-
Notifications
You must be signed in to change notification settings - Fork 0
[Bug] Aggregator pod crashes intermittently #58
Comments
cc @cliffcolvin can you take a look here? Has this been fixed in the upcoming 2.1 rc's? |
Transferred. |
We're taking a look right now. |
@tabossert do you have any further log context from this crash? About 5 lines after and 15-20 lines preceding would help me here. |
`goroutine 2513173 [runnable]: goroutine 2513177 [sync.Mutex.Lock]: goroutine 2513180 [runnable]: rax 0x0 |
@tabossert That's helpful, thank you for the quick response. I'm looking for the first instance of the If you'd like, I can make it easier on you -- you can share the log file with me privately via email: michael@kubecost.com |
To clarify: I need more log context to understand what's going wrong here. Please either share a full log file or share the requested first trace + surrounding context I mentioned above. |
Email sent with full log @michaelmdresser |
Thank you @tabossert. I have a pretty strong theory about what's going wrong here -- there are a few different resolution paths if this is what I think it is. If you are willing to try a pre-production release, please upgrade to Kubecost Otherwise, if you would like to stay on
|
Thanks, we will try those workarounds until the v2.1.0 is released. Thanks for the quick response! |
I tried upgrading to 2.1.0-rc6, but it wasn't seeming to load the data, so not sure if I missed something, I went to go back to 2.0.2 but now it gives me this error |
Actually just did an upgrade to 2.1.0 that was just released and that seems to be loading, will report back if the crashes stop |
Thanks for the update and sorry for the confusion about the back-and-forth upgrade. Please let us know if you run into trouble with 2.1.0. |
Issue seems to be resolved, thanks! |
Hello everyone! @michaelmdresser I experience the similar issue on GKE cluster in version
|
I have resolved the issue above with according to this message: #72 |
Kubecost Helm Chart Version
2.0.2
Kubernetes Version
1.27
Kubernetes Platform
AKS
Description
Intermittently the kubecost pod restarts, due to an error in the aggregator pod as seen below
We have tuned resources as much as possible so it doesn't seem to be related to OOM or disk slowness.
Steps to reproduce
Expected behavior
Pod would not be restarting
Impact
Our scripts to pull data out fail when this happens
Screenshots
No response
Logs
Slack discussion
No response
Troubleshooting
The text was updated successfully, but these errors were encountered: