-
Notifications
You must be signed in to change notification settings - Fork 0
[Bug] Aggregator crashing on Kubecost #72
Comments
@aaj-synth thank you for reporting this issue, I've got an engineer looking at this today. |
Same thing for me as well but with different error: Full Error:
|
@passionInfinite @aaj-synth curious if a downgrade to 2.1 resolves the issue? |
Downgrading from 2.2 to 2.1 does not help. I had to go back to 1.108 to have things working again. |
Downgrading to v2.1.0 produces this error:
|
@aaj-synth For me it is still failing for 1.108.1 . Is it something that you did and it started working? Looks like the pv files are corrupted 🤔 |
Hi @passionInfinite looks like a seperate issue with permissions on the PV? Can you open a ticket with support? |
I upgraded from v1.108.0 to v2.1.0 and it worked fine. In the meantime i saw the blog post about v2.2.0 being released and soon as i upgraded to that, things stopped working. I tried downgrading to v2.1.0 but that ran in the same error that i mentioned in the issue. I eventually downgraded to v1.108.0 and just removed the |
I can confirm i am facing this too on kubecost 2.1, i did a upgrade from 1.103.5 to 2.1.1
|
@rahul-chr did you upgrade directly from v1.103.5 to v2.1.1? No other upgrades/downgrades along the way before seeing that error? |
@aaj-synth Downgrades can sometimes be tricky when going between particular version of v2.X. We're working on making this not a problem. While we're working on that, if you'd like to get back to v2.1 or try getting onto v2.2 again, please remove the The command you would run is this, assuming Kubecost is installed in the @rahul-chr I'm not confident that your problem is the same as @aaj-synth's problem. If you're willing to experiment, trying the same command above might help you, but it also might not. |
Also, @aaj-synth and @rahul-chr do Kubecost's PV(C)s have enough space on them? Are any of them filling up or full? |
@michaelmdresser For my case, I found out that the PV mounted folder permissions got changed to root for some reason but the newer version uses the @michaelmdresser By anychance etlUtils runs as root? 🤔 |
@michaelmdresser I attached the volume to another test pod and checked the permissions of the v1.106.5 (Current Version) -> v1.107.1 Please correct me if something is wrong in my point of view. |
@passionInfinite Thank you for the extra information, please open a separate issue to track the file permission problems you have encountered. We are using this issue to track the original issue and related problems: |
I attempted an upgrade directly from v1.103.5 to v2.1.1 without incident. I suspect this issue is limited to situations where downgrades have occurred. |
@michaelmdresser yes that was directl upgrade.. no downgrades.
|
Fascinating, we're trying to look further into this.
@rahul-chr Are you using Aggregator in a StatefulSet configuration? If so, the command I gave you is slightly wrong, and needs to be modified like so:
|
Thank you @michaelmdresser for your response! But looks like this isnt helping either..
error: unable to upgrade connection: container not found ("aggregator") |
@rahul-chr Ah, shucks. I'm guessing that's because its crash looping. To
I apologize for the trouble here. This is an unusual error situation. |
@michaelmdresser i think still there is obvious problem
But i have tweaked it more, i have removed the below
It works now also, do you think this is a potential bug with this upgrade ? |
Ah, thanks for the reminder about that bit of the volume configuration. Thanks for your patience.
The command works, great! After removing the
Is this question about the original bit of this GH Issue, which is |
Nope, this is specific to my issue, do you want me to open an github issue for that? As i am afraid, if i can do this workaround(removing duckdb) in production? |
If you're running into a new bug, please do open a new issue.
Don't worry! DuckDB files are not a "source of truth" -- Aggregator builds up its datastore from what we call "ETL" files which are stored either in object storage (e.g. S3, GCS) or in a different folder in the PV, depending on your configuration. Removing the |
Does not appear to be an issue with the Helm chart. Transferred to the correct repository. |
@AjayTripathy - This is marked as completed - do you know what version a fix was released in? thanks :) |
2.2.5 -- let me check on what's going on in #103 though. |
Kubecost Helm Chart Version
2.2.0
Kubernetes Version
1.29
Kubernetes Platform
EKS
Description
While trying to update kubecost from v2.1.0 to v2.2.0, the
kubecost-analyzer
pod's containeraggregator
started going intoCrashLoopBackOff
with the error pasted below.Steps to reproduce
Expected behavior
It was expected to update successfully but it threw this error.
Impact
No response
Screenshots
No response
Logs
Slack discussion
No response
Troubleshooting
The text was updated successfully, but these errors were encountered: