-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why do my grafana tempo ingester pods go into Backoff restarting state after max_block_duration? #2488
Comments
Can you share the k8s reason they restarted? OOM? unexpected exit? If it's an unexpected exit the logs would be helpful. If it's an OOM try increasing the memory limit. |
Hi @joe-elliott Thanks for replying. I can attach a screenshot of the my tempo ingester pod logs.
These was basically the logs.
|
Can you please attach the logs when it crashed? |
Hi @joe-elliott, I cant get them now because yesterday I uninstalled and re-installed the tempo. But I remember it was this only
|
@joe-elliott Now As mentioned ingester works fine (without single restart), but I have also seen that my compactor fails and keeps on restarting.. then runs for some minutes then again goes on backoff restart and then again goes up and on and off.. Here are my compactor pod logs:
what can be the issue with this? |
I'm struggling to understand your issue. Perhaps you could give the output of |
Hi @joe-elliott, thanks! |
Good to hear. If your compactors are having issues OOMing, I would also consider using Closing this issue. Thanks for the report! |
Many Thanks @joe-elliott for your help! |
Can anyone share their grafana tempo configuration for using Azure blob storage (access key with SOPS secret)? |
Hi @nishasati6oct , here is a sample config on the docs itself.
|
This is my config
SOPS secret with Az keyVault apiVersion: v1 I am using SOPS to encrypt storage-account-key, and helm-release is failing as not able to find storage-account-key (error - field storage-account-key not found in type config.Config). Can u plz suggest if something is misconfigured here? |
@nishasati6oct can you paste the exact error in a more readable way? |
kubectl logs grafana-tempo-compactor -n monitoring
|
instead of creating a secret like you mentioned above,
Please create with underscore like below: And adjust in extraEnv as well and give it a try.
|
This actually worked finally :) Thanku so much for your support. apiVersion: helm.toolkit.fluxcd.io/v2beta1
One last question - can we define the below section globally instead of defining it for every component? |
@nishasati6oct glad it got worked.
I know it is little hectic to provide it multiple times but actually it is a requirement as a developer to define env variables explicitly to a tempo component and not all, that is the reason we have to pass these configuration for every components, helm chart is made in that way only. I hope you have masked your actual azure storage values in above discussion. |
Thanku for your prompt reply. yes, Those are not real values of storage account and Key. :) |
I am using grafana-tempo distributed helm chart. It is successfully deployed and its backend is configured on Azure Storage (blob containers) and working fine.
I have a demo application which is sending traces to grafana-tempo. I can confirm I'm receiving traces.
The issue I have observed is that exactly after 30m, my ingester pods are going into Back-off restarting state. And I have to manually restart its statefulset.
While searching the root cause, found that their is one parameter max_block_duration which has a default value of 30m: "max_block_duration: maximum length of time before cutting a block."
So I tried to increase the timing, and given value 60m. Now after 60 minutes my ingester pods are going into Back-off restarting state.
I have also enabled autoscaling. But no new pods are coming up if all ingester pods are in the same error state.
Can someone help me out to understand why its happening like this and the possible solution to eleminate the issue?
What value should be passed to max_block_duration so that this pods will not so in Back-off restarting?
Expectation: I expect my Ingester pods should run fine every time.
The text was updated successfully, but these errors were encountered: