-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cnnot open the file /var/lib/netdata/health.silencers.json #45
Comments
I just can't replicate this. I see the following dirs and permissions on my master:
What do you see? |
the node goes into crashloopbackoff
I'm don't know how to get a shell to a crashed pod, is there a way? |
ok it was the storage class. I have the pod running, I'll check the rest later, but I'm sure it will be fine. Thanks for the quick response and sorry for wasting your time! |
I have encountered the same problem. Could you please tell me how to solve it? Why is there no solution? |
@happysalada said it was the storage class, so I expect he had to modify the value |
I reopened it, so you can verify it works for you too. If this is a common problem, perhaps we should disable persistence by default (i.e. set |
@ktsakalozos gave a much better solution at #58, which should fix this. Waiting for feedback that it works with chart version 1.1.10. |
I keep getting
on the slaves and the pods won't start. |
If you're getting these errors from the slaves, then the master db and alarms persistent volume configs are irrelevant. I checked some things and I do see something that's not right, the file containing the GUID in the slaves was somehow created by root! The master and any normal installation should have that file being created by user netdata. Not sure if it's related, but it's definitely a suspect. Posting the commands in the next comment. |
On the master, owned by user netdata
On the slave, owned by root!
But the
|
Ok, so I see the issue, writing here for me to clear it in my mind and for you to validate the solution: https://github.com/netdata/helmchart/blob/master/templates/daemonset.yaml#L70
What this is supposed to do is to cheat a bit, so that new pods can get the same MACHINE_GUID as older pods, running on the same node. It's not at all important that they do so, it was meant to help the netdata registry to not consider every pod restart a new machine. I really mucked this one up though:
There's a similar line on the Statefulset template as well, which by mistake has it on |
I remembered why I needed that persistent machine GUID. Without it, the master database engine will create new DB files for those pods. So every time you restart a pod, you will lose all the history. That's no good. So I will try to find another way to fix this. |
There's no easy way to do this, UUIDs are supposed to be unique. I'm removing them and we'll need to find some other way for the masters' database to keep those pods' long-term history in the same db instance, even after a restart. |
PR with the fix merged @masterkain, please test. |
thanks @cakrit seems to be up and running I have one last question if I may: I'm trying to run netdata in a linkerd-enabled namespace and although the master picks up the mesh extra containers the slaves won't, how is that? I even tried
|
With netdata 1.17 I get a new issue on master
Here is my values.yml (I have just updated the image tag, and the storage class for the volumes)
I'm using rook and cephfs for the volumes, let me know if you want the details.
The text was updated successfully, but these errors were encountered: