-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spegel seems not to be working in EKS #503
Comments
I realize these logs only contain 200-responses so here is a snippet from another cluster
|
It seems there is partial success in replicating image layers. Might it be that you get problems during node startup? It is also possible that this is the suspected "EKS bug", see #469 (comment). There are two things you can do to help us pin down this issue:
Also, just to be sure: you are sure that the pods not starting are because of docker.io rate limiting? The reason I'm asking is that one of the hypotheses on the "EKS bug" is that EKS containerd was doing something special when talking to ECR. |
I am not sure of this node is the mirroring node or the mirroree but this one has much less errors then the others and not the same
I tried spinning up a new node today and I think all the logs looked good.
So I guess spegel is working as intended? |
As I am upgrading nodes to a newer version (1.28) at the same time as I am updating spegel from 0.0.14 -> 0.0.22 could there be an issue with communication between the older spegel pods and the newer ones? |
In theory no there should not be an issue. They are both running OCI compliant registries. There are however no tests to verify that this actually works. I would say that I am 90% sure that this should work. |
Right now I am experiencing some kind of catch 22 where it cant pull from docker.io becuase it has exhausted the number of allowed pulls and therefor cant start Calico pods and spegel won't start because Calico is not up.
As I drained the older nodes I think spegel re-elected a new leader as can be seen by these logs I think
|
Spegel relies on the CNI working so there is no real method to solve this. |
Ok, I've configured the pull of Calico image to use docker credentials and then spegel can hopefully solve any other images if needed. |
Spegel version
0.0.22
Kubernetes distribution
EKS
Kubernetes version
1.28
CNI
Calico
Describe the bug
We are in the process of upgrading our clusters to 1.28 and in doing so we are also upgrading spegel from 0.0.14 to 0.0.22 and the nodes from 1.27.9-20240117 to 1.28.8-20240514. We previously encountered #350 so we are aware of the EKS specific settings now needed which is documented at https://github.com/spegel-org/spegel/blob/main/docs/COMPATIBILITY.md. We modified our current userdata and added the suggested EKS settings.
After trying this on a test cluster it seems spegel is not working, although I am not sure. Most obvious sign is that after draining some nodes and starting new ones I start to get issues with pods not starting because we have exceeded our pull limits to docker.io.
Here is some logs pulled from one of the spegel pods.
After this is mainly repeats the last log lines.
I have compared the output with the
FAQ but logs dont seem to match well. This is the only pod that successfully acquires a lease which might indicate it's selected as a leader?
I am quite new as a cluster administrator so it is very possible that I have done something wrong in our setup. But would appreciate some help with understanding the logs and how to proceed.
The text was updated successfully, but these errors were encountered: