-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transition from gcr.io to a modern artifact repository #15199
Comments
Having studied the release script, I see pushing to both registries substantially increases duration and resource usage of the release pipeline. The advantage is unclear to me. |
Would be good to consult what K8s is doing about this. @BenTheElder |
They've already migrated. https://kubernetes.io/blog/2022/11/28/registry-k8s-io-faster-cheaper-ga/ |
registry.k8s.io is a multi-cloud hybrid system for funding reasons (that's a whole complicated topic ...), but we've also used the opportunity to move to basing Kubernetes's future image hosting on Artifact Registry, we hope to adopt some of the AR features at some point like immutable tags. registry.k8s.io basically sits in front of AR and redirects some content download traffic to other hosts. The source code is not fully reusable at the moment (shipping reliably ASAP >>> flexible configuration), but the approach is hopefully well enough documented and relatively simple. I'm not sure what overall is most appropriate for etcd, other than I would recommend GCR => AR. It's mostly a drop-in upgrade. |
I know that technically etcd isn't a kubernetes sig (right?), but it is CNCF, so maybe it should just use the kubernetes release pipeline, rather than creating a whole new one. I'd much rather we redefine the kubernetes release pipeline as the CNCF release pipeline, than require every CNCF project to stand up their own. There is a pipeline for etcd already: https://github.com/kubernetes/k8s.io/tree/main/k8s.gcr.io/images/k8s-staging-etcd The process is described here (along with a background of why etcd is there - TLDR because it is bundled with k8s) This then becomes a shared problem (aka not etcd's problem), though of course anyone would be welcome to work on it. With artifacts.k8s.io, our dependency on gcr.io is pretty light anyway, and if the etcd project wants to maintain their own read-only mirror (e.g. if you have some money burning a hole in your pocket) then it's relatively easy to stand up a S3 / GCS / whatever bucket to do that. |
@justinsb I agree that it's inefficient for each project to build their own pipeline, however I don't think it's a simple as just taking K8s pipeline. Etcd image released by etcd is totally different than what etcd users would expect. It includes additional old etcd binaries, wraper scripts for purpose of running etcd in K8s. It would be great if CNCF gave us ready release tooling and maintained it for us, however reality is that we mostly depend on contributions and etcd community is not large enough to support it on our own. I have escalated problem of etcd release pipelines multiple times to both CNCF representatives and Kubernetes release people, but no luck. I'm stuck building etcd on my own laptop. |
GHCR + github actions might be worth exploring as a potentially no-cost, automated, low-maintenance option. I think some SIG subprojects in Kubernetes have done so, but I don't have first hand experience yet. I'm not sure Kubernetes is in a position to be offering to host the entire CNCF (considering our existing budget overruns...) ... but for etcd in particular there is probably an argument to be made, we'd need to bring that to SIG K8s Infra and SIG Release. Otherwise if Kubernetes is not actively hosting the infrastructure for you, I wouldn't recommend replicating all of it, especially if you're already understaffed. The approaches used are not without benefits but also not free. |
GitHub Container Registry isn't configured for IPv6 either. |
Tell me more @serathius and I might be able to make that monkey paw finger curl 🙃 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
@justinsb @serathius @ahrtr Is this something that can be revisited these days? |
etcd has already become a Kubernetes SIG, how do other SIGs maintain their images? Can we just follow the similar way to do this? We need someone to drive this effort. |
Let's chat separately and see if I can help with this. |
Going back to solving the immediate problem of AR disappearing in march. Can we just follow https://cloud.google.com/artifact-registry/docs/transition/auto-migrate-gcr-ar#migrate-gcrio-hosted-ar? Run Would be good to confirm the assumption and have someone to test the command if it works as we expect. So, create a docker registry in GCP project, push a random image there, make it public, check if they can download it, then migrate it, check if they can download it after and push new images the same way. Anyone interested in helping? |
[NOTE: registry.k8s.io has not depended on gcr.io for a long time, it is on artifact registry + other hosts, long term this is a more cost effective option inline with other SIG projects] |
I've raised #19250 to propose a migration to artifact registry. We would still need to migrate existing images but there are helper utilities as mentioned by @serathius above for copying existing images across to the new repository. |
/assign Based on the consensus in pull request #19250 and the 2025-01-21 community meeting, we lean towards following @serathius's suggestion.
I'm on it. |
Uh, I don't think I'll be able to test. I created a new GCR repository, but GCP has already implemented the Artifact Registry mirroring for new repositories: $ gcloud artifacts docker upgrade migrate --projects=ivan-tests-449923
Artifact Registry is already handling all requests for *gcr.io repos for the provided projects. If there are images you still need to copy, use the --copy-only flag. If we want to test this scenario, we'd need someone with an old and not yet migrated GCR repository. |
Alternative would be to do a canary rollout. The migration command supports We could do a following rollout process:
One thing I'm not sure yet is how to recognize if image was served by GCR or AR. I expect that it should be visible in the response headers. What do you think about redirecting 1% of traffic now? In worst case user will need to retry a request. I don't think we can test it otherwise. cc @BenTheElder @ahrtr @jmhbnz @ivanvc what's your thoughts? |
Makes sense. Probably we can even set
I am not 100% sure about it either. Currently I see there is a warning on the console " Another thing which needs to be super carefully about is the removal of the legacy container registry storage. Based on this doc,
It seems that the recommended way is to remove the Cloud Storage buckets for the Container Registry directly. Probably we should do (cleanup the storage) it after the legacy container registry is completely out of support (May 22, 2025); but we still need to double confirm that it won't affect the already migrated gcr.io hosted on artifact registry. |
I think doing 1% trial is fine, but I think you should be free to just move it, eventually the migration will be forced anyhow. If you someone complains at 50%, are you actually going to turn it off? Then what? GCR turndown was announced a long time ago and will actually begin on March 18th, so there's not a lot of room to delay. https://cloud.google.com/artifact-registry/docs/transition/prepare-gcr-shutdown We've been very explicit with registry.k8s.io that we won't be beholden to users depending on implementation details of the host versus a public OCI registry and that there can be no SLA as a free, volunteer staffed content host https://registry.k8s.io#stability If users are really serious about uptime they need to use a mirror/pull-through-cache or distro provided mirror (which we provide docs/guidance for). I think the situation is similar here.
Artifact registry doesn't depend on your GCR GCS bucket once the content is migrated |
This is slightly leaky, so you you do The traffic to the GCS bucket is also observable in cloud console IIRC. er: this is using https://github.com/google/go-containerregistry/tree/main/cmd/crane, not the only option, but I find it helpful for this sort of debugging so you can see the requests etc |
Either way works. To keep it simple, let's just move it (without using the
Thanks for the confirmation. |
@ivanvc let's just issue the command below today and let's check the status tomorrow? We have another container registry at
|
Hey all, I had a busy morning and couldn't reply earlier. By now, it's probably late in the evening for @ahrtr and @serathius. I agree with @BenTheElder. Even if things fail, we don't have an alternate plan, as the shutdown is coming soon, and there's nothing else we can do. However, I still feel like the safest course of action would be to first do the 1% canary just to triple-check that everything works fine (and that we don't have configuration issues). Then, move forward with it. @ahrtr (hopefully you see this today), are you okay with this? |
I could finally reproduce the steps to migrate an old GCR repo into AR. I tried with a 1% and 10% canary and later ran the whole migration in one of my repositories, and it worked fine. I tested getting the images using a It doesn't hurt to do a canary deployment in etcd-development, while I can verify the location and that the images work as expected. So, I'm enabling it in the meantime. |
I enabled it with 1%. I'm seeing that some of my requests are being redirected to AR. I'm testing with a blob from our latest release (v3.5.18): https://gcr.io/v2/etcd-development/etcd/blobs/sha256:b9e6889272c9e672fa749795344385882b2696b0f302c6430a427a4377044a7a Following the redirect, it returns a 200. So, there aren't any permission issues. I'll open the floodgates 😄 (route all traffic as asked by @ahrtr). |
And it works as expected. I tested several 3.5 and 3.4 images, which run fine in Docker. Also, I checked with |
Great news. Thanks @ivanvc |
Yesterday, we released the v3.6.0-rc.0 release. It was the first time we pushed to AR's backed GCR, and it worked as expected with no issues. So, the download and upload parts are now thoroughly tested. I believe we can close this issue now. If you think otherwise, we can reopen it. Happy valentines 💟 ✌ |
What would you like to be added?
The Google Container Registry is deprecated. Transitioning within the Google ecosystem, to their Artifact Registry, is described on https://cloud.google.com/artifact-registry/docs/transition/transition-from-gcr.
Alternatively, only use Quay.
Why is this needed?
A pressing problem this would solve is that the Artifact Registry is reachable over IPv6, whereas the Container Registry isn't.
The text was updated successfully, but these errors were encountered: