Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a simple k8s job that can install or upgrade Gateway API CRDs #2678

Open
robscott opened this issue Dec 14, 2023 · 11 comments
Open

Create a simple k8s job that can install or upgrade Gateway API CRDs #2678

robscott opened this issue Dec 14, 2023 · 11 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@robscott
Copy link
Member

What would you like to be added:
We could create a simple Kubernetes Job that could be bundled with implementations to install Gateway API CRDs if they don't already exist. This job would have the following configuration:

  • Desired bundle version
  • Desired release channel
  • Optional: Desired subset of CRDs

This would need to have the following logic for each Gateway API CRD:

  1. If Gateway API CRD exists:
    a. Skip or error if existing CRD is from a different release channel or does not have expected bundle version or release channel labels
    b. Upgrade to configured bundle version if existing CRD has older version
    c. Skip if existing CRD has version >= to version configured by job
  2. If Gateway API CRD does not exist in cluster, install it.

All of this could theoretically be built with the registry.k8s.io/kubectl image.

Why this is needed:
Many implementations want to have an easy way to bundle CRDs with their installation, but they also don't want to conflict with other installations of Gateway API in the cluster. This could provide a reasonably safe mechanism to ensure that CRDs were present and at a min version. This could also be bundled in a Helm chart #1590 to bypass some of the limitations of including CRDs directly in a Helm chart.

Note: This is not ready to work on yet. We first need to get some feedback on this idea to ensure that it actually makes sense before starting any development.

@danehans
Copy link
Contributor

Since the CRDs are shared resources, what safeguards does this approach provide to ensure the Job does not cause breakage among different implementations? For instance, implementation A runs the Job to install version X of the CRDs and later implementation B runs the Job to install version Y of the CRDs. If the schema changes between X and Y versions, a conversion will need to take place, correct?

@robscott
Copy link
Member Author

You're completely right @danehans, to make this safe, we'd need to establish some guardrails that could be fairly limiting. I think the only way to provide safe installation and upgrades would be to limit this to installing newer versions of CRDs included in standard channel. If an experimental CRD was present, it's possible that an upgrade could result in a breaking change.

I think the MVP for this would need to be limited to standard channel since it provides strong backwards compatibility guarantees.

In the future, we'd probably want to extend this to experimental, but that would require more advanced logic, including:

  • Awareness of which upgrade paths contain breaking changes and can't be automatically upgraded
  • Understanding of storage versions and if/when an upgrade would fail due to resources being on old storage versions
  • Maybe some kind of option to force an upgrade and/or override certain safeguards, but that may defeat the whole purpose

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 19, 2024
@networkhermit
Copy link
Contributor

Taken from my comment in #2951 (comment)

I'm not sure using a job to bootstrap the gateway api crds is possible before the CNI get ready first. As that is the case to bootstrap cilium to use its gateway api support. I'm testing different implementations to better learn Gateway API.

@robscott
Copy link
Member Author

robscott commented Apr 9, 2024

I'd always assumed that Cilium's Envoy-based Gateway API implementation was deployed separately from CNI, @sayboras can you confirm if this approach would be problematic for Cilium?

@sayboras
Copy link
Contributor

sayboras commented Apr 9, 2024

I'd always assumed that Cilium's Envoy-based Gateway API implementation was deployed separately from CNI

Yes, you are correct. The Gateway API provisioning part is part of Cilium Operator, which is separated from Cilium Agent or Cilium CNI components.

Can you confirm if this approach would be problematic for Cilium?

I don't think there will be any problem due to the reasons mentioned above.

@networkhermit
Copy link
Contributor

@sayboras Hello!

https://github.com/cilium/cilium/blob/d913b6298123064f51a8b97495f956b5ebbe62b7/install/kubernetes/cilium/templates/cilium-gateway-api-class.yaml#L1-L11

When users use helm chart to bootstrap cilium CNI with gatewayAPI.enabled in a new cluster, is the default GatewayClass cilium the only missing resource if the gateway api crds were not installed before hand?

I currently uses a multi-step installation process:

  1. install cilium with gatewayAPI support disabled
  2. use fluxcd to install the gateway api crds
  3. update cilium helm values to enable gatewayAPI to finish with the gateway api support

Is it equivalent with the following approach?

  1. install cilium with gatewayAPI support enabled in the first run
  2. use fluxcd to install the gateway api crds and the cilium GatewayClass

I'd always assumed that Cilium's Envoy-based Gateway API implementation was deployed separately from CNI

Yes, you are correct. The Gateway API provisioning part is part of Cilium Operator, which is separated from Cilium Agent or Cilium CNI components.

Can you confirm if this approach would be problematic for Cilium?
I don't think there will be any problem due to the reasons mentioned above.

@robscott More specifically, does it mean in the future the cilium helm installation method would embed the gateway api crds bootstrap/upgrade k8s job?

@sayboras
Copy link
Contributor

sayboras commented Apr 9, 2024

Is it equivalent with the following approach?

Not really equivalent, however, once cilium/cilium#29207 is done, the installation process will be easier (though you might still need to provision Cilium GatewayClass outside of helm chart).

@networkhermit
Copy link
Contributor

Is it equivalent with the following approach?

Not really equivalent, however, once cilium/cilium#29207 is done, the installation process will be easier (though you might still need to provision Cilium GatewayClass outside of helm chart).

I see. If we use k8s job (as this issue discussed) to install the gateway api crds, given that cilium/cilium#29207 is done, so basically this k8s job and the cilium helm bootstrap can be started in parallel and got eventually installed, not leaving the k8s job in pending state. Is my understanding correct?

@sayboras
Copy link
Contributor

sayboras commented Apr 9, 2024

I see. If we use k8s job (as this issue discussed) to install the gateway api crds, given that cilium/cilium#29207 is done, so basically this k8s job and the cilium helm bootstrap can be started in parallel and got eventually installed, not leaving the k8s job in pending state. Is my understanding correct?

The Gateway API provisioning is part of Cilium Operator, which is separated from Cilium Agent or Cilium CNI components. So any pod will be scheduled regardless of Gateway API CRD installation. The work mentioned in cilium/cilium#29207 is to improve user experience and avoid manual Cilium Operator restart.

@networkhermit
Copy link
Contributor

I see. If we use k8s job (as this issue discussed) to install the gateway api crds, given that cilium/cilium#29207 is done, so basically this k8s job and the cilium helm bootstrap can be started in parallel and got eventually installed, not leaving the k8s job in pending state. Is my understanding correct?

The Gateway API provisioning is part of Cilium Operator, which is separated from Cilium Agent or Cilium CNI components. So any pod will be scheduled regardless of Gateway API CRD installation. The work mentioned in cilium/cilium#29207 is to improve user experience and avoid manual Cilium Operator restart.

Thanks for the above and previous clarification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

6 participants