Create a simple k8s job that can install or upgrade Gateway API CRDs #2678

robscott · 2023-12-14T18:07:52Z

What would you like to be added:
We could create a simple Kubernetes Job that could be bundled with implementations to install Gateway API CRDs if they don't already exist. This job would have the following configuration:

Desired bundle version
Desired release channel
Optional: Desired subset of CRDs

This would need to have the following logic for each Gateway API CRD:

If Gateway API CRD exists:
a. Skip or error if existing CRD is from a different release channel or does not have expected bundle version or release channel labels
b. Upgrade to configured bundle version if existing CRD has older version
c. Skip if existing CRD has version >= to version configured by job
If Gateway API CRD does not exist in cluster, install it.

All of this could theoretically be built with the registry.k8s.io/kubectl image.

Why this is needed:
Many implementations want to have an easy way to bundle CRDs with their installation, but they also don't want to conflict with other installations of Gateway API in the cluster. This could provide a reasonably safe mechanism to ensure that CRDs were present and at a min version. This could also be bundled in a Helm chart #1590 to bypass some of the limitations of including CRDs directly in a Helm chart.

Note: This is not ready to work on yet. We first need to get some feedback on this idea to ensure that it actually makes sense before starting any development.

The text was updated successfully, but these errors were encountered:

danehans · 2023-12-19T22:31:38Z

Since the CRDs are shared resources, what safeguards does this approach provide to ensure the Job does not cause breakage among different implementations? For instance, implementation A runs the Job to install version X of the CRDs and later implementation B runs the Job to install version Y of the CRDs. If the schema changes between X and Y versions, a conversion will need to take place, correct?

robscott · 2023-12-19T23:56:26Z

You're completely right @danehans, to make this safe, we'd need to establish some guardrails that could be fairly limiting. I think the only way to provide safe installation and upgrades would be to limit this to installing newer versions of CRDs included in standard channel. If an experimental CRD was present, it's possible that an upgrade could result in a breaking change.

I think the MVP for this would need to be limited to standard channel since it provides strong backwards compatibility guarantees.

In the future, we'd probably want to extend this to experimental, but that would require more advanced logic, including:

Awareness of which upgrade paths contain breaking changes and can't be automatically upgraded
Understanding of storage versions and if/when an upgrade would fail due to resources being on old storage versions
Maybe some kind of option to force an upgrade and/or override certain safeguards, but that may defeat the whole purpose

k8s-triage-robot · 2024-03-19T00:49:59Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

networkhermit · 2024-04-09T00:52:46Z

Taken from my comment in #2951 (comment)

I'm not sure using a job to bootstrap the gateway api crds is possible before the CNI get ready first. As that is the case to bootstrap cilium to use its gateway api support. I'm testing different implementations to better learn Gateway API.

robscott · 2024-04-09T04:15:15Z

I'd always assumed that Cilium's Envoy-based Gateway API implementation was deployed separately from CNI, @sayboras can you confirm if this approach would be problematic for Cilium?

sayboras · 2024-04-09T13:52:33Z

I'd always assumed that Cilium's Envoy-based Gateway API implementation was deployed separately from CNI

Yes, you are correct. The Gateway API provisioning part is part of Cilium Operator, which is separated from Cilium Agent or Cilium CNI components.

Can you confirm if this approach would be problematic for Cilium?

I don't think there will be any problem due to the reasons mentioned above.

networkhermit · 2024-04-09T14:39:23Z

@sayboras Hello!

https://github.com/cilium/cilium/blob/d913b6298123064f51a8b97495f956b5ebbe62b7/install/kubernetes/cilium/templates/cilium-gateway-api-class.yaml#L1-L11

When users use helm chart to bootstrap cilium CNI with gatewayAPI.enabled in a new cluster, is the default GatewayClass cilium the only missing resource if the gateway api crds were not installed before hand?

I currently uses a multi-step installation process:

install cilium with gatewayAPI support disabled
use fluxcd to install the gateway api crds
update cilium helm values to enable gatewayAPI to finish with the gateway api support

Is it equivalent with the following approach?

install cilium with gatewayAPI support enabled in the first run
use fluxcd to install the gateway api crds and the cilium GatewayClass

I'd always assumed that Cilium's Envoy-based Gateway API implementation was deployed separately from CNI

Yes, you are correct. The Gateway API provisioning part is part of Cilium Operator, which is separated from Cilium Agent or Cilium CNI components.

Can you confirm if this approach would be problematic for Cilium?
I don't think there will be any problem due to the reasons mentioned above.

@robscott More specifically, does it mean in the future the cilium helm installation method would embed the gateway api crds bootstrap/upgrade k8s job?

sayboras · 2024-04-09T14:47:27Z

Is it equivalent with the following approach?

Not really equivalent, however, once cilium/cilium#29207 is done, the installation process will be easier (though you might still need to provision Cilium GatewayClass outside of helm chart).

networkhermit · 2024-04-09T15:07:54Z

Is it equivalent with the following approach?

Not really equivalent, however, once cilium/cilium#29207 is done, the installation process will be easier (though you might still need to provision Cilium GatewayClass outside of helm chart).

I see. If we use k8s job (as this issue discussed) to install the gateway api crds, given that cilium/cilium#29207 is done, so basically this k8s job and the cilium helm bootstrap can be started in parallel and got eventually installed, not leaving the k8s job in pending state. Is my understanding correct?

sayboras · 2024-04-09T23:43:06Z

I see. If we use k8s job (as this issue discussed) to install the gateway api crds, given that cilium/cilium#29207 is done, so basically this k8s job and the cilium helm bootstrap can be started in parallel and got eventually installed, not leaving the k8s job in pending state. Is my understanding correct?

The Gateway API provisioning is part of Cilium Operator, which is separated from Cilium Agent or Cilium CNI components. So any pod will be scheduled regardless of Gateway API CRD installation. The work mentioned in cilium/cilium#29207 is to improve user experience and avoid manual Cilium Operator restart.

networkhermit · 2024-04-10T00:40:53Z

I see. If we use k8s job (as this issue discussed) to install the gateway api crds, given that cilium/cilium#29207 is done, so basically this k8s job and the cilium helm bootstrap can be started in parallel and got eventually installed, not leaving the k8s job in pending state. Is my understanding correct?

The Gateway API provisioning is part of Cilium Operator, which is separated from Cilium Agent or Cilium CNI components. So any pod will be scheduled regardless of Gateway API CRD installation. The work mentioned in cilium/cilium#29207 is to improve user experience and avoid manual Cilium Operator restart.

Thanks for the above and previous clarification.

robscott added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 14, 2023

robscott mentioned this issue Dec 14, 2023

Feature request: suggest to provide a helm chart for gateway-api and related CRDs #1590

Open

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 19, 2024

mikemorris mentioned this issue Apr 8, 2024

feat: add gateway-api-crds chart #2951

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a simple k8s job that can install or upgrade Gateway API CRDs #2678

Create a simple k8s job that can install or upgrade Gateway API CRDs #2678

robscott commented Dec 14, 2023

danehans commented Dec 19, 2023

robscott commented Dec 19, 2023

k8s-triage-robot commented Mar 19, 2024

networkhermit commented Apr 9, 2024

robscott commented Apr 9, 2024

sayboras commented Apr 9, 2024 •

edited

networkhermit commented Apr 9, 2024

sayboras commented Apr 9, 2024

networkhermit commented Apr 9, 2024

sayboras commented Apr 9, 2024

networkhermit commented Apr 10, 2024

Create a simple k8s job that can install or upgrade Gateway API CRDs #2678

Create a simple k8s job that can install or upgrade Gateway API CRDs #2678

Comments

robscott commented Dec 14, 2023

danehans commented Dec 19, 2023

robscott commented Dec 19, 2023

k8s-triage-robot commented Mar 19, 2024

networkhermit commented Apr 9, 2024

robscott commented Apr 9, 2024

sayboras commented Apr 9, 2024 • edited

networkhermit commented Apr 9, 2024

sayboras commented Apr 9, 2024

networkhermit commented Apr 9, 2024

sayboras commented Apr 9, 2024

networkhermit commented Apr 10, 2024

sayboras commented Apr 9, 2024 •

edited