Modular architecture for component reconcilers and kyma CLI #13759

pbochynski · 2022-03-28T11:38:17Z

Description
Kyma architecture should support modularization. The first step in this direction was made with the initial implementation of kyma reconciler, but it is not sufficient.

Requirements

Flexible deployment options for Kyma operator (installer)
- can manage multiple cluster (control-plane mode)
- can be shipped as a single binary (cli)
- can be installed in the cluster (single cluster mode)
Each component can be independently enabled/disabled
Each component can provide own reconciler (operator) or use base reconciler (helm) if no special actions are required
Component reconcilers should handle their dependencies (fail if something is missing)
Some component reconcilers require secure (trusted) connection to the external systems and cannot run in the target cluster
Event-driven reconciliation should be preferred over time-based reconciliation (component is up and running as soon as possible after dependencies are ready)
Horizontal scalability should be possible (sharding)
Changing Kyma version and component version
Central configuration should take precedence over user configuration (in the managed Kyma scenario control plane can validate user configuration before it is applied)
Avoid creating additional service accounts with powerful roles in the user cluster (e.g. tiller like service account with cluster admin role)
Running external modules should be possible (adding component reconcilers from external contributors)

Reasons
Kyma provides Kubernetes building blocks. It should be easy to pick only those that are needed for the job and it should be easy to add new blocks to extend Kyma features. With the growing number of components, it is not possible to always install them all anymore.

Ideas

Use CRD and Operator SDK / Kubebuilder as base architecture.
KEB is just OSB API wrapper for Cluster custom resource (translates plan into cluster configuration and Kyma version in the resource spec)
Each reconciler has own CRD to manage. Use owner reference to point main cluster resource.
Some reconcilers have to watch target clusters resource (e.g. service instance). The watcher component running in the target cluster could notify control-plane that reconciliation should be triggered. Watcher could be generic (configured what resources should trigger reconciliation.
Reconciliation should be done as quickly as possible (no waiting for resources inside the reconciliation loop). Use RequeueAfter option to handle missing resources you are waiting for.

API design
The API should be designed and validated against all use cases and requirements.

There are 2 top level Custom Resources:

Cluster - managed by provisioner - it is a request to create a kubernetes cluster
Kyma - managed by kyma-operator - it contains list of Kyma modules that should be installed in the kubernetes cluster.

Kyma resource does not depend on Cluster resource. The connection is indirect. Both resources reference the kubeconfig secret that is created by the provisioner. If a cluster already exists kubeconfig can be created directly and referenced by Kyma resource (no need to create Cluster resource at all). Kyma operator installs Custom Resource Definitions in the target cluster and creates component CRs referencing the same kubeconfig to start reconciliation of selected Kyma modules. If the kubeconfig reference is empty kyma operator and component reconcilers operate on the same cluster.

Decisions:

Cluster (infrastructure) reconciliation should be separated from Kyma operator. Kyma operator can work on remote clusters and on the local cluster (install kyma where operator is running). For infrastructure, it is always remote. Separation helps also with Bring Your Own Cluster model - Cluster resource is not created.

Open topics:

How to install CRDs? Currently, all CRDs are installed in front.
How to check if dependency is ready?
- kyma-operator watches resources and creates/changes dependent resources (e.g. when istio becomes ready creates HelmComponents)
- components verify dependencies on their own (Helm Component checks if istio sidecar injection is enabled)

Links

The text was updated successfully, but these errors were encountered:

janmedrek · 2022-04-04T14:10:33Z

I like the concept of the separate Cluster/Kyma CRDs. We need to come up with a way to "tie" both of them, right now we are missing a link between Runtime and Cluster creation. Would the kyma-operator handle that, KEB, or should we introduce another component?

I would say that it's the component's responsibility to determine when to act. If we go with kyma-operator based workflow then in the end we will end up with one component that has to know everything about the whole setup sequence of the Runtime. In my opinion, this will not really differ from the declarative-imperative mix we have right now.

Also, what is your opinion on external integrations, such as Compass registration? Should we treat them as just regular Components, represented by their own CRD?

piotrmiskiewicz · 2022-04-05T07:18:09Z

Let's think about requirements, what we expect from high level (do not care if we are talking using k8s API and CRDs or GraphQL or REST API). Do we want KEB is doing one call for creating Kyma Runtime (with all necessary things or not). If yes, I can imagine "Runtime" CRD, which is the root. Then we have a runtime-operator, which creates proper "cluster" and "kyma" resources. I can also imagine the third one - "compass".
The runtime operator flow could be:

Runtime operator checks if the compass integration is necessary, if yes - create the "compass" resource
Runtime operator creates proper "cluster" resource using given kubeconfig (if provided for bring your own cluster) or hyperscaler/region/machine values.
Runtime operator waits until "cluster" (and "compass" if created) are ready.
Runtime operator creates resource "kyma"

Let's think how it looks, when the root operator does not care about dependencies. It creates all resource at the same time: "cluster", ("compass" if necessary) and "kyma". Then Kyma-Operator watches if "compass" and "cluster" are ready. If yes, then starts creating "HelmComponent", "IstioComponent", "ClusterEssentials" etc.

There is another way - KEB is creating "cluster", "compass" then waits. When ready, creates "kyma" resource. The question is where we expect the orchestration to be done - KEB or a separate component? Where we should implement the "if" statement, which decides if we are registering the runtime in the Compass or not.

pbochynski · 2022-04-07T11:00:26Z

@piotrmiskiewicz I was thinking about having another CRD on top (Runtime) but then we have 3 levels of operators. The question is what would be in the spec of the runtime CRD. Lets take 2 use cases:

create Kyma Runtime with managed cluster in AWS, region us-east-1, multi-zone, machine type m5.xlarge, minimum worker pool size: 2, and with 3 kyma modules: eventing, istio, serverless
create Kyma Runtime with own cluster and with 3 kyma modules: eventing, istio, serverless

In KEB I expect 2 separate plans for these 2 use cases with completely different input parameters. First plan has all infrastructure details, second has just kubeconfig. If you introduce Runtime CRD it has to contain infrastructure details, kubeconfig (one of them is mandatory) and list of modules. I think it would be better to create Kyma CR and one of: Cluster CR or Kubeconfig secret from KEB. You can create them in parallel (no need to wait).

pbochynski · 2022-04-08T08:56:15Z

I like the concept of the separate Cluster/Kyma CRDs. We need to come up with a way to "tie" both of them, right now we are missing a link between Runtime and Cluster creation. Would the kyma-operator handle that, KEB, or should we introduce another component?

The "tie" would be done by reference to the kubeconfig secret (name). KEB would create both resources Cluster and Kyma that would refer to the same kubeconfig name. For the BYOC model KEB would create Kyma and kubeconfig secret directly.

I would say that it's the component's responsibility to determine when to act. If we go with kyma-operator based workflow then in the end we will end up with one component that has to know everything about the whole setup sequence of the Runtime. In my opinion, this will not really differ from the declarative-imperative mix we have right now.

Kyma operator was not meant to manage sequence. It is more meta-operator. Kyma-operator will be responsible for installing CRD for selected components and creating Component CR for list of selected modules. The logic can be generic and based on the configuration provided for each kyma version.

Also, what is your opinion on external integrations, such as Compass registration? Should we treat them as just regular Components, represented by their own CRD?

Yes. Compass integration is just another module (added to the picture).

jakobmoellerdev · 2022-04-14T14:17:53Z

@pbochynski regarding external systems: Do these systems lie behind a VPN or what is the reason they cannot be reached from the customer cluster? Otherwise also a Proxy would be totally sufficient to reach them. Not saying we can't centralize these components, just to make sure we don't artificially limit ourselves here

Tomasz-Smelcerz-SAP · 2022-04-15T09:00:08Z

@pbochynski I would like to better understand the sentence: "Component reconcilers should handle their dependencies "

Consider a component operator, e.g: Ory. Should it check for pre-requisites like: "Is istio installed already?", "Is there a certificate in the cluster already?"

If the answer is "yes", then I think we'll end up with a bunch of operators that have embedded knowledge about most of the runtime environment, with just minor differences between them. Of course they will install different things, but their dependencies will be similar and the components themselves will have to know a lot about their environment.

Considering that, I vote for the model, where the Kyma-operator is the entity that has the knowledge about top-level dependencies (if any) and is a single source of truth for that. Component reconcilers should only focus on "technical" dependencies, like, for example, ability to create objects (RBAC), ability to access necessary remote services (networking) etc, without knowing which component is actually providing such services to them.

pbochynski · 2022-04-20T15:39:11Z

@pbochynski I would like to better understand the sentence: "Component reconcilers should handle their dependencies "

Consider a component operator, e.g: Ory. Should it check for pre-requisites like: "Is istio installed already?", "Is there a certificate in the cluster already?"

If the answer is "yes", then I think we'll end up with a bunch of operators that have embedded knowledge about most of the runtime environment, with just minor differences between them. Of course they will install different things, but their dependencies will be similar and the components themselves will have to know a lot about their environment.

We do not have too many dependencies now and we aim to have even fewer dependencies. Right now we have just istio and certificates as prerequisite. And we should not treat any dependency as a hard dependency. If we don't have certificate it doesn't mean that api-gateway controller cannot be installed. Most of the controllers do not have even dependency to istio (and should be excluded from istio-mesh if they only communicate with api-server). I would not demonize the dependency check. In ory you need istio just to create virtual service. So the only thing to do is to handle the error correctly (if there is no such resource as istio virtual service) then return error from reconciliation. Kubernetes will try again with default backoff strategy or you can decide when to try again (requeueAfter). That's it. Your controller has to handle such situation even if dependency management will be implemented in Kyma operator because someone can delete istio in the cluster after it was installed. We need to code controllers and reconcilers with resilience and eventual consistency in mind.

pbochynski · 2022-04-20T15:51:10Z

@pbochynski regarding external systems: Do these systems lie behind a VPN or what is the reason they cannot be reached from the customer cluster? Otherwise also a Proxy would be totally sufficient to reach them. Not saying we can't centralize these components, just to make sure we don't artificially limit ourselves here

I have 3 use cases right now:

usage of powerful API to create tenants in external systems
access to the cloud provider account to configure volume encryption (customer managed keys)
access to gardener project to configure networking

More probably will come when we get external contributions.

ghost · 2022-07-01T10:57:42Z

This issue has been automatically marked as stale due to the lack of recent activity. It will soon be closed if no further activity occurs. Thank you for your contributions.

ghost · 2022-07-08T11:57:25Z

This issue has been automatically closed due to the lack of recent activity. /lifecycle rotten

tobiscr · 2022-07-20T09:17:42Z

Discussion continued in kyma-project/community#666

github-actions · 2022-09-19T04:12:10Z

This issue has been automatically marked as stale due to the lack of recent activity. It will soon be closed if no further activity occurs. Thank you for your contributions.

pbochynski · 2022-10-24T10:51:12Z

The modular architecture is described here and is ready for implementation.

pbochynski added the architecture label Mar 28, 2022

pbochynski self-assigned this Mar 28, 2022

jakobmoellerdev mentioned this issue Apr 7, 2022

High-level motivation for operator-based reconciliation kyma-project/community#654

Merged

adityabhatia mentioned this issue Apr 20, 2022

Operator POC: Watching cluster resources for changes kyma-incubator/reconciler#1046

Closed

This was referenced Apr 20, 2022

Define API for Provisioner Operator kyma-incubator/reconciler#1047

Closed

Implement Provisioner Reconciler kyma-incubator/reconciler#320

Closed

janmedrek mentioned this issue Apr 27, 2022

Enable NAT Gateway for Azure clusters on Prod landscapes #13553

Closed

adityabhatia mentioned this issue May 3, 2022

POC: Watching and triggering actions for cluster resources kyma-project/lifecycle-manager#10

Closed

16 tasks

This was referenced May 3, 2022

MVP: Infrastructure Operators kyma-project/lifecycle-manager#12

Closed

Lifecycle Manager delivery kyma-project/lifecycle-manager#13

Closed

MVP: Module Manager kyma-project/module-manager#3

Closed

tobiscr mentioned this issue Jul 8, 2022

Watcher-Listener delivery kyma-project/runtime-watcher#2

Closed

18 tasks

pbochynski mentioned this issue May 12, 2022

Modularization concept kyma-project/community#666

Merged

ghost added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 1, 2022

ghost closed this as completed Jul 8, 2022

tobiscr removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 20, 2022

tobiscr reopened this Jul 20, 2022

This was referenced Aug 23, 2022

POC: Prepare a project structure for keda operator #15232

Closed

Deliver Releasable and testable artefacts for KEDA Manager #15290

Closed

pbochynski mentioned this issue Sep 15, 2022

Release channels kyma-project/community#717

Merged

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 19, 2022

tobiscr removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 21, 2022

pbochynski mentioned this issue Oct 17, 2022

Update module operator description and examples kyma-project/community#726

Merged

pbochynski closed this as completed Oct 24, 2022

pbochynski mentioned this issue Apr 13, 2023

Replace Helm charts with Kubernetes operators for managing modules kyma-project/community#774

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modular architecture for component reconcilers and kyma CLI #13759

Modular architecture for component reconcilers and kyma CLI #13759

pbochynski commented Mar 28, 2022 •

edited

Loading

janmedrek commented Apr 4, 2022

piotrmiskiewicz commented Apr 5, 2022 •

edited

Loading

pbochynski commented Apr 7, 2022

pbochynski commented Apr 8, 2022

jakobmoellerdev commented Apr 14, 2022

Tomasz-Smelcerz-SAP commented Apr 15, 2022 •

edited

Loading

pbochynski commented Apr 20, 2022

pbochynski commented Apr 20, 2022

ghost commented Jul 1, 2022

ghost commented Jul 8, 2022

tobiscr commented Jul 20, 2022

github-actions bot commented Sep 19, 2022

pbochynski commented Oct 24, 2022

Modular architecture for component reconcilers and kyma CLI #13759

Modular architecture for component reconcilers and kyma CLI #13759

Comments

pbochynski commented Mar 28, 2022 • edited Loading

janmedrek commented Apr 4, 2022

piotrmiskiewicz commented Apr 5, 2022 • edited Loading

pbochynski commented Apr 7, 2022

pbochynski commented Apr 8, 2022

jakobmoellerdev commented Apr 14, 2022

Tomasz-Smelcerz-SAP commented Apr 15, 2022 • edited Loading

pbochynski commented Apr 20, 2022

pbochynski commented Apr 20, 2022

ghost commented Jul 1, 2022

ghost commented Jul 8, 2022

tobiscr commented Jul 20, 2022

github-actions bot commented Sep 19, 2022

pbochynski commented Oct 24, 2022

pbochynski commented Mar 28, 2022 •

edited

Loading

piotrmiskiewicz commented Apr 5, 2022 •

edited

Loading

Tomasz-Smelcerz-SAP commented Apr 15, 2022 •

edited

Loading