Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modular architecture for component reconcilers and kyma CLI #13759

Closed
pbochynski opened this issue Mar 28, 2022 · 13 comments
Closed

Modular architecture for component reconcilers and kyma CLI #13759

pbochynski opened this issue Mar 28, 2022 · 13 comments
Assignees

Comments

@pbochynski
Copy link
Contributor

pbochynski commented Mar 28, 2022

Description
Kyma architecture should support modularization. The first step in this direction was made with the initial implementation of kyma reconciler, but it is not sufficient.

Requirements

  • Flexible deployment options for Kyma operator (installer)
    • can manage multiple cluster (control-plane mode)
    • can be shipped as a single binary (cli)
    • can be installed in the cluster (single cluster mode)
  • Each component can be independently enabled/disabled
  • Each component can provide own reconciler (operator) or use base reconciler (helm) if no special actions are required
  • Component reconcilers should handle their dependencies (fail if something is missing)
  • Some component reconcilers require secure (trusted) connection to the external systems and cannot run in the target cluster
  • Event-driven reconciliation should be preferred over time-based reconciliation (component is up and running as soon as possible after dependencies are ready)
  • Horizontal scalability should be possible (sharding)
  • Changing Kyma version and component version
  • Central configuration should take precedence over user configuration (in the managed Kyma scenario control plane can validate user configuration before it is applied)
  • Avoid creating additional service accounts with powerful roles in the user cluster (e.g. tiller like service account with cluster admin role)
  • Running external modules should be possible (adding component reconcilers from external contributors)

Reasons
Kyma provides Kubernetes building blocks. It should be easy to pick only those that are needed for the job and it should be easy to add new blocks to extend Kyma features. With the growing number of components, it is not possible to always install them all anymore.

Ideas

  1. Use CRD and Operator SDK / Kubebuilder as base architecture.
  2. KEB is just OSB API wrapper for Cluster custom resource (translates plan into cluster configuration and Kyma version in the resource spec)
  3. Each reconciler has own CRD to manage. Use owner reference to point main cluster resource.
  4. Some reconcilers have to watch target clusters resource (e.g. service instance). The watcher component running in the target cluster could notify control-plane that reconciliation should be triggered. Watcher could be generic (configured what resources should trigger reconciliation.
  5. Reconciliation should be done as quickly as possible (no waiting for resources inside the reconciliation loop). Use RequeueAfter option to handle missing resources you are waiting for.

API design
The API should be designed and validated against all use cases and requirements.
kyma-operator-generic

There are 2 top level Custom Resources:

  • Cluster - managed by provisioner - it is a request to create a kubernetes cluster
  • Kyma - managed by kyma-operator - it contains list of Kyma modules that should be installed in the kubernetes cluster.

Kyma resource does not depend on Cluster resource. The connection is indirect. Both resources reference the kubeconfig secret that is created by the provisioner. If a cluster already exists kubeconfig can be created directly and referenced by Kyma resource (no need to create Cluster resource at all). Kyma operator installs Custom Resource Definitions in the target cluster and creates component CRs referencing the same kubeconfig to start reconciliation of selected Kyma modules. If the kubeconfig reference is empty kyma operator and component reconcilers operate on the same cluster.

kyma-operator

Decisions:

  • Cluster (infrastructure) reconciliation should be separated from Kyma operator. Kyma operator can work on remote clusters and on the local cluster (install kyma where operator is running). For infrastructure, it is always remote. Separation helps also with Bring Your Own Cluster model - Cluster resource is not created.

Open topics:

  • How to install CRDs? Currently, all CRDs are installed in front.
  • How to check if dependency is ready?
    • kyma-operator watches resources and creates/changes dependent resources (e.g. when istio becomes ready creates HelmComponents)
    • components verify dependencies on their own (Helm Component checks if istio sidecar injection is enabled)

Links

@pbochynski pbochynski self-assigned this Mar 28, 2022
@janmedrek
Copy link
Contributor

I like the concept of the separate Cluster/Kyma CRDs. We need to come up with a way to "tie" both of them, right now we are missing a link between Runtime and Cluster creation. Would the kyma-operator handle that, KEB, or should we introduce another component?

I would say that it's the component's responsibility to determine when to act. If we go with kyma-operator based workflow then in the end we will end up with one component that has to know everything about the whole setup sequence of the Runtime. In my opinion, this will not really differ from the declarative-imperative mix we have right now.

Also, what is your opinion on external integrations, such as Compass registration? Should we treat them as just regular Components, represented by their own CRD?

@piotrmiskiewicz
Copy link
Member

piotrmiskiewicz commented Apr 5, 2022

Let's think about requirements, what we expect from high level (do not care if we are talking using k8s API and CRDs or GraphQL or REST API). Do we want KEB is doing one call for creating Kyma Runtime (with all necessary things or not). If yes, I can imagine "Runtime" CRD, which is the root. Then we have a runtime-operator, which creates proper "cluster" and "kyma" resources. I can also imagine the third one - "compass".
The runtime operator flow could be:

  1. Runtime operator checks if the compass integration is necessary, if yes - create the "compass" resource
  2. Runtime operator creates proper "cluster" resource using given kubeconfig (if provided for bring your own cluster) or hyperscaler/region/machine values.
  3. Runtime operator waits until "cluster" (and "compass" if created) are ready.
  4. Runtime operator creates resource "kyma"

Let's think how it looks, when the root operator does not care about dependencies. It creates all resource at the same time: "cluster", ("compass" if necessary) and "kyma". Then Kyma-Operator watches if "compass" and "cluster" are ready. If yes, then starts creating "HelmComponent", "IstioComponent", "ClusterEssentials" etc.

There is another way - KEB is creating "cluster", "compass" then waits. When ready, creates "kyma" resource. The question is where we expect the orchestration to be done - KEB or a separate component? Where we should implement the "if" statement, which decides if we are registering the runtime in the Compass or not.

@pbochynski
Copy link
Contributor Author

@piotrmiskiewicz I was thinking about having another CRD on top (Runtime) but then we have 3 levels of operators. The question is what would be in the spec of the runtime CRD. Lets take 2 use cases:

  1. create Kyma Runtime with managed cluster in AWS, region us-east-1, multi-zone, machine type m5.xlarge, minimum worker pool size: 2, and with 3 kyma modules: eventing, istio, serverless
  2. create Kyma Runtime with own cluster and with 3 kyma modules: eventing, istio, serverless

In KEB I expect 2 separate plans for these 2 use cases with completely different input parameters. First plan has all infrastructure details, second has just kubeconfig. If you introduce Runtime CRD it has to contain infrastructure details, kubeconfig (one of them is mandatory) and list of modules. I think it would be better to create Kyma CR and one of: Cluster CR or Kubeconfig secret from KEB. You can create them in parallel (no need to wait).

@pbochynski
Copy link
Contributor Author

I like the concept of the separate Cluster/Kyma CRDs. We need to come up with a way to "tie" both of them, right now we are missing a link between Runtime and Cluster creation. Would the kyma-operator handle that, KEB, or should we introduce another component?

The "tie" would be done by reference to the kubeconfig secret (name). KEB would create both resources Cluster and Kyma that would refer to the same kubeconfig name. For the BYOC model KEB would create Kyma and kubeconfig secret directly.

I would say that it's the component's responsibility to determine when to act. If we go with kyma-operator based workflow then in the end we will end up with one component that has to know everything about the whole setup sequence of the Runtime. In my opinion, this will not really differ from the declarative-imperative mix we have right now.

Kyma operator was not meant to manage sequence. It is more meta-operator. Kyma-operator will be responsible for installing CRD for selected components and creating Component CR for list of selected modules. The logic can be generic and based on the configuration provided for each kyma version.

Also, what is your opinion on external integrations, such as Compass registration? Should we treat them as just regular Components, represented by their own CRD?

Yes. Compass integration is just another module (added to the picture).

@jakobmoellerdev
Copy link

@pbochynski regarding external systems: Do these systems lie behind a VPN or what is the reason they cannot be reached from the customer cluster? Otherwise also a Proxy would be totally sufficient to reach them. Not saying we can't centralize these components, just to make sure we don't artificially limit ourselves here

@Tomasz-Smelcerz-SAP
Copy link
Member

Tomasz-Smelcerz-SAP commented Apr 15, 2022

@pbochynski I would like to better understand the sentence: "Component reconcilers should handle their dependencies "

Consider a component operator, e.g: Ory. Should it check for pre-requisites like: "Is istio installed already?", "Is there a certificate in the cluster already?"

If the answer is "yes", then I think we'll end up with a bunch of operators that have embedded knowledge about most of the runtime environment, with just minor differences between them. Of course they will install different things, but their dependencies will be similar and the components themselves will have to know a lot about their environment.

Considering that, I vote for the model, where the Kyma-operator is the entity that has the knowledge about top-level dependencies (if any) and is a single source of truth for that. Component reconcilers should only focus on "technical" dependencies, like, for example, ability to create objects (RBAC), ability to access necessary remote services (networking) etc, without knowing which component is actually providing such services to them.

@pbochynski
Copy link
Contributor Author

@pbochynski I would like to better understand the sentence: "Component reconcilers should handle their dependencies "

Consider a component operator, e.g: Ory. Should it check for pre-requisites like: "Is istio installed already?", "Is there a certificate in the cluster already?"

If the answer is "yes", then I think we'll end up with a bunch of operators that have embedded knowledge about most of the runtime environment, with just minor differences between them. Of course they will install different things, but their dependencies will be similar and the components themselves will have to know a lot about their environment.

We do not have too many dependencies now and we aim to have even fewer dependencies. Right now we have just istio and certificates as prerequisite. And we should not treat any dependency as a hard dependency. If we don't have certificate it doesn't mean that api-gateway controller cannot be installed. Most of the controllers do not have even dependency to istio (and should be excluded from istio-mesh if they only communicate with api-server). I would not demonize the dependency check. In ory you need istio just to create virtual service. So the only thing to do is to handle the error correctly (if there is no such resource as istio virtual service) then return error from reconciliation. Kubernetes will try again with default backoff strategy or you can decide when to try again (requeueAfter). That's it. Your controller has to handle such situation even if dependency management will be implemented in Kyma operator because someone can delete istio in the cluster after it was installed. We need to code controllers and reconcilers with resilience and eventual consistency in mind.

@pbochynski
Copy link
Contributor Author

@pbochynski regarding external systems: Do these systems lie behind a VPN or what is the reason they cannot be reached from the customer cluster? Otherwise also a Proxy would be totally sufficient to reach them. Not saying we can't centralize these components, just to make sure we don't artificially limit ourselves here

I have 3 use cases right now:

  • usage of powerful API to create tenants in external systems
  • access to the cloud provider account to configure volume encryption (customer managed keys)
  • access to gardener project to configure networking

More probably will come when we get external contributions.

@ghost
Copy link

ghost commented Jul 1, 2022

This issue has been automatically marked as stale due to the lack of recent activity. It will soon be closed if no further activity occurs. Thank you for your contributions.

@ghost ghost added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 1, 2022
@ghost
Copy link

ghost commented Jul 8, 2022

This issue has been automatically closed due to the lack of recent activity. /lifecycle rotten

@ghost ghost closed this as completed Jul 8, 2022
@tobiscr tobiscr removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 20, 2022
@tobiscr
Copy link
Contributor

tobiscr commented Jul 20, 2022

Discussion continued in kyma-project/community#666

@github-actions
Copy link

This issue has been automatically marked as stale due to the lack of recent activity. It will soon be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 19, 2022
@tobiscr tobiscr removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 21, 2022
@pbochynski
Copy link
Contributor Author

The modular architecture is described here and is ready for implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants