Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add kubermatic-installer-operator proposal #13327

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
129 changes: 129 additions & 0 deletions docs/proposals/kubermatic-installer-operator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# Kubermatic Installer Operator: Enhanced Kubermatic Installer

**Author**: Mohamed Rafraf (@mohamed-rafraf)
**Status**: Draft proposal

This proposal introduces a Kubernetes controller and a Custom Resource Definition (CRD) designed to simplify the installation,
upgrade, and management of Kubermatic systems, replacing the traditional CLI-based approach.


## Motivation

The existing CLI-based installation and upgrade process for Kubermatic, while effective,
requires manual intervention and extensive Kubernetes and Helm knowledge. By transitioning to a controller-based model,
we aim to automate these processes and make them more accessible to users with varying
levels of expertise. Key motivations include:

* **Automation of Upgrades and Installation:** Automate the entire lifecycle management, including initial installation,
upgrades, changes, and potentially scaling.
* **Reduction of Errors and Early Validation:** Minimize the chance of errors from manual steps, enhancing system reliability
by integrating validation directly into the CRD to catch configuration errors before applying changes.
* **Ready for Future Enhancements:** A controller-based installer makes it easier to introduce new features and integrations. The installer can be quickly updated or expanded to levarga new features without breaking underlying mechanism.

## Proposal

A new Kubermatic Installer Operator that utilizes a CRD called `KubermaticInstallation`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like we're re-inventing existing Helm operators that simply watch some CRD and install Helm charts into the cluster. What would be the benefit and effort of writing a custom one?

KKP already has had multiple bugs in its App feature when it tries to automate managing Helm installations. I'm fearful of writing a Helm reconciler myself.

(Since `KubermaticConfiguration` exist).The operator will handle the deployment and management
of the Kubermatic stack by watching for changes to instances of `KubermaticInstallation` CRD.

* **KubermaticInstalltion CRD:** Define all necessary configurations needed for the Kubermatic installation
in a single Kubernetes resource
* **Kubermatic Installer Operator:** A controller that reacts to changes to KubermaticInstallation resources
by deploying or updating the Kubermatic stack according to the specified configuration.
* **Dependencies Management:** The operator will manage dependencies like cert-manager, nginx-ingress-controller,
and Dex, ensuring they are installed or upgraded as necessary before deploying Kubermatic components.
Comment on lines +23 to +34
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation is good and we've discussed this in SIG Cluster Management numerous time (with some consensus), but in my opinion the proposal needs refinement. Why are we adding another CRD that describes the same thing as a KubermaticConfiguration - a KKP setup?

KKP already has the kubermatic-operator, which is responsible for upgrading all KKP components. What is the benefit of adding an outer operator layer that just exists to run a bit of migration and verification code and then increment the inner operator version, instead of e.g. moving the migration code into the existing operator?

If we get rid of the installer and deliver a Helm chart to install (any) operator, why not create a meta/umbrella chart that installs kubermatic-operator (the existing one) and the dependencies (cert-manager, ingress controller, etc)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta/umbrella chart

Even here I wonder what the benefit is. I think mainly that you keep a consistent set of versions and don't install each latest version of each dependency individually.

Because if the umbrella Chart were to include everything, everyone would complain that it installs too much. So each component would need to be optional anyway, at which point the benefit of the umbrella chart might diminish. Again, besides the good point of having one set of versions blessed by us when we tag a KKP release.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback, I understand the concerns about introducing a new layer when we already have kubermatic-opertor. The key motivation behind proposing an installer operator is to provide a solution that can manage not just the core KKP components, but also additional stacks such as monitoring, logging, backups, etc., which the current kubermatic-operator does not handle. The goal is to offer a unified, automated experience for admins and KKP users

The CRD will not describe the same thing as KubermaticConfiguration, but the other stacks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the way to go, but just for the sake of poking this argument:

Why include all the things already configurable by KubermaticConfiguration if what you want to focus on are actually just the dependencies? What is the benefit of having all of those KKP settings effectively doubled? Wouldn't it make more sense to have a DefaultStack resource or something that we add to the existing kubermatic-operator and give it the capability to deploy dependencies alongside reconciling KubermaticConfiguration?

In addition, I think the question regarding the benefit of this over providing a meta Helm chart also still stands.



### User Story

An administrator wishes to install or upgrade their Kubermatic system::

1. The admin prepares a `KubermaticInstallation` manifest. The goal here is to avoid splitting the configuration and customization to be split into different places and manifests like Helm `values.yaml` file, `KubermaticConfiguration` etc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the KubermaticInstallation CRD (and the operator) get into the cluster in the first place?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of making the operator very simple so it could be installed with one yaml manifest that contains the CRD, deployment and necessary RBAC permissions. This approach would enable a one-command installation, simplifying the initial setup

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, but my point here was that the installation process is missing a step (installing this new operator) which makes it look more simple than it actually is. We should strive for completeness when describing admin UX and workflows.

1. The admin create or apply the manifest and the operator will detect the new/changed configuration
and performs the necessary actions to bring the system to the desired state, handling dependencies and ordering automatically.
1. admin can follow the installation details by watching the logs of the operator.
1. The administrator monitors the status of the installation or upgrade through Kubernetes standard tools like kubectl describe.

Example of `KubermaticInstallation`. Each stack could be defined in different manifests and different objects.

```yaml
apiVersion: kubermatic.k8c.io/v1
kind: KubermaticInstallation
metadata:
name: kubermatic
namespace: kubermatic

spec:
master:
domain: demo.kubermatic.io
dex:
clients:
- id: kubermatic
name: kubermatic
#######################
### SKIP DEX CONFIG ###
#######################
controller:
replicas: 2
seed:
replicas: 2
cert-manager:
clusterIssuers:
letsencrypt-prod:
email: mohamed.rafraf@kubermatic.com
enabled: true
nginx:
enabled: true
controller:
replicaCount: 2

seed:
- name: seed-1
etcdBackupRestore:
defaultDestination: minio-ext-endpoint
#######################
### SKIP SEED CONFIG ###
#######################
Comment on lines +80 to +86
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we gain from this "monolith" CRD over storing multiple YAML documents in a kubermatic.yaml which holds the KubermaticConfiguration "master config" object and multiple Seed objects which you then apply?

I mean:

apiVersion: kubermatic.k8c.io/v1
kind: KubermaticConfiguration
metadata:
  name: kubermatic
  namespace: kubermatic
spec:
  [...]
---
apiVersion: kubermatic.k8c.io/v1
kind: Seed
metadata:
  name: seed-1
spec:
  [...]
---
apiVersion: kubermatic.k8c.io/v1
kind: Seed
metadata:
  name: seed-2
spec:
  [...]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(assuming we change the way we ship the Seed CRD). But I'd be very sceptical to add fields to one CRD to generate another CRD on the same layer of the KKP installation, just because currently the CRD isn't there at that time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design of the KubermaticInstallation CRD is flexible, allowing you to either define all components within one comprehensive manifest or split them into individual manifests for each stack, depending on your management preferences. It's not mandatory to create a monolithic YAML manifest. you can choose to apply configurations for each stack separately with their own specific specs. This ensures that the CRD can support a variety of deployment strategies

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this point. Are you saying you could apply the same KubermaticInstallation multiple times with only specific settings set to configure the components? That doesn't sound very declarative.


monitoring:
grafana:
enabled: true
prometheus:
enabled: true
nodeExprter:
enabled: true

mla:
loki-distributed:
enabled: true
ingester:
replicas: 3

cortex:
enabled: true
server:
replicas: 2

backup:
velero:
enabled: true
##########################
### SKIP VELERO CONFIG ###
##########################
minio:
enabled: true
storeSize: 100Gi
```

### Goals

* Develop the `KubermaticInstallation` CRD to capture all required installer settings.
* Implement the Kubermatic Installer Operator to manage lifecycle events based on CRD changes.
* Integrate thorough validation within the operator to ensure configuration validity before application.
* Package and distribute the operator for easy deployment.
* Provide clear, helpful log output and error messages.

### Non-Goals

* Handling downgrades which can be significantly complex and risky.