Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Enable gitops usage on management clusters #2

Merged
merged 4 commits into from
Jul 12, 2021

Conversation

MarcelMue
Copy link
Contributor

Please discuss the comment publicly here. We want to find some alignment on our approach.

@MarcelMue MarcelMue self-assigned this Feb 10, 2021
@MarcelMue
Copy link
Contributor Author

Comment by @webwurst in the internal issue:

I like the idea and think it is also the right time to explore that area. People are becoming more aware of GitOps and eventually customers will ask for it. The development teams behind Flux and ArgoCD gathered to exchange experiences from production usage. Finally they decided to follow separated paths, but hopefully for good reasons. As the result there is Flux2 based on GitOps Toolkit which can also be used in your own golang tools or operators.

Gerald put out a first preview of a possible flux-app in our playground-catalog. Challenge at the moment is that it did not reach v1.0 yet and there are multiple releases per week to catch up with ;) Halo provides this "as is" for the current cycle since we are focusing on Loki. But I guess we are happy for PRs and discussions.

Trying out Flux1 earlier didn't make much sense since there were discussions about a re-write and breaking changes and finally the deprecation note last autumn.

ArgoCD and Fleet might be interesting for comparison.

@MarcelMue MarcelMue requested review from a team February 10, 2021 12:35
We currently give customers access to the management clusters but do not support them in utilizing this access effectively in terms of git ops related management (e.g. for apps).

Context:
Some customers might be interested in utilizing mainstream git ops tooling in order to manage their apps across clusters and installations.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main question for me:

What does it help our customers achieve? Is it, "Manage stacks of apps on fleets of clusters" and challenges related to that?

(If it is, then this is the thing Batman wants to help customers achieve. And I'd like to explore this opportunity to take this next step in that direction.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say this is one approach which could achieve the goal of "Manage stacks of apps on fleets of clusters".

I would appreciate Batman input & opinions here :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cokiengchiara Yes this tooling would help customers to manage the app CRs and user values configmaps and secrets for their apps. With stacks of apps on fleets of clusters that becomes more complex.

There is a lot of content out there on the benefits the GitOps approach provides. I don't want to repeat that but maybe https://www.gitops.tech/ helps as an overview / starting point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with "Manage stacks (or fleets) of apps on fleets of clusters", and to be clear that means it is not just app CRs that get managed, but all kinds of YAML that a customer can and should create on the MCs. So I would generalize this a bit in two directions:

  1. it's not about apps, it's about the Management API (MAPI)
  2. it's not about gitops per se, it is about enabling tooling against the MAPI that needs "an agent" in the MC and cannot just externally work against the MAPI.

## Open Questions

### 1. General direction
1.1 We would like to allow our customers to choose their management tooling freely. Do we want to support some tools with e.g. a managed app first though?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to let customers choose their tooling freely but I think we need some validation so only a subset of apps can be installed in management clusters.

Starting with a managed app first makes sense to me. Especially if its a tool we choose to use ourselves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that managed apps makes sense. It's also a question of how much freedom we give customers / how much we allow them to "fuck up".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be that we limit this for the beginning to there's this selection of apps (at the get go 1) that you can install on your MC using our interface. very controlled.

Later, it could be extended to having nicely isolated namespaces in the MC where we can give a certain amount of more freedom, but even there because of the limits of isolation we need to be very careful and we might never get there completely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it's not only about "fuckign up" it might also be that there's cases of actual multi-tenancy within an MC, where there's an untrusted org

1.2 Do we own the gitops tooling or is it purely owned by the customer?

### 2. Technical issues
2.1 How does the setup for an in-cluster agent look like? Currently customers will struggle setting up an in-cluster agent with appropriate permissions.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the app CR the customer just needs to set .spec.kubeConfig.inCluster to true.

We should also make our kubeconfigs work with GitOps tooling. They don't currently work with Flux because of how we name the secrets we generate in cluster-operator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this was bad phrasing on my part:
This more refers to the initial agent set-up (at least as long as we don't have a managed app). We will currently struggle to not give the agent full permissions to all namespaces, right?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This more refers to the initial agent set-up (at least as long as we don't have a managed app). We will currently struggle to not give the agent full permissions to all namespaces, right?

Ah got it. Yes I think we'll need to restrict the permissions. Taking Flux as an example AIUI there are multi tenancy patterns with a system Flux for platform admins and then multiple Flux's for teams.

My proposal would be we follow that approach and offer a managed app with the locked down setup. This is another argument for offering managed apps for this IMO.

### 2. Technical issues
2.1 How does the setup for an in-cluster agent look like? Currently customers will struggle setting up an in-cluster agent with appropriate permissions.

2.2 How do we ensure security requirements when customer interaction increases?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a multi tenancy problem with chart CRs and their related configmaps and secrets.

These are currently stored in the giantswarm namespace. Before enabling customers to create apps in management clusters I think we should move this to their organization or cluster namespace.

We currently give customers access to the management clusters but do not support them in utilizing this access effectively in terms of git ops related management (e.g. for apps).

Context:
Some customers might be interested in utilizing mainstream git ops tooling in order to manage their apps across clusters and installations.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cokiengchiara Yes this tooling would help customers to manage the app CRs and user values configmaps and secrets for their apps. With stacks of apps on fleets of clusters that becomes more complex.

There is a lot of content out there on the benefits the GitOps approach provides. I don't want to repeat that but maybe https://www.gitops.tech/ helps as an overview / starting point.

@rossf7
Copy link

rossf7 commented Feb 10, 2021

I think the primary interface should remain the Management API and App CRD to not force an option on customers. But I like the idea and I don't see this as blocking that.

Allowing customers to choose their own GitOps tooling could make using app platform easier and unblock more complex use cases.

I liked the demo Halo did with Flux. Especially using SOPS to encrypt the secrets which is a complex area. Argo could also be interesting especially its workflow engine. This could also enable customers to use these tools directly rather than App Platform if they work better for their use case.

@piontec
Copy link
Contributor

piontec commented Feb 12, 2021

There are also some notes (since quite some time) in our brain storm doc: https://docs.google.com/document/d/1-zgREXVmnu2Rd7QLnrKO15MJWSkWS9_yBWkzFZsSRNw/edit#heading=h.c80optp1pse7

@piontec
Copy link
Contributor

piontec commented Feb 12, 2021

Few more comments from me:

  • to get started, I think it totally makes sense to do managed Apps only
  • I'd like to keep (as Ross already wrote) the API intact: so we're looking for a gitops tool that manages/syncs App/CM/Secret (C)Rs
  • To get started, I would avoid allowing customers to choose and install stuff on MCs. This seems like a huge security and stability risk that will need a lot of effort to implement. I like it in general, just I don't think it's necessary to get customers started with gitops. By recommendation would be "opt-in request": if customer wants to use gitops tools, we deploy it for them to selected MCs. Customers then continue by providing configuration as necessary CRs. Limiting RBAC access to necessary namespaces and CRs on the MC should be much faster and easier than allowing them to install anything from a set of apps (this feature might come later)
  • TO get started, I would offer just a single gitops tool. @rossf7 and me already agreed to do a quick comparison project: we want to compare flux/argo/fleet/jenkinsx (OK, the last one is already out after what I learned today). Out of them , we pick one, we support this one and we offer it as on opt-in way for customers on MCs. We will evaluate what works best with our app management API, so they don't have to. And since we need to learn it, we can make some training for customers, so it's easier for them to get started.

All in all, as MVP, 1 opinionated tool, deployed on-demand by us and only configured by customers makes a lot of sense to me.

### 1. General direction
1.1 We would like to allow our customers to choose their management tooling freely. Do we want to support some tools with e.g. a managed app first though?

1.2 Do we own the gitops tooling or is it purely owned by the customer?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say this could be similar to other managed apps, they might start in a low support level and move up if we feel confident, some might never be mananged


2.2 How do we ensure security requirements when customer interaction increases?

2.3 Do we foresee issues when introducing gitops tooling to already existing resources?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depends how invasive the tooling is, like does it delete and recreate resources or does it "adopt" and apply/update

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! This needs to be checked for each GitOps tool separately if resources changed by happa or gsctl will be changed back to the state the GitOps tool knows.

Random idea: At least flux applies labels to ressourced managed by it, so we could display (for example) clusters installed through the GitOps tool as read-only

2.3 Do we foresee issues when introducing gitops tooling to already existing resources?

### 3. Guidance
3.1 How do we support customers in making sensible decision in terms of tooling choice with gitops?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piontec was proposing a SPIKE within balo looking at the alternatives and forming a first opnion, as currently GS has no or only very low informed opinions it feels

### 3. Guidance
3.1 How do we support customers in making sensible decision in terms of tooling choice with gitops?

3.2 Do we aid customers with repository structure for their gitops approach?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is sth we believe should be used, we could or rather should accompany the offering with docs as well as tutorials or workshops that educate the users. This is really nice in the context of moving Room 1 customer to Room 2 for example.


3.2 Do we aid customers with repository structure for their gitops approach?

3.3 Do we offer customers to give us shared access to their repository for additional review?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting Q, for me especially interesting if we could find a mutual interest there, as we already often have this between us and customers, where we also hate up waking up because of misconfiguration so we have an incentive to help out and avoid bad config. Also, interesting how that would play into our new Solutions Engineering approaches. cc @giantswarm/se

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we publish some guidelines (previous point) I do think it is fine checking current existing or future repo structure and whole design as part of SE consultancy work

Copy link
Member

@puja108 puja108 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be a really nice opportunity, especially thinking of moving Room 1 customers towards Room 2.

I could also see sense in making our approach staged from "very restrictive/controlled" MVP towards more "freedom", but I'd leave that open to future evaluation as there could be benefits to a more free approach that save initial efforts on our side and do not open up the pandoras box of security.

* main:
  remove duplicate rfc description file (#18)
  Use README.md in the folders (#14)
  Remove rfc numbers (#13)
@teemow
Copy link
Member

teemow commented Apr 22, 2021

Please make sure to pull the changes before making new modifications. I've renamed the folder and markdown as we agreed on a slightly different structure for RFCs.

@JosephSalisbury
Copy link
Contributor

@snizhana-dynnyk do we have roadmap issues towards implementing the ideas here now?

@MarcelMue
Copy link
Contributor Author

There is a roadmap issue now. Merging here.

@MarcelMue MarcelMue merged commit 8216169 into main Jul 12, 2021
@MarcelMue MarcelMue deleted the gitops-management-clusters branch July 12, 2021 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants