Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the proper convention for customizing kubeflow in a maintainable way? #1549

Closed
mttcnnff opened this issue Sep 9, 2020 · 24 comments
Closed

Comments

@mttcnnff
Copy link

mttcnnff commented Sep 9, 2020

I understand that you can use kfctl to generate kustomization.yaml files and then edit the generated files in the /kustomize directory but this doesn't seem maintainable as if I rerun kfctl it will overwrite that /kustomize directory, which could be a problem when you want to change the upstream repo from which you pull config for something like an upgrade down the road.

Is there anyway to generate the overall config for the kubeflow deploy using kfctl but to refer to outside kustomize files that wouldn't get thrown out from a rerun? This way I would have my "base deploy" and then my custom configuration and resources on top?

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
area/kfctl 0.86
kind/question 0.86

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@kubeflow-bot kubeflow-bot added this to To Do in Needs Triage Sep 9, 2020
@Bobgy
Copy link
Contributor

Bobgy commented Sep 10, 2020

@mttcnnff This is exactly the reason why we built kubeflow/gcp-blueprint.
Some explanation in GoogleCloudPlatform/kubeflow-distribution#123.

GCP is now the only platform following this pattern. I'm interested if other platforms want to follow.

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
platform/gcp 0.71

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@mttcnnff
Copy link
Author

@Bobgy Can you explain a bit more of how this pattern works? I'm on AWS but more than happy to contribute to get this in a better place

@thesuperzapper
Copy link
Member

thesuperzapper commented Sep 11, 2020

I make use of argo-cd.

Here is what I do:

  1. create a private git repo (for all the following steps)
  2. use kfctl to generate the top-level kubeflow/xxx/kustomization.yaml files which reference, .cache/manifests-1.1-branch/....
  3. create yaml patch files with the changes I need for my cluster (like dex config.yaml ConfigMap)
  4. add these yaml to the corresponding kubeflow/xxx/kustomization.yaml under patchesStrategicMerge:, so they get strategic merged in by kustomize
  5. create an argo-cd Application which targets each of these top-level folders
    • example: GIT_ROOT/kubeflow/kubeflow-apps/kustomization.yaml
    • note: remember to set kustomize.buildOptions: --load_restrictor=none in your argo-cd configmap

This has the benefit of being git-ops, and means I can make any changes I need without modifying the files in .cache/* directly. It also means I can easily remove components which are already on my K8S cluster, like istio and cert-manager.

If there is interest, I can put a guide for this on the main Kubeflow docs, as I expect this approach is probably the most generic way to deploy kubeflow if you are not wanting to use one of the vendor's approaches.

@mttcnnff
Copy link
Author

mttcnnff commented Sep 11, 2020

@thesuperzapper this sounds good, at the end of the day you're using kfctl once to generate everything and then from there on using kustomize? Never to run kfctl again?

How does this work for different environment? How do you specify which to use for the patchesStrategicMerge?

@thesuperzapper
Copy link
Member

@mttcnnff yep, you only use kfctl once per major Kubeflow version (which is important for git-ops, as you should be storing exactly the YAML you expect to be in your cluster, explicitly in your repository).

Because the purpose of having multiple environments is usually testing dev/prod, I would recommend having completely seperate folders under a repo to store the ./kustomize/ folders for each environment.

Kubeflow app-of-apps:

The basic structure is a single repo with a helm chart at its root. This outer chart is an app-of-apps which deploys the various sub kubeflow apps found in the ./kustomize/ folder generated by kfctl.

Repo structure for kubeflow-app-of-apps:

  • ./dev
    • ./.cache
      • ...
    • ./kuztomize
      • ./kubeflow-apps
        • kustomization.yaml
        • YOUR_PATCH.yaml
      • ...
  • ./prod
    • ./.cache
      • ...
    • ./kuztomize
      • ...
  • ./templates
    • ./argocd-apps.yaml
    • ./validate-input.tpl
  • Chart.yaml
  • values.yaml

./templates/argocd-apps.yaml

{{- /*
While this looks complex, it simply:
  1. Loops over each kustomize folder under the requested `environment` sub-folder.
  2. Reads the corresponding kustomization.yaml file, interpreting it as a YAML structure.
     (This lets us extract the namespace field in the following step.)
  3. Creates a new Argo-CD Application Resource for each one it finds.
*/ -}}
{{ $search_path := printf "%s/kustomize/*/kustomization.yaml" .Values.environment }}
{{- range $file_path, $bytes := .Files.Glob $search_path }}
{{- $folder_path := dir $file_path }}
{{- $chart_name := base $folder_path }}
{{- $kustomize_yaml := $.Files.Get $file_path | fromYaml }}
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: {{ $.Values.argo.app_of_apps.name }}--{{ $chart_name }}
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  destination:
    namespace: {{ $kustomize_yaml.namespace | default "kubeflow" }}
    server: https://kubernetes.default.svc
  source:
    path: {{ $folder_path | quote }}
    repoURL: {{ $.Values.argo.app_of_apps.repo.url | quote }}
    targetRevision: {{ $.Values.argo.app_of_apps.repo.target_revision | quote }}
  project: default
  ignoreDifferences:
    # kubeflow has many empty `/descriptor` fields, which are removed in the live manifest
    # NOTE: as the `/descriptor` field doesn't do much, we can just ignore the diffs they create
    - group: app.k8s.io
      kind: Application
      jsonPointers:
        - /spec/descriptor
    # kubeflow uses role aggregation so the `/rules` field changes in the live manifest
    # NOTE: this could cause some issues for ClusterRole not using aggregation
    - group: rbac.authorization.k8s.io
      kind: ClusterRole
      jsonPointers:
        - /rules
---
{{- end }}

./templates/validate-input.tpl

{{ $valid_environments := list "dev" "prod" }}
{{ if not (has .Values.environment $valid_environments) }}
  {{ fail "VALUE: 'environment' must be one of: ['dev','prod']" }}
{{ end }}

{{ if empty .Values.argo.app_of_apps.name }}
  {{ fail "VALUE: 'argo.app_of_apps.name' is required!" }}
{{ end }}

{{ if empty .Values.argo.app_of_apps.repo.url }}
  {{ fail "VALUE: 'argo.app_of_apps.repo.url' is required!" }}
{{ end }}

{{ if empty .Values.argo.app_of_apps.repo.target_revision }}
  {{ fail "VALUE: 'argo.app_of_apps.repo.target_revision' is required!" }}
{{ end }}

./Chart.yaml

apiVersion: v2
name:  kubeflow-apps
version: 1.1.0

./values.yaml

# this must be set by argo-cd as 'dev' or 'prod'
environment: null

argo:
  app_of_apps:
    # the name of the argo-cd app which targets the root of this repo
    # Build Env: $ARGOCD_APP_NAME
    # Example: `kubeflow-apps`
    name: null

    repo:
      # the URL of THIS repo
      # Build Env: $ARGOCD_APP_SOURCE_REPO_URL
      # Example: `ssh://git@example.org/kubeflow-apps.git`
      url: null

      # the desired target revision of THIS repo
      # Build Env: $ARGOCD_APP_SOURCE_TARGET_REVISION
      # Example: `master` or `v1.1.0`
      target_revision: null

argo-cd app which targets kubeflow-app-of-apps

  • for prod you will see I am targeting the v1.1.0 Git tag, which you must create on your kubeflow-app-of-apps repo
  • for dev I would usually set targetRevision to master, so that people can make changes without creating a new tag
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: kubeflow-apps
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  destination:
    namespace: argocd
    server: https://kubernetes.default.svc
  source:
    path: "."
    repoURL: ssh://git@example.org/kubeflow-apps.git
    targetRevision: v1.1.0
    helm:
      parameters:
        - name: environment
          value: prod
        # as this is an app-of-apps, and the inner-apps are also in the same repo
        # we ensure all repos/target_revisions are the same, by passing them here
        - name: argo.app_of_apps.repo.url
          value: $ARGOCD_APP_SOURCE_REPO_URL
        - name: argo.app_of_apps.repo.target_revision
          value: $ARGOCD_APP_SOURCE_TARGET_REVISION
        # similar to the repo values: we pass the name of this app, so it can be used in the name of the child apps
        - name: argo.app_of_apps.name
          value: $ARGOCD_APP_NAME
  project: default

@mttcnnff
Copy link
Author

@thesuperzapper Wow thank you so much! This is super enlightening!

@mttcnnff
Copy link
Author

@thesuperzapper what if I do want to edit what's inside the .cache, for instance if I don't want to create an aws-alb-ingress-controller bc I already have one in my cluster?

@thesuperzapper
Copy link
Member

@mttcnnff I assume you are using kfctl_aws.v1.1.0.yaml? (Which I see is missing from the master branch of this repo, so I raised #1555)

I personally comment out resources for installing istio, cert-manager from my kfdef files before I run kfctl build -V -f ./kfctl_xxx.yaml, as I install them outside of Kubeflow. (Just remember that istio/istio/base is still required, as it configs the istio VirtualServices)

Also note, Kustomize will never let you remove a resource with a patch, so until we fix up this repos structure, you will have to either create your own stack, or modify the stack you are using, which in your case would be ./stacks/aws. But in general, removal is only necessary for if you want to host an external MySQL for Pipelines/Metadata, rather than use the embedded one.

@sajid-moinuddin
Copy link

@thesuperzapper thanks for sharing! this is genius !!!!

@thesuperzapper
Copy link
Member

@sajid-moinuddin I will try get this added to the docs on https://www.kubeflow.org/docs/

@sajid-moinuddin
Copy link

@thesuperzapper looking deeper into it, do you really need to maintain copies of .cache , kustomize charts for dev/stg/prod ? could you have a base with the kfctl generated charts and override them on dev/stg config ?

@thesuperzapper
Copy link
Member

@sajid-moinuddin perhaps you could, but the point of storing the .cache folder is to prevent changes to your dev environment having any effect on your prod environment.

Also note, we make changes to the kubeflow/manifests branches after release, for example 1.2-branch has been updated a few times since Kubeflow 1.2, and kfctl build will always take the latest from GitHub.

@sylus
Copy link

sylus commented Dec 7, 2020

If this is helpful I documented also how I was doing this which I think is close to this as well:

kubeflow/kubeflow#5440

@yanniszark
Copy link
Contributor

Hi everyone!

With the formation of the Manifests Working Group, the goal is to provide a catalog of kustomize packages that admins/platforms can:

  • Deploy as is.
  • Customize for their environment, using kustomize patches and overlays.

AFAIK, virtually all GitOps tools right now work with kustomize, so integration with 3rd-party tools should be straightforward.

@munagekar
Copy link

I created a repository demonstrating the approach mentioned in #1549 (comment).

@davidspek
Copy link
Contributor

davidspek commented Apr 13, 2021

With the release of 1.3 and the use of bare Kustomize using Argo CD and referencing the upstream manifests directly (no need to copy them in your repo) has made the deployment and customization of Kubeflow a lot easier. I've created a distribution that I am maintaining. The TL;DR setup is as follows:

  • fork the repo
  • modify the kustomizations for your purpose
  • run ./setup_repo.sh <your_repo_fork_url> (to change the Argo CD application specs to point to your fork)
  • commit and push your changes
  • run kubectl apply -f kubeflow.yaml (deploys the Argo CD applications specs, each Kubeflow component is an application in Argo CD)

image

https://github.com/argoflow/argoflow

@benjamintanweihao
Copy link

@davidspek This is excellent! How would you remove an application? For example, what would be the steps to remove Katib?

@davidspek
Copy link
Contributor

@benjamintanweihao In https://github.com/argoflow/argoflow/blob/master/kustomization.yaml you would simply comment out Katib and KNative.

@benjamintanweihao
Copy link

benjamintanweihao commented Apr 16, 2021

@davidspek Interesting! I tried commenting out Katib only, however it disappeared on the ArgoUI, but I could still see the pods and other Katib resources being deployed. Also, I was still able to access AutoML from the UI.

@davidspek
Copy link
Contributor

@benjamintanweihao Seems like the deletion didn't propagate. If your on Slack I can help you get those deleted and removed the Katib entry from the central dashboard. It's not difficult but I don't want to hijack this issue.

@thesuperzapper
Copy link
Member

thesuperzapper commented Apr 27, 2021

So this issue doesn't live forever, I am going to close it here.

I think the main outcomes are:

  1. if you want to use ArgoCD, look at the argoflow distribution
  2. for other options, review the installing-kubeflow docs

/close

@google-oss-robot
Copy link

@thesuperzapper: Closing this issue.

In response to this:

So this issue live forever, I am going to close it here.

I think the main outcomes are:

  1. if you want to use ArgoCD, look at the argoflow distribution
  2. for other options, review the installing-kubeflow docs

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Notebooks WG automation moved this from Backlog to Done Apr 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests