What's the proper convention for customizing kubeflow in a maintainable way? #1549

mttcnnff · 2020-09-09T14:05:54Z

I understand that you can use kfctl to generate kustomization.yaml files and then edit the generated files in the /kustomize directory but this doesn't seem maintainable as if I rerun kfctl it will overwrite that /kustomize directory, which could be a problem when you want to change the upstream repo from which you pull config for something like an upgrade down the road.

Is there anyway to generate the overall config for the kubeflow deploy using kfctl but to refer to outside kustomize files that wouldn't get thrown out from a rerun? This way I would have my "base deploy" and then my custom configuration and resources on top?

issue-label-bot · 2020-09-09T14:06:02Z

Issue-Label Bot is automatically applying the labels:

Label	Probability
area/kfctl	0.86
kind/question	0.86

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

Bobgy · 2020-09-10T01:12:36Z

@mttcnnff This is exactly the reason why we built kubeflow/gcp-blueprint.
Some explanation in GoogleCloudPlatform/kubeflow-distribution#123.

GCP is now the only platform following this pattern. I'm interested if other platforms want to follow.

issue-label-bot · 2020-09-10T01:12:43Z

Issue-Label Bot is automatically applying the labels:

Label	Probability
platform/gcp	0.71

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

mttcnnff · 2020-09-10T14:06:58Z

@Bobgy Can you explain a bit more of how this pattern works? I'm on AWS but more than happy to contribute to get this in a better place

thesuperzapper · 2020-09-11T01:40:02Z

I make use of argo-cd.

Here is what I do:

create a private git repo (for all the following steps)
use kfctl to generate the top-level kubeflow/xxx/kustomization.yaml files which reference, .cache/manifests-1.1-branch/....
create yaml patch files with the changes I need for my cluster (like dex config.yaml ConfigMap)
add these yaml to the corresponding kubeflow/xxx/kustomization.yaml under patchesStrategicMerge:, so they get strategic merged in by kustomize
create an argo-cd Application which targets each of these top-level folders
- example: GIT_ROOT/kubeflow/kubeflow-apps/kustomization.yaml
- note: remember to set kustomize.buildOptions: --load_restrictor=none in your argo-cd configmap

This has the benefit of being git-ops, and means I can make any changes I need without modifying the files in .cache/* directly. It also means I can easily remove components which are already on my K8S cluster, like istio and cert-manager.

If there is interest, I can put a guide for this on the main Kubeflow docs, as I expect this approach is probably the most generic way to deploy kubeflow if you are not wanting to use one of the vendor's approaches.

mttcnnff · 2020-09-11T13:13:20Z

@thesuperzapper this sounds good, at the end of the day you're using kfctl once to generate everything and then from there on using kustomize? Never to run kfctl again?

How does this work for different environment? How do you specify which to use for the patchesStrategicMerge?

thesuperzapper · 2020-09-14T02:28:17Z

@mttcnnff yep, you only use kfctl once per major Kubeflow version (which is important for git-ops, as you should be storing exactly the YAML you expect to be in your cluster, explicitly in your repository).

Because the purpose of having multiple environments is usually testing dev/prod, I would recommend having completely seperate folders under a repo to store the ./kustomize/ folders for each environment.

Kubeflow app-of-apps:

The basic structure is a single repo with a helm chart at its root. This outer chart is an app-of-apps which deploys the various sub kubeflow apps found in the ./kustomize/ folder generated by kfctl.

Repo structure for kubeflow-app-of-apps:

./dev
- ./.cache
  - ...
- ./kuztomize
  - ./kubeflow-apps
    - kustomization.yaml
    - YOUR_PATCH.yaml
  - ...
./prod
- ./.cache
  - ...
- ./kuztomize
  - ...
./templates
- ./argocd-apps.yaml
- ./validate-input.tpl
Chart.yaml
values.yaml

`./templates/argocd-apps.yaml`

{{- /*
While this looks complex, it simply:
  1. Loops over each kustomize folder under the requested `environment` sub-folder.
  2. Reads the corresponding kustomization.yaml file, interpreting it as a YAML structure.
     (This lets us extract the namespace field in the following step.)
  3. Creates a new Argo-CD Application Resource for each one it finds.
*/ -}}
{{ $search_path := printf "%s/kustomize/*/kustomization.yaml" .Values.environment }}
{{- range $file_path, $bytes := .Files.Glob $search_path }}
{{- $folder_path := dir $file_path }}
{{- $chart_name := base $folder_path }}
{{- $kustomize_yaml := $.Files.Get $file_path | fromYaml }}
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: {{ $.Values.argo.app_of_apps.name }}--{{ $chart_name }}
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  destination:
    namespace: {{ $kustomize_yaml.namespace | default "kubeflow" }}
    server: https://kubernetes.default.svc
  source:
    path: {{ $folder_path | quote }}
    repoURL: {{ $.Values.argo.app_of_apps.repo.url | quote }}
    targetRevision: {{ $.Values.argo.app_of_apps.repo.target_revision | quote }}
  project: default
  ignoreDifferences:
    # kubeflow has many empty `/descriptor` fields, which are removed in the live manifest
    # NOTE: as the `/descriptor` field doesn't do much, we can just ignore the diffs they create
    - group: app.k8s.io
      kind: Application
      jsonPointers:
        - /spec/descriptor
    # kubeflow uses role aggregation so the `/rules` field changes in the live manifest
    # NOTE: this could cause some issues for ClusterRole not using aggregation
    - group: rbac.authorization.k8s.io
      kind: ClusterRole
      jsonPointers:
        - /rules
---
{{- end }}

`./templates/validate-input.tpl`

{{ $valid_environments := list "dev" "prod" }}
{{ if not (has .Values.environment $valid_environments) }}
  {{ fail "VALUE: 'environment' must be one of: ['dev','prod']" }}
{{ end }}

{{ if empty .Values.argo.app_of_apps.name }}
  {{ fail "VALUE: 'argo.app_of_apps.name' is required!" }}
{{ end }}

{{ if empty .Values.argo.app_of_apps.repo.url }}
  {{ fail "VALUE: 'argo.app_of_apps.repo.url' is required!" }}
{{ end }}

{{ if empty .Values.argo.app_of_apps.repo.target_revision }}
  {{ fail "VALUE: 'argo.app_of_apps.repo.target_revision' is required!" }}
{{ end }}

`./Chart.yaml`

apiVersion: v2
name:  kubeflow-apps
version: 1.1.0

`./values.yaml`

# this must be set by argo-cd as 'dev' or 'prod'
environment: null

argo:
  app_of_apps:
    # the name of the argo-cd app which targets the root of this repo
    # Build Env: $ARGOCD_APP_NAME
    # Example: `kubeflow-apps`
    name: null

    repo:
      # the URL of THIS repo
      # Build Env: $ARGOCD_APP_SOURCE_REPO_URL
      # Example: `ssh://git@example.org/kubeflow-apps.git`
      url: null

      # the desired target revision of THIS repo
      # Build Env: $ARGOCD_APP_SOURCE_TARGET_REVISION
      # Example: `master` or `v1.1.0`
      target_revision: null

argo-cd app which targets `kubeflow-app-of-apps`

for prod you will see I am targeting the v1.1.0 Git tag, which you must create on your kubeflow-app-of-apps repo
for dev I would usually set targetRevision to master, so that people can make changes without creating a new tag

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: kubeflow-apps
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  destination:
    namespace: argocd
    server: https://kubernetes.default.svc
  source:
    path: "."
    repoURL: ssh://git@example.org/kubeflow-apps.git
    targetRevision: v1.1.0
    helm:
      parameters:
        - name: environment
          value: prod
        # as this is an app-of-apps, and the inner-apps are also in the same repo
        # we ensure all repos/target_revisions are the same, by passing them here
        - name: argo.app_of_apps.repo.url
          value: $ARGOCD_APP_SOURCE_REPO_URL
        - name: argo.app_of_apps.repo.target_revision
          value: $ARGOCD_APP_SOURCE_TARGET_REVISION
        # similar to the repo values: we pass the name of this app, so it can be used in the name of the child apps
        - name: argo.app_of_apps.name
          value: $ARGOCD_APP_NAME
  project: default

mttcnnff · 2020-09-14T14:58:45Z

@thesuperzapper Wow thank you so much! This is super enlightening!

mttcnnff · 2020-09-14T19:38:32Z

@thesuperzapper what if I do want to edit what's inside the .cache, for instance if I don't want to create an aws-alb-ingress-controller bc I already have one in my cluster?

thesuperzapper · 2020-09-15T01:31:14Z

@mttcnnff I assume you are using kfctl_aws.v1.1.0.yaml? (Which I see is missing from the master branch of this repo, so I raised #1555)

I personally comment out resources for installing istio, cert-manager from my kfdef files before I run kfctl build -V -f ./kfctl_xxx.yaml, as I install them outside of Kubeflow. (Just remember that istio/istio/base is still required, as it configs the istio VirtualServices)

Also note, Kustomize will never let you remove a resource with a patch, so until we fix up this repos structure, you will have to either create your own stack, or modify the stack you are using, which in your case would be ./stacks/aws. But in general, removal is only necessary for if you want to host an external MySQL for Pipelines/Metadata, rather than use the embedded one.

sajid-moinuddin · 2020-12-07T02:36:12Z

@thesuperzapper thanks for sharing! this is genius !!!!

thesuperzapper · 2020-12-07T02:47:14Z

@sajid-moinuddin I will try get this added to the docs on https://www.kubeflow.org/docs/

sajid-moinuddin · 2020-12-07T02:59:42Z

@thesuperzapper looking deeper into it, do you really need to maintain copies of .cache , kustomize charts for dev/stg/prod ? could you have a base with the kfctl generated charts and override them on dev/stg config ?

thesuperzapper · 2020-12-07T04:47:25Z

@sajid-moinuddin perhaps you could, but the point of storing the .cache folder is to prevent changes to your dev environment having any effect on your prod environment.

Also note, we make changes to the kubeflow/manifests branches after release, for example 1.2-branch has been updated a few times since Kubeflow 1.2, and kfctl build will always take the latest from GitHub.

sylus · 2020-12-07T04:49:57Z

If this is helpful I documented also how I was doing this which I think is close to this as well:

kubeflow/kubeflow#5440

yanniszark · 2021-01-12T10:52:36Z

Hi everyone!

With the formation of the Manifests Working Group, the goal is to provide a catalog of kustomize packages that admins/platforms can:

Deploy as is.
Customize for their environment, using kustomize patches and overlays.

AFAIK, virtually all GitOps tools right now work with kustomize, so integration with 3rd-party tools should be straightforward.

munagekar · 2021-02-15T11:33:47Z

I created a repository demonstrating the approach mentioned in #1549 (comment).

davidspek · 2021-04-13T11:51:50Z

With the release of 1.3 and the use of bare Kustomize using Argo CD and referencing the upstream manifests directly (no need to copy them in your repo) has made the deployment and customization of Kubeflow a lot easier. I've created a distribution that I am maintaining. The TL;DR setup is as follows:

fork the repo
modify the kustomizations for your purpose
run ./setup_repo.sh <your_repo_fork_url> (to change the Argo CD application specs to point to your fork)
commit and push your changes
run kubectl apply -f kubeflow.yaml (deploys the Argo CD applications specs, each Kubeflow component is an application in Argo CD)

https://github.com/argoflow/argoflow

benjamintanweihao · 2021-04-16T08:18:07Z

@davidspek This is excellent! How would you remove an application? For example, what would be the steps to remove Katib?

davidspek · 2021-04-16T08:20:21Z

@benjamintanweihao In https://github.com/argoflow/argoflow/blob/master/kustomization.yaml you would simply comment out Katib and KNative.

benjamintanweihao · 2021-04-16T08:23:18Z

@davidspek Interesting! I tried commenting out Katib only, however it disappeared on the ArgoUI, but I could still see the pods and other Katib resources being deployed. Also, I was still able to access AutoML from the UI.

davidspek · 2021-04-16T08:26:58Z

@benjamintanweihao Seems like the deletion didn't propagate. If your on Slack I can help you get those deleted and removed the Katib entry from the central dashboard. It's not difficult but I don't want to hijack this issue.

thesuperzapper · 2021-04-27T01:38:40Z

So this issue doesn't live forever, I am going to close it here.

I think the main outcomes are:

if you want to use ArgoCD, look at the argoflow distribution
for other options, review the installing-kubeflow docs

/close

google-oss-robot · 2021-04-27T01:38:43Z

@thesuperzapper: Closing this issue.

In response to this:

So this issue live forever, I am going to close it here.

I think the main outcomes are:

if you want to use ArgoCD, look at the argoflow distribution

for other options, review the installing-kubeflow docs

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

issue-label-bot bot added area/kfctl kind/question labels Sep 9, 2020

kubeflow-bot added this to To Do in Needs Triage Sep 9, 2020

issue-label-bot bot added the platform/gcp label Sep 10, 2020

jbottum added platform/aws priority/p2 and removed platform/gcp labels Oct 3, 2020

kubeflow-bot removed this from To Do in Needs Triage Oct 3, 2020

thesuperzapper mentioned this issue Dec 7, 2020

add docs for deploying kubeflow with argo-cd kubeflow/website#2395

Closed

davidspek added this to Backlog in Notebooks WG Dec 23, 2020

manojlds mentioned this issue Dec 31, 2020

On AWS how can I use internal itsio-ingress rather than internet-facing? #1704

Closed

google-oss-robot closed this as completed Apr 27, 2021

Notebooks WG automation moved this from Backlog to Done Apr 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's the proper convention for customizing kubeflow in a maintainable way? #1549

What's the proper convention for customizing kubeflow in a maintainable way? #1549

mttcnnff commented Sep 9, 2020

issue-label-bot bot commented Sep 9, 2020

Bobgy commented Sep 10, 2020

issue-label-bot bot commented Sep 10, 2020

mttcnnff commented Sep 10, 2020

thesuperzapper commented Sep 11, 2020 •

edited

mttcnnff commented Sep 11, 2020 •

edited

thesuperzapper commented Sep 14, 2020

mttcnnff commented Sep 14, 2020

mttcnnff commented Sep 14, 2020

thesuperzapper commented Sep 15, 2020

sajid-moinuddin commented Dec 7, 2020

thesuperzapper commented Dec 7, 2020

sajid-moinuddin commented Dec 7, 2020

thesuperzapper commented Dec 7, 2020

sylus commented Dec 7, 2020

yanniszark commented Jan 12, 2021

munagekar commented Feb 15, 2021

davidspek commented Apr 13, 2021 •

edited

benjamintanweihao commented Apr 16, 2021

davidspek commented Apr 16, 2021

benjamintanweihao commented Apr 16, 2021 •

edited

davidspek commented Apr 16, 2021

thesuperzapper commented Apr 27, 2021 •

edited

google-oss-robot commented Apr 27, 2021

What's the proper convention for customizing kubeflow in a maintainable way? #1549

What's the proper convention for customizing kubeflow in a maintainable way? #1549

Comments

mttcnnff commented Sep 9, 2020

issue-label-bot bot commented Sep 9, 2020

Bobgy commented Sep 10, 2020

issue-label-bot bot commented Sep 10, 2020

mttcnnff commented Sep 10, 2020

thesuperzapper commented Sep 11, 2020 • edited

mttcnnff commented Sep 11, 2020 • edited

thesuperzapper commented Sep 14, 2020

Kubeflow app-of-apps:

./templates/argocd-apps.yaml

./templates/validate-input.tpl

./Chart.yaml

./values.yaml

argo-cd app which targets kubeflow-app-of-apps

mttcnnff commented Sep 14, 2020

mttcnnff commented Sep 14, 2020

thesuperzapper commented Sep 15, 2020

sajid-moinuddin commented Dec 7, 2020

thesuperzapper commented Dec 7, 2020

sajid-moinuddin commented Dec 7, 2020

thesuperzapper commented Dec 7, 2020

sylus commented Dec 7, 2020

yanniszark commented Jan 12, 2021

munagekar commented Feb 15, 2021

davidspek commented Apr 13, 2021 • edited

benjamintanweihao commented Apr 16, 2021

davidspek commented Apr 16, 2021

benjamintanweihao commented Apr 16, 2021 • edited

davidspek commented Apr 16, 2021

thesuperzapper commented Apr 27, 2021 • edited

google-oss-robot commented Apr 27, 2021

thesuperzapper commented Sep 11, 2020 •

edited

mttcnnff commented Sep 11, 2020 •

edited

`./templates/argocd-apps.yaml`

`./templates/validate-input.tpl`

`./Chart.yaml`

`./values.yaml`

argo-cd app which targets `kubeflow-app-of-apps`

davidspek commented Apr 13, 2021 •

edited

benjamintanweihao commented Apr 16, 2021 •

edited

thesuperzapper commented Apr 27, 2021 •

edited