Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create OCP Stack in KF #54

2 of 7 tasks
nakfour opened this issue Aug 27, 2020 · 3 comments
2 of 7 tasks

Create OCP Stack in KF #54

nakfour opened this issue Aug 27, 2020 · 3 comments


Copy link

nakfour commented Aug 27, 2020

Testing done on OCP 4.3/Kub 1.16. Code is in this PR :

  1. Profiles app is not setting $namespaces as seen in error from validating webhook. Looking at manifest/profiles/base_v3 it seems not to specify a variable $namespace as is specified in base/kustomization.yaml file. DONE-FIXED
WARN[0032] Encountered error applying application profiles:  (kubeflow.error): Code 500 with message: Apply.Run : error when creating "/tmp/kout859943078": admission webhook "" denied the request: configuration is invalid: domain name "$(namespace).svc.cluster.local" invalid (label "$(namespace)" invalid)  filename="kustomize/kustomize.go:266"
WARN[0032] Will retry in 6 seconds.                      filename="kustomize/kustomize.go:267"
  1. Cache-server pod crashing on "secret "webhook-server-tls" not found"
  2. When creating a notebook server, fsgroup is set to 100. DONE-FIXED
  3. When launching a notebook server we get error "no healthy upstream" DONE-FIXED
  4. ml-pipeline pod crashes withe logs DONE-FIXED
 config.go:37] Please specify flag PROFILES_KFAM_SERVICE_HOST

This is due to the fact this needs profiles installed, and profiles only works when installed in top kustomization file
6. Getting to Pipeline page from Kubeflow dashboard gives error before: kubeflow/kubeflow#5271
upstream connect error or disconnect/reset before headers. reset reason: connection failure
to solve this set

      mode: DISABLE

in destination rule for ml-pipeline-ui. DONE-FIXED

  • tf-jobs,pytorch,katib,metadata (Vasek)

  • profiles, pipelines (Landon)

  • Seldon (Juana)

  • JH web (Maros)

  • Notebook Controller

  • istio (Juana)

  • cert-manager

@nakfour nakfour created this issue from a note in ODH 0.9.0 (To do) Aug 27, 2020
@nakfour nakfour self-assigned this Aug 27, 2020
@nakfour nakfour removed this from To do in ODH 0.9.0 Aug 27, 2020
Copy link

vpavlin commented Oct 21, 2020

Copy link

vpavlin commented Oct 21, 2020

  1. could it be because the params.yaml is not taken into account ( and thus the virtual service is not templated?

@nakfour nakfour added this to In progress in ODH 0.9.0 Oct 21, 2020
Copy link

vpavlin commented Oct 23, 2020

TF operator, PyTorch operator and Katib work

metadata-db needs a small fix mentioned in kubeflow#1567 (comment)

@nakfour nakfour moved this from In progress to Done in ODH 0.9.0 Nov 9, 2020
@nakfour nakfour closed this as completed Jun 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
No open projects
ODH 0.9.0

No branches or pull requests

2 participants