Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kubeflow 1.9] Distributions and Kubeflow 1.9 #2611

Closed
rimolive opened this issue Jan 25, 2024 · 41 comments
Closed

[Kubeflow 1.9] Distributions and Kubeflow 1.9 #2611

rimolive opened this issue Jan 25, 2024 · 41 comments

Comments

@rimolive
Copy link
Member

rimolive commented Jan 25, 2024

This issue will be used to track the progress of and coordinate with distributions along the 1.9 release.

While we hope all distros will manage to be ready when the KF 1.9 release is out, this is sometimes difficult to achieve. In this issue, we want to both keep track of the progress of distributions towards the KF 1.9 release and also know which of the distros will be working on KF 1.9 (testing during the distribution testing cycle) even if they can't meet the KF 1.9 deadline.

Tagging distribution owners identified from previous releases (Any new or missed distro owners, please comment on this issue)

Distribution Representative(s) State
AWS @surajkota not participating in 1.9
Charmed Kubeflow @DnPlas participating in 1.9
Google Cloud @gkcalat
@zijianjoy
@Linchin
not participating in 1.9
IBM IKS @Tomcli
@yhwang
participating in 1.9
Microsoft not participating in 1.9
Nutanix @johnugeorge
@nagar-ajay
participating in 1.9
Red Hat OpenShift AI @rimolive participating in 1.9
Oracle Cloud Infrastructure @julioo participating in 1.9
DeployKF @thesuperzapper participating in 1.9
VMWare @liuqi
@xujinheng
participating in 1.9
QBO @alexeadem participating in 1.9

Please let us know if you'll be participating in the 1.9 release by answering the following questions:

  • Are you planning on having your distro ready in sync with the KF 1.9 release?
  • Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?
  • If you cannot participate, when can the community expect your distro to be ready for release 1.9?

Please note the release timelines are being discussed in #2606.

cc @kubeflow/release-team @jbottum

@kubeflow-bot kubeflow-bot added this to To Do in Needs Triage Jan 25, 2024
@ca-scribner
Copy link
Contributor

@rimolive can you remove @DnPlas from Charmed Kubeflow and replace her with myself? ty!

to your questions, for Charmed Kubeflow:

  • Are you planning on having your distro ready in sync with the KF 1.9 release?
    • yes
  • Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?
    • yes

@thesuperzapper
Copy link
Member

@rimolive deployKF will participate in 1.9, but it's not 100% clear exactly what that will look like.


Separately, given "Kubeflow on AWS" did not participate in 1.8, and announced they were no longer supporting their distribution in awslabs/kubeflow-manifests#794, I think its unlikely they will do 1.9?

Given this, I proposed moving them to "legacy" on the Kubeflow website on this PR kubeflow/website#3641.

However, I also want to avoid confusion with users, because they might think that Kubeflow no longer supports AWS due to the "Kubeflow on AWS" name. So I also think we should merge kubeflow/website#3643 at the same time, which tells users that "Kubeflow on XXXX" is just a name, and NOT the ONLY way to use Kubeflow on that platform.

@yhwang
Copy link
Member

yhwang commented Jan 31, 2024

For IBM IKS:

Are you planning on having your distro ready in sync with the KF 1.9 release?

Yes

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

@liuqi
Copy link

liuqi commented Feb 3, 2024

For VMware Distro:

Are you planning on having your distro ready in sync with the KF 1.9 release?

Yes

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

@alexeadem
Copy link

For QBO Distro:

Are you planning on having your distro ready in sync with the KF 1.9 release?

Yes

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

@tiansiyuan
Copy link
Contributor

For VMware Distro:

Are you planning on having your distro ready in sync with the KF 1.9 release?

Yes

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

@rimolive
Copy link
Member Author

rimolive commented May 6, 2024

Calling all Distribution owners! I'm proud to announce our first Release Candidate for Kubeflow 1.9!

You can find the release details in the following URL:

https://github.com/kubeflow/manifests/releases/tag/v1.9.0-rc.0

We'll be working on another Release Candidate when we have Notebooks and KServe Models Webapp updated and ready for KF 1.9. We can use this issue to keep track of blocker issues for distributions while we work on fixing them.

cc @ca-scribner @yhwang @johnugeorge @nagar-ajay @thesuperzapper @liuqi @xujinheng @alexeadem @alex-treebeard

@juliusvonkohout
Copy link
Member

juliusvonkohout commented May 7, 2024

We also have to update cert-manager, knative, istio, seldon, bentoml etc which will come in later RCs.

@StefanoFioravanzo
Copy link
Member

@ca-scribner @yhwang @johnugeorge @nagar-ajay @thesuperzapper @liuqi @xujinheng @alexeadem @alex-treebeard Can you please acknowledge that you are aware of Kubeflow 1.9 RC0 and are aware the the distributions testing phase has started? Please react with a thumbs up if everything is okay from your side and you are proceeding with testing.

@thesuperzapper
Copy link
Member

thesuperzapper commented May 13, 2024

deployKF is mostly waiting on the updates from Notebooks (kubeflow/kubeflow#7453), but I am aware that a 1.9.0-RC0 was cut with other components.

@alexeadem
Copy link

alexeadem commented May 13, 2024

What do we mean by '(around 1.28)' here: https://github.com/kubeflow/manifests/tree/v1.9.0-rc.0?tab=readme-ov-file#prerequisites

Is that v1.28.0 and v1.27.11?

I'm proceeding with the testing in QBO.

OK: Everything is looking good in QBO. Tested by doing a vector addition test.

Details:

git branch
* (HEAD detached at v1.9.0-rc.0)

In Kubernetes v1.28.0:

qbo get nodes kubeflow_v1_9_0_nvidia | jq .nodes[]?.image
"kindest/node:v1.28.0"
"kindest/node:v1.28.0"
"kindest/node:v1.28.0"

with NVIDIA GPU Operator

helm list -n gpu-operator
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /home/alex/.qbo/kubeflow_v1_9_0_nvidia.conf
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /home/alex/.qbo/kubeflow_v1_9_0_nvidia.conf
NAME                    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
gpu-operator-1715634796 gpu-operator    1               2024-05-13 21:13:18.636880948 +0000 UTC deployed        gpu-operator-v24.3.0    v24.3.0 

And Kustomize

./kustomize version
v5.4.1
  • There is only a small change I had to do:

It looks like platform-agnostic-multi-user-pns is not longer available
./kustomize build apps/pipeline/upstream/env/platform-agnostic-multi-user-pns | kubectl apply -f -

as per kubeflow/pipelines#5285

So I used the following instead. I'll update the QBOT installer for this version
./kustomize build apps/pipeline/upstream/env/platform-agnostic-multi-user | kubectl apply -f -

This is what it was deployed

kubectl get pods --all-namespaces -o jsonpath="{..image}" | sed 's/ /\n/g' | sort | uniq
docker.io/istio/pilot:1.17.5
docker.io/istio/proxyv2:1.17.5
docker.io/kindest/kindnetd:v20220726-ed811e41
docker.io/kindest/local-path-provisioner:v0.0.22-kind.0
docker.io/kserve/kserve-controller:v0.12.1
docker.io/kserve/models-web-app:v0.10.0
docker.io/kubeflow/training-operator:v1-f8f7363
docker.io/kubeflowkatib/katib-controller:v0.17.0-rc.0
docker.io/kubeflowkatib/katib-db-manager:v0.17.0-rc.0
docker.io/kubeflowkatib/katib-ui:v0.17.0-rc.0
docker.io/kubeflownotebookswg/centraldashboard:v1.8.0
docker.io/kubeflownotebookswg/jupyter-scipy:v1.8.0
docker.io/kubeflownotebookswg/jupyter-web-app:v1.8.0
docker.io/kubeflownotebookswg/kfam:v1.8.0
docker.io/kubeflownotebookswg/notebook-controller:v1.8.0
docker.io/kubeflownotebookswg/poddefaults-webhook:v1.8.0
docker.io/kubeflownotebookswg/profile-controller:v1.8.0
docker.io/kubeflownotebookswg/pvcviewer-controller:v1.8.0
docker.io/kubeflownotebookswg/tensorboard-controller:v1.8.0
docker.io/kubeflownotebookswg/tensorboards-web-app:v1.8.0
docker.io/kubeflownotebookswg/volumes-web-app:v1.8.0
docker.io/library/mysql:8.0.29
docker.io/library/python:3.7
docker.io/metacontrollerio/metacontroller:v2.0.4
gcr.io/knative-releases/knative.dev/eventing/cmd/controller@sha256:92967bab4ad8f7d55ce3a77ba8868f3f2ce173c010958c28b9a690964ad6ee9b
gcr.io/knative-releases/knative.dev/eventing/cmd/webhook@sha256:ebf93652f0254ac56600bedf4a7d81611b3e1e7f6526c6998da5dd24cdc67ee1
gcr.io/knative-releases/knative.dev/net-istio/cmd/controller@sha256:421aa67057240fa0c56ebf2c6e5b482a12842005805c46e067129402d1751220
gcr.io/knative-releases/knative.dev/net-istio/cmd/webhook@sha256:bfa1dfea77aff6dfa7959f4822d8e61c4f7933053874cd3f27352323e6ecd985
gcr.io/knative-releases/knative.dev/serving/cmd/activator@sha256:c2994c2b6c2c7f38ad1b85c71789bf1753cc8979926423c83231e62258837cb9
gcr.io/knative-releases/knative.dev/serving/cmd/autoscaler@sha256:8319aa662b4912e8175018bd7cc90c63838562a27515197b803bdcd5634c7007
gcr.io/knative-releases/knative.dev/serving/cmd/controller@sha256:98a2cc7fd62ee95e137116504e7166c32c65efef42c3d1454630780410abf943
gcr.io/knative-releases/knative.dev/serving/cmd/domain-mapping-webhook@sha256:7368aaddf2be8d8784dc7195f5bc272ecfe49d429697f48de0ddc44f278167aa
gcr.io/knative-releases/knative.dev/serving/cmd/domain-mapping@sha256:f66c41ad7a73f5d4f4bdfec4294d5459c477f09f3ce52934d1a215e32316b59b
gcr.io/knative-releases/knative.dev/serving/cmd/webhook@sha256:4305209ce498caf783f39c8f3e85dfa635ece6947033bf50b0b627983fd65953
gcr.io/kubebuilder/kube-rbac-proxy:v0.13.1
gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0
gcr.io/ml-pipeline/api-server:2.2.0
gcr.io/ml-pipeline/cache-deployer:2.2.0
gcr.io/ml-pipeline/cache-server:2.2.0
gcr.io/ml-pipeline/frontend:2.2.0
gcr.io/ml-pipeline/metadata-envoy:2.2.0
gcr.io/ml-pipeline/metadata-writer:2.2.0
gcr.io/ml-pipeline/minio:RELEASE.2019-08-14T20-37-41Z-license-compliance
gcr.io/ml-pipeline/mysql:8.0.26
gcr.io/ml-pipeline/persistenceagent:2.2.0
gcr.io/ml-pipeline/scheduledworkflow:2.2.0
gcr.io/ml-pipeline/viewer-crd-controller:2.2.0
gcr.io/ml-pipeline/visualization-server:2.2.0
gcr.io/ml-pipeline/workflow-controller:v3.4.16-license-compliance
gcr.io/tfx-oss-public/ml_metadata_store_server:1.14.0
ghcr.io/dexidp/dex:v2.36.0
kserve/kserve-controller:v0.12.1
kserve/models-web-app:v0.10.0
kubeflow/training-operator:v1-f8f7363
kubeflownotebookswg/jupyter-scipy:v1.8.0
mysql:8.0.29
nvcr.io/nvidia/cloud-native/gpu-operator-validator:v24.3.0
nvcr.io/nvidia/gpu-operator:v24.3.0
nvcr.io/nvidia/k8s-device-plugin:v0.15.0-ubi8
nvcr.io/nvidia/k8s/container-toolkit:v1.15.0-ubuntu20.04
nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04
nvcr.io/nvidia/k8s/dcgm-exporter:3.3.5-3.4.1-ubuntu22.04
python:3.7
quay.io/jetstack/cert-manager-cainjector:v1.12.2
quay.io/jetstack/cert-manager-controller:v1.12.2
quay.io/jetstack/cert-manager-webhook:v1.12.2
quay.io/oauth2-proxy/oauth2-proxy:v7.6.0
registry.k8s.io/coredns/coredns:v1.10.1
registry.k8s.io/etcd:3.5.9-0
registry.k8s.io/kube-apiserver:v1.28.0
registry.k8s.io/kube-controller-manager:v1.28.0
registry.k8s.io/kube-proxy:v1.28.0
registry.k8s.io/kube-scheduler:v1.28.0
registry.k8s.io/nfd/node-feature-discovery:v0.15.4

@juliusvonkohout
Copy link
Member

juliusvonkohout commented May 14, 2024

@alexeadem please check the updated release notes
https://github.com/kubeflow/manifests/releases/tag/v1.9.0-rc.0 1.27-1.29 officially
Yes, we made emissary the default in 1.7 or 1.8

@DnPlas
Copy link
Contributor

DnPlas commented May 21, 2024

Hi @rimolive @StefanoFioravanzo, a couple of things:

  1. Could I please ask to replace @ca-scribner with me as the distribution owner?
  2. We are aware that the distribution testing phase has started, but we have identified that components from the kubeflow/kubeflow repository are missing. Is this something coming in another RC? Is this planned?

@rimolive
Copy link
Member Author

rimolive commented May 23, 2024

Hi @rimolive @StefanoFioravanzo, a couple of things:

  1. Could I please ask to replace @ca-scribner with me as the distribution owner?

Done

  1. We are aware that the distribution testing phase has started, but we have identified that components from the kubeflow/kubeflow repository are missing. Is this something coming in another RC? Is this planned?

We decided to move on with rc0 because many components were upgraded, but there's a plan for rc1 with the remainder components. Is there one specific you are expecting to test?

@rimolive
Copy link
Member Author

rimolive commented Jun 4, 2024

Just an update: We have just released Kubeflow 1.9.0-rc.1, which includes all updates from the Notebooks WG, Istio 1.18.7 (targetting to fully upgrade to 1.22 until the final release), and Model Registry 0.2.1-alpha. We ask all Distributions a help with testing the new release and open issues so we can work with the Working Groups to fix them until the final release.

You can find the Release Notes in the releases page.

cc @ca-scribner @DnPlas @yhwang @johnugeorge @nagar-ajay @thesuperzapper @liuqi @xujinheng @alexeadem @alex-treebeard

@nagar-ajay
Copy link

Created an issue to track Nutanix distribution testing - nutanix/kubeflow-manifests#21

@rimolive
Copy link
Member Author

We are one week away from the Kubeflow 1.9.0-rc.2 release and we plan to be the last release candidate before final. We really welcome any updates about Distribution testing with bug reports, and anything that the release team should pursuit for rc.2 or final.

cc @ca-scribner @DnPlas @yhwang @johnugeorge @nagar-ajay @thesuperzapper @liuqi @xujinheng @alexeadem @alex-treebeard

@DnPlas
Copy link
Contributor

DnPlas commented Jun 10, 2024

We will start testing in the following two weeks, we'll keep you posted.

@rimolive
Copy link
Member Author

Hello Distribution owners! Just wanted to announce Kubeflow 1.9.0-rc.2 release, it's the last one before we go final. Please take a look at the Release Notes here and help us validating the manifests by issuing a /lgtm comment in this issue.

cc @ca-scribner @DnPlas @yhwang @johnugeorge @nagar-ajay @thesuperzapper @liuqi @xujinheng @alexeadem @alex-treebeard

@alexeadem
Copy link

alexeadem commented Jun 30, 2024

/lgtm
Tested in QBO: api:cloud-stage-4.3.0.7aba1d45
Kubeflow: 1.9.0-rc.2
Kubernetes: v1.29.4
NVIDIA GPU operator:

helm list -n gpu-operator
adable. This is insecure. Location: /home/alex/.qbo/kubeflow_v1_9_0_nvidia.conf
NAME                    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
gpu-operator-1719811045 gpu-operator    1               2024-07-01 05:17:27.654568661 +0000 UTC deployed        gpu-operator-v24.3.0    v24.3.0 

Recording:
https://youtu.be/-CrtjPsVbUY

@rimolive
Copy link
Member Author

rimolive commented Jul 8, 2024

@liuqi @xujinheng Can you please confirm if you are testing Kubeflow 1.9.0-rc.2 manifests and let us know if it looks good?

@rimolive
Copy link
Member Author

rimolive commented Jul 8, 2024

@yhwang Please let us know if you tested Kubeflow 1.9.0-rc.2 and it looks good.

@yhwang
Copy link
Member

yhwang commented Jul 8, 2024

/lgtm

verified the 1.9.0-rc.2 on IKS using the following settings:

  • oauth2-proxy + dex
  • oauth2-proxy + App Id
    Both went well. I ran several pipelines examples, katib, kserve, and notebook server. All good!

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Jul 9, 2024

@xujinheng
Copy link
Member

Yes, we are currently testing Kubeflow 1.9.0-rc2. Once we complete our testing, we will post the results here to keep you informed.

@nagar-ajay
Copy link

/lgtm - verified workflows mentioned in the tracking issue. nutanix/kubeflow-manifests#21

@juliusvonkohout
Copy link
Member

Yes, we are currently testing Kubeflow 1.9.0-rc2. Once we complete our testing, we will post the results here to keep you informed.

As mentioned above, rc.2 does not contain all fixes.

@tiansiyuan
Copy link
Contributor

tiansiyuan commented Jul 10, 2024 via email

@rimolive
Copy link
Member Author

@tiansiyuan This thread is exclusively to track work with the Kubeflow Distribution owners to test 1.9 release. Please open an issue in https://github.com/kserve/models-web-app

@rimolive
Copy link
Member Author

rimolive commented Jul 10, 2024

This is the current status of the Distribution Testing on July 10th:

Distribution Representative(s) State
Charmed Kubeflow @DnPlas Pending
IBM IKS @Tomcli
@yhwang
LGTM
Nutanix @johnugeorge
@nagar-ajay
LGTM
Red Hat OpenShift AI @rimolive Pending
Oracle Cloud Infrastructure @julioo Pending
DeployKF @thesuperzapper Pending
VMWare @liuqi
@xujinheng
Pending
QBO @alexeadem LGTM

We need your updates as quick as possible as our release date is July 22nd and in case of any bug reports we can take actions on time.

@tiansiyuan
Copy link
Contributor

tiansiyuan commented Jul 10, 2024 via email

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Jul 11, 2024

Hello, please retest with the 1.9 branch https://github.com/kubeflow/manifests/commits/v1.9-branch/ given the merge of #2795 Testing RC.2 is not enough.

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Jul 11, 2024

If no further bugs come up, i will synchronize any last-minute release tags from the other working groups on June 20-21 and do the change log and final release on July 22.

@rimolive if i do not get any more final releases/tags from the other WGs, i probably have to release as is on july 22. You can also decide as release manager that we cut RC.3 and delay the final release.

@rimolive
Copy link
Member Author

@juliusvonkohout hold on, we need the remaining WGs to cut their final releases. We cannot release 1.9 with components in RC releases.

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Jul 11, 2024

@juliusvonkohout hold on, we need the remaining WGs to cut their final releases. We cannot release 1.9 with components in RC releases.

To cite myself from a few messages above: "if i do not get any more final releases/tags from the other WGs, i probably have to release as is on July 22. You can also decide as release manager that we cut RC.3 and delay the final release."

I wont do anything today and when i am on vacation :-D As i said July 20-22 is when I can do the remaining stuff.
But you have to decide what we do if the final releases/tags from other WGs are not available on July 22. This could be the case and was the case in the previous releases. If this is the case, the question arises whether you want to release anyway, or cut an RC.3 on July 22 and delay the final release.
Just think about it ;-)

@rimolive
Copy link
Member Author

This is the current status of the Distribution Testing on July 15th:

Distribution Representative(s) State
Charmed Kubeflow @DnPlas Pending
IBM IKS @Tomcli
@yhwang
LGTM
Nutanix @johnugeorge
@nagar-ajay
LGTM
Red Hat OpenShift AI @rimolive Pending
Oracle Cloud Infrastructure @julioo Pending
DeployKF @thesuperzapper Pending
VMWare @liuqi
@xujinheng
Pending
QBO @alexeadem LGTM

We had no changes in 5 days, and next week it's the Release date for 1.9. Please send us your updates so we can guarantee all Distributions are good with the release.

@thesuperzapper
Copy link
Member

@juliusvonkohout @rimolive @StefanoFioravanzo I have cut the final v1.9.0 tag for the kubeflow/kubeflow repo, feel free to sync the manifests for this tag into kubeflow/manifests.

@DnPlas
Copy link
Contributor

DnPlas commented Jul 22, 2024

Hey @rimolive, here is my latest update:

version: 1.9.0-rc.2
platform:

  • ubuntu 22.04
  • microk8s (k8s) 1.29
    tested with oidc-authservice latest version

So far it is looking good, so for that version /lgtm.

@rimolive
Copy link
Member Author

Hello,

This is the status for today July 22nd:

Distribution Representative(s) State
Charmed Kubeflow @DnPlas LGTM
IBM IKS @Tomcli
@yhwang
LGTM
Nutanix @johnugeorge
@nagar-ajay
LGTM
Red Hat OpenShift AI @rimolive LGTM
Oracle Cloud Infrastructure @julioo Pending
DeployKF @thesuperzapper Pending
VMWare @liuqi
@xujinheng
Pending
QBO @alexeadem LGTM

We see the majority of distributions agreed on the state of the release. Thank you so much for everyone involved in the testing. We'll keep receiving feedbacks for cases we can consider work on patch releases for 1.9.

@juliusvonkohout
Copy link
Member

Is someone here encountering this bug/PR ?

#2815
#2812
#2766

It has not been changed in 7 months https://github.com/kubeflow/manifests/commits/master/common/dex/base/config-map.yaml , but some users are complaining

@alexeadem
Copy link

Is someone here encountering this bug/PR ?

#2815 #2812 #2766

It has not been changed in 7 months https://github.com/kubeflow/manifests/commits/master/common/dex/base/config-map.yaml , but some users are complaining

not in QBO

Needs Triage automation moved this from To Do to Closed Aug 3, 2024
@kubeflow-bot kubeflow-bot removed this from Closed in Needs Triage Aug 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

No branches or pull requests