Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kubeflow 1.8] release: bump container images to tag 1.8.0-rc.0 and bump release version #7234

Merged

Conversation

DnPlas
Copy link
Contributor

@DnPlas DnPlas commented Aug 11, 2023

Following the releasing steps, update the manifests to use the v1.8.0-rc.1 tag and bump version in version/VERSION.

cc: @annajung @kimwnasptd

@DnPlas DnPlas changed the title release: bump container images to tag 1.8.0-rc.0 and bump release version [kubeflow 1.8] release: bump container images to tag 1.8.0-rc.0 and bump release version Aug 11, 2023
@annajung
Copy link
Member

@DnPlas From the file changes and GitHub actions logs, it's hard to tell why it's failing. Can you help debug this further to gather better logs for the problem? possibly using https://github.com/marketplace/actions/debugging-with-ssh or outputting data as artifact?

@kimwnasptd
Copy link
Member

@DnPlas thanks for the PR! Looking into those to unblock us. Most probably it's a race on how we check for some manifests to be there. I'll keep you posted

@kimwnasptd
Copy link
Member

kimwnasptd commented Aug 22, 2023

I'm pretty sure we have the same race in the tests in manifests, where we check for the Trial CR before it's created kubeflow/manifests#2508 (comment). The test code is applying the manifests and then directly waiting for Pods, which might not have yet been created

kubectl wait pods -n kubeflow -l app=notebook-controller --for=condition=Ready --timeout=300s

The solution here should be in each test to either:

  1. wait for the Deployment/StatefulSet to create the Pods first and then wait for the Pods
  2. wait for the Pods to get created, before waiting for them with kubectl wait

I'll send a follow-up PR to fix this. Ideally I'd like to avoid adding sleep commands as there's always a chance the bug might happen again. If though it will take way too much time to code I'll switch to this simple approach to unblock us for now

@DnPlas
Copy link
Contributor Author

DnPlas commented Aug 25, 2023

I submitted a set of commits correcting the VERSION and image tags (1.8.0-rc.0 -> v1.8.0-rc.0). I also added the pvc viewer component to the update-manifests-images script to generate the corresponding images, and finally added the suggestions from @kimwnasptd to fix the CI.

Kindly review the changes, if everything looks good, let's merge.

@DnPlas
Copy link
Contributor Author

DnPlas commented Aug 25, 2023

One of the failing latest CI run shows the following error:

deployment.apps/tensorboard-controller-deployment created
error: timed out waiting for the condition on deployments/tensorboard-controller-deployment
Error: Process completed with exit code 1.

I increased the timeout, but I'm not sure this is helping.

@DnPlas
Copy link
Contributor Author

DnPlas commented Sep 7, 2023

Reopening PR as it was mistakenly closed by #7263

@DnPlas DnPlas reopened this Sep 7, 2023
This commit updates the release tag for container images and
adds pvc-viewer to update-manifests-images script
@kimwnasptd
Copy link
Member

Thanks for coordinating and pushing this @DnPlas!

/lgtm
/approve

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kimwnasptd

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 309873a into kubeflow:v1.8-branch Sep 7, 2023
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants