Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v0.5.0 requirements #438

Closed
6 of 10 tasks
danielvegamyhre opened this issue Feb 29, 2024 · 7 comments
Closed
6 of 10 tasks

Release v0.5.0 requirements #438

danielvegamyhre opened this issue Feb 29, 2024 · 7 comments

Comments

@danielvegamyhre
Copy link
Contributor Author

I think we're good to proceed with the release, I'll do it tomorrow.

@ahg-g
Copy link
Contributor

ahg-g commented Apr 10, 2024

I think we're good to proceed with the release, I'll do it tomorrow.

Can we include the status update PR?

@danielvegamyhre
Copy link
Contributor Author

danielvegamyhre commented Apr 10, 2024

I think we're good to proceed with the release, I'll do it tomorrow.

Can we include the status update PR?

I'll try, it's almost done but after rebasing to include the TTL changes, there seems to be some bug / race condition causing the TTL feature to not work. The TTL integration test passes when I step through in debug mode, but not when run at full speed (it fails here, expected job deletion timestamps are not set).

These kind of issues sometimes take a while to determine the root cause (and I have other deadlines coming up this month I need to mindful of) and I don't want to delay the v0.5.0 too much if we can avoid it, since we are trying to coordinate the release with Kueue's next release on April 15th, so that our changes in #487 can be utilized in their Multi-Kueue work.

I'll see if I can figure it out tomorrow.

@dejanzele
Copy link
Contributor

dejanzele commented Apr 10, 2024

@danielvegamyhre can you describe the issue with TTL? I can work on fixing it.

I never managed to get a failure in the integration test neither locally nor on CI, I do have some experience with deflaking tests, I will try to run to replicate a failure.

@danielvegamyhre
Copy link
Contributor Author

danielvegamyhre commented Apr 10, 2024

@dejanzele that would be super helpful! You can checkout the feature branch in #494 locally and run the integration tests. This line which checks that the expected number of jobs have been deleted fails. Adding print statements, it shows the jobs being listed do not have the deletion timestamp set when the test is run at full speed.

When I add breakpoints in the JobSet controller and step through the logic, the test passes.

To be clear, I don't think your PR implementing TTL support had a bug. The issue seems to be the interaction of my new changes (updating JobSet status only once per reconcile call, at the very end) with the existing TTL logic, causing jobs to not be deleted.

@dejanzele
Copy link
Contributor

dejanzele commented Apr 11, 2024

@danielvegamyhre this should fix it https://github.com/kubernetes-sigs/jobset/pull/494/files#r1560773603, tests pass locally on my side

@danielvegamyhre
Copy link
Contributor Author

v0.5.0 was released in #515

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants