Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase Nanny close timeout in LocalCUDACluster tests #5636

Conversation

pentschev
Copy link
Member

Tests in CI may fail at times, possibly under high loads only, due to Nanny close timeout, whose internal mechanism to establish timeout to kill processes may leave too little time for the process to shutdown properly.

Dask-CUDA introduced a new IncreasedCloseTimeoutNanny class intended to be used with LocalCUDACluster in tests to reduce chances such timeouts occur. This new class is now used in tests to improve the situation in CI.

Tests in CI may fail at times, possibly under high loads only, due to
`Nanny` close timeout, whose internal mechanism to establish timeout
to kill processes may leave too little time for the process to shutdown
properly.

Dask-CUDA introduced a new `IncreasedCloseTimeoutNanny` class intended
to be used with `LocalCUDACluster` in tests to reduce chances such
timeouts occur. This new class is now used in tests to improve the
situation in CI.
@pentschev pentschev requested a review from a team as a code owner October 30, 2023 17:29
@github-actions github-actions bot added the Cython / Python Cython or Python issue label Oct 30, 2023
@galipremsagar galipremsagar added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Oct 30, 2023
Copy link
Contributor

@csadorf csadorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@csadorf
Copy link
Contributor

csadorf commented Oct 30, 2023

/merge

@rapids-bot rapids-bot bot merged commit 04348ed into rapidsai:branch-23.12 Oct 30, 2023
52 checks passed
@pentschev pentschev deleted the increase-local-cuda-cluster-nanny-close-timeout branch May 20, 2024 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants