[tests][dask] replace client fixture with cluster fixture #4159

jmoralez · 2021-04-03T06:10:43Z

This replaces the client fixture (which creates and tears down a LocalCluster for every test) with a cluster fixture that creates a cluster once for all the tests and then every test creates a client based on it. This reduces the time it takes for test_dask to run significantly.

I replaced all the client.close(timeout=CLIENT_CLOSE_TIMEOUT) with a context manager for the client, so the diff seems awful but there's an option to hide whitespace changes in github which can be toggled in the little gears when reviewing the files. If this is too much I can just add a client = Client(cluster) at the top of every test instead, but I thought it'd be nice to unify all the tests to use this context manager.

I modified test_errors to use a regular cluster (instead of the async one) as well.

I also had to add a client.restart in test_machines_should_be_used_if_provided because that triggers an error on the worker which had local_listen_port already bound but the other one keeps waiting, so adding the restart resets that worker.

In test_error_on_feature_parallel_tree_learner I had to add a client.rebalance because the error wasn't always being triggered. I assume it's because the test data had two partitions and there was a chance that they both went to the same worker and we werent performing distributed training, which inspired #4149.

jmoralez · 2021-04-03T23:57:53Z

I just thought that this could also be achieved by defining a new client fixture that just creates a client from the cluster fixture defined here. This would involve minimal changes to the existing code (only the client that the tests receive would be different, but the code would stay the same). Should I go this way?

jameslamb

Thanks so much for doing this!

It looks like this might save on the order of 16 minutes in our CI jobs, that is an AMAZING improvement. 🙌

Linux regular on master (logs)

===== 510 passed, 6 skipped, 2 xfailed, 435 warnings in 1221.07s (0:20:21) =====

Linux regular on this PR (logs)

===== 510 passed, 6 skipped, 2 xfailed, 340 warnings in 204.23s (0:03:24) ======

I'm +1 on this PR, just left one small suggestion. I've also provided some more answers and information below.

I just thought that this could also be achieved by defining a new client fixture that just creates a client from the cluster fixture defined here. This would involve minimal changes to the existing code (only the client that the tests receive would be different, but the code would stay the same). Should I go this way?

If this is just for the sake of code organization but wouldn't have any functional or performance benefit, I'd rather not. Using fixtures of fixtures makes me (maybe irrationally) nervous. I'm ok with the changes you've made in this PR, passing in a cluster and then creating a client from it. I'm also ok with using the client context manager to handle teardowns instead of Client.close().

In the future, when you open a PR or issue that comes from some previous discussion, please link that discussion. That helps anyone who finds this PR from search to find all of the relevant context, and helps reviewers who might not have been part of that original conversation to learn what they need to give an effective review.

In this case, I'm linking #3829 (comment).

You don't have to close and re-open PRs on GitHub to re-trigger CI. In most projects, you can just push an empty commit. In LightGBM we have a slight preference for that because a new commit means new builds, which means it is possible to view the logs from the previous builds, which can be useful in debugging. Some CI services overwrite builds if you close and re-open. This is why, for example, only the most recent build of this PR is currently available on GitHub Actions, even though there have been two.

You can push an empty commit like this:

git commit --allow-empty -m "empty commit"
git push tests/cluster-fixture

tests/python_package_test/test_dask.py

jmoralez · 2021-04-04T03:30:08Z

If this is just for the sake of code organization but wouldn't have any functional or performance benefit, I'd rather not.

Yes, this was just to avoid the (seemingly) huge diff.

In the future, when you open a PR or issue that comes from some previous discussion, please link that discussion.

Yeah, that's my bad. I saw you do it on some previous PRs and forgot to do it here.

You don't have to close and re-open PRs on GitHub to re-trigger CI. In most projects, you can just push an empty commit. In LightGBM we have a slight preference for that because a new commit means new builds, which means it is possible to view the logs from the previous builds, which can be useful in debugging. Some CI services overwrite builds if you close and re-open. This is why, for example, only the most recent build of this PR is currently available on GitHub Actions, even though there have been two.

I usually do that but when I opened this it only triggered four actions (WIP, license/cla, continuous-integraction/appveyor/pr and Microsoft.LightGBM) and only one ran (WIP), the other three got stuck in "queued" which seemed weird. I didn't try the empty commit because I thought something had gone wrong when opening this PR (it took like 3 minutes for me to be able to see it, I thought I had missclicked somewhere). So I don't believe anything was lost in this case but will definitely try the empty commit first from now on.

jameslamb · 2021-04-04T04:10:22Z

You can safely ignore the R-package CI failures, they're not related to this PR's changes. They are part of #4160.

jameslamb

Looks great, thanks! Once #4161 is merged, the R tests should be fixed and we'll be able to merge this.

jameslamb · 2021-04-05T15:42:39Z

Ok @jmoralez , I think CI should be ok now. At your earliest convenience, can you update this to the latest master? Or if you give me permission, I can do that directly.

github-actions · 2023-08-23T22:51:01Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

replace client fixture with cluster fixture

e09dbf3

jmoralez requested a review from jameslamb as a code owner April 3, 2021 06:10

jmoralez closed this Apr 3, 2021

jmoralez reopened this Apr 3, 2021

jameslamb requested changes Apr 4, 2021

View reviewed changes

tests/python_package_test/test_dask.py Show resolved Hide resolved

tests/python_package_test/test_dask.py Show resolved Hide resolved

wait on persist before rebalance

51f437f

jameslamb approved these changes Apr 4, 2021

View reviewed changes

merge master

771eaca

StrikerRUS added the maintenance label Apr 5, 2021

StrikerRUS merged commit 965b9fc into microsoft:master Apr 5, 2021

jmoralez deleted the tests/cluster-fixture branch April 6, 2021 00:12

jmoralez mentioned this pull request Apr 6, 2021

Correct Way to Setup PyTest Fixture dask/distributed#3540

Closed

jameslamb mentioned this pull request Apr 6, 2021

[dask] add support for eval sets and custom eval functions #4101

Merged

jameslamb mentioned this pull request Apr 27, 2021

[dask] run Dask tests on aarch64 architecture #3996

Merged

jameslamb mentioned this pull request Nov 5, 2021

[dask] [python] lightgbm.dask hangs indefinitely after an error #4771

Open

jmoralez mentioned this pull request Nov 5, 2021

[tests][dask] fix hangs after exception in worker #4772

Closed

jmoralez mentioned this pull request May 30, 2022

[dask] Random failures in Dask tests during teardown #3829

Closed

jmoralez mentioned this pull request Sep 30, 2022

[ci] prefer CPython in Windows test environment and use safer approach for cleaning up network (fixes #5509) #5510

Merged

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tests][dask] replace client fixture with cluster fixture #4159

[tests][dask] replace client fixture with cluster fixture #4159

jmoralez commented Apr 3, 2021 •

edited

Loading

jmoralez commented Apr 3, 2021

jameslamb left a comment

jmoralez commented Apr 4, 2021

jameslamb commented Apr 4, 2021

jameslamb left a comment

jameslamb commented Apr 5, 2021

github-actions bot commented Aug 23, 2023

[tests][dask] replace client fixture with cluster fixture #4159

[tests][dask] replace client fixture with cluster fixture #4159

Conversation

jmoralez commented Apr 3, 2021 • edited Loading

jmoralez commented Apr 3, 2021

jameslamb left a comment

Choose a reason for hiding this comment

jmoralez commented Apr 4, 2021

jameslamb commented Apr 4, 2021

jameslamb left a comment

Choose a reason for hiding this comment

jameslamb commented Apr 5, 2021

github-actions bot commented Aug 23, 2023

jmoralez commented Apr 3, 2021 •

edited

Loading