[tune] Track live trials in a set in the TrialRunner to reduce linear scans #15811

krfricke · 2021-05-14T12:53:28Z

Why are these changes needed?

We're scanning through all trials several times on each invocation of TrialRunner.step(). To optimize this in the case of many trials, we can reduce this by tracking live (non-terminated) trials in a separate set and looping through this.

Related issue number

Closes #15504

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

… scans

krfricke · 2021-05-14T16:02:18Z

We should add a test to make sure bookkeeping works.

krfricke · 2021-05-14T16:29:24Z

cc @richardliaw @max0x7ba

krfricke · 2021-06-16T08:29:05Z

cc @richardliaw can you take a look?

richardliaw

nice!

Kai Fricke added 2 commits May 14, 2021 13:49

[tune] Track live trials in a set in the TrialRunner to reduce linear…

bed0deb

… scans

Stick with full trial list in scheduler decisions

fafc270

Merge branch 'master' into tune-live-trials

28132df

krfricke mentioned this pull request May 19, 2021

Ray Tune doesn't scale, scheduling performance degrades to less than 25% worker utilization with 32 workers. #15504

Closed

2 tasks

krfricke marked this pull request as ready for review June 9, 2021 12:34

krfricke requested a review from richardliaw June 9, 2021 12:34

krfricke assigned richardliaw Jun 9, 2021

richardliaw approved these changes Jun 16, 2021

View reviewed changes

richardliaw merged commit e547a27 into ray-project:master Jun 17, 2021

krfricke deleted the tune-live-trials branch June 17, 2021 08:45

krfricke mentioned this pull request Jul 12, 2021

[tune] workers get stuck in ray.tune.report #15751

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tune] Track live trials in a set in the TrialRunner to reduce linear scans #15811

[tune] Track live trials in a set in the TrialRunner to reduce linear scans #15811

krfricke commented May 14, 2021

krfricke commented May 14, 2021

krfricke commented May 14, 2021

krfricke commented Jun 16, 2021

richardliaw left a comment

[tune] Track live trials in a set in the TrialRunner to reduce linear scans #15811

[tune] Track live trials in a set in the TrialRunner to reduce linear scans #15811

Conversation

krfricke commented May 14, 2021

Why are these changes needed?

Related issue number

Checks

krfricke commented May 14, 2021

krfricke commented May 14, 2021

krfricke commented Jun 16, 2021

richardliaw left a comment

Choose a reason for hiding this comment