Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adaptive scheduler #85

Merged
merged 50 commits into from
Jun 20, 2022
Merged

Conversation

petervdonovan
Copy link
Contributor

@petervdonovan petervdonovan commented May 31, 2022

To clarify -- this is the scheduler that previously had the vague name "heuristic."

For details, please see PR #1207 in the main repository.

@petervdonovan petervdonovan force-pushed the scaling-wrt-threads-runtime-experiments branch 2 times, most recently from d32053a to 56bc3fb Compare May 31, 2022 16:24
A timer event seems to be occasionally dropped in TimeLimitThreaded.
TODO: compute_number_of_workers should be reduced correspondingly?
Marked improvement observed locally for RadixSort, Counting.
This optimization is lifted directly from the NP scheduler in response to visible differences in the number of cycles spent accessing mutexes.

Clear improvement is observed locally for the Counting benchmark.
This is another step in the direction of the NP scheduler. The condition for level advancement should be that _everyone_ is sleeping; otherwise, we cannot optimize out the lock for level advancement without introducing a race condition.
Some of our benchmarks need either worker affinity OR use of just one worker, so that a particular reaction is always executed by the same worker. Perhaps worker affinity is easier to add. However, it must be combined with some sort of load-balancing strategy.
This is a very crude implementation that should be improved upon.
*Although this fixes a race condition, I have not tested if this fixes "the" race condition. The bug manifested itself so rarely that it was infeasible to reproduce.
Symptom: A thread goes to the event queue to wait for the next event and ends up waiting forever (while the other threads wait for work). This is possible only in SleepingBarber due to use of physical actions -- not in other benchmarks. This bug is not caught by our tests.
The level counter is part of the predicate associated with the condition variable. It is unsafe to change without acquiring a mutex.
If I am right, expected trials required to go from an intermediate number of workers to an extreme (1 or the maximum) number of workers was previously quadtratic in # of workers (cuz random walk) -- no good. This should make it (log(n))^2.
This kind of bug reflects lack of care on my part in finding a clean implementation. Some cleanup will be necessary if this is ever to be merged.
I did not observe a race condition, but in principle I think there was one. For this reason, this change could not be justified by empirical results.
Again, this is another optimization borrowed from the NP scheduler. The difference is that the NP scheduler is designed such that this optimization needs no explicit code to handle it; there, it just works.
@petervdonovan petervdonovan force-pushed the scaling-wrt-threads-runtime-experiments branch 6 times, most recently from f7b48fd to e2ee83f Compare June 3, 2022 21:41
@petervdonovan petervdonovan force-pushed the scaling-wrt-threads-runtime-experiments branch from e2ee83f to d682724 Compare June 3, 2022 22:55
By "unrelated" I mean, "unrelated to the hard-to-reproduce race condition that is manifesting itself in LoopDistributedCentralized in CI."
@petervdonovan petervdonovan force-pushed the scaling-wrt-threads-runtime-experiments branch 2 times, most recently from 504885c to 5f3b2f0 Compare June 7, 2022 01:31
@petervdonovan petervdonovan force-pushed the scaling-wrt-threads-runtime-experiments branch from 5f3b2f0 to c19bb3d Compare June 7, 2022 03:59
@petervdonovan petervdonovan force-pushed the scaling-wrt-threads-runtime-experiments branch from b6cb486 to d23d17f Compare June 9, 2022 06:11
@petervdonovan petervdonovan changed the title Heuristic-based scheduler Adaptive scheduler Jun 9, 2022
@petervdonovan petervdonovan marked this pull request as ready for review June 9, 2022 17:36
Copy link
Contributor

@Soroosh129 Soroosh129 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@petervdonovan petervdonovan merged commit ec93fdf into main Jun 20, 2022
@petervdonovan petervdonovan deleted the scaling-wrt-threads-runtime-experiments branch June 20, 2022 17:08
@lhstrh lhstrh added the feature New feature label Jan 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants