Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid long task lists #1599

Merged
merged 3 commits into from
May 3, 2023
Merged

Conversation

vondele
Copy link
Member

@vondele vondele commented Apr 13, 2023

they make some operations in fishtest much slower, and the db bigger. Additionally larger batches means less idle time at the end of the batch for workers, which is good.

@dubslow
Copy link
Contributor

dubslow commented Apr 14, 2023

Another thing I'd like to see is to ensure that the batch size is a multiple of the worker concurrency.

For instance on the SMP VLTC tests, where I have a concurrency of 3, assigning a batch size of 7 or 8 is intensely silly and wasteful, not to mention the usual concerns about half-idle batches affecting frequency/hyperthreads and thus altering the effective TC.

@vondele
Copy link
Member Author

vondele commented Apr 14, 2023

doesn't matter if the batch size is sufficiently long. The variations in game lengths are much larger.

@ppigazzini ppigazzini added enhancement server server side changes labels Apr 14, 2023
@ppigazzini
Copy link
Collaborator

Applied by hand on PROD.

@dubslow
Copy link
Contributor

dubslow commented Apr 14, 2023

On the other hand, as I stated, there are some cases where the batch size is not sufficiently long.

@dubslow
Copy link
Contributor

dubslow commented Apr 14, 2023

This PR does have an effect on the itp balancing, significantly increasing the latency of response to changing itp, but that should be a small effect outweighed by the benefit to the server. But we should keep an eye on it.

@vondele
Copy link
Member Author

vondele commented Apr 14, 2023

Applied by hand on PROD.

so, this works, larger tasks are being generated. Right now, as the tests have partially already >1000 tasks, the tasks generated are exceptionally large. For new tests, that will essentially never have that many tasks, the behavior will be OK.

@xoto10
Copy link
Contributor

xoto10 commented Apr 15, 2023

How about the 800,000 limit, it was preventing more cores on some jobs with this set to 1000. Is there anything we can do about that?
E.g. change it to 1,000,000?

@ppigazzini ppigazzini added the scaling handle >> 1M worker cores label Apr 15, 2023
@vondele
Copy link
Member Author

vondele commented Apr 15, 2023

I think it might need a bit of a refinement. The quadratic scaling with task number gets too much, tasks get very large. So, I'll damp that down a bit. It will allow more workers to join a task. The 800k, maybe we can indeed increase it a bit. Let me have a look.

they make some operations in fishtest much slower, and the db bigger.
Additionally larger batches means less idle time at the end of the batch for workers,
which is good.
@ppigazzini
Copy link
Collaborator

I think it might need a bit of a refinement. The quadratic scaling with task number gets too much, tasks get very large. So, I'll damp that down a bit. It will allow more workers to join a task. The 800k, maybe we can indeed increase it a bit. Let me have a look.

Let me know if there is a change to be reloaded on PROD (or this was only a squash)

@vondele
Copy link
Member Author

vondele commented Apr 15, 2023

This is somewhat different code. It doesn't need testing on PROD, but it is different from what we had on PROD before.

@vondele
Copy link
Member Author

vondele commented Apr 23, 2023

so, based now on the experience gathered, I limit the growth of the batch size with task_id to a maximum of 5. This will help when many workers land to keep tasks available.
Complements #1627

@ppigazzini ppigazzini merged commit 473c40b into official-stockfish:master May 3, 2023
16 checks passed
@ppigazzini
Copy link
Collaborator

PROD updated, thank you @vondele :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement scaling handle >> 1M worker cores server server side changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants