Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to set faster? #28

Closed
ivysrono opened this issue Jun 22, 2019 · 3 comments
Closed

How to set faster? #28

ivysrono opened this issue Jun 22, 2019 · 3 comments

Comments

@ivysrono
Copy link

numSearchThreads is limitted by CPU or GPU?
nnMaxBatchSize should be equal to numSearchThreads?
Others?

@lightvector
Copy link
Owner

numSearchThreads is the number of CPU threads to use. If your GPU is powerful, it can actually be much higher than the number of CPU cores on your system because you will need many threads in order to feed large enough batches to the GPU to get good GPU use.

nnMaxBatchSize should be be around the number of CPU threads you are using, yes, but how large it needs to be can vary if you are using multiple GPUs instead of 1 GPU.

You will want to use cudaUseFP16 and cudaUseNHWC if your GPU has FP16 tensor cores.

If you are doing long searches with large numbers of visits, and you don't mind using more RAM on your machine, nnCacheSizePowerOfTwo can be increased, and also nnMutexPoolSizePowerOfTwo. I think the config comes with nnCacheSizePowerOfTwo = 18 meaning that it will cache 2^18 = 262144 neural net results, but due to birthday paradox you may start getting some noticeable cache losses once you start searching in the high thousands of visits.

Thanks for the question, hope that helps. I'll add some more documentation about various parameters to the README or otherwise within the release itself before too long, I just haven't done it yet.

@ivysrono
Copy link
Author

Thank you very much.

@ivysrono
Copy link
Author

#29

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants