New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA Toolkit version comparison #160
Comments
I assume this was version 1.3.3, but what size net were you using? |
20 block. |
I compiled with TCMalloc and my 2060 can do 1050/s with 16 threads on CUDA 10.0 Can you try a test compiling against TCMalloc? |
Here's more data based on more threads, more visits (100k), and multiple attempts trials (10 for each setting). I've included the std deviations, and sorted by nnEvals. cuDNN versions are the same from last time. These are with Tcmalloc enabled.
|
For completeness sake, this is my exact command line used to generate the data:
The nix-shell stuff is specific to my OS/package manager, and you'll have to use a very recent version of the nixpkgs (to get cudnn_cudatoolkit_10_2), and the following PR NixOS/nixpkgs#82082 |
It is possible to compile KataGo against various versions of the CUDA toolkit. I've tested the following combinations of CuDNN and CUDA, and did a short performance test (just the default benchmark really) on each.
System is an otherwise unused (i.e. not connected to a monitor) RTX 2080Ti with two Xeons. FP16 and NHWC are both on. Tcmalloc was turned off for these tests.
Essentially, the performance for all of the v10 tests are essentially the same within statistical noise. v10.1 and v10.2 have consistently very slightly higher nnEvals than v10.0. Avoid using v9.2.
(visits/s seem slightly consistently lower in v10.2, but I'm guessing that has more to do with lucky/unlucky tree reuse than with raw power, considering the nnEvals are essentially the same.)
The text was updated successfully, but these errors were encountered: