Performance of 0.17 on V100 (Google Compute): 2000 n/s vs 6000 n/s #2335

ghost · 2019-04-12T11:19:51Z

On Facebook [1] someone mentioned getting an average of 6000 n/s on a V100 using network #220.

I set up just such an instance yesterday (6 vCPUs, 1 V100), using Ubuntu 18.04 and CUDA 10, and am "only" getting around 1750 n/s (without "-t"), or at most 2100 n/s (using "-t 16").

Here is the output of "./leelaz -w best-network.gz":

Using OpenCL batch size of 5
Using 10 thread(s).
RNG seed: 11358463697549930105
Leela Zero 0.17 Copyright (C) 2017-2019 Gian-Carlo Pascutto and contributors
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under certain conditions; see the COPYING file for details.

BLAS Core: built-in Eigen 3.3.7 library.
Detecting residual layers...v1...256 channels...40 blocks.
Initializing OpenCL (autodetecting precision).
Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2 CUDA 10.1.133
Platform profile: FULL_PROFILE
Platform name: NVIDIA CUDA
Platform vendor: NVIDIA Corporation
Device ID: 0
Device name: Tesla V100-SXM2-16GB
Device type: GPU
Device vendor: NVIDIA Corporation
Device driver: 418.56
Device speed: 1530 MHz
Device cores: 80 CU
Device score: 1112
Selected platform: NVIDIA CUDA
Selected device: Tesla V100-SXM2-16GB
with OpenCL 1.2 capability.
Half precision compute support: No.
Tensor Core support: Yes.
OpenCL: using fp16/half or tensor core compute support.
Loaded existing SGEMM tuning.
Wavefront/Warp size: 32
Max workgroup size: 1024
Max workgroup dimensions: 1024 1024 64
Setting max tree size to 3736 MiB and cache size to 415 MiB.

Is there anything I could do to also get 6000 n/s? Do others get that performance as well?

[1] https://www.facebook.com/groups/go.igo.weiqi.baduk/permalink/10157283599366514/

nerai · 2019-04-12T12:14:55Z

You refer to n/s, which is an unreliable metric. A V100 should get something between 2k and 8k n/s, depending on the board and game.

It is better to also measure evals/s, which directly shows how fast the GPU is (compared to n/s, which describes the combination of CPU and GPU). A V100 with 0.17 is probably around 2k evals/s. As for the maximum possible, I will publish a couple tables about this in a few weeks.

ozymandias8 · 2019-04-12T23:57:04Z

My v100 on google cloud is averaging one game per 108 seconds, or 473 ms/move, over the last ~800 games. Not sure what that translates to in n/s but it is quite a bit faster than my local GTX 1060 6GB. THe script I'm using runs two games simultaneously, and the creator of the script claims it is significantly faster than running one game at a time.

nerai · 2019-04-13T03:49:03Z

@ozymandias8 It translates to 3400 n/s (which is in the expected range)

zhanzhenzhen · 2019-04-13T07:11:50Z

I get 1600 n/s on V100. Maybe their 6000 n/s is only for the first move, which is 4x faster because of symmetric things? Or maybe it's actually a 4-GPU machine?

lonemonkeywithwhiteshell · 2019-04-30T01:44:49Z

I'm sad that my v100 on google cloud is averaging 2700ms/move over the ~300 games.
I don't know what to do...

ozymandias8 · 2019-04-30T10:51:59Z

Are you using the script from:

#1905

lonemonkeywithwhiteshell · 2019-05-01T03:30:41Z

I tried master branch script.But error happened.
No glanceslib. Setting up glanceslib and all other leela-zero packages.
root@instance-fstbrnc:~# exit
logout
Hit:1 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic InRelease
Hit:2 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates InRelease
Hit:3 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-backports InRelease
Hit:4 http://archive.canonical.com/ubuntu bionic InRelease
Hit:5 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu bionic InRelease
Hit:6 http://security.ubuntu.com/ubuntu bionic-security InRelease
Reading package lists... Done
Hit:1 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic InRelease
Hit:2 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates InRelease
Hit:3 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-backports InRelease
Hit:4 http://archive.canonical.com/ubuntu bionic InRelease
Hit:5 http://security.ubuntu.com/ubuntu bionic-security InRelease
Hit:6 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu bionic InRelease
Reading package lists... Done
E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable)
E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?

I just copied&pasted the script...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of 0.17 on V100 (Google Compute): 2000 n/s vs 6000 n/s #2335

Performance of 0.17 on V100 (Google Compute): 2000 n/s vs 6000 n/s #2335

ghost commented Apr 12, 2019

nerai commented Apr 12, 2019 •

edited

Loading

ozymandias8 commented Apr 12, 2019 •

edited

Loading

nerai commented Apr 13, 2019

zhanzhenzhen commented Apr 13, 2019

lonemonkeywithwhiteshell commented Apr 30, 2019

ozymandias8 commented Apr 30, 2019

lonemonkeywithwhiteshell commented May 1, 2019

Performance of 0.17 on V100 (Google Compute): 2000 n/s vs 6000 n/s #2335

Performance of 0.17 on V100 (Google Compute): 2000 n/s vs 6000 n/s #2335

Comments

ghost commented Apr 12, 2019

nerai commented Apr 12, 2019 • edited Loading

ozymandias8 commented Apr 12, 2019 • edited Loading

nerai commented Apr 13, 2019

zhanzhenzhen commented Apr 13, 2019

lonemonkeywithwhiteshell commented Apr 30, 2019

ozymandias8 commented Apr 30, 2019

lonemonkeywithwhiteshell commented May 1, 2019

nerai commented Apr 12, 2019 •

edited

Loading

ozymandias8 commented Apr 12, 2019 •

edited

Loading