Tensor accum #5

godmoves · 2019-04-26T14:15:07Z

Try to merge 'tensor-accum-0.17' branch into 'distributed'

… commandqueue. (Reverted "one queue for each GPU".)

…ance.

…r with some compilers.

Pull request leela-zero#2033.

Tuning output

Tensorcore -> fastexit

Added extra support for "TM" and "OT" and other sgf time control properties on printsgf and loadsgf GTP commands. * Added parsing and loading of "TM" and "OT" sgf properties on GTP command loadsgf. Only supports "OT" syntax matching output from a printsgf GTP command. * Change SGFTree to have a shared_ptr for a time control. * Added saving and loading of "BL", "WL", "OB" and "OW" sgf properties on GTP commands printsgf and loadsgf. * Change to make TimeControl::make_from_text_sgf() a time control factory and other minor tidying. Pull request leela-zero#2172.

As noted in pull request leela-zero#2172, the default constructor set byo yomi stones but no time or periods.

We currently will either crash or do strange things if we're fed a weights file that doesn't match the board size we're compiled for. See issue leela-zero#2289.

Add an lz-analyze tag to suggest the minimum amount of moves the engine should post info about (rather than only those it considers interesting, i.e. the ones with at least a visit). This allows some very flexible constructs: Getting a heatmap: lz-setoption name visits value 1 lz-analyze interval 1 minmoves 361 Forcing a move among the top policy moves only: lz-setoption name visits value 1 lz-analyze interval 1 minmoves 2 (store those moves, e.g. A1, B1) lz-setoption name visits value 0 lz-genmove_analyze b interval 1 allow b A1 1 allow b B1 1

Only pass when winning or low on legal moves. Disabled in self-play. Fixes issue leela-zero#2273. Based on pull request leela-zero#2277. Pull request leela-zero#2301.

Adding the minmoves tag exposes a small bug in the PV output formatting. Avoid extra blank spaces. Small style fixups.

@gjm11

As pointed out by @gjm11 in leela-zero#2277, when there's few legal moves we might want to allow passing even if this loses on the board count. The alternative might be to self-destruct large groups and carry the game on endlessely even if the policy wouldn't want to. No difference in "dumbpass" mode.

Seems like the previous test regex is causing MSVC's regex engine to run out of stack space.

See issue leela-zero#2280. Pull request leela-zero#2302.

leela-zero's default build directory is `build`. It is very annoying when using leela as a git submodule that the repository updates whenever it builds. Pull request leela-zero#2199.

Group evaluations and run them in parallel. Roughly 50% speedup on my setup, but there are a couple of points that is debatable. - Thread / batch sizing heuristics : This PR changes how the default threads / default batch sizes are picked. See Leela.cpp - Batch-forming heuristic : See OpenCLScheduler.cpp for the batch forming heuristic : the heuristic exists so that we can wait for the rest of the engine to create more NN evaluations so that we can run larger batches. We can't wait indefinitely since there are cases we enter 'serial' paths. Since heuristics are heuristics, these might need some tests on a larger variety of types of systems. Did make sure that winrate improves when running default vs. default command line `./leelaz -w (weight file)` on time parity. Pull request leela-zero#2188.

* Calculate node variance. * Use normal distribution LCB to choose the played move. * Cached student-t. * Sort lz-analyze output according to LCB. * Don't choose nodes with very few visits even if LCB is better. Guard against NN misevaluations when top move has lot of visits. Without this it's possible for move with few hundred visits to be picked over a move with over ten thousand visits. The problem is that the evaluation distribution isn't really normal distribution. Evaluations correlate and the distribution can change if deeper in the tree it finds a better alternative. Pull request leela-zero#2290.

* Add mixed precision training support. * Do not use loss scale if training with fp32 * Fix potential reg_term overflow of large networks. Pull request leela-zero#2191.

Don't autodetect or default to fp32 when all cards have Tensor Cores. We will assume fp16 is the fastest. This avoids problems in tune-only mode which does not detect the precision to use and would use fp32 on such cards. Pull request leela-zero#2312.

We have a first implementation of batching now.

AutoGTP will always send --batchsize, but CPU only compiles don't support the option. Ignore the option in those builds. The same problem exists with --tune-only, but quitting immediately happens to be sane behavior so we don't need to fix that. Pull request leela-zero#2313.

It will recursively include OpenCL.h and that is bad. Pull request leela-zero#2314.

# Conflicts: # src/Network.cpp # src/Network.h # src/OpenCLScheduler.cpp # src/UCTNode.cpp # src/UCTSearch.cpp # src/kernels/convolve3.opencl

# Conflicts: # src/GTP.cpp # src/Leela.cpp # src/UCTSearch.cpp # src/UCTSearch.h

# Conflicts: # src/OpenCL.cpp # src/Tuner.cpp # src/kernels/clblast/hgemm_tensorcore.opencl # src/kernels/tensorcore_test.opencl

# Conflicts: # src/GTP.cpp # src/GTP.h # src/Leela.cpp # src/OpenCL.cpp # src/OpenCL.h # src/OpenCLScheduler.cpp # src/OpenCLScheduler.h # src/UCTSearch.cpp

# Conflicts: # src/GTP.cpp # src/Leela.cpp # src/OpenCLScheduler.cpp # src/Training.cpp # src/UCTNode.cpp # src/UCTNode.h # src/UCTNodePointer.h # src/UCTSearch.cpp

…into tensor-accum

ihavnoid and others added 30 commits October 14, 2018 07:28

Command line parsing : OPENGL --> OPENCL

9b1b7cf

Asynchronous simulation / evaluation+backup for batching.

5fa9a0b

temp commit.

c682e1c

New fractional backup implementation.

321bb55

reorder children after Dirichlet noise + minor fix.

0050742

Fix for compiler syntax nitpick.

e6b03d7

Once again...

59aac54

Output max queue length.

71fbd7a

One queue for each GPU.

f02dcb0

Limit max queue size to twice gpucount*batchsize and Serialize OpenCL…

9e3c8c1

… commandqueue. (Reverted "one queue for each GPU".)

temp commits.

c424531

Less variation in speed (pos/s) but seems ~5% slower than max perform…

845bbeb

…ance.

Use accumulated virtual losses to avoid visiting expanding nodes.

16b97d4

Merge branch '1t-batch' into 1t-batch-accum-vl

9f29b98

Fix missing header leading to error with some compiler.

13af6f9

Merge branch '1t-batch' into 1t-batch-accum-vl

0aae063

Fast conclusion of think().

e84b35e

Solve problem with root node expansion when it's in NNCache; Fix erro…

735fc70

…r with some compilers.

Cleanup loop code.

67a0087

Pull request leela-zero#2033.

always output tuning result

4622d95

fixes.

e781dc8

Tensor core support for half precision

9053b9d

Bugfixes

848417f

Merge pull request leela-zero#71 from alreadydone/tuning-output

88f7111

Tuning output

Merge branch '1t-batch-fastexit-tensor' into tensorcore+

89687ad

Merge pull request leela-zero#73 from alreadydone/tensorcore+

f2a1c69

Tensorcore -> fastexit

Use m32n8k16 format instead of m16n16k16 - seems to be a bit faster

77cd296

Merge fixes.

c812505

Code cleanup for tuning for tensorcores

d5f7fb6

Change default to try SA=0 / SA=1 for tensorcore cases

c444a8d

Hersmunch and others added 29 commits April 2, 2019 13:19

Fix inconsistent default timecontrol.

e89e1a7

As noted in pull request leela-zero#2172, the default constructor set byo yomi stones but no time or periods.

Error out if weights are for wrong board size.

f7bf826

We currently will either crash or do strange things if we're fed a weights file that doesn't match the board size we're compiled for. See issue leela-zero#2289.

Ignore passing moves unless they make sense.

1a4538a

Only pass when winning or low on legal moves. Disabled in self-play. Fixes issue leela-zero#2273. Based on pull request leela-zero#2277. Pull request leela-zero#2301.

Fix style, extra spaces in PV output.

1792aa9

Adding the minmoves tag exposes a small bug in the PV output formatting. Avoid extra blank spaces. Small style fixups.

Rework test regex for MSVC limits.

3b3fd08

Seems like the previous test regex is causing MSVC's regex engine to run out of stack space.

Report root visits in gomill-explain_last_move.

aabfecc

See issue leela-zero#2280. Pull request leela-zero#2302.

.gitignore: Add build.

aad47c1

leela-zero's default build directory is `build`. It is very annoying when using leela as a git submodule that the repository updates whenever it builds. Pull request leela-zero#2199.

Mixed precision training support.

1adcc30

* Add mixed precision training support. * Do not use loss scale if training with fp32 * Fix potential reg_term overflow of large networks. Pull request leela-zero#2191.

Update AUTHORS.

085839f

Update README.md.

2474f73

We have a first implementation of batching now.

Don't include OpenCL scheduler in CPU build.

f664191

It will recursively include OpenCL.h and that is bad. Pull request leela-zero#2314.

Bump version numbers.

3f29788

Merge commit '888156f' into tensor-accum-dev+

be838be

# Conflicts: # src/Network.cpp # src/Network.h # src/OpenCLScheduler.cpp # src/UCTNode.cpp # src/UCTSearch.cpp # src/kernels/convolve3.opencl

Merge commit '294285a' into tensor-accum-dev+

dae6ba4

# Conflicts: # src/GTP.cpp # src/Leela.cpp # src/UCTSearch.cpp # src/UCTSearch.h

Merge commit 'gcp/master~38' into tensor-accum-dev+

be41dc7

# Conflicts: # src/OpenCL.cpp # src/Tuner.cpp # src/kernels/clblast/hgemm_tensorcore.opencl # src/kernels/tensorcore_test.opencl

Merge commit 'gcp/master~28' into tensor-accum-dev+

420708c

Merge commit 'gcp/master~27' into tensor-accum-dev+

48a414b

# Conflicts: # src/GTP.cpp # src/GTP.h # src/Leela.cpp # src/OpenCL.cpp # src/OpenCL.h # src/OpenCLScheduler.cpp # src/OpenCLScheduler.h # src/UCTSearch.cpp

Merge remote-tracking branch 'gcp/master' into tensor-accum-dev+

a350ff0

# Conflicts: # src/GTP.cpp # src/Leela.cpp # src/OpenCLScheduler.cpp # src/Training.cpp # src/UCTNode.cpp # src/UCTNode.h # src/UCTNodePointer.h # src/UCTSearch.cpp

Fix: batch sizes were not set according to command line.

3fb7ac5

Merge branch 'master' of https://github.com/gcp/leela-zero

147d87a

Merge branch 'tensor-accum-0.17' of https://github.com/alreadydone/lz …

c6b6453

…into tensor-accum

Merge branch 'distributed' into tensor-accum

46c2c57

godmoves merged commit cffc01b into distributed Apr 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensor accum #5

Tensor accum #5

godmoves commented Apr 26, 2019

Tensor accum #5

Tensor accum #5

Conversation

godmoves commented Apr 26, 2019