forked from leela-zero/leela-zero
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensor accum #5
Merged
Merged
Tensor accum #5
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… commandqueue. (Reverted "one queue for each GPU".)
…r with some compilers.
Pull request leela-zero#2033.
Tuning output
Tensorcore -> fastexit
Added extra support for "TM" and "OT" and other sgf time control properties on printsgf and loadsgf GTP commands. * Added parsing and loading of "TM" and "OT" sgf properties on GTP command loadsgf. Only supports "OT" syntax matching output from a printsgf GTP command. * Change SGFTree to have a shared_ptr for a time control. * Added saving and loading of "BL", "WL", "OB" and "OW" sgf properties on GTP commands printsgf and loadsgf. * Change to make TimeControl::make_from_text_sgf() a time control factory and other minor tidying. Pull request leela-zero#2172.
As noted in pull request leela-zero#2172, the default constructor set byo yomi stones but no time or periods.
We currently will either crash or do strange things if we're fed a weights file that doesn't match the board size we're compiled for. See issue leela-zero#2289.
Add an lz-analyze tag to suggest the minimum amount of moves the engine should post info about (rather than only those it considers interesting, i.e. the ones with at least a visit). This allows some very flexible constructs: Getting a heatmap: lz-setoption name visits value 1 lz-analyze interval 1 minmoves 361 Forcing a move among the top policy moves only: lz-setoption name visits value 1 lz-analyze interval 1 minmoves 2 (store those moves, e.g. A1, B1) lz-setoption name visits value 0 lz-genmove_analyze b interval 1 allow b A1 1 allow b B1 1
Only pass when winning or low on legal moves. Disabled in self-play. Fixes issue leela-zero#2273. Based on pull request leela-zero#2277. Pull request leela-zero#2301.
Adding the minmoves tag exposes a small bug in the PV output formatting. Avoid extra blank spaces. Small style fixups.
As pointed out by @gjm11 in leela-zero#2277, when there's few legal moves we might want to allow passing even if this loses on the board count. The alternative might be to self-destruct large groups and carry the game on endlessely even if the policy wouldn't want to. No difference in "dumbpass" mode.
Seems like the previous test regex is causing MSVC's regex engine to run out of stack space.
See issue leela-zero#2280. Pull request leela-zero#2302.
leela-zero's default build directory is `build`. It is very annoying when using leela as a git submodule that the repository updates whenever it builds. Pull request leela-zero#2199.
Group evaluations and run them in parallel. Roughly 50% speedup on my setup, but there are a couple of points that is debatable. - Thread / batch sizing heuristics : This PR changes how the default threads / default batch sizes are picked. See Leela.cpp - Batch-forming heuristic : See OpenCLScheduler.cpp for the batch forming heuristic : the heuristic exists so that we can wait for the rest of the engine to create more NN evaluations so that we can run larger batches. We can't wait indefinitely since there are cases we enter 'serial' paths. Since heuristics are heuristics, these might need some tests on a larger variety of types of systems. Did make sure that winrate improves when running default vs. default command line `./leelaz -w (weight file)` on time parity. Pull request leela-zero#2188.
* Calculate node variance. * Use normal distribution LCB to choose the played move. * Cached student-t. * Sort lz-analyze output according to LCB. * Don't choose nodes with very few visits even if LCB is better. Guard against NN misevaluations when top move has lot of visits. Without this it's possible for move with few hundred visits to be picked over a move with over ten thousand visits. The problem is that the evaluation distribution isn't really normal distribution. Evaluations correlate and the distribution can change if deeper in the tree it finds a better alternative. Pull request leela-zero#2290.
* Add mixed precision training support. * Do not use loss scale if training with fp32 * Fix potential reg_term overflow of large networks. Pull request leela-zero#2191.
Don't autodetect or default to fp32 when all cards have Tensor Cores. We will assume fp16 is the fastest. This avoids problems in tune-only mode which does not detect the precision to use and would use fp32 on such cards. Pull request leela-zero#2312.
We have a first implementation of batching now.
AutoGTP will always send --batchsize, but CPU only compiles don't support the option. Ignore the option in those builds. The same problem exists with --tune-only, but quitting immediately happens to be sane behavior so we don't need to fix that. Pull request leela-zero#2313.
It will recursively include OpenCL.h and that is bad. Pull request leela-zero#2314.
# Conflicts: # src/Network.cpp # src/Network.h # src/OpenCLScheduler.cpp # src/UCTNode.cpp # src/UCTSearch.cpp # src/kernels/convolve3.opencl
# Conflicts: # src/GTP.cpp # src/Leela.cpp # src/UCTSearch.cpp # src/UCTSearch.h
# Conflicts: # src/OpenCL.cpp # src/Tuner.cpp # src/kernels/clblast/hgemm_tensorcore.opencl # src/kernels/tensorcore_test.opencl
# Conflicts: # src/GTP.cpp # src/GTP.h # src/Leela.cpp # src/OpenCL.cpp # src/OpenCL.h # src/OpenCLScheduler.cpp # src/OpenCLScheduler.h # src/UCTSearch.cpp
# Conflicts: # src/GTP.cpp # src/Leela.cpp # src/OpenCLScheduler.cpp # src/Training.cpp # src/UCTNode.cpp # src/UCTNode.h # src/UCTNodePointer.h # src/UCTSearch.cpp
…into tensor-accum
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Try to merge 'tensor-accum-0.17' branch into 'distributed'