Pick chosen move based on normal distribution LCB #2290

Ttl · 2019-03-19T19:01:08Z

Similar to lcb_max_root branch by @roy7, but uses normal distribution instead of binomial.

Some tests:

15x192 with 1600 visits:
"-g -d -r 5 -v 1600 --noponder -w f438268ef88e083aaf7a08fba885552818abc08cd0aefce9671954a8df3cf707.gz --precision half --timemanage off", "time_settings 0 1 0".

lz_gauss v lz_next (400/400 games)
board size: 19   komi: 7.5
           wins              black          white        avg cpu
lz_gauss    268 67.00%       118 59.00%     150 75.00%     81.32
lz_next     132 33.00%       50  25.00%     82  41.00%     81.26
                             168 42.00%     232 58.00%

15x192 with 800 visits:
"g -d -r 5 -v 800 --noponder -w 438268ef88e083aaf7a08fba885552818abc08cd0aefce9671954a8df3cf707.gz --precision half --timemanage off" "time_settings 0 1 0".

lz_gauss v lz_next (400/400 games)
board size: 19   komi: 7.5
           wins              black          white        avg cpu
lz_gauss    294 73.50%       136 68.00%     158 79.00%     35.05
lz_next     106 26.50%       42  21.00%     64  32.00%     35.13
                             178 44.50%     222 55.50%

15x192 with 30+1s on GTX 1080 Ti:
"-g -d -r 5 --noponder -w f438268ef88e083aaf7a08fba885552818abc08cd0aefce9671954a8df3cf707.gz --precision half", "time_settings 30 1 1".

lz_gauss v lz_next (400/400 games)
board size: 19   komi: 7.5
           wins              black          white        avg cpu
lz_gauss    250 62.50%       115 57.50%     135 67.50%     42.16
lz_next     150 37.50%       65  32.50%     85  42.50%     42.16
                             180 45.00%     220 55.00%

Latest 40x256 with 120+2s on RTX 2080:
"-g -d -r 5 --noponder -w e5dd6019c73a853466abb2fb9cf502d80d85179619f8ca76676f4eec16a13468.gz --precision half", "time_settings 180 2 1".

lz_gauss v lz_next (400/400 games)
board size: 19   komi: 7.5
           wins              black          white        avg cpu
lz_gauss    234 58.50%       97  48.50%     137 68.50%    312.18
lz_next     166 41.50%       63  31.50%     103 51.50%    288.99
                             160 40.00%     240 60.00%

Remove value correlation parameter. No strength gain.

l1t1 · 2019-03-19T23:45:39Z

is the lcb method the unique method or one option to choose move after merge the pr?

roy7 · 2019-03-19T23:49:44Z

This would make the change for all play, not set by an option.

l1t1 · 2019-03-19T23:53:56Z

it would be better to test on low visits

src/UCTNode.cpp

src/UCTNode.h

src/UCTNodePointer.cpp

src/UCTSearch.cpp

src/Utils.cpp

OmnipotentEntity · 2019-03-21T16:53:32Z

src/UCTNode.cpp

-
-float UCTNode::get_stddev(float default_stddev) const {
-    return m_visits > 1 ? std::sqrt(get_variance()) : default_stddev;
+float UCTNode::get_eval_variance(float default_var) const {


Have you considered using a std::optional here? That way you don't need the default_var parameter. You can just use .value_or(0), this would also allow you to distinguish between cases where the variance is exactly the same as the passed default_var vs when the variance is merely undefined and taking the default_var.

I don't think the default is even used anywhere right now. get_eval_variance is only needed in get_eval_lcb that only calls it when node has at least two visits. Probably not much point in using optional when it's never needed.

std::optional is C++17 btw so not usable for now.

You can use a signaling NaN to let things blow up if it's accidentally used.

Signaling NaN didn't seem to do anything. It seems that it would require changing how floating point exceptions are handled for it to raise exception. Didn't want to go messing with that.

Guard against NN misevaluations when top move has lot of visits. Without this it's possible for move with few hundred visits to be picked over a move with over ten thousand visits. The problem is that the evaluation distribution isn't really normal distribution. Evaluations correlate and the distribution can change if deeper in the tree it finds a better alternative.

Ttl · 2019-03-22T13:48:51Z

I collected some stats on how often move with highest LCB doesn't have also the most visits and it seems that the answer is around 10% of time:

At very low nodes the uncertainty is so big that the move with the best LCB also very likely has also the most visits.

I also logged the ratio of highest LCB move visits to move with the most visits and got the following plot:

There seems to be an unnatural looking spike at very low visits. I suspect the reason for that is that the confidence bounds are underestimated. The reason being is that the evaluation distribution isn't really a normal distribution. If NN misevaluates a position there is a high probability that it also misevaluates nearby positions making the samples correlated. The distribution can also change if during tree search some better option is found deeper in the tree. This violates the i.i.d assumption making the estimated confidence bounds too tight. I have plans to find a better formula for the confidence bounds based on measurements, but that might take a while.

I added a parameter for enforcing a minimum visit ratio of the chosen move. I set it to 10% for now based on the above plot. After the change and with more samples the ratio distribution visit distribution looks like this:

Friday9i · 2019-03-22T14:13:36Z

Nice.
If someone has a windows .exe, I can generate comparison matches at various visits against a recently compiled next LZ : -)

langit7 · 2019-03-23T00:35:12Z

Nice.
If someone has a windows .exe, I can generate comparison matches at various visits against a recently compiled next LZ : -)

http://fira.pw/lz/leelazX64.zip

iopq · 2019-03-23T19:56:01Z

Wouldn't there be the same problem just one ply deeper? Our variation will have the max LCB because the "obvious" policy move didn't refute our variation. It's refuted twenty moves later, but we never visit that line again because it's buried in win rates of other branches

Don't we need to choose the correct refutation by doing min UCB for our opponent? (In other words we want the max black score, our opponent wants the min black score)

OmnipotentEntity · 2019-03-24T13:44:45Z

@iopq AFAIK, this PR does not affect the tree search, only the chosen move. (Though, all evaluations are done from the perspective of the player to move, so a max LCB from black's perspective is a min UCB from white's, no?)

iopq · 2019-03-24T18:58:13Z

It doesn't have to be.

Let's say the best black move is putting opponent in atari. Now, after a lot of searching, we find that it has the best win rate. The reason is that if the opponent doesn't run his group out, we win big. Now, if he does run his group out, his win rate increases. But we can give ANOTHER atari and still increase our win rate, etc.

After a few forced moves if our opponent responds correctly he might have 55% max LCB win rate. But in ALL of the other variations he'll have like 30%, so when we average his value for this node it looks like a winner. Let's call this max LCB node.

But if we look at the visits for our opponent's node, it might ALREADY have 50% min UCB for the correct response, just not total overall. It makes sense to assign our max UCB as the min UCB of our opponent, as long as the cutoff for visits is met (10% right now)

If that works, you can do this iteratively until you no longer meet the cut off 10% (max LCB, min UCB, max LCB, etc.)

then each node's score would actually be dependent on things deeper in the tree for stronger tactical fighting

Ttl · 2019-03-25T14:41:40Z

The latest commit should fix the issue with incorrect sorting when there are very few visits.

gcp · 2019-03-25T16:32:00Z

Seems to be a win in every situation, including fixed time with time-manage on (IIRC a point where it previously failed)?

Should I pull it for 0.17?

roy7 · 2019-03-25T16:45:41Z

Using the normal does seem stronger than using the Binomial, and @Ttl did a cleaner job than my test branch. And he even avoids pruning the max_lcb move during timemanage calculations. :)

This won't hurt training/selfplay in any way right, since the only change is to actual move selection and that just means the resulting selfplay games have stronger moves overall.

src/UCTSearch.cpp

gcp · 2019-03-25T16:47:08Z

src/UCTNode.cpp

-
-float UCTNode::get_stddev(float default_stddev) const {
-    return m_visits > 1 ? std::sqrt(get_variance()) : default_stddev;
+float UCTNode::get_eval_variance(float default_var) const {


You can use a signaling NaN to let things blow up if it's accidentally used.

src/UCTNode.h

src/Utils.h

gcp · 2019-03-25T16:54:07Z

This won't hurt training/selfplay in any way right, since the only change is to actual move selection and that just means the resulting selfplay games have stronger moves overall.

Right.

The amount of gain this gives is signaling that our primitive time allocation could also use a lot of improvement by detecting max_lcb != max_winrate != max_visits, but that wouldn't help fixed time/nodes. I think we can and should do both, eventually.

gcp · 2019-03-25T18:06:40Z

LGTM.

Ttl · 2019-03-25T18:35:01Z

I'm not sure if this is better at very high playouts, but probably not worse either. I ran some tests with 192x15 network and 180+5s time controls on GTX 1080 Ti. This is around 10k playouts / move.

lz_gauss v lz_next (99/400 games)
board size: 19   komi: 7.5
           wins              black         white       avg cpu
lz_gauss     50 50.51%       18 36.00%     32 65.31%    852.98
lz_next      49 49.49%       17 34.69%     32 64.00%    797.97
                             35 35.35%     64 64.65%

I think that the normal distribution confidence bounds are too tight at very high playouts, but it's not very easy to figure out how it should be modified. Testing with high playouts takes too long.

I guess it's okay to merge since it's clearly better at more common visit counts.

l1t1 · 2019-03-26T05:37:01Z

please post the full comand line of validation.exe of two versions leelaz.exe
@Ttl

iopq · 2019-03-26T09:11:28Z

I ran a 400 game match on 9x9 with a strong 9x9 network with 7.5 komi at 1600 visits. In non-duplicate games:

LCB won with Black 7 times, with White 36 times. Standard Leela won with Black 7 times, with White 71 times.

Total record LCB-standard is 208-192

* Calculate node variance. * Use normal distribution LCB to choose the played move. * Cached student-t. * Sort lz-analyze output according to LCB. * Don't choose nodes with very few visits even if LCB is better. Guard against NN misevaluations when top move has lot of visits. Without this it's possible for move with few hundred visits to be picked over a move with over ten thousand visits. The problem is that the evaluation distribution isn't really normal distribution. Evaluations correlate and the distribution can change if deeper in the tree it finds a better alternative. Pull request leela-zero#2290.

l1t1 · 2019-04-01T10:16:23Z

cp from lifein19x19
C:\APPS\l0gpu16\validation.exe -n C:\APPS\net\35824222.gz -o "-g -v 1601 --gpu 0 --gpu 1 --noponder -t 24 -q -d --timemanage off --precision single -w" -n C:\APPS\net\35824222.gz -o "-g -v 1601 --gpu 0 --gpu 1 --noponder -t 24 -q -d --timemanage off --precision single -w" -- C:\APPS\l0gpu16\leelaz -- C:\APPS\l0gpu17beta\leelaz -k 215-215

* Calculate node variance. * Use normal distribution LCB to choose the played move. * Cached student-t. * Sort lz-analyze output according to LCB. * Don't choose nodes with very few visits even if LCB is better. Guard against NN misevaluations when top move has lot of visits. Without this it's possible for move with few hundred visits to be picked over a move with over ten thousand visits. The problem is that the evaluation distribution isn't really normal distribution. Evaluations correlate and the distribution can change if deeper in the tree it finds a better alternative. Pull request #2290.

* Calculate node variance. * Use normal distribution LCB to choose the played move. * Cached student-t. * Sort lz-analyze output according to LCB. * Don't choose nodes with very few visits even if LCB is better. Guard against NN misevaluations when top move has lot of visits. Without this it's possible for move with few hundred visits to be picked over a move with over ten thousand visits. The problem is that the evaluation distribution isn't really normal distribution. Evaluations correlate and the distribution can change if deeper in the tree it finds a better alternative. Pull request leela-zero#2290.

@gjm11

* Command line parsing : OPENGL --> OPENCL * Asynchronous simulation / evaluation+backup for batching. * temp commit. * New fractional backup implementation. * reorder children after Dirichlet noise + minor fix. * Fix for compiler syntax nitpick. * Once again... * Output max queue length. * One queue for each GPU. * Limit max queue size to twice gpucount*batchsize and Serialize OpenCL commandqueue. (Reverted "one queue for each GPU".) * temp commits. * Less variation in speed (pos/s) but seems ~5% slower than max performance. * Use accumulated virtual losses to avoid visiting expanding nodes. * Fix missing header leading to error with some compiler. * Fast conclusion of think(). * Solve problem with root node expansion when it's in NNCache; Fix error with some compilers. * Cleanup loop code. Pull request leela-zero#2033. * always output tuning result * fixes. * Tensor core support for half precision * Bugfixes * Use m32n8k16 format instead of m16n16k16 - seems to be a bit faster * Merge fixes. * Code cleanup for tuning for tensorcores * Change default to try SA=0 / SA=1 for tensorcore cases * Update UCTSearch.cpp * Clear NNCache when clear_board or loadsgf is issued. * Fixes. * Queue insertion/vl undo improvements. * Half precision by default. * hgemm : Added m16n16k16/m32n8k16/m8n32k16 tuning Tuner will see which shaped multiplication is fastest. MDIMA represents the M dimension, NDIMB represents the N dimension. * Tuner : adjusted range for tensorcore cases so that it covers all MDIMA/NDIMB dimensions * Fix bug causing infinite wait. * Fix bug causing infinite wait. * Minor fixes. * Minor fixes. * Crucial fix: infinite wait_expanded. * Tentative fixes. * Follow-up fixes. * Update UCTNode.cpp * stupid typo. * stupid typo. * small fix. * Fix crucial bug in frac-backup factor calculation. * Fix crucial bug in frac-backup factor calculation. * Better output stats. * Defaulted frac-backup; better naming of pending backup stats. * Small fix. * Revert SEL -> WR for get_visits for selection. * Forgotten comment text change. * Make some debug variables atomic. * Renaming a variable; static_cast -> load() * virtual loss in numerator. * Small output fix. * Reorganize pending backup obligations. * Move backup data insertion to Network::get_output0. * Remove statics; bugfixes. * Optimizations? Do not use m_return_queue. * Corrected implementation of virtual loss accumulation. * Missing include. * Modifications that don't achieve good result. * WIP; implemented readers-writer lock. * A snapshot as basis of further changes. * Checkpoint. * Checkpoint: Seamless think/ponder transition implemented. NOT for actual use: This version sends positions to GPUs without limit for stress-testing purposes; will eat up your memory. * Bugfixes and better debug outputs; usable version. * Checkpoint: changes are not done but it compiles. * Checkpoint: moved some members from OpenCLScheduler and OpenCL_Network to OpenCL; compiles. * temp * temp commit; won't compile. * Checkpoint: implementation unfinished, now switch to another design. * Mostly lock-free OpenCLScheduler. Ensure minimal latency when there're enough positions to feed the GPUs. Compiles. Pending debug. * Seems working now. * Fixes. * Worker thread = search thread. * Tweak conversion script for ELF v2. Small tweak to conversion script for ELF v2 weights. Pull request leela-zero#2213. * Bugfix: accumulated virtual loss removal. * Work around inexplicable reported bug. * Endgame/Double-pass bugfix. * Fix some cv race conditions. * Update OpenCL.h * Correctly initialize board when reading SGF. Even though SGF defaults to size 19 boards, we should not try to set up a board that size if LZ has not been compiled to support it. Pull request leela-zero#1964. * Increase memory limit for 32-bit builds. Without this, it's empirically not possible to load the current 256x40 networks on a 32-bit machine. * Never select a CPU during OpenCL autodetection. If we are trying to auto-select the best device for OpenCL, never select a CPU. This will cause the engine to refuse to run when people are trying to run the OpenCL version without a GPU or without GPU drivers, instead of selecting any slow and suboptimal (and empirically extremely broken) OpenCL-on-CPU drivers. Falling back to CPU-only would be another reasonable alternative, but doesn't provide an alert in case the GPU drivers are missing. Improves behavior of issue leela-zero#1994. * Fix tuner for heterogeneous GPUs and auto precision. Fix full tuner for heterogeneous GPUs and auto precision detection. --full-tuner implies --tune-only --full-tuner requires an explicit precision Fixes leela-zero#1973. Pull request leela-zero#2004. * Optimized out and out_in kernels. Very minor speedup of about 2% with batch size of 1. With batch size of 5 there is a speedup of about 5% with half precision and 12% with single precision. Out transformation memory accesses are almost completely coalesced with the new kernel. Pull request leela-zero#2014. * Update OpenCL C++ headers. From upstream a807dcf0f8623d40dc5ce9d1eb00ffd0e46150c7. * CPU-only eval performance optimization. * CPUPipe : change winograd transformation constants to an equation. Combined with a series of strength reduction changes, improves netbench by about 8%. * Convert some std::array into individual variables For some reason this allows gcc to optimize the code better, improving netbench by 2%. Pull request leela-zero#2021. * Convolve in/out performance optimization. Use hard-coded equations instead of matrix multiplication. Pull request leela-zero#2023. * Validation: fix -k option. Fix Validation -k option by reading its value before the parser is reused. Pull request leela-zero#2024. * Add link to Azure free trial instructions. See pull request leela-zero#2031. * Cleanup loop code. Pull request leela-zero#2033. * Cleanup atomics and dead if. Pull request leela-zero#2034. * Const in SGFTree. Pull request leela-zero#2035. * Make the README more clear. Simplify instructions, especially related to building and running when wanting to contribute. Based on pull request leela-zero#1983. * Refactor to allow AutoGTP to use Engine. * Move Engine to Game.h and refactor autogtp to use it too. * Fix initialization of job engines. Pull request leela-zero#2029. * Fix printf call style. Generally speaking, providing character pointers as the first argument directly might cause FSB (Format String Bug). Pull request leela-zero#2063. * Add O(sqrt(log(n))) scaling to tree search. Pull request leela-zero#2072. * Update Khronos OpenCL C++ headers. Update from upstream f0b7045. Fixes warnings related to CL_TARGET_OPENCL_VERSION. * AutoGTP: allow specifying an SGF as initial position. * Make AutoGTP URL parametric. * Support for the sgfhash and movescount parameters in get-task. * Automatic downloading of sgf and training files. * Fix Management.cpp for older Qt5 versions. * Added starting match games from specified initial position * Tidy ValidationJob::init() like ProductionJob::init() * Use existing QUuid method of generating random file names instead of QTemporaryFile when fetching game data. Moreover, we do not load training data in LeelaZ since it is not needed to start from an arbitrary position. Pull request leela-zero#2052. * Support separate options for white in match games. * Add optional separate options for white in match game. * Fixed loading of saved match order with optionsSecond. Pull request leela-zero#2078. * Option to get network output without writing to cache. Pull request leela-zero#2093. * Add permission to link with NVIDIA libs. Update year. See issue leela-zero#2032. All contributors to the core engine have given their permission to add an additional permission to link with NVIDIA's CUDA/cuDNN/TensorRT libraries. This makes it possible to distribute the engine when built to use those libraries. Update the copyright notices to 2019. * Add link to GoReviewPartner. Pull request leela-zero#2147. * Reminder to install OpenCL driver if seperate. Although the OpenCL driver is generally installed as part of the driver install, mention the requirement explicitly in case it wasn't. See pull request leela-zero#2138. * Fixed leelaz_file on Android. Pull request leela-zero#2135. * Fix 'catching polymorphic type by value' warning. Pull request leela-zero#2134. * Fixed converter script for minigo removing bias. Fixes leela-zero#2020. Pull request leela-zero#2133. * Add zlib to the mac OS X build instructions. See pull request leela-zero#2122. * UCTNodePtr rare race condition fix. Calling get_eval() on zero-visit node will assert-fail. The original code could assert-fail on b.get_eval() if 'a' and 'b' both had zero visits but suddenly 'a' gained an additional visit. Pull request leela-zero#2110. * Make sure analysis is printed at least once. Fixes issue leela-zero#2001. Pull request leela-zero#2114. * Don't post if not requested. Follow up fix for pull request leela-zero#2114. * AutoGTP: Allow specifying initial GTP commands. * AutoGTP: Allow specifying initial GTP commands. Also add support for white taking the first move in handicapped job games. * AutoGTP: Refactored core loop for match games to avoid code duplication. * Fixed white using black's match game settings after loading from an SGF by moving SGF loading into Game::gameStart() to before sending GTP commands (except handicap commands). * Changed so that when an SGF file is loaded, AutoGTP determines whether handicap is in use from the SGF rather than from any starting GTP commands. Pull request leela-zero#2096. * Update Eigen to 3.3.7. This includes some optimization improvements for newer GCC/Clang that may be relevant to a lot of our users. Pull request leela-zero#2151. * Fix lz-setoption name playouts. Fixes issue leela-zero#2167. I could swear I fixed this before. Maybe I forgot to push? * AutoGTP: More info in SGF comments. * AutoGTP: Added full engine options and starting GTP commands to SGF comments that are produced. * Refactored Game::fixSgf(). Pull request leela-zero#2160. * Truncate and compress minigo weights. Truncate to 4 precision and compress converted minigo weights. Pull request leela-zero#2173. * Add gomill-explain_last_move. Add gomill-explain_last_move for additional output in ringmaster competitions. Pull request leela-zero#2174. * Add a feature to exclude moves from the search. * The "avoid" command is now a param for lz-analyze and for lz-genmove_analyze. New syntax is: `lz-analyze ARGS [avoid <color> <coords> <number_of_moves>] [avoid ...]` `lz-genmove_analyze ARGS [avoid <color> <coords> <number_of_moves>] [avoid ...]` The number_of_moves is now always relative to the current move number. Example: `lz-analyze b 200 avoid b q16 1 avoid b q4 1 avoid b d16 1 avoid b d4 1` * Re-organize the parser for the "analyze" commands. * New tag "interval"; old syntax "100" is now short for "interval 100" * Tags can be specified in any arbitrary order * Moved all of the parsing code for "lz-anaylze" and "lz-genmove_analyze" into the parse_analyze_tags function * parse_analyze_tags uses its return value instead of side effects * Implement the "allow" tag for lz-analyze. It works similar to "avoid". Adding moves to the "allow" list is the same as adding all other moves (except pass and resign) to the "avoid" list. * "Avoid" and "allow" moves can be specified as a comma-separated list. Example: `lz-analyze b 100 avoid w q4,q16,d4,d16 2 avoid b pass 50` Pull request leela-zero#1949. * Removed --cpu-only option from USE_CPU_ONLY build. Generalized output displayed in cases where potentially referring to a CPU instead of or as well as a GPU. Pull request leela-zero#2161. * Tensor Core support with PTX inline assembly. * Tensor core support for half precision * hgemm : Added m16n16k16/m32n8k16/m8n32k16 tuning Tuner will see which shaped multiplication is fastest. MDIMA represents the M dimension, NDIMB represents the N dimension. * tensorcore : Test m16n16k16 typs only for checking tensorcore availability It seems that there are cases where only m16n16k16 is supported. If other formats are not available they will be auto-disabled on tuning. Pull request leela-zero#2049. * Update TODO list. We support avoid tags now. Clarify batching work needs changes in the search. * Remove an unnecessary std::move(). Which inhibits RVO. See e.g. https://stackoverflow.com/a/19272035 * Add contributor (and maintainer) guidelines. * Add contributor (and maintainer) guidelines. Spell out the existing code style, C++ usage, git workflow, commit message requirements, and give guidelines regarding reviewing, merging and adding configuration options and GTP extensions. Pull request leela-zero#2186. * Add several simple GTP commands. Added several simple GTP commands useful for building interfaces to LZ. Added the following GTP commands. last_move move_history The output of these commands is in line with that of the corresponding commands in GNU Go when such commands existed. Pull request leela-zero#2170. * Minor style fixups. Minor fixups for pull request leela-zero#2170. * Remark about move assignment in style guideline. Emphasize use of emplace_back and move semantics. * Add lz-analyze minmoves tag. Add an lz-analyze tag to suggest the minimum amount of moves the engine should post info about (rather than only those it considers interesting, i.e. the ones with at least a visit). This allows some very flexible constructs: Getting a heatmap: lz-setoption name visits value 1 lz-analyze interval 1 minmoves 361 Forcing a move among the top policy moves only: lz-setoption name visits value 1 lz-analyze interval 1 minmoves 2 (store those moves, e.g. A1, B1) lz-setoption name visits value 0 lz-genmove_analyze b interval 1 allow b A1 1 allow b B1 1 * Fix style, extra spaces in PV output. Adding the minmoves tag exposes a small bug in the PV output formatting. Avoid extra blank spaces. Small style fixups. * Rework test regex for MSVC limits. Seems like the previous test regex is causing MSVC's regex engine to run out of stack space. * .gitignore: Add build. leela-zero's default build directory is `build`. It is very annoying when using leela as a git submodule that the repository updates whenever it builds. Pull request leela-zero#2199. * Batched neural net evaluations Group evaluations and run them in parallel. Roughly 50% speedup on my setup, but there are a couple of points that is debatable. - Thread / batch sizing heuristics : This PR changes how the default threads / default batch sizes are picked. See Leela.cpp - Batch-forming heuristic : See OpenCLScheduler.cpp for the batch forming heuristic : the heuristic exists so that we can wait for the rest of the engine to create more NN evaluations so that we can run larger batches. We can't wait indefinitely since there are cases we enter 'serial' paths. Since heuristics are heuristics, these might need some tests on a larger variety of types of systems. Did make sure that winrate improves when running default vs. default command line `./leelaz -w (weight file)` on time parity. Pull request leela-zero#2188. * Autogtp: Tune for batchsize 1 Self-play games specify `-t 1` for playing which implies batch size of 1, but tuning was done for default settings since number of threads was not specified. Pull request leela-zero#2206 * Update README.md. Update links to leela-zero instead of gcp. Update badge and link to the new AppVeyor project under leela-zero instead of gcp ownership. * Remove unused lambda capture. Pull request leela-zero#2231. * README.md: link to mentioned pull requests. Pull request leela-zero#2229. * Minor cleanup involving Network::get_output. Pull request leela-zero#2228. * Set up default batch size and threads. Fixes issue leela-zero#2214. Pull request leela-zero#2256. * Shuffle tuner parameters to find good parameters quicker. Parameters are searched in a linear fashion currently. By shuffling them, we will find a good instance more quickly. Also, shuffing could help reduce possible bias due to grouped, similar parameters that affect the environment (e.g. cache, branch predictor, ...), leading to more accurate/fair results. Additionally, this is a preparation for exiting the tuner during the search, which becomes a possible option. Pull request leela-zero#2225. * Refactor tree_stats_helper to lambda. Pull request leela-zero#2244. * Enable batching for self-play. Pull request leela-zero#2253. * Allow configuring default komi at compile-time. Pull request leela-zero#2257. * Make chunkparser more robust. Some clients are sending corrupted data, make the chunk parser resilient against it. * Fix thread count error message. Pull request leela-zero#2287. * Fix small style nits. * Add support for time controls in loadsgf/printsgf. Added extra support for "TM" and "OT" and other sgf time control properties on printsgf and loadsgf GTP commands. * Added parsing and loading of "TM" and "OT" sgf properties on GTP command loadsgf. Only supports "OT" syntax matching output from a printsgf GTP command. * Change SGFTree to have a shared_ptr for a time control. * Added saving and loading of "BL", "WL", "OB" and "OW" sgf properties on GTP commands printsgf and loadsgf. * Change to make TimeControl::make_from_text_sgf() a time control factory and other minor tidying. Pull request leela-zero#2172. * Fix inconsistent default timecontrol. As noted in pull request leela-zero#2172, the default constructor set byo yomi stones but no time or periods. * Error out if weights are for wrong board size. We currently will either crash or do strange things if we're fed a weights file that doesn't match the board size we're compiled for. See issue leela-zero#2289. * Ignore passing moves unless they make sense. Only pass when winning or low on legal moves. Disabled in self-play. Fixes issue leela-zero#2273. Based on pull request leela-zero#2277. Pull request leela-zero#2301. * Always allow passing when low on moves. As pointed out by @gjm11 in leela-zero#2277, when there's few legal moves we might want to allow passing even if this loses on the board count. The alternative might be to self-destruct large groups and carry the game on endlessely even if the policy wouldn't want to. No difference in "dumbpass" mode. * Report root visits in gomill-explain_last_move. See issue leela-zero#2280. Pull request leela-zero#2302. * Choose move based on normal distribution LCB. * Calculate node variance. * Use normal distribution LCB to choose the played move. * Cached student-t. * Sort lz-analyze output according to LCB. * Don't choose nodes with very few visits even if LCB is better. Guard against NN misevaluations when top move has lot of visits. Without this it's possible for move with few hundred visits to be picked over a move with over ten thousand visits. The problem is that the evaluation distribution isn't really normal distribution. Evaluations correlate and the distribution can change if deeper in the tree it finds a better alternative. Pull request leela-zero#2290. * Mixed precision training support. * Add mixed precision training support. * Do not use loss scale if training with fp32 * Fix potential reg_term overflow of large networks. Pull request leela-zero#2191. * Update AUTHORS. * Don't detect precision with Tensor Cores. Don't autodetect or default to fp32 when all cards have Tensor Cores. We will assume fp16 is the fastest. This avoids problems in tune-only mode which does not detect the precision to use and would use fp32 on such cards. Pull request leela-zero#2312. * Update README.md. We have a first implementation of batching now. * Ignore --batchsize in CPU only compiles. AutoGTP will always send --batchsize, but CPU only compiles don't support the option. Ignore the option in those builds. The same problem exists with --tune-only, but quitting immediately happens to be sane behavior so we don't need to fix that. Pull request leela-zero#2313. * Don't include OpenCL scheduler in CPU build. It will recursively include OpenCL.h and that is bad. Pull request leela-zero#2314. * Bump version numbers. * Fix: batch sizes were not set according to command line.

* Calculate node variance. * Use normal distribution LCB to choose the played move. * Cached student-t. * Sort lz-analyze output according to LCB. * Don't choose nodes with very few visits even if LCB is better. Guard against NN misevaluations when top move has lot of visits. Without this it's possible for move with few hundred visits to be picked over a move with over ten thousand visits. The problem is that the evaluation distribution isn't really normal distribution. Evaluations correlate and the distribution can change if deeper in the tree it finds a better alternative. Pull request leela-zero#2290.

@gjm11

* Correctly initialize board when reading SGF. Even though SGF defaults to size 19 boards, we should not try to set up a board that size if LZ has not been compiled to support it. Pull request #1964. * Increase memory limit for 32-bit builds. Without this, it's empirically not possible to load the current 256x40 networks on a 32-bit machine. * Never select a CPU during OpenCL autodetection. If we are trying to auto-select the best device for OpenCL, never select a CPU. This will cause the engine to refuse to run when people are trying to run the OpenCL version without a GPU or without GPU drivers, instead of selecting any slow and suboptimal (and empirically extremely broken) OpenCL-on-CPU drivers. Falling back to CPU-only would be another reasonable alternative, but doesn't provide an alert in case the GPU drivers are missing. Improves behavior of issue #1994. * Fix tuner for heterogeneous GPUs and auto precision. Fix full tuner for heterogeneous GPUs and auto precision detection. --full-tuner implies --tune-only --full-tuner requires an explicit precision Fixes #1973. Pull request #2004. * Optimized out and out_in kernels. Very minor speedup of about 2% with batch size of 1. With batch size of 5 there is a speedup of about 5% with half precision and 12% with single precision. Out transformation memory accesses are almost completely coalesced with the new kernel. Pull request #2014. * Update OpenCL C++ headers. From upstream a807dcf0f8623d40dc5ce9d1eb00ffd0e46150c7. * CPU-only eval performance optimization. * CPUPipe : change winograd transformation constants to an equation. Combined with a series of strength reduction changes, improves netbench by about 8%. * Convert some std::array into individual variables For some reason this allows gcc to optimize the code better, improving netbench by 2%. Pull request #2021. * Convolve in/out performance optimization. Use hard-coded equations instead of matrix multiplication. Pull request #2023. * Validation: fix -k option. Fix Validation -k option by reading its value before the parser is reused. Pull request #2024. * Add link to Azure free trial instructions. See pull request #2031. * Cleanup atomics and dead if. Pull request #2034. * Const in SGFTree. Pull request #2035. * Make the README more clear. Simplify instructions, especially related to building and running when wanting to contribute. Based on pull request #1983. * Refactor to allow AutoGTP to use Engine. * Move Engine to Game.h and refactor autogtp to use it too. * Fix initialization of job engines. Pull request #2029. * Fix printf call style. Generally speaking, providing character pointers as the first argument directly might cause FSB (Format String Bug). Pull request #2063. * Update Khronos OpenCL C++ headers. Update from upstream f0b7045. Fixes warnings related to CL_TARGET_OPENCL_VERSION. * Cleanup loop code. Pull request #2033. * AutoGTP: allow specifying an SGF as initial position. * Make AutoGTP URL parametric. * Support for the sgfhash and movescount parameters in get-task. * Automatic downloading of sgf and training files. * Fix Management.cpp for older Qt5 versions. * Added starting match games from specified initial position * Tidy ValidationJob::init() like ProductionJob::init() * Use existing QUuid method of generating random file names instead of QTemporaryFile when fetching game data. Moreover, we do not load training data in LeelaZ since it is not needed to start from an arbitrary position. Pull request #2052. * Support separate options for white in match games. * Add optional separate options for white in match game. * Fixed loading of saved match order with optionsSecond. Pull request #2078. * Add O(sqrt(log(n))) scaling to tree search. Pull request #2072. * Option to get network output without writing to cache. Pull request #2093. * Add permission to link with NVIDIA libs. Update year. See issue #2032. All contributors to the core engine have given their permission to add an additional permission to link with NVIDIA's CUDA/cuDNN/TensorRT libraries. This makes it possible to distribute the engine when built to use those libraries. Update the copyright notices to 2019. * Add link to GoReviewPartner. Pull request #2147. * Reminder to install OpenCL driver if seperate. Although the OpenCL driver is generally installed as part of the driver install, mention the requirement explicitly in case it wasn't. See pull request #2138. * Fixed leelaz_file on Android. Pull request #2135. * Fix 'catching polymorphic type by value' warning. Pull request #2134. * Fixed converter script for minigo removing bias. Fixes #2020. Pull request #2133. * Add zlib to the mac OS X build instructions. See pull request #2122. * UCTNodePtr rare race condition fix. Calling get_eval() on zero-visit node will assert-fail. The original code could assert-fail on b.get_eval() if 'a' and 'b' both had zero visits but suddenly 'a' gained an additional visit. Pull request #2110. * Make sure analysis is printed at least once. Fixes issue #2001. Pull request #2114. * Don't post if not requested. Follow up fix for pull request #2114. * AutoGTP: Allow specifying initial GTP commands. * AutoGTP: Allow specifying initial GTP commands. Also add support for white taking the first move in handicapped job games. * AutoGTP: Refactored core loop for match games to avoid code duplication. * Fixed white using black's match game settings after loading from an SGF by moving SGF loading into Game::gameStart() to before sending GTP commands (except handicap commands). * Changed so that when an SGF file is loaded, AutoGTP determines whether handicap is in use from the SGF rather than from any starting GTP commands. Pull request #2096. * Update Eigen to 3.3.7. This includes some optimization improvements for newer GCC/Clang that may be relevant to a lot of our users. Pull request #2151. * Fix lz-setoption name playouts. Fixes issue #2167. I could swear I fixed this before. Maybe I forgot to push? * AutoGTP: More info in SGF comments. * AutoGTP: Added full engine options and starting GTP commands to SGF comments that are produced. * Refactored Game::fixSgf(). Pull request #2160. * Truncate and compress minigo weights. Truncate to 4 precision and compress converted minigo weights. Pull request #2173. * Add gomill-explain_last_move. Add gomill-explain_last_move for additional output in ringmaster competitions. Pull request #2174. * Add a feature to exclude moves from the search. * The "avoid" command is now a param for lz-analyze and for lz-genmove_analyze. New syntax is: `lz-analyze ARGS [avoid <color> <coords> <number_of_moves>] [avoid ...]` `lz-genmove_analyze ARGS [avoid <color> <coords> <number_of_moves>] [avoid ...]` The number_of_moves is now always relative to the current move number. Example: `lz-analyze b 200 avoid b q16 1 avoid b q4 1 avoid b d16 1 avoid b d4 1` * Re-organize the parser for the "analyze" commands. * New tag "interval"; old syntax "100" is now short for "interval 100" * Tags can be specified in any arbitrary order * Moved all of the parsing code for "lz-anaylze" and "lz-genmove_analyze" into the parse_analyze_tags function * parse_analyze_tags uses its return value instead of side effects * Implement the "allow" tag for lz-analyze. It works similar to "avoid". Adding moves to the "allow" list is the same as adding all other moves (except pass and resign) to the "avoid" list. * "Avoid" and "allow" moves can be specified as a comma-separated list. Example: `lz-analyze b 100 avoid w q4,q16,d4,d16 2 avoid b pass 50` Pull request #1949. * Removed --cpu-only option from USE_CPU_ONLY build. Generalized output displayed in cases where potentially referring to a CPU instead of or as well as a GPU. Pull request #2161. * Tensor Core support with PTX inline assembly. * Tensor core support for half precision * hgemm : Added m16n16k16/m32n8k16/m8n32k16 tuning Tuner will see which shaped multiplication is fastest. MDIMA represents the M dimension, NDIMB represents the N dimension. * tensorcore : Test m16n16k16 typs only for checking tensorcore availability It seems that there are cases where only m16n16k16 is supported. If other formats are not available they will be auto-disabled on tuning. Pull request #2049. * Update TODO list. We support avoid tags now. Clarify batching work needs changes in the search. * Remove an unnecessary std::move(). Which inhibits RVO. See e.g. https://stackoverflow.com/a/19272035 * Add contributor (and maintainer) guidelines. * Add contributor (and maintainer) guidelines. Spell out the existing code style, C++ usage, git workflow, commit message requirements, and give guidelines regarding reviewing, merging and adding configuration options and GTP extensions. Pull request #2186. * Add several simple GTP commands. Added several simple GTP commands useful for building interfaces to LZ. Added the following GTP commands. last_move move_history The output of these commands is in line with that of the corresponding commands in GNU Go when such commands existed. Pull request #2170. * Minor style fixups. Minor fixups for pull request #2170. * Remark about move assignment in style guideline. Emphasize use of emplace_back and move semantics. * Add lz-analyze minmoves tag. Add an lz-analyze tag to suggest the minimum amount of moves the engine should post info about (rather than only those it considers interesting, i.e. the ones with at least a visit). This allows some very flexible constructs: Getting a heatmap: lz-setoption name visits value 1 lz-analyze interval 1 minmoves 361 Forcing a move among the top policy moves only: lz-setoption name visits value 1 lz-analyze interval 1 minmoves 2 (store those moves, e.g. A1, B1) lz-setoption name visits value 0 lz-genmove_analyze b interval 1 allow b A1 1 allow b B1 1 * Fix style, extra spaces in PV output. Adding the minmoves tag exposes a small bug in the PV output formatting. Avoid extra blank spaces. Small style fixups. * Rework test regex for MSVC limits. Seems like the previous test regex is causing MSVC's regex engine to run out of stack space. * .gitignore: Add build. leela-zero's default build directory is `build`. It is very annoying when using leela as a git submodule that the repository updates whenever it builds. Pull request #2199. * Batched neural net evaluations Group evaluations and run them in parallel. Roughly 50% speedup on my setup, but there are a couple of points that is debatable. - Thread / batch sizing heuristics : This PR changes how the default threads / default batch sizes are picked. See Leela.cpp - Batch-forming heuristic : See OpenCLScheduler.cpp for the batch forming heuristic : the heuristic exists so that we can wait for the rest of the engine to create more NN evaluations so that we can run larger batches. We can't wait indefinitely since there are cases we enter 'serial' paths. Since heuristics are heuristics, these might need some tests on a larger variety of types of systems. Did make sure that winrate improves when running default vs. default command line `./leelaz -w (weight file)` on time parity. Pull request #2188. * Autogtp: Tune for batchsize 1 Self-play games specify `-t 1` for playing which implies batch size of 1, but tuning was done for default settings since number of threads was not specified. Pull request #2206 * Tweak conversion script for ELF v2. Small tweak to conversion script for ELF v2 weights. Pull request #2213. * Update README.md Update links to leela-zero instead of gcp. * Update README.md Appveyor link still needs to be 'gcp'. * Update README.md Update badge and link to the new AppVeyor project under leela-zero instead of gcp ownership. * Update README.md. Update links to leela-zero instead of gcp. Update badge and link to the new AppVeyor project under leela-zero instead of gcp ownership. * Remove unused lambda capture. Pull request #2231. * README.md: link to mentioned pull requests. Pull request #2229. * Minor cleanup involving Network::get_output. Pull request #2228. * Set up default batch size and threads. Fixes issue #2214. Pull request #2256. * Shuffle tuner parameters to find good parameters quicker. Parameters are searched in a linear fashion currently. By shuffling them, we will find a good instance more quickly. Also, shuffing could help reduce possible bias due to grouped, similar parameters that affect the environment (e.g. cache, branch predictor, ...), leading to more accurate/fair results. Additionally, this is a preparation for exiting the tuner during the search, which becomes a possible option. Pull request #2225. * Refactor tree_stats_helper to lambda. Pull request #2244. * Enable batching for self-play. Pull request #2253. * Allow configuring default komi at compile-time. Pull request #2257. * Update README.md Update links to leela-zero instead of gcp. * Make chunkparser more robust. Some clients are sending corrupted data, make the chunk parser resilient against it. * Fix thread count error message. Pull request #2287. * Fix small style nits. * Add support for time controls in loadsgf/printsgf. Added extra support for "TM" and "OT" and other sgf time control properties on printsgf and loadsgf GTP commands. * Added parsing and loading of "TM" and "OT" sgf properties on GTP command loadsgf. Only supports "OT" syntax matching output from a printsgf GTP command. * Change SGFTree to have a shared_ptr for a time control. * Added saving and loading of "BL", "WL", "OB" and "OW" sgf properties on GTP commands printsgf and loadsgf. * Change to make TimeControl::make_from_text_sgf() a time control factory and other minor tidying. Pull request #2172. * Fix inconsistent default timecontrol. As noted in pull request #2172, the default constructor set byo yomi stones but no time or periods. * Error out if weights are for wrong board size. We currently will either crash or do strange things if we're fed a weights file that doesn't match the board size we're compiled for. See issue #2289. * Ignore passing moves unless they make sense. Only pass when winning or low on legal moves. Disabled in self-play. Fixes issue #2273. Based on pull request #2277. Pull request #2301. * Always allow passing when low on moves. As pointed out by @gjm11 in #2277, when there's few legal moves we might want to allow passing even if this loses on the board count. The alternative might be to self-destruct large groups and carry the game on endlessely even if the policy wouldn't want to. No difference in "dumbpass" mode. * Report root visits in gomill-explain_last_move. See issue #2280. Pull request #2302. * Choose move based on normal distribution LCB. * Calculate node variance. * Use normal distribution LCB to choose the played move. * Cached student-t. * Sort lz-analyze output according to LCB. * Don't choose nodes with very few visits even if LCB is better. Guard against NN misevaluations when top move has lot of visits. Without this it's possible for move with few hundred visits to be picked over a move with over ten thousand visits. The problem is that the evaluation distribution isn't really normal distribution. Evaluations correlate and the distribution can change if deeper in the tree it finds a better alternative. Pull request #2290. * Mixed precision training support. * Add mixed precision training support. * Do not use loss scale if training with fp32 * Fix potential reg_term overflow of large networks. Pull request #2191. * Update AUTHORS. * Don't detect precision with Tensor Cores. Don't autodetect or default to fp32 when all cards have Tensor Cores. We will assume fp16 is the fastest. This avoids problems in tune-only mode which does not detect the precision to use and would use fp32 on such cards. Pull request #2312. * Update README.md. We have a first implementation of batching now. * Ignore --batchsize in CPU only compiles. AutoGTP will always send --batchsize, but CPU only compiles don't support the option. Ignore the option in those builds. The same problem exists with --tune-only, but quitting immediately happens to be sane behavior so we don't need to fix that. Pull request #2313. * Don't include OpenCL scheduler in CPU build. It will recursively include OpenCL.h and that is bad. Pull request #2314. * Bump version numbers. * Address git hub security alert * Match upstream

roy7 · 2019-11-30T13:57:43Z

@Ttl Is https://github.com/leela-zero/leela-zero/blob/next/src/UCTNodePointer.cpp#L142 a bug that the is_inflated check is different than every other function in here?

I actually "fixed" that in the thompson_sampling branch but I'm not 100% certain it was a bug to fix. #2352

Ttl · 2019-11-30T14:14:11Z

is_inflated() is equal to is_inflated(m_data.load()) so it's functionally the same. It could cause the m_data.load() to be called twice though.

@gjm11

* Correctly initialize board when reading SGF. Even though SGF defaults to size 19 boards, we should not try to set up a board that size if LZ has not been compiled to support it. Pull request leela-zero#1964. * Increase memory limit for 32-bit builds. Without this, it's empirically not possible to load the current 256x40 networks on a 32-bit machine. * Never select a CPU during OpenCL autodetection. If we are trying to auto-select the best device for OpenCL, never select a CPU. This will cause the engine to refuse to run when people are trying to run the OpenCL version without a GPU or without GPU drivers, instead of selecting any slow and suboptimal (and empirically extremely broken) OpenCL-on-CPU drivers. Falling back to CPU-only would be another reasonable alternative, but doesn't provide an alert in case the GPU drivers are missing. Improves behavior of issue leela-zero#1994. * Fix tuner for heterogeneous GPUs and auto precision. Fix full tuner for heterogeneous GPUs and auto precision detection. --full-tuner implies --tune-only --full-tuner requires an explicit precision Fixes leela-zero#1973. Pull request leela-zero#2004. * Optimized out and out_in kernels. Very minor speedup of about 2% with batch size of 1. With batch size of 5 there is a speedup of about 5% with half precision and 12% with single precision. Out transformation memory accesses are almost completely coalesced with the new kernel. Pull request leela-zero#2014. * Update OpenCL C++ headers. From upstream a807dcf0f8623d40dc5ce9d1eb00ffd0e46150c7. * CPU-only eval performance optimization. * CPUPipe : change winograd transformation constants to an equation. Combined with a series of strength reduction changes, improves netbench by about 8%. * Convert some std::array into individual variables For some reason this allows gcc to optimize the code better, improving netbench by 2%. Pull request leela-zero#2021. * Convolve in/out performance optimization. Use hard-coded equations instead of matrix multiplication. Pull request leela-zero#2023. * Validation: fix -k option. Fix Validation -k option by reading its value before the parser is reused. Pull request leela-zero#2024. * Add link to Azure free trial instructions. See pull request leela-zero#2031. * Cleanup atomics and dead if. Pull request leela-zero#2034. * Const in SGFTree. Pull request leela-zero#2035. * Make the README more clear. Simplify instructions, especially related to building and running when wanting to contribute. Based on pull request leela-zero#1983. * Refactor to allow AutoGTP to use Engine. * Move Engine to Game.h and refactor autogtp to use it too. * Fix initialization of job engines. Pull request leela-zero#2029. * Fix printf call style. Generally speaking, providing character pointers as the first argument directly might cause FSB (Format String Bug). Pull request leela-zero#2063. * Update Khronos OpenCL C++ headers. Update from upstream f0b7045. Fixes warnings related to CL_TARGET_OPENCL_VERSION. * Cleanup loop code. Pull request leela-zero#2033. * AutoGTP: allow specifying an SGF as initial position. * Make AutoGTP URL parametric. * Support for the sgfhash and movescount parameters in get-task. * Automatic downloading of sgf and training files. * Fix Management.cpp for older Qt5 versions. * Added starting match games from specified initial position * Tidy ValidationJob::init() like ProductionJob::init() * Use existing QUuid method of generating random file names instead of QTemporaryFile when fetching game data. Moreover, we do not load training data in LeelaZ since it is not needed to start from an arbitrary position. Pull request leela-zero#2052. * Support separate options for white in match games. * Add optional separate options for white in match game. * Fixed loading of saved match order with optionsSecond. Pull request leela-zero#2078. * Add O(sqrt(log(n))) scaling to tree search. Pull request leela-zero#2072. * Option to get network output without writing to cache. Pull request leela-zero#2093. * Add permission to link with NVIDIA libs. Update year. See issue leela-zero#2032. All contributors to the core engine have given their permission to add an additional permission to link with NVIDIA's CUDA/cuDNN/TensorRT libraries. This makes it possible to distribute the engine when built to use those libraries. Update the copyright notices to 2019. * Add link to GoReviewPartner. Pull request leela-zero#2147. * Reminder to install OpenCL driver if seperate. Although the OpenCL driver is generally installed as part of the driver install, mention the requirement explicitly in case it wasn't. See pull request leela-zero#2138. * Fixed leelaz_file on Android. Pull request leela-zero#2135. * Fix 'catching polymorphic type by value' warning. Pull request leela-zero#2134. * Fixed converter script for minigo removing bias. Fixes leela-zero#2020. Pull request leela-zero#2133. * Add zlib to the mac OS X build instructions. See pull request leela-zero#2122. * UCTNodePtr rare race condition fix. Calling get_eval() on zero-visit node will assert-fail. The original code could assert-fail on b.get_eval() if 'a' and 'b' both had zero visits but suddenly 'a' gained an additional visit. Pull request leela-zero#2110. * Make sure analysis is printed at least once. Fixes issue leela-zero#2001. Pull request leela-zero#2114. * Don't post if not requested. Follow up fix for pull request leela-zero#2114. * AutoGTP: Allow specifying initial GTP commands. * AutoGTP: Allow specifying initial GTP commands. Also add support for white taking the first move in handicapped job games. * AutoGTP: Refactored core loop for match games to avoid code duplication. * Fixed white using black's match game settings after loading from an SGF by moving SGF loading into Game::gameStart() to before sending GTP commands (except handicap commands). * Changed so that when an SGF file is loaded, AutoGTP determines whether handicap is in use from the SGF rather than from any starting GTP commands. Pull request leela-zero#2096. * Update Eigen to 3.3.7. This includes some optimization improvements for newer GCC/Clang that may be relevant to a lot of our users. Pull request leela-zero#2151. * Fix lz-setoption name playouts. Fixes issue leela-zero#2167. I could swear I fixed this before. Maybe I forgot to push? * AutoGTP: More info in SGF comments. * AutoGTP: Added full engine options and starting GTP commands to SGF comments that are produced. * Refactored Game::fixSgf(). Pull request leela-zero#2160. * Truncate and compress minigo weights. Truncate to 4 precision and compress converted minigo weights. Pull request leela-zero#2173. * Add gomill-explain_last_move. Add gomill-explain_last_move for additional output in ringmaster competitions. Pull request leela-zero#2174. * Add a feature to exclude moves from the search. * The "avoid" command is now a param for lz-analyze and for lz-genmove_analyze. New syntax is: `lz-analyze ARGS [avoid <color> <coords> <number_of_moves>] [avoid ...]` `lz-genmove_analyze ARGS [avoid <color> <coords> <number_of_moves>] [avoid ...]` The number_of_moves is now always relative to the current move number. Example: `lz-analyze b 200 avoid b q16 1 avoid b q4 1 avoid b d16 1 avoid b d4 1` * Re-organize the parser for the "analyze" commands. * New tag "interval"; old syntax "100" is now short for "interval 100" * Tags can be specified in any arbitrary order * Moved all of the parsing code for "lz-anaylze" and "lz-genmove_analyze" into the parse_analyze_tags function * parse_analyze_tags uses its return value instead of side effects * Implement the "allow" tag for lz-analyze. It works similar to "avoid". Adding moves to the "allow" list is the same as adding all other moves (except pass and resign) to the "avoid" list. * "Avoid" and "allow" moves can be specified as a comma-separated list. Example: `lz-analyze b 100 avoid w q4,q16,d4,d16 2 avoid b pass 50` Pull request leela-zero#1949. * Removed --cpu-only option from USE_CPU_ONLY build. Generalized output displayed in cases where potentially referring to a CPU instead of or as well as a GPU. Pull request leela-zero#2161. * Tensor Core support with PTX inline assembly. * Tensor core support for half precision * hgemm : Added m16n16k16/m32n8k16/m8n32k16 tuning Tuner will see which shaped multiplication is fastest. MDIMA represents the M dimension, NDIMB represents the N dimension. * tensorcore : Test m16n16k16 typs only for checking tensorcore availability It seems that there are cases where only m16n16k16 is supported. If other formats are not available they will be auto-disabled on tuning. Pull request leela-zero#2049. * Update TODO list. We support avoid tags now. Clarify batching work needs changes in the search. * Remove an unnecessary std::move(). Which inhibits RVO. See e.g. https://stackoverflow.com/a/19272035 * Add contributor (and maintainer) guidelines. * Add contributor (and maintainer) guidelines. Spell out the existing code style, C++ usage, git workflow, commit message requirements, and give guidelines regarding reviewing, merging and adding configuration options and GTP extensions. Pull request leela-zero#2186. * Add several simple GTP commands. Added several simple GTP commands useful for building interfaces to LZ. Added the following GTP commands. last_move move_history The output of these commands is in line with that of the corresponding commands in GNU Go when such commands existed. Pull request leela-zero#2170. * Minor style fixups. Minor fixups for pull request leela-zero#2170. * Remark about move assignment in style guideline. Emphasize use of emplace_back and move semantics. * Add lz-analyze minmoves tag. Add an lz-analyze tag to suggest the minimum amount of moves the engine should post info about (rather than only those it considers interesting, i.e. the ones with at least a visit). This allows some very flexible constructs: Getting a heatmap: lz-setoption name visits value 1 lz-analyze interval 1 minmoves 361 Forcing a move among the top policy moves only: lz-setoption name visits value 1 lz-analyze interval 1 minmoves 2 (store those moves, e.g. A1, B1) lz-setoption name visits value 0 lz-genmove_analyze b interval 1 allow b A1 1 allow b B1 1 * Fix style, extra spaces in PV output. Adding the minmoves tag exposes a small bug in the PV output formatting. Avoid extra blank spaces. Small style fixups. * Rework test regex for MSVC limits. Seems like the previous test regex is causing MSVC's regex engine to run out of stack space. * .gitignore: Add build. leela-zero's default build directory is `build`. It is very annoying when using leela as a git submodule that the repository updates whenever it builds. Pull request leela-zero#2199. * Batched neural net evaluations Group evaluations and run them in parallel. Roughly 50% speedup on my setup, but there are a couple of points that is debatable. - Thread / batch sizing heuristics : This PR changes how the default threads / default batch sizes are picked. See Leela.cpp - Batch-forming heuristic : See OpenCLScheduler.cpp for the batch forming heuristic : the heuristic exists so that we can wait for the rest of the engine to create more NN evaluations so that we can run larger batches. We can't wait indefinitely since there are cases we enter 'serial' paths. Since heuristics are heuristics, these might need some tests on a larger variety of types of systems. Did make sure that winrate improves when running default vs. default command line `./leelaz -w (weight file)` on time parity. Pull request leela-zero#2188. * Autogtp: Tune for batchsize 1 Self-play games specify `-t 1` for playing which implies batch size of 1, but tuning was done for default settings since number of threads was not specified. Pull request leela-zero#2206 * Tweak conversion script for ELF v2. Small tweak to conversion script for ELF v2 weights. Pull request leela-zero#2213. * Update README.md Update links to leela-zero instead of gcp. * Update README.md Appveyor link still needs to be 'gcp'. * Update README.md Update badge and link to the new AppVeyor project under leela-zero instead of gcp ownership. * Update README.md. Update links to leela-zero instead of gcp. Update badge and link to the new AppVeyor project under leela-zero instead of gcp ownership. * Remove unused lambda capture. Pull request leela-zero#2231. * README.md: link to mentioned pull requests. Pull request leela-zero#2229. * Minor cleanup involving Network::get_output. Pull request leela-zero#2228. * Set up default batch size and threads. Fixes issue leela-zero#2214. Pull request leela-zero#2256. * Shuffle tuner parameters to find good parameters quicker. Parameters are searched in a linear fashion currently. By shuffling them, we will find a good instance more quickly. Also, shuffing could help reduce possible bias due to grouped, similar parameters that affect the environment (e.g. cache, branch predictor, ...), leading to more accurate/fair results. Additionally, this is a preparation for exiting the tuner during the search, which becomes a possible option. Pull request leela-zero#2225. * Refactor tree_stats_helper to lambda. Pull request leela-zero#2244. * Enable batching for self-play. Pull request leela-zero#2253. * Allow configuring default komi at compile-time. Pull request leela-zero#2257. * Update README.md Update links to leela-zero instead of gcp. * Make chunkparser more robust. Some clients are sending corrupted data, make the chunk parser resilient against it. * Fix thread count error message. Pull request leela-zero#2287. * Fix small style nits. * Add support for time controls in loadsgf/printsgf. Added extra support for "TM" and "OT" and other sgf time control properties on printsgf and loadsgf GTP commands. * Added parsing and loading of "TM" and "OT" sgf properties on GTP command loadsgf. Only supports "OT" syntax matching output from a printsgf GTP command. * Change SGFTree to have a shared_ptr for a time control. * Added saving and loading of "BL", "WL", "OB" and "OW" sgf properties on GTP commands printsgf and loadsgf. * Change to make TimeControl::make_from_text_sgf() a time control factory and other minor tidying. Pull request leela-zero#2172. * Fix inconsistent default timecontrol. As noted in pull request leela-zero#2172, the default constructor set byo yomi stones but no time or periods. * Error out if weights are for wrong board size. We currently will either crash or do strange things if we're fed a weights file that doesn't match the board size we're compiled for. See issue leela-zero#2289. * Ignore passing moves unless they make sense. Only pass when winning or low on legal moves. Disabled in self-play. Fixes issue leela-zero#2273. Based on pull request leela-zero#2277. Pull request leela-zero#2301. * Always allow passing when low on moves. As pointed out by @gjm11 in leela-zero#2277, when there's few legal moves we might want to allow passing even if this loses on the board count. The alternative might be to self-destruct large groups and carry the game on endlessely even if the policy wouldn't want to. No difference in "dumbpass" mode. * Report root visits in gomill-explain_last_move. See issue leela-zero#2280. Pull request leela-zero#2302. * Choose move based on normal distribution LCB. * Calculate node variance. * Use normal distribution LCB to choose the played move. * Cached student-t. * Sort lz-analyze output according to LCB. * Don't choose nodes with very few visits even if LCB is better. Guard against NN misevaluations when top move has lot of visits. Without this it's possible for move with few hundred visits to be picked over a move with over ten thousand visits. The problem is that the evaluation distribution isn't really normal distribution. Evaluations correlate and the distribution can change if deeper in the tree it finds a better alternative. Pull request leela-zero#2290. * Mixed precision training support. * Add mixed precision training support. * Do not use loss scale if training with fp32 * Fix potential reg_term overflow of large networks. Pull request leela-zero#2191. * Update AUTHORS. * Don't detect precision with Tensor Cores. Don't autodetect or default to fp32 when all cards have Tensor Cores. We will assume fp16 is the fastest. This avoids problems in tune-only mode which does not detect the precision to use and would use fp32 on such cards. Pull request leela-zero#2312. * Update README.md. We have a first implementation of batching now. * Ignore --batchsize in CPU only compiles. AutoGTP will always send --batchsize, but CPU only compiles don't support the option. Ignore the option in those builds. The same problem exists with --tune-only, but quitting immediately happens to be sane behavior so we don't need to fix that. Pull request leela-zero#2313. * Don't include OpenCL scheduler in CPU build. It will recursively include OpenCL.h and that is bad. Pull request leela-zero#2314. * Bump version numbers. * Address git hub security alert * Match upstream

Ttl added 6 commits March 17, 2019 16:09

Calculate node variance

9b816d1

Use normal distribution lcb to choose the played move

cc68a41

Remove student-t approximation

bcf6812

Adjust confidence bound, pruning

5661a45

Cached student-t

17257e0

Remove value correlation parameter. No strength gain.

Code cleanup

ec611bf

This was referenced Mar 19, 2019

Fresh max_lcb_root experiments #2282

Open

Choose move at root using highest lower confidence bound instead of visit count #883

Closed

Hersmunch reviewed Mar 20, 2019

View reviewed changes

Ttl added 2 commits March 20, 2019 14:57

Address review comments

14fd741

Sort lz-analyze output according to lcb

d633514

OmnipotentEntity reviewed Mar 21, 2019

View reviewed changes

Fix incorrect node sorting when node had too few visits

ec45a9e

gcp force-pushed the next branch from 83770e5 to 6cfb1e0 Compare March 25, 2019 15:32

gcp reviewed Mar 25, 2019

View reviewed changes

Address review comments

c7478bf

gcp merged commit cfb93e5 into leela-zero:next Mar 26, 2019

Ttl mentioned this pull request Mar 27, 2019

Play move with the highest lower confidence bound LeelaChessZero/lc0#817

Closed

kaorahi mentioned this pull request Mar 28, 2019

Keep search results across undo #2305

Open

Mardak mentioned this pull request Mar 30, 2019

Also make parents Terminal if any move is a win or all moves are loss or draw. LeelaChessZero/lc0#822

Merged

roy7 mentioned this pull request Dec 2, 2019

Could anyone please offer the LCB algorithm in LZ 0.17? #2551

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pick chosen move based on normal distribution LCB #2290

Pick chosen move based on normal distribution LCB #2290

Ttl commented Mar 19, 2019 •

edited

l1t1 commented Mar 19, 2019

roy7 commented Mar 19, 2019

l1t1 commented Mar 19, 2019

OmnipotentEntity Mar 21, 2019

Ttl Mar 21, 2019

gcp Mar 25, 2019

gcp Mar 25, 2019

Ttl Mar 25, 2019

Ttl commented Mar 22, 2019

Friday9i commented Mar 22, 2019

langit7 commented Mar 23, 2019

iopq commented Mar 23, 2019

OmnipotentEntity commented Mar 24, 2019

iopq commented Mar 24, 2019 •

edited

Ttl commented Mar 25, 2019

gcp commented Mar 25, 2019

roy7 commented Mar 25, 2019

gcp Mar 25, 2019

gcp commented Mar 25, 2019 •

edited

gcp commented Mar 25, 2019

Ttl commented Mar 25, 2019

l1t1 commented Mar 26, 2019

iopq commented Mar 26, 2019

l1t1 commented Apr 1, 2019

roy7 commented Nov 30, 2019 •

edited

Ttl commented Nov 30, 2019

Pick chosen move based on normal distribution LCB #2290

Pick chosen move based on normal distribution LCB #2290

Conversation

Ttl commented Mar 19, 2019 • edited

l1t1 commented Mar 19, 2019

roy7 commented Mar 19, 2019

l1t1 commented Mar 19, 2019

OmnipotentEntity Mar 21, 2019

Choose a reason for hiding this comment

Ttl Mar 21, 2019

Choose a reason for hiding this comment

gcp Mar 25, 2019

Choose a reason for hiding this comment

gcp Mar 25, 2019

Choose a reason for hiding this comment

Ttl Mar 25, 2019

Choose a reason for hiding this comment

Ttl commented Mar 22, 2019

Friday9i commented Mar 22, 2019

langit7 commented Mar 23, 2019

iopq commented Mar 23, 2019

OmnipotentEntity commented Mar 24, 2019

iopq commented Mar 24, 2019 • edited

Ttl commented Mar 25, 2019

gcp commented Mar 25, 2019

roy7 commented Mar 25, 2019

gcp Mar 25, 2019

Choose a reason for hiding this comment

gcp commented Mar 25, 2019 • edited

gcp commented Mar 25, 2019

Ttl commented Mar 25, 2019

l1t1 commented Mar 26, 2019

iopq commented Mar 26, 2019

l1t1 commented Apr 1, 2019

roy7 commented Nov 30, 2019 • edited

Ttl commented Nov 30, 2019

Ttl commented Mar 19, 2019 •

edited

iopq commented Mar 24, 2019 •

edited

gcp commented Mar 25, 2019 •

edited

roy7 commented Nov 30, 2019 •

edited