Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic komi #1772

Closed
wants to merge 51 commits into from
Closed

Dynamic komi #1772

wants to merge 51 commits into from

Conversation

alreadydone
Copy link
Contributor

@alreadydone alreadydone commented Aug 24, 2018

I just made a release out of my komi+next branch which is a stable, combined and updated version of komi-v0.3x and endgame-v0.3. Debugging has taken longer than expected but now the engine should be free of (additional) crashes. I am not sure how much cleanup/refactoring/revisions are necessary to get the PR merged (plus certain features are still partially experimental), but I'll just open this PR to gather some reviews.

Instructions 使用说明

  • Since current 40-block networks mostly don't show the monotonicity required for dynamic komi, it is recommended to use pangafu's 192x15 Leela Master weights at https://drive.google.com/drive/folders/1XrdAxjDQ7Dnz49QRdv9Dfm1P__-rwR3L. It's reported that GX37 and GX38 work well, but you may also try more recent network files, which are trained with more recent LZ self-play data but have more human games mixed in.
  • Dynamic komi works by adjusting the color plane inputs of the neural network (which the network interprets/perceives as komi to some extent, since there's a trend of black winrate decreasing with komi increasing for many networks) to make it output winrates within a certain range, to avoid too low winrates causing desperate/crazy moves being played and/or too high winrates causing slack moves being played. It currently works as a hack as no network has been trained on games with komi values other than +/-7.5, so the winrates under other komi cannot be trusted, but they are meaningful relative to each other, and dynamic komi does improve handicap performance and allows the engine to play non-slack when it's leading.
    动态贴目是通过调整颜色平面的输入起作用:某种程度上,网络可以将颜色平面的输入解释/感知为贴目信息,因为许多网络在多数局面下黑胜率随贴目上升有下降趋势。将胜率调整到合理范围,有利于防止胜率过低导致的绝望放弃式“发疯”着法(逃死征、往一二路落子等)及胜率过高导致的过于放松的着法(大龙不杀、官子损目等)。但由于网络并未在正负7.5以外的贴目下对战训练过,在其他贴目下的胜率是不准确的,但这些胜率的相对高低是有意义的,确能提高引擎的让子/不退让能力。
  • Compatible with Lizzie /next or myLizzie but not Lizzie /master (v0.5 release). 可用于Lizzie /master或myLizzie,但不适用于Lizzie v0.5。
  • Not theoretically usable with any network that says Detecting residual layers...v2 (so far these are just ELFv0 62b5417b and ELFv1 d13c4099). 不适用于ELFv0或ELFv1。
  • Without parameters, the engine should behave identically to the official /next branch (not necessarily the latest). 不加参数时,引擎的表现应与官方next分支完全一致。
  • Parameters recommended for handicap games 让子棋使用参数: --handicap -r 0 --target-komi 0
    Raising --max-wr and --min-wr may lead to better performance depending on the opponent and maybe the network; the defaults are --max-wr 0.12 --min-wr 0.06 --wr-margin 0.03 for the handicap mode. In general, the lower these parameters are, the more aggressively (but possibly unreasonably or desparately) the engine plays.
    根据对手或权重的不同,加大--max-wr和--min-wr可能带来更好的表现(让子模式下默认是--max-wr 0.12 --min-wr 0.06 --wr-margin 0.03)。一般而言,两参数值越低,引擎表现越进取,但过低则会出现过多无理手或绝望放弃的着法。
  • Parameter recommended for not being slack when leading 不退让使用参数: --nonslack
    Defaults 默认值: --max-wr 0.9 --min-wr 0.1 --wr-margin 0.1
  • Development has focused on stronger handicap performance. With --handicap --pos -r 0 -t 1 -p 3000 -w GX37 against ELFv0 -p 1 (@Splee99's test, but the same number of playouts takes somewhat more time with dynamic komi) with 7 handicap stones, it scored 3-3, and with 8 stones it scored 4-20 (GX37 runs into ladder issues early on in many of the lost games). It should perform pretty well against human as well with up to 5-6 stones according to people who tested.

Changelog: Improvements and over komi/endgame-v0.3x releases 在上一主要版本(让子版、不退让版)基础上的改进/修正的缺陷

  • Safe with AutoGTP, since adjusted komi not be enabled without --handicap, --nonslack or --target-komi parameters. Also the single engine can now play both in handicap mode and in non-slack mode, or under different komi. 可以安全地与跑谱程序AutoGTP一同使用,因为贴目在未启用--handicap, --nonslack 或 --target-komi 选项/参数时不会改变。让子、不退让及不同贴目的功能现包含在一个引擎中。
  • --pos and --neg are implemented. Since it's observed that some networks seem to generalize better (monotonicity of winrate w.r.t. komi) in the positive komi range and some in the negative range, it's natural to use only winrates from one of the ranges, which is enabled with the --pos or --neg option.
    实现了--pos和--neg选项。因为观察到有些权重在正贴目范围泛化更好(具备胜率随贴目的单调性),有些在负贴目范围更好,所以设置了以下两个选项用来选择性地采用正贴目或负贴目下的胜率。
  • More accurate/stable komi/winrate adjustment; addresses the observed problem of plummeting winrate after search causing engine go crazy. Now we don't just use the single position (and its rotations/reflections) at the root node for komi adjustment, we collect positions during tree search. Accuracy/stability can potentially be improved by increasing --adj-positions (default 200). On fast machines --adj-positions 2000 may be reasonable. Since the adjustment is more stable now, --min-wr can be set lower without producing crazy moves. 更精确/稳定的贴目/胜率调整,在树搜索过程中收集用于调整贴目的局面,可以解决搜索后胜率跳水导致的胜率过低发疯问题。--adj-positions越高,调整越精确;默认--adj-positions 200,强机2000或许合适。因为调整更为稳定,--min-wr设小些应该也不容易发疯。
  • Better strategy of komi adjustment: (1) retires --mid-wr and replaces it with --wr-margin. Before, when winrate goes out of the range [min,max], it's adjusted to mid. Now, if winrate goes above max, it's adjusted to max - margin, and if it goes below min, it's adjusted to min + margin. (2) In nonslack mode, adjust back to the target komi if the winrate under the target komi is within [min + margin, max - margin]; addresses Komi only ever adjusts once. (Endgame branch.) alreadydone/lz#42 by @anonymousAwesome and should mostly eliminate the possibility of losing a winning game. 更好的贴目调整策略:(1) 去除--mid-wr参数并以--wr-margin代替。原让子/不退让版中,胜率超过--max-wr或低于--min-wr都会导致其被调整到--mid-wr;在此新版本中,胜率超过max则调整到max - margin,而低于min则调整到min + margin。(2) 在不退让模式中,如目标贴目下的胜率在[min+margin,max-margin]范围内,则直接调整回目标贴目。
  • Thanks to everyone who tested the previous versions! 感谢大家的测试!

Meaning of options/parameters 选项/参数释义

(An abbreviated version of the following can be displayed with the option -h or --help. 使用选项-h或--help可显示以下说明的缩略版。)

  • --handicap Handicap mode 让子模式

  • --nonslack Non-slack mode 不退让模式

  • --max-wr Maximal white winrate 最大白胜率

  • --min-wr Minimal white winrate 最小白胜率

  • --wr-margin Winrate margin 胜率调整宽余量
    Example: --max-wr 0.9 --min-wr 0.1 --wr-margin 0.05 means that the winrate will be adjusted to 0.85 = 0.9 - 0.05 if the winrate under the current komi exceeds 90%, and to 0.15 = 0.1 + 0.05 if it drops below 10%. There are some exceptions to be detailed later.
    示例:--max-wr 0.9 --min-wr 0.1 --wr-margin 0.05 意味着(当前贴目下的)胜率超过90%时将被调整到85% (0.9 - 0.05 = 0.85),低于10%时将被调整到15%。有一些例外情况见后。

  • --target-komi Target komi 目标贴目 (Default 默认值: --target-komi 7.5)
    Normally, only 7.5, -7.5 and possibly 0 are recommended, but see FAQ. 在一般情况下,只推荐设置7.5, -7.5两个值(0或许也能行),不过在FAQ中有另一种不同的用法。
    In handicap mode, the komi will be adjusted back to target-komi if the (white) winrate under current komi is above max and the winrate under the target komi is above (min + margin). In non-slack mode, this happens if the winrate under the target komi is between (min + margin) and (max - margin), regardless of the winrate under the current komi. 在让子模式中,如果当前贴目下的白胜率超过最大值且目标贴目下的白胜率超过最小值+宽余量,贴目将被调整回目标贴目。在不退让模式中,如果目标贴目下的白胜率在(最小值+宽余量)和(最大值-宽余量)之间,无论当前贴目下胜率如何,贴目都将被调整回目标贴目。
    --target-komi is also used for scoring after double-pass, but the komi for scoring purpose can be changed any time during the game with the GTP command komi without affecting the dynamic komi adjustment. 目标贴目也用于终局数子判定胜负,但数子使用的贴目随时可用GTP命令komi更改,对动态贴目调整没有影响。
    Notice: When handicap or non-slack mode is enabled, scoring doesn't take into account the number of handicap stones; this agrees with Chinese scoring in Sabaki and is more reasonable, since the neural network is ignorant of the handicap. 注意:启用让子/不退让模式时,终局确定胜负时黑方不会将让子贴还给白方;这与Sabaki中的中国规则一致,并且更合理,因为神经网络并不知道让子数。

  • --adj-positions Number of positions to collect during tree search for komi adjustment 树搜索中要收集的用于调整贴目的局面的数量 (Default 默认值: --adj-positions 200)

  • --adj-pct Percentage of positions to use for komi adjustment 收集的局面中实际用于调整贴目的局面的百分比 (Default 默认值: --adj-pct 4) We choose the 4% positions that have winrates closest to the average collected winrate and only try different komi values for these positions. 选取收集的所有局面中4%胜率最接近平均胜率的,仅对这些局面尝试不同贴目,将胜率调整到范围内。

  • --num-adj Maximal number of adjustments during each genmove/lz-analyze (Default 默认值:--num-adj 1) Probably no need to change. 每次genmove/lz-analyze命令中调整贴目的最大次数,应无改动的必要。
    By default, each komi adjustment takes about 2x9x200x4%=144 forward passes of the network (somewhat more time than 144 playouts cost). Also notice that any komi adjustment will destroy the search tree, though in handicap games the opponent will frequently play unexpected moves that destroy the tree. 默认设置下,每次调整贴目需要调用网络2x9x200x4%=144次(用时比144po稍多)。任何贴目的调整都会清空搜索树,不过让子棋中对手不理想的/意料之外的着法也会砍掉很大一部分搜索树。

  • --pos Use winrates with "positive" (>=-7.5) komi values (for side-to-move) only, i.e. use winrates at black's turns only when komi is >=-7.5, and use winrates at white's turns only when komi is <=7.5. 仅采用不小于-7.5的贴目(对轮走方而言)下的胜率,也即:只有贴目不小于-7.5时才采用轮黑时的胜率,只有贴目不大于7.5时才采用轮白时的胜率。

  • --neg Use winrates with "negative" (<=7.5) komi values (for side-to-move) only. 仅采用不大于7.5的贴目(对轮走方而言)下的胜率。
    --pos or --neg are automatically set at startup by default using a test (GTP command dyn_komi_test) for the empty board (previously done with the "wrout" utility). Some networks may be rejected by this test, but you can force using them for dynamic komi with the option --tg-auto-pn and possibly manually set --pos or --neg. 默认情况下--pos和--neg选项在引擎启动时会通过一个测试(GTP命令dyn_komi_test)自动设置。测试可能认为有些权重不适用于动态贴目,但如果想强制在让子/不退让模式下使用权重,可以加上--tg-auto-pn选项(将不会自动设置--pos或--neg)。

  • --tg-auto-pn Toggle automatic setting of --pos or --neg. 开启/关闭--pos/--neg的自动设置。

  • --fixed-symmetry Use fixed symmetry instead of random symmetry (rotation/reflection); to be followed by the symmetry you'd like to use exclusively (an integer from 0 to 7). Since the winrates may differ a lot with different symmetries under high komi, better performance might be achieved by fixing a symmetry (but there could be more blind spots as well). 使用固定的旋转/镜像而不使用随机旋转/镜像。因为贴目高时旋转/镜像后的胜率彼此可能很不一样,采用固定的旋转/镜像或不做旋转/镜像可能增强棋力(但也可能增加盲点)。
    Example: --fixed-symmetry 0 means that we use the identity symmetry (do no rotation/reflection of the board before feeding into the neural net).
    示例:--fixed-symmetry 0表示将局面不经旋转/镜像直接输入到神经网络中。

  • --tg-sure-backup Toggle backup for nodes with winrates invalid under --pos or --neg. Suppose that --neg is enabled, the komi is 8.0 and it's black to move, then the winrate at this node would be invalid: if backup is enabled (default), the winrate of the first child (where it's white to move) will be used instead; if backup is disabled, the winrate will simply be discarded. Setting this option will change the shape of the tree so it is not recommended. 实验性质的选项,不推荐设置。

  • --tg-orig-policy Toggle original/adjusted policy under --pos or --neg. Without this option, if --pos is set in a handicap game and the komi is high, the policy for black will be adjusted to suggest more aggressive moves as if black isn't leading by a lot, but the policy for white will not be adjusted so it suggests super-aggressive/desparate moves; if --neg is set, the policy for white will be adjusted to suggest not so aggressive/desparate moves, but the policy for black will not be adjusted so it suggests very conservative/safe moves. With this option, both black and white policy will be adjusted, so the moves suggested are closer to those in an even game. May worth experimenting with. 使用--pos或--neg选项时,启用该选项将总是采用调整后的网络概率,使得网络建议的着法更接近形势差不多时的着法(但让子时白更积极)。让子模式、未启用该选项时,如启用--pos选项,轮黑时的建议着法是调整过的,不会过于保守,而轮白时的建议着法则未经调整,会非常进取;如启用--neg选项,轮白时的建议着法不会过于进取,而轮黑时的建议着法会非常保守。

  • --tg-dyn-fpu Toggle dynamic first-play urgency (FPU). Under handicap/non-slack mode the default is using the dynamic evaluation (the average neural network winrate of the tree below the parent) of the parent node (minus reduction) as FPU; without --handicap or --nonslack the default is using the neural network winrate of the single parent node as FPU. Since the winrates may vary a lot under high komi, setting this option is not recommended. 实验性质的选项,不推荐设置。

FAQ/Technical details 常见问题和技术细节

  • What is opp_komi? opp_komi是什么?
    Since the networks are never trained on komi values other than +/-7.5, the winrates under other komi are inaccurate, and it turns out that many networks generalize differently at the positive range than at the negative range, and a different komi value is needed at black's turns than at white's turns to bring the winrates close together and within the range. The displayed komi is the input at the current side-to-move's turns, while opp_komi is the komi input to the network at the opponent's turns. 由于网络对贴目的感受的不精确性,轮黑走和轮白走时的贴目要设为不同的值,才能分别将胜率调整到预定范围。komi是轮己方走时输入网络的贴目值,而opp_komi是轮对方走时的值。

  • How does dyn_komi_test and automatic setting of --pos or --neg work? 引擎如何自动设置--pos和--neg?
    用空棋盘检测胜率随贴目的单调性,得出两个分数(score),如果正贴目范围的分数超过5e-2就认为权重在正贴目范围不能使用,自动设置--neg,反之自动设置--pos。
    Take the initial empty board position, from -300 to 300 komi with increment 0.5, record every place where the black winrate is increasing instead of decreasing, and add up the increments separately for the positive and negative ranges; finally add 1 - (winrate at -300 komi) to the negative range sum, and add (winrate at 300 komi) to the positive range sum. If the positive range sum exceeds 0.05, then the positive range is deemed unusable and --neg is automatically set; similarly for the negative range.

  • How would I use the engine to play games with 6.5 komi as white (e.g. on Fox) or other values? 如何用于6.5贴目的对局?
    You may be tempted to use --target-komi 6.5, but due to inaccuracy of winrates under komi values other than +/-7.5, it's safer to set --target-komi 3, for example. Then you may type in GTP command komi 6.5 to set the komi for scoring. 设置 --target-komi 6.5 看似能行,但因为网络对贴目感受的不精确性,启动时设置譬如说 --target-komi 3 会更为安全。之后可以通过GTP命令komi 6.5设置终局数子时采用的贴目(中国/日韩规则的不同此处无法保证正确处理)。

Possible further improvements

  • Parallelizing and batching komi adjustment (mean_white_eval) and dyn_komi_test will save some time.
  • Instead of saving GameState for komi adjustment, maybe only save KoState + input_data (output of gather_features); should save some memory.
  • Not collecting half of the positions when --pos or --neg is enabled.

alreadydone and others added 30 commits July 3, 2018 02:44
…d by default, so won't affect self-play game generation.
…nd adjust during search if time is ample (incomplete, with comments of ideas to implement).
Toggle options; automatically set --pos or --neg or neither at startup.
Merge pull request #48 from alreadydone/patch-20
@infinity0
Copy link
Contributor

I tried this (merges cleanly into current next branch, thanks) and it crashes about half the time with:

leelaz: ../../../../../src/gallium/state_trackers/clover/core/resource.cpp:176: clover::root_resource::root_resource(clover::device&, clover::memory_obj&, clover::root_resource&): Assertion `0' failed.
Aborted
exit code 134

When it does succeed, I get partial junk output on my terminal:

��������Ŀֵ����ʤ���������ģ�Winrate increasing near -87.5, -87.0, -86.5, -86.0, -85.5, -85.0, -84.5, -84.0, -83.5, -83.0, -82.5, -82.0, -81.5, -81.0, -80.5, -80.0, -71.5, -71.0, -70.5, -70.0, -69.5, -69.0, -68.5, -68.0, -67.5, -67.0, -66.5, -66.0, -65.5, -65.0, -64.5, -64.0, -63.5, -63.0, -62.5, -62.0, -61.5, -61.0, -60.5, -60.0, -59.5, -59.0, -58.5, -58.0, -57.5, -57.0, -56.5, -56.0, -55.5, -55.0, -54.5, -47.5, -47.0, -46.5, -46.0, -45.5, -45.0, -44.5, -44.0, -12.0, -11.5, -11.0, -10.5, -10.0, -9.5, -9.0, -8.5, -8.0, -7.5, -7.0, 183.0, 183.5, 184.0, 184.5, 185.0, 185.5, 186.0, 186.5, 187.0, 187.5, 188.0, 188.5, 189.0, 189.5, 190.0, 190.5, 191.0, 191.5, 192.0, 193.0, .
Negative komi total score: 7.449442e-02
Positive komi total score: 4.774332e-05
Weight file is of mediocre quality for dynamic komi. Use with the option --pos. Ȩ�������еȣ�����Ŀ���ֲ��ѣ��Ƽ�ʹ��--pos������

@gcp
Copy link
Member

gcp commented Sep 3, 2018

This diverges quite a bit from the original AlphaGo paper, so I don't know how this will bode with the project's goals... but if it is useful enough we might want this to be merged in...?

In theory if this is disabled the program should behave identically (at least that's what I hope). The main reason for merging would be that this is probably the most useful feature for Go players. We don't have to switch the main server training to use this, but having support in the main client makes it possible, and everyone that wants dynamic komi can then use the official releases.

In case the code turns out to be too invasive, we can still see if we can perhaps pull in significant parts of it and make the "diff" smaller, making it easier to keep the branches in sync.

(The linked posts do indeed lay out my position well)

I see a lot of command line options. @gcp was against adding command-line arguments on multiple occasions, and I see that it's due to adding more confusion to the users.

It's also because they require the program to be restarted to change settings, which is unnecessary and slow (especially on a mobile...!). However, I do realize there is a chicken and egg problem here as there exists no GTP interfaces that support some variant of set_option or whatever but they all allow specifying command line arguments.

Hiding most of these behind USE_TUNER and exposing only the minimal amount that the user can be expected to set without reading through 10 pages of explanation would be ideal. Let's try for good/great (even if not perfect) defaults rather than saddling the user with trying to find them.

(set_option support can be done in another issue)

Or, should we do a i18n project for leela-zero?

The translation should be handled on the GUI side, preferably the user should never see any Leela Zero output.

Of course, set_option and command line flags are exactly the problem here, and this is why USI (for Shogi) talks about translating the setoption stuff: http://hgm.nubati.net/usi.html

Maybe another reason to not add unnecessary options :-)

@@ -27,6 +27,7 @@
#include "Utils.h"
#include "Zobrist.h"

extern bool cfg_dyn_komi;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just include GTP.h.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to work here, though I recall having encountered a cycle of includes that led to compilation error...

int m_handicap;
int m_passes;
int m_komove;
size_t m_movenum;
int m_lastmove;

bool pos_komi() { return (get_to_move() == FastBoard::BLACK && (m_opp_komi > 7.5 || m_stm_komi > 7.5)) || (get_to_move() == FastBoard::WHITE && (m_opp_komi < -7.5 || m_stm_komi < -7.5)); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line length. Also in general put as much functions in the .cpp as possible, and let the compiler figure out the inlining.

// todo: configurable lower/upper limits and gap, allow black or white to move, more accurate (with raw_winrate, no bias towards pos or neg)
auto vec = net.get_output(&game, Network::Ensemble::DIRECT, sym, true);
auto current_komi = game.m_stm_komi;
std::vector<float> loc_incr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's loc_incr? increment of something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loc_incr stores the komi values where the winrate is locally increasing instead of decreasing. (Black's winrate is expected to decrease with komi, not white's, but I haven't yet made the test work at white's turns.)

@@ -327,7 +432,7 @@ bool GTP::execute(GameState & game, std::string xinput) {
} else if (command.find("komi") == 0) {
std::istringstream cmdstream(command);
std::string tmp;
float komi = 7.5f;
float komi = cfg_target_komi;
float old_komi = game.get_komi();

cmdstream >> tmp; // eat komi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just process this now and replace cfg_target_komi with it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the command line parameter --target-komi is used for m_stm_komi, m_opp_komi (both for the NN) and for m_komi (for scoring), but the komi command only changes m_komi. For now it's good to be possible to specify different komi for scoring and for NN, since the NN doesn't perceive komi accurately and undistortedly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For compatibility it might be good to derive a best guess value for m_stm_komi and m_opp_komi when "komi" is set this way.

@@ -35,7 +35,7 @@ class KoState : public FastState {
void play_move(int color, int vertex);
void play_move(int vertex);

private:
//private:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of hacks will need to be fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making this single member public is harmless? I don't remember about other hacks like this.

@@ -110,7 +115,7 @@ class UCTNode {
// UCT eval
float m_policy;
// Original net eval for this node (not children).
float m_net_eval{0.0f};
float m_net_eval{2.0f};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally unclear why this is changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's changed to a value to indicate uninitialized status because I tried to use fpu instead when net_eval is invalid: 5976a2f#diff-28216771f77a09f4e0782ad5404fdd28R326 5976a2f#diff-28216771f77a09f4e0782ad5404fdd28R89
However that idea is abandoned and the change is unnecessary now.

create_children(net, nodes, root_state, eval);
inflate_all_children();
kill_superkos(root_state);
if (root_state.eval_invalid()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is repeated several times so should probably be factored out.

root_eval = (color == FastBoard::BLACK ? root_eval : 1.0f - root_eval);
}

std::array<std::vector<std::shared_ptr<Sym_State>>, 2> ss;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to collect the average eval during the search somehow. Not clear why it's so complicated, or why it's tracking symmetries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of collecting the 200 positions during the search, and then taking a 4% sample afterwards, what about making the komi adjustment loop perform a minimal search (4% of 200 positions is 8 nodes) with the adjusted parameters? The main disadvantage is that the positions would be closer together.

Or go back to only adjusting at the root (what you seem to have initially had) for a first merge attempt, and then try to clean up the position collection afterwards? It seems to be almost half of this patch, and it's some of the messiest code.

The code can be made less complicated by removing some inferior/abandoned ideas/features. The winrate can differ a lot with different symmetries, but it's still reasonable to expect monotonicity (of winrate w.r.t. komi) for each fixed symmetry. For a set of (position, symmetry) pairs, mean_white_eval returns a fixed value for each komi. Under the assumption that the returned value is monotonic w.r.t. komi, binary search will find a komi that attains the target winrate. Due to symmetry randomness and resulting node expansion randomness, the returned value of a minimal search isn't deterministic, let alone monotonic; and you'll need to run the net for all nodes, not just 4% of them.

Originally (in komi-v0.3x) I just used the 8 symmetries of the root node to adjust komi, but it's observed by @kuba97531 for example in a 4H game against an EGF 7d that if the winrate drops after some search, crazy moves will appear, and my idea to solve the problem is to collect positions deeper into the search for komi adjustment.

OutputAnalysisData(const std::string& move, int visits, int winrate, std::string pv) :
m_move(move), m_visits(visits), m_winrate(winrate), m_pv(pv) {};
OutputAnalysisData(const std::string& move, int visits, int winrate, std::string N_num, std::string pv) :
m_move(move), m_visits(visits), m_winrate(winrate), m_N_num(N_num), m_pv(pv) {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's m_N_num?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the policy prior; mylizzie needs it to display the policy prior together with the move, principal variation, winrate and visit count in a table. Originally, the N_num output breaks Lizzie, but after featurecat/lizzie#329 it's now compatible with Lizzie. That PR also added support to display stm_komi and opp_komi in Lizzie.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha! We should split off this feature. (And rename it to something clearer)

Do you want to do this?

It's generally useful outside this pull, and it will make this diff smaller again.

currstate = state;
result = play_simulation(currstate, node, thread_num);
} while (!result.valid());
result = SearchResult::from_eval(result.eval() + 2.0f);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what this is trying to do is to keep expanding until we have the right side to move?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. With the --pos or --neg option, the winrate (say) at black's turn may be adjusted but at white's turn may not be adjusted, so we can't use both winrates. With --pos or --neg it's also impossible to use net_eval for the fpu of a child, since the parent and the child have different sides to move. Hopefully after the network is properly trained, no --pos or --neg is necessary, and m_stm_komi, m_opp_komi (for NN inputs) and m_komi (for double-pass scoring) can always be set to be equal.

@gcp
Copy link
Member

gcp commented Sep 3, 2018

This is already a huge diff, it's mostly unreviewable.

I would suggest this as a first simplification run:

  1. Throw out dyn_fpu.
  2. Throw out fixed symmetries.
  3. Threw out all cases to support values that your own help doesn't recommend for now.

The code to support the adj-positions during tree search is huge and very complex. I'd like to understand the need for it better and what it's trying to achieve. My guess so far from reading the explanation is that near the endgame, small changes to komi can cause huge changes to winrate, much more so than at the starting position where the initial values were determined, so you need to figure out by how much you can change the komi to keep winrate values reasonable?

@gcp
Copy link
Member

gcp commented Sep 3, 2018

Instead of collecting the 200 positions during the search, and then taking a 4% sample afterwards, what about making the komi adjustment loop perform a minimal search (4% of 200 positions is 8 nodes) with the adjusted parameters? The main disadvantage is that the positions would be closer together.

Or go back to only adjusting at the root (what you seem to have initially had) for a first merge attempt, and then try to clean up the position collection afterwards? It seems to be almost half of this patch, and it's some of the messiest code.

node->update(result.eval());
auto eval = result.eval();
auto num = 1;
while (eval >= 2.0f) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is a result of actually doing two evaluations to get to the proper side to move. I'd try to refactor the double expansion into the node evaluation itself, and then just assign the average of both evaluations to the backup, this keeping all the existing tree search code that deals with this unaware of what happened underneath.

@alreadydone
Copy link
Contributor Author

@gcp Thanks for the reviews, and I hope to respond to them and make some suggested changes tomorrow. There's a also a strange bug under Linux alreadydone#55 that I need to fix.

It might be easy to train the network to accurately perceive color plane inputs as komi; maybe it suffices to collect positions where the territories are all settled, and train the value towards +1 with inputs corresponding to (score-0.5) komi and towards -1 with inputs corresponding to (score+0.5) komi, and the network will generalize to unsettled positions. (There are existing algorithms e.g. in Sabaki to score without removing all dead stones.) Accurate score estimation has always been a desired feature anyway, not just for dynamic komi.

@@ -65,6 +65,10 @@ class NNCache {
}

void dump_stats();
void clear_cache() {
m_cache.clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If m_order isn't cleared then will the size check in insert() try erasing elements that have been cleared? If a hash is inserted that had previously been "cleared" then on a later insert that tries to erase the cleared hash, will that incorrectly remove all the elements equal to hash?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, would it be reasonable to include the komi value in the hash used for cache lookup thus removing the need to clear the cache?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point; I didn't notice m_order, but it seems it won't cause a problem: http://www.cplusplus.com/reference/unordered_map/unordered_map/erase/

Version (2) returns the number of elements erased, which in unordered_map containers (that have unique keys), this is 1 if an element with a key value of k existed (and thus was subsequently erased), and zero otherwise.Version (2) returns the number of elements erased, which in unordered_map containers (that have unique keys), this is 1 if an element with a key value of k existed (and thus was subsequently erased), and zero otherwise.

I thought about including komi in the hash but was too lazy to figure out details to implement it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a problem for the example case I tried to explain above. I.e. when m_order contains a hash multiple times because it hasn’t been cleared and therefore when the size limit is hit (in m_order), m_cache will incorrectly have an entry removed.
There might be some speed benefits of maintaining the cache but like you, I’m not sure how to modify the hash function :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see the situation you meant now and I think not clearing m_order is a mistake. But now I also realize that not clearing the cache can save at least half of the NN evals if --pos or --neg is enabled, so I should indeed looking into including komi in hash!

@axao
Copy link

axao commented Sep 27, 2018

Hi, i am a begginer in the field. I would like to know how to use dynamic komi with leela masterweight on sabaki ? Is it simple to do install it ?

@alreadydone
Copy link
Contributor Author

@axao Do you know how to use the officially released Leela Zero engine, load a weight from zero.sjeng.org, and use Leela Zero (or any other GTP engine) in Sabaki? If not then your question doesn't belong here, and if yes why not read the instructions at the top of this page and tell me what you don't understand?

@axao
Copy link

axao commented Sep 29, 2018

@alreadydone Hi, I know how to use the officially released Leela zero, load a weight from zero.sjeng.org and use Leela Zero in Sabaki. Thx for your message ! I got it !

I use gx37 and "-g -t 4 --handicap --max-wr 0.11 --min-wr 0.09 --wr-margin 0.02 -b 100 -w gx37.txt" and the new release of dynamic komi ! I'll test it now ! :p

@roy7
Copy link
Collaborator

roy7 commented Oct 8, 2018

@kuba97531 calculated the histograms and the current 40b isn't close to monotonic. I wonder, once the ELF games are finally dropped out of the training pipeline, would we expect 40b to move back towards a monotonic graph?

@kuba97531
Copy link
Contributor

The picture showing comparison of 15b and 40b nets with regards to a winrate as a function of komi in starting position.
Red - LZ153 - shows black's winrate nicely decreasing as komi grows from -300 to 100.
Green - LZ181 - doesn't show anything

image

@alreadydone
Copy link
Contributor Author

I don't think ELF games have anything to do with monotonicity. Personally I think non-monotonicity indicates some form of overfitting, and maybe with current learning rate, window size and network capacity it is necessary to feed in different komi during training to restore monotonicity.

@gcp
Copy link
Member

gcp commented Nov 26, 2018

Can't take this as is, so going to close this pull for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants