Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FPU reduction proportional to root eval #1565

Closed
wants to merge 1 commit into from

Conversation

Ttl
Copy link
Member

@Ttl Ttl commented Jun 17, 2018

I did some statistics about the optimal FPU reduction. I first changed FPU to 1 to expand all the first level nodes, added code to save child node net evaluations and then played some self play games to gather enough data. In the analysis script I calculated the difference between child node net evaluation and root net evaluation for every node, this would be the optimal FPU reduction constant to use for that node. Code is available at: https://github.com/Ttl/leela-zero/tree/fpu_estimation

Here is histogram of the evaluation difference of child and root:

fpu_hist

As expected most of the moves are worse than the root which is why root evaluation minus constant is a more accurate initialization than just root evaluation.

Here is L2 loss plot with different FPU reduction constants:

fpu_constant_l2

The minimum is very close to what was previously determined to be strongest using CLOP. Estimated maximum was -0.3 in the last tuning run (#1211) and in this analysis -0.29 has the lowest L2 loss.

Plotting FPU reduction vs. root evaluation gives the following plot:

fpu_winrate

As the winrate can't go lower than 0 it makes sense to use FPU reduction that is proportional to the root evaluation instead of just subtracting a constant. In the plot green line is the least squares fitted FPU reduction vs. winrate (This pull request).

Plotting the FPU reduction vs. policy head output gives:

fpu_policy

As would be expected moves with high policy output usually have value network evaluation close to the root and low policy moves are often much worse, although there are exceptions. This plot does give some explanation why sqrt(total_visited_policy) works: It lowers the reduction for moves with high policy that are expanded first and that are usually close to the root evaluation. In green lines are the FPU reductions with 0.1, 0.5 and 0.9 root evaluations using the equation in this pull request.

There are many different functions that could be used to initialize nodes from policy and root evaluation, but I decided to keep it simple and avoid adding any more tunable variables. The equation in this pull request gives lower L2 loss than simple subtraction. I didn't try to find any better formula for taking the policy output into account and used the current sqrt(total_visited_policy) for that.

L2 loss for this equation in prediction the child evaluation is 0.033 using the latest network compared to 0.051 using the current equation. -0.6 is the best reduction using this formula with this network. As was earlier observed when FPU reduction was first introduced it does depend on the network. Minigo-303 optimum is at -0.29 and ELF is at -0.7. The new formula is more accurate also for them by about the same amount as for LZ net.

I have been running a match between this pull request vs. next for one week now using a very weak laptop. Command line parameters are: -g -d -t 1 --noponder -v 800 --timemanage off -r 10 -w d01879964d578b676714251164f7289da023f14c0063b1721e5e3cd2e8d51ae0.gz

Current results:

lz_fpu_est v lz_next (131/400 games)
board size: 19   komi: 7.5
             wins              black         white       avg cpu
lz_fpu_est     71 54.20%       29 43.94%     42 64.62%    500.59
lz_next        60 45.80%       23 35.38%     37 56.06%    508.49
                               52 39.69%     79 60.31%

It's not statistically significant yet and it looks like completing the 400 games takes about two more weeks. I don't think there is currently any hope for me to test this at higher playouts. I hope that someone will help with testing or invent a more accurate initialization scheme using the analysis results.

@remdu
Copy link
Contributor

remdu commented Jun 17, 2018

Very nice analysis !

@sethtroisi
Copy link
Member

@Ttl I'm working on determining minigo optimal c_puct and running similiar large batches of games. I found the results change dramatically when I vary readouts/playouts/visits from a low first pass number (200 or 400) to the self-play number (800 for MG). I have a 1080 not being highly utilized, I can easily pull a change and run a ringmaster config or potentially even give you SSH to that machine if that would help.

@Ttl
Copy link
Member Author

Ttl commented Jun 18, 2018

I also expect the number of playouts to affect the results, which is why I was hoping to test also with larger playouts. Ringmaster tournament with 400 games using the same command line but using 3200 visits would be interesting. If that is too much effort then 1600 visits would probably be also fine.

@amj
Copy link

amj commented Jun 18, 2018

Seth, want to try and get this running on a cluster this week?

@sethtroisi
Copy link
Member

sethtroisi commented Jun 18, 2018

lz_next compiled from commit 91031bf
lz_fpu_est compiled from commit ccfdd17

ringmaster config
(I used -j2 to get 100% GPU utilization but CPU usage wasn't equal so I went back to -j1)

competition_type = 'playoff'
description = """ Tests for Ttl """
record_games = True
stderr_to_log = True

options = "-g -d -t 1 --noponder --timemanage off -r 10 -w d01879964d578b676714251164f7289da023f14c0063b1721e5e3cd2e8d51ae0"

def player(binary, playouts):
    return Player(binary + " " + options + " -v " + str(playouts))

def matchup(name1, name2, games):
    return Matchup(name1, name2, alternating=True, scorer='players', number_of_games=games)

players = {
    'lz_fpu_est_800' : player("./lz_fpu_est", 800),
    'lz_next_800' :    player("./lz_next", 800),
    'lz_fpu_est_1600' : player("./lz_fpu_est", 1600),
    'lz_next_1600' :    player("./lz_next", 1600),
    'lz_fpu_est_3200' : player("./lz_fpu_est", 3200),
    'lz_next_3200' :    player("./lz_next", 3200),
}

board_size = 19
komi = 7.5

matchups = [
    matchup('lz_fpu_est_800', 'lz_next_800', 400),
    matchup('lz_fpu_est_1600', 'lz_next_1600', 100),
    matchup('lz_fpu_est_3200', 'lz_next_3200', 200),
]

Results

  • I fixed using the wrong branch and now CPU time is equal (sample size = 1)
  • I'll update in ~14 hours with the final games
lz_fpu_est_800 v lz_next_800 (400/400 games)
board size: 19   komi: 7.5
                 wins              black          white        avg cpu
lz_fpu_est_800    206 51.50%       95  47.50%     111 55.50%     50.79
lz_next_800       194 48.50%       89  44.50%     105 52.50%     51.53
                                   184 46.00%     216 54.00%

lz_fpu_est_1600 v lz_next_1600 (100/100 games)
board size: 19   komi: 7.5
                  wins              black         white       avg cpu
lz_fpu_est_1600     45 45.00%       21 42.00%     24 48.00%     99.59
lz_next_1600        55 55.00%       26 52.00%     29 58.00%    100.76
                                    47 47.00%     53 53.00%

lz_fpu_est_3200 v lz_next_3200 (200/200 games)
board size: 19   komi: 7.5
                  wins              black          white        avg cpu
lz_fpu_est_3200    104 52.00%       52  52.00%     52  52.00%    194.81
lz_next_3200        96 48.00%       48  48.00%     48  48.00%    197.34
                                    100 50.00%     100 50.00%

@Ttl
Copy link
Member Author

Ttl commented Jun 18, 2018

You are using a wrong branch. That one expands every child node and is meant for generating the data to fit the FPU reduction. Use the branch from this pull request instead.

Also I might have rebased the fpu_estimation branch for generating the data incorrectly. I'll fix it later today. The branch in this pull request should be correct. Everything is correct after all.

@sethtroisi
Copy link
Member

Glad I saw this :) I fixed it now and restarted.

@sethtroisi
Copy link
Member

I completed the set of games, if you want me to run some more I certainly can (ping me or ping AMJ and have him ping me on discord)

@remdu
Copy link
Contributor

remdu commented Jun 20, 2018

Am I understanding the 3rd graph right that at 0% winrate, FPU reduction seems like it should be 0.0, while at 100% winrate, it seems like it should be around 0.5 ?

@Ttl
Copy link
Member Author

Ttl commented Jun 20, 2018

Thanks for running the tests. I continued my test too and it also seems to converge around 50%:

lz_fpu_est v lz_next (188/400 games)
board size: 19   komi: 7.5
             wins              black         white        avg cpu
lz_fpu_est     97 51.60%       43 45.74%     54  57.45%    496.65
lz_next        91 48.40%       40 42.55%     51  54.26%    504.46
                               83 44.15%     105 55.85%

It seems mostly equal to the constant FPU reduction. Combining all the tests with different playouts gives 452/888 wins = 50.9% winrate. It does use about 1.5% less CPU which I guess means that the results would be a little bit better with equal time, but I don't think it would really matter.

@eddh You are reading the plot correctly. Near 50% it should give very similar results to the current method and it's only significantly different at extreme winrates. In practice the difference is mostly that it expand more nodes for the losing side and less for the winning side.

I'll close this for now since it's not better by statistically significant margin.

@Ttl Ttl closed this Jun 20, 2018
@roy7
Copy link
Collaborator

roy7 commented Jun 20, 2018

Before LZ would get too extreme on win rate it'd resign wouldn't it? I'm not sure better extreme win rate handling would do much in a head to head comparison. We might need to test it in a handicap situation against an older weaker LZ or something and see if the fpu_est has a higher win rate than next vs the same opponent and same handicap?

@Ttl
Copy link
Member Author

Ttl commented Jun 20, 2018

There's actually quite large difference in the number of expanded nodes compared to next when winrate is low. For example in https://online-go.com/game/13035133 move 70 using 1000 visits:

Next
 H11 ->     420 (V:  6.65%) (N: 18.39%) PV: H11 C10 F11 A9 F10 R15 D12 E12 E11 D11 C12 H15 A15 D13 B12
 C12 ->     187 (V:  7.59%) (N:  2.55%) PV: C12 C11 H11 F11 A15 D13 B12 C13 B13 D12
 A15 ->     185 (V:  5.79%) (N: 14.33%) PV: A15 C13 D13 D12 C12 B12 B13 C11 H11 F11 A12 B11
 B12 ->      93 (V:  6.89%) (N:  3.21%) PV: B12 C10 A15 C12 B13 B11 A14 A12 C15
 E10 ->      79 (V:  5.77%) (N:  6.32%) PV: E10 F10 C10 C11 D11 A9 D10 D12 H11 E11 C3 D3 C4 C2 B2 D2 B1
 B19 ->      27 (V:  4.87%) (N:  3.10%) PV: B19 A19 A15 C13 B13 B12 C12 C11
  C3 ->       8 (V:  4.83%) (N:  0.93%) PV: C3 C4 D3 E4 H11 F11 B4
7.1 average depth, 18 max depth
738 non leaf nodes, 1.35 average children
1000 visits, 290742 nodes, 999 playouts, 20 n/s
Proportional FPU
 H11 ->     410 (V:  6.68%) (N: 18.39%) PV: H11 C10 F11 A9 F10 R15 D12 E12 E11 D11 D13 H15
 A15 ->     164 (V:  5.60%) (N: 14.33%) PV: A15 C13 H11 C10 F11 A9 F10 R15
 E10 ->     159 (V:  6.80%) (N:  6.32%) PV: E10 F10 C10 C11 A9 A10 D11 A8 D10 D12 E11 F11 B11
 B12 ->     105 (V:  7.03%) (N:  3.21%) PV: B12 C10 A15 C12 A14 C13 B13
 B19 ->      27 (V:  4.97%) (N:  3.10%) PV: B19 A19 A15 C13 B13 B12
 C12 ->      25 (V:  5.20%) (N:  2.55%) PV: C12 C10 A15 B12 D12 D13 C13
  C3 ->      11 (V:  5.76%) (N:  0.93%) PV: C3 C4 D3 E4 E10 F10 C10
  R4 ->       6 (V:  5.20%) (N:  0.63%) PV: R4 R3 H11 C10
 A19 ->       6 (V:  5.15%) (N:  0.73%) PV: A19 A15 B19 H11
 C10 ->       6 (V:  4.91%) (N:  0.79%) PV: C10 C11 A9 A10 E10
 R14 ->       5 (V:  5.48%) (N:  0.45%) PV: R14 H11 A15 C13
  Q5 ->       5 (V:  4.71%) (N:  0.61%) PV: Q5 Q3 O4 P3
  O4 ->       5 (V:  4.66%) (N:  0.74%) PV: O4 P3 N4 N3
  P3 ->       4 (V:  5.45%) (N:  0.38%) PV: P3 O4 P2
 D12 ->       4 (V:  4.54%) (N:  0.57%) PV: D12 C10 H11 F11
  A9 ->       4 (V:  4.14%) (N:  0.58%) PV: A9 C10 H11 F11
 A12 ->       3 (V:  5.96%) (N:  0.23%) PV: A12 A15 H11
 G19 ->       3 (V:  5.79%) (N:  0.27%) PV: G19 A15 H11
 B11 ->       3 (V:  5.48%) (N:  0.36%) PV: B11 C10 A15
  C4 ->       3 (V:  5.40%) (N:  0.29%) PV: C4 C3 D3
 A13 ->       3 (V:  5.16%) (N:  0.23%) PV: A13 A15 H11
  Q3 ->       3 (V:  4.83%) (N:  0.38%) PV: Q3 P3 R4
 F11 ->       3 (V:  4.68%) (N:  0.42%) PV: F11 F10 H11
 F10 ->       3 (V:  4.51%) (N:  0.38%) PV: F10 E10 H11
  S5 ->       2 (V:  6.00%) (N:  0.21%) PV: S5 H11
 C11 ->       2 (V:  5.72%) (N:  0.24%) PV: C11 C10
  A6 ->       2 (V:  5.29%) (N:  0.21%) PV: A6 H11
  B7 ->       2 (V:  4.93%) (N:  0.21%) PV: B7 A9
  D3 ->       2 (V:  4.89%) (N:  0.27%) PV: D3 C3
  N5 ->       2 (V:  4.60%) (N:  0.21%) PV: N5 H11
 E11 ->       2 (V:  4.58%) (N:  0.28%) PV: E11 C10
 S14 ->       2 (V:  4.51%) (N:  0.36%) PV: S14 H11
  E5 ->       2 (V:  4.17%) (N:  0.30%) PV: E5 D5
 A10 ->       2 (V:  3.88%) (N:  0.22%) PV: A10 C10
  L6 ->       2 (V:  3.19%) (N:  0.22%) PV: L6 H11
 D11 ->       1 (V:  4.76%) (N:  0.21%) PV: D11 
  B6 ->       1 (V:  4.59%) (N:  0.24%) PV: B6 
  D5 ->       1 (V:  4.21%) (N:  0.25%) PV: D5 
  N6 ->       1 (V:  4.03%) (N:  0.21%) PV: N6 
  K6 ->       1 (V:  3.63%) (N:  0.22%) PV: K6 
  N3 ->       1 (V:  3.59%) (N:  0.21%) PV: N3 
  A8 ->       1 (V:  3.35%) (N:  0.27%) PV: A8 
6.6 average depth, 14 max depth
646 non leaf nodes, 1.55 average children
1000 visits, 291150 nodes, 999 playouts, 19 n/s

I can't say though if expanding more low prior nodes is advantageous or not. At least on this position it doesn't really make any difference.

Old LZ weights have some obvious weak spots with missing ataris on large groups and not understanding how ladders work. In my opinion converted minigo weights play much more sane while still being beatable with handicap (#1538 (comment)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants