Move selection in deep endgames is broken #54

gcp · 2017-11-15T09:22:06Z

This is a game between a random network and a network trained from 9k games where the random network won. The position of interest is around move 430 (loadsgf cap.sgf 430).

We can see that the 9k version was easily winning here, but it played in the 2nd last liberty of its own group and lost. The search should see the capture easily, and the 9k version should have at least some understanding that losing so many stones is bad, so it is weird that it makes this mistake. Running the search on the same position again reveals what is happening:

Playouts: 54749, Win: 71.24%, PV: K17 G6 P11 R16 

 K17 ->      16 (V: 70.62%) (N: 11.2%) PV: K17 G6 P11 R16 
 C18 ->      13 (V: 70.23%) (N:  8.4%) PV: C18 G6 H3 R16 
  H3 ->      12 (V: 66.63%) (N:  8.9%) PV: H3 N1 P11 N2 
  G3 ->      12 (V: 52.24%) (N: 12.6%) PV: G3 R16 Q10 H3 
 Q10 ->      11 (V: 75.94%) (N:  7.0%) PV: Q10 R16 H3 G3 
 B17 ->      11 (V: 69.14%) (N:  9.3%) PV: B17 G3 P11 R16 
 P11 ->      10 (V: 65.98%) (N:  8.3%) PV: P11 R16 G3 H3 
pass ->       8 (V: 88.23%) (N:  3.7%) PV: pass D1 G3 
  T1 ->       4 (V: 98.38%) (N:  0.8%) PV: T1 P6 S1 
97 visits, 937 nodes, 57340 playouts, 1147 n/s

We can see that the value head properly understands that passing, or capturing the black group in the lower right are very good. But the move decision is made on the number of simulations, and this is very low. In fact, we have very few visits for all moves (instead of the expected 1600 total). Yet the playout number is very high. What's going on?

I think this is the code pointed out in issue #38 backfiring. Every time there are two passes, there will be no expansion past that and the number of visits for this path won't be updated on a playout. So all paths that quickly lead to two passes in the optimal case won't get much if any visits, meaning that even if the value network understands the move, the search can't choose it.

The text was updated successfully, but these errors were encountered:

gcp · 2017-11-15T10:50:05Z

  T1 ->   39036 (V: 83.18%) (N:  0.5%) PV: T1 P5 N4 R1 O6 R5 N3 
 K17 ->     726 (V: 64.74%) (N: 12.9%) PV: K17 A3 H3 D1 Q10 G3 C18 
 C18 ->     619 (V: 66.16%) (N:  9.2%) PV: C18 G3 P11 H3 B17 A3 pass R16 
 B17 ->     572 (V: 66.17%) (N:  8.4%) PV: B17 D1 G3 H3 C18 G6 P11 
  H3 ->     556 (V: 63.31%) (N: 10.6%) PV: H3 G6 Q10 R16 B17 G7 K17 
  G3 ->     509 (V: 63.71%) (N:  9.6%) PV: G3 A3 B17 H3 K17 N1 Q10 
pass ->     372 (V: 71.83%) (N:  4.0%) PV: pass D1 K17 G3 C18 R16 
 Q10 ->     265 (V: 47.67%) (N:  9.0%) PV: Q10 P11 T6 R9 T10 
 P11 ->     248 (V: 51.19%) (N:  7.6%) PV: P11 Q10 P10 T10 R11 
42903 visits, 1632236 nodes, 44177 playouts, 883 n/s

This looks a lot better.

The search relies on the best node having a high visit count in order to pick it out. This iteracts badly with us not wanting to expand a node that is terminal, and not updating the results in that case (thus not recording a visit). Make the search detect when it has reached a terminal node, count who is the winner on the board, and initialize the SearchResult with that information. This makes sure that winning positions can get a good visit count. Fixes issues #38 and #54.

roy7 · 2017-11-15T14:49:13Z

Should move selection be based on win % and not # of playouts? I realize the winning moves will get more playouts, so over time the higher win % will have the highest # of playouts if there's no constraints. But when we cap playouts at a specific number, once half the playouts are on move #1 then no other move can be chosen, since a better move #2 won't have more playouts by the time the playout cap is hit.

(If confidence intervals are available on the win %s, perhaps choose the move with the highest lower CI bound not just the predicted win % itself, so a move with high win % because we've barely explored it (found right before playout cap kicked in) won't be chosen because confidence is too low.)

gcp · 2017-11-15T14:52:27Z

It must be playouts, selecting on win% makes the program far, far weaker because it will play blunders than win in 1 ply but lose in 2 and have only a few nodes expanded.

The whole point of UCT is that it picks based on the Upper Confidence Bound, so what you're asking for is exactly what is already implemented.

(It's true that in some cases you can take an early exit because nothing can overtake the top move, but the system learns on the probability distribution of the search outcome, so it must still be computed)

OmnipotentEntity · 2017-11-15T21:52:00Z

This is a recently played game with the latest network:

http://eidogo.com/#2YKh7bmNO

Does this still represent a problem with passing? And if so, is it from the engine, or from the network that was trained with the broken engine?

EDIT: Also is scoring correct? W+318.5 seems wrong.

gcp · 2017-11-15T21:59:31Z

I'm not sure what you're asking. The engines don't know anything about passing whatsoever. They simply haven't learned about this yet.

In the original example above, the engine that has a vague idea about counting was winning easily, but suicided its group. The vague count should be enough to see that killing a large amount of your own stones is bad, and capturing a large opponent group is good. And indeed, with the bug fixed, this is what it does, and you can see that the value network likes capturing the opponent (83% winrate) and dislikes killing itself (47% and 51%).

OmnipotentEntity · 2017-11-15T22:02:03Z

Sorry, I should have been more clear. White continued to play even though black passed several times starting around 623, and white was winning decisively.

I was asking if white's lack of passing was a problem.

gcp · 2017-11-15T22:10:17Z

No, as long as it is not killing itself it's fine. As said, they simply don't know about early passing yet, and I'm not sure that in computer vs computer play under Chinese (really Tromp Taylor) rules they ever will. The only time it's right to pass is when you are otherwise forced to suicide yourself. Because they can't count very well yet, you now sometimes have games that they pass out way too early.

I disabled some leftover heuristics for passing just before fixing this bug in commit 7f4d8d2.

So the slightly older version would have reliably passed out that game because it knew it could do an exact count. But now, it has to see that passing is at least as good a move as any.

The search relies on the best node having a high visit count in order to pick it out. This iteracts badly with us not wanting to expand a node that is terminal, and not updating the results in that case (thus not recording a visit). Make the search detect when it has reached a terminal node, count who is the winner on the board, and initialize the SearchResult with that information. This makes sure that winning positions can get a good visit count. Fixes issues leela-zero#38 and leela-zero#54.

This was referenced Nov 15, 2017

node status not updated in play_simulation for two consecutive passes #38

Closed

Question about distributed learning #1

Open

gcp closed this as completed Nov 15, 2017

earthengine mentioned this issue Dec 8, 2017

Output in ver 0.7 is unreadable #281

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move selection in deep endgames is broken #54

Move selection in deep endgames is broken #54

gcp commented Nov 15, 2017 •

edited

Loading

gcp commented Nov 15, 2017 •

edited

Loading

roy7 commented Nov 15, 2017

gcp commented Nov 15, 2017 •

edited

Loading

OmnipotentEntity commented Nov 15, 2017 •

edited

Loading

gcp commented Nov 15, 2017 •

edited

Loading

OmnipotentEntity commented Nov 15, 2017

gcp commented Nov 15, 2017

Move selection in deep endgames is broken #54

Move selection in deep endgames is broken #54

Comments

gcp commented Nov 15, 2017 • edited Loading

gcp commented Nov 15, 2017 • edited Loading

roy7 commented Nov 15, 2017

gcp commented Nov 15, 2017 • edited Loading

OmnipotentEntity commented Nov 15, 2017 • edited Loading

gcp commented Nov 15, 2017 • edited Loading

OmnipotentEntity commented Nov 15, 2017

gcp commented Nov 15, 2017

gcp commented Nov 15, 2017 •

edited

Loading

gcp commented Nov 15, 2017 •

edited

Loading

gcp commented Nov 15, 2017 •

edited

Loading

OmnipotentEntity commented Nov 15, 2017 •

edited

Loading

gcp commented Nov 15, 2017 •

edited

Loading