Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When will resignation be enabled? #135

Closed
isty2e opened this issue Nov 22, 2017 · 34 comments
Closed

When will resignation be enabled? #135

isty2e opened this issue Nov 22, 2017 · 34 comments
Labels

Comments

@isty2e
Copy link

isty2e commented Nov 22, 2017

Apparently current LZ doesn't have any good idea about counting, and that has been the reason of disabling resignation for self-play games. A few questions regarding this:

  1. When will be the point of allowing it to resign? LZ would probably remain weak for a long time, which also suggests that its winrate estimation will remain not that trustworthy. Even if it learns how to catch a stone, its prediction on winrates will not improve that much, I believe.
  2. Will the AGZ-like approach, namely disallowing resignation for 10% of the games, be suitable at the early stage?
  3. By allowing resignation, the individual games will be shorter in general, and there would be less board-filling games. Will it somehow affect the trained network?
@gcp
Copy link
Member

gcp commented Nov 22, 2017

AGZ pulled statistics from the games first to set a suitable resignation threshold. It's possible to obtain the same during validation tests, and hence see if it makes sense to try resigning, but this needs some tooling to be written first.

@gcp gcp added the question label Nov 22, 2017
@isty2e
Copy link
Author

isty2e commented Nov 22, 2017

If it makes sense, we probably should allow resignation as soon as possible. LZ is in its early stage, and refreshing games faster with updated networks would be a virtue. And it will presumably be trained less with nearly fully filled board, which might be desirable too.

@gcp
Copy link
Member

gcp commented Nov 22, 2017

This assumes the bot is estimating reliably who is winning. That's far from obvious. And if it's not, it needs to see the game played out and realize that it's wrong.

@isty2e
Copy link
Author

isty2e commented Nov 22, 2017

Yeah, so that's why setting a proper resignation threshold depending on the network is important, if I understand correctly. Given the data of self-play games, how long will it take to estimate the resignation threshold for a trained NN?

@gcp
Copy link
Member

gcp commented Nov 22, 2017

You can't, see above. It needs tooling during the validation test.

@isty2e
Copy link
Author

isty2e commented Nov 22, 2017

Oh well... I got it.

@Ttl
Copy link
Member

Ttl commented Nov 26, 2017

I wrote some Python code to analyze the resign rate from the completed self-play games played with the newest network. I loaded completed games to leelaz, ran it with one playout and recorded the NN eval value for each position.

min_winner_nn_eval

The above plot is the minimum NN eval value for the winner during the game. If the resign rate is set above the winner's minimum rate then there won't be an incorrect resignation.

Some statistics I calculated:

Dataset size 145 games
Minimum observed evaluation for winner 0.00377
Minimum evaluation in game for winner mean=0.20014442069, std=0.0866228330411
Minimum evaluation in game for loser mean=0.0271061655172, std=0.0644117728729
95.0% confidence resignation rate 0.0576625395852
Incorrect resignations in dataset with the suggested resign rate: 7.59%
Average game length without resignations: 412, after resignations: 299
Average game length reduction: 27.46%
89.66% games resigned
98.0% confidence resignation rate 0.0222428716957
Incorrect resignations in dataset with the suggested resign rate: 4.14%
Average game length without resignations: 412, after resignations: 335
Average game length reduction: 18.59%
78.62% games resigned

Enabling resignations even with a very low threshold would speed up the self-play games by about 20%. Number of incorrect resignations wouldn't be very high with the current network and it seems that the network can currently count with good enough accuracy.

If the clients would upload the minimum winners score during the self-play games it could be used to accurately and fast to set the correct resign rate for the each network on the server.

I uploaded the scripts I wrote to: https://github.com/Ttl/leela-zero/tree/resign_analysis/resign_analysis

EDIT: It seems that currently the NN eval corresponds very closely to the actual probability of winning. It makes sense that it does as that's what it is trained for. Even just hard coding the resignation rate to for example 0.01 would be helpful.

@isty2e
Copy link
Author

isty2e commented Nov 26, 2017

A nice experiment! The number of false positives would decrease with more playouts, hopefully.

By the way, how long did it take for this analysis? Could it be done for each training step without significant increase in computational cost?

@Ttl
Copy link
Member

Ttl commented Nov 26, 2017

It takes few minutes to analyze the already played games. False positives currently seem to be caused by the network not realizing that it needs two eyes to live. Some games I looked at the winner's evaluation was very low because it didn't realize that one of the opponents groups is actually dead. After putting the group actually in atari the evaluation shoots up.

Note that during the analysis there are are no playouts and only the neural network output is used. It should be much better at evaluation when combined with the MCTS.

I'm not sure if resignation threshold can be determined from the validation games since one of the networks could be much stronger. In my opinion it should be determined from the self play games like in the AGZ paper. It wouldn't slow the self play games if the evaluations were recorded, but it needs adding a support for it in client and server codes.

@isty2e
Copy link
Author

isty2e commented Nov 26, 2017

Maybe it can be done without such additional works: We have at least 10%, and practically more, of completed games. That means we'll have at least 50k fully completed games, and I suppose it will be sufficient for such an analysis. Of course we lose the playouts, but it can't be at least dangerous, I believe.

@jkiliani
Copy link

If autogtp automatically recorded the value head output after every move and sent this file to the server along with .sgf and training data (or, if the value head output was part of the training data), it would be a simple matter of statistically evaluating the minimum NN evaluation of the winner to determine a suitable resignation threshold. It would only add little to the data volume collected.

@isty2e
Copy link
Author

isty2e commented Nov 26, 2017

Sure, it would be ideal if there is no problem with the storage.

@gcp
Copy link
Member

gcp commented Nov 26, 2017

Thanks for the analysis. It seems resigning at 3% would be reasonably safe, and if we disable resignation in 25% of the games that should give the net enough training data for the cases where it resigned wrongly. Both parameters seem robust enough to me that I'm wiling to enable this.

That would give almost 20% speedup.

@Dorus
Copy link

Dorus commented Nov 26, 2017

Are we sure about this? In most self play games, the winner is correct even with premature passes, but i made some sneaky examples, for example with the bulky five and it doesn't recognize that yet. Is in the analyses above also checked if the correct winner was selected, even if a stronger bot got a chance to play it out?

You don't even need the bulky five, i noticed that if a group is large enough and has a large eye with a opponent stone in it, even if it's only a 3 point eye, the value network doesn't think it can kill it yet. For now it seem to depend on the other player to reduce the liberties of this eye first before it captures. In most self play games this does happen eventually, but as the value network gets smarter, it might stop filling its own inside liberties and then incorrect resignations might happen.

(i'm just speculating a bit here, 25% of the games might be enough to teach it about these concepts anyway)

Edit: For example on this board:

(;PL[B]AB[aa][ab][ac][ad][ae][af][ag][ah][ai][aj][ak][al][am][an][ao][ap][aq][ar][as][ba][bb][bc][bd][be][bf][bg][bh][bi][bj][bk][bl][bm][bn][bo][bp][bq][br][bs][ca][cb][cd][ce][cf][cg][ci][cj][ck][cl][cm][cn][co][cp][cq][cr][cs][da][db][dc][dd][de][df][dg][dh][di][dj][dk][dm][dn][dr][ds][ea][eb][ec][ed][ee][ef][eg][eh][ei][ej][ek][el][em][en][ep][eq][er][es][fa][fb][fc][fd][fe][fg][fh][fj][fk][fl][fm][fn][fp][fq][fr][fs][ga][gb][gc][gd][ge][gf][gg][gh][gi][gj][gk][gl][gm][gn][gp][gq][gr][gs][ha][hb][hc][hd][hf][hg][hh][hi][hj][hk][hl][hm][hp][hq][hr][hs][ia][ib][ic][id][ig][ih][ii][ij][ik][il][io][ip][iq][ir][is][ja][jb][jc][jd][jh][ji][jj][jk][jl][jn][jo][jp][jq][jr][js][kj][lj][mj][nj][og][oj][pj][qj][rj][sj]AW[do][dp][dq][eo][fo][go][he][hn][ho][ie][if][im][in][je][jf][jg][jm][ka][kb][kc][kd][ke][kf][kg][kh][ki][kk][kl][km][kn][ko][kp][kq][kr][ks][la][lb][lc][ld][le][lf][lg][lh][li][lk][ll][lm][ln][lo][lp][lq][lr][ls][ma][mb][mc][md][me][mf][mg][mh][mi][mk][mm][mo][mr][ms][na][nb][nc][nd][ne][nf][ni][nk][nl][nn][no][np][nr][ns][oa][ob][oc][od][oe][of][oi][ok][om][on][oo][oq][or][os][pa][pb][pc][pd][pe][pf][ph][pi][pk][pl][pn][pp][pr][ps][qa][qb][qc][qd][qe][qf][qg][qh][qi][qk][qm][qn][qo][qq][qr][qs][ra][rb][rc][rd][re][rf][rg][rh][ri][rk][rl][rn][rp][rr][rs][sa][sb][sc][sd][se][sf][sg][sh][si][sk][sl][sm][sn][so][sp][sq][sr][ss]DT[2017-11-26]PB[Leela 295k]PW[Leela 295k]AP[Sabaki:0.31.5]CA[UTF-8]KM[7.5]
;B[mq]
;W[]
)

Both players seem happy to pass, even while white would win and black could kill the bulky 5.

Edit: Ran some more simulations, now they dont pass, but black messed up the kill anyway with B Q13, W P12, B O12, WO13. Anyway winrate doesn't drop below 3% it seems so this might be fine.

@betterworld
Copy link
Contributor

It might be a good idea to embed a comment into the sgf for the 25% non-resignable games whenever the winrate drops below 3%. So if we aren't sure that it's safe, we can later grep for those comments and (hopefully) find out that most of the games were really lost.

@Dorus
Copy link

Dorus commented Nov 26, 2017

Another board: Here winrate does drop below 3% because white has a larger margin on the dead group.

http://eidogo.com/#CH7ruGF2

(;FF[4]CA[UTF-8]AP[GoGui:1.4.9]
KM[7.5]DT[2017-11-26]
AB[as][ar][aq][ap][ao][an][am][al][ak][aj][ai][ah][ag][af][ae][ad][ac][ab][aa][bs][br][bq][bp][bo][bn][bm][bl][bk][bj][bi][bh][bg][bf][be][bd][bc][bb][ba][cs][cr][cq][cp][co][cn][cm][cl][ck][cj][ci][cg][cf][ce][cd][cb][ca][ds][dr][dn][dm][dk][dj][di][dh][dg][df][de][dd][dc][db][da][es][er][eq][ep][en][em][el][ek][ej][ei][eh][eg][ef][ee][ed][ec][eb][ea][fs][fr][fq][fp][fn][fm][fl][fk][fj][fh][fg][fe][fd][fc][fb][fa][gs][gr][gq][gp][gn][gm][gl][gk][gj][gi][gh][gg][gf][hs][hr][hq][hp][hm][hl][hk][hj][hi][hh][hg][hf][is][ir][iq][ip][io][il][ik][ij][ii][ih][ig][js][jr][jq][jp][jo][jn][jl][jk][jj][ji][jh][kj][lj][mj][nj][oj][og][pj][qj][rj][sj]
AW[dq][dp][do][eo][fo][go][ge][gd][gc][gb][ga][ho][hn][he][hd][hc][hb][ha][in][im][if][ie][id][ic][ib][ia][jm][jg][jf][je][jd][jc][jb][ja][ks][kr][kq][kp][ko][kn][km][kl][kk][ki][kh][kg][kf][ke][kd][kc][kb][ka][ls][lr][lq][lp][lo][ln][lm][ll][lk][li][lh][lg][lf][le][ld][lc][lb][la][ms][mr][mo][mm][mk][mi][mh][mg][mf][me][md][mc][mb][ma][ns][nr][np][no][nn][nl][nk][ni][nf][ne][nd][nc][nb][na][os][or][oq][oo][on][om][ok][oi][of][oe][od][oc][ob][oa][ps][pr][pp][pn][pl][pk][pi][ph][pf][pe][pd][pc][pb][pa][qs][qr][qq][qo][qn][qm][qk][qi][qh][qg][qf][qe][qd][qc][qb][qa][rs][rr][rp][rn][rl][rk][ri][rh][rg][rf][re][rd][rc][rb][ra][ss][sr][sq][sp][so][sn][sm][sl][sk][si][sh][sg][sf][se][sd][sc][sb][sa]
PL[B])

@evanroberts85
Copy link

If shapes like the dead 5 are outside the range of the 1000 simulations, then the neural network is probably not ready to absorb that level of information anyway, so I do not think any harm is done by an early resignation. Leelaz is still learning the basics and does not understand liberties yet.

@evanroberts85
Copy link

http://eidogo.com/#CH7ruGF2 the white group is only cut of by a thin black line of stones, I wonder if Leela might think the top group is connected to the bottom, due to the density of white stones on the right side of the board.

@Dorus
Copy link

Dorus commented Nov 26, 2017

I tested it with a 3 stone cut and it gave the same result.

The heatmap is simply unable to detect the stones are dead. Even if i reduce it to just a 1 point eye. It seems only the MCTS search is able to declare the stones dead by taking them. It seems it just assumes large groups are alive, if i try the same with a small group, it does consider it dead. Even to a point where it thinks it's acceptable to pass without cleaning up said stones.

@evanroberts85
Copy link

Leela must be taking the proverb “Big Dragons Never Die” literally 😎

@isty2e
Copy link
Author

isty2e commented Nov 27, 2017

Do such counterexamples matter if the majority of the games are predicted correctly? We don't have to care about such specific cases nor force LZ to do something specific which a more skilled player would've done, I believe. Such skill issues can be ultimately handled by the RL process.

@Dorus
Copy link

Dorus commented Nov 27, 2017

We're not talking about inputing specific cases, rather, how likely adding resignation will make it harder for the network to learn certain important concepts. Making games 20% shorter without losing quality is great. Making them 20% shorter by tossing 75% important date is terrible.

That's why I asks for an analysis of the current game with a stronger bot to see how often it would surrender correctly. If we just depend on the actual game result we will never spot bad results from early passes and surrenders.

@isty2e
Copy link
Author

isty2e commented Nov 27, 2017

Again, I don't see any reason to do that. The "certain important concepts" you are talking about is something in human perspective, whereas the NN can learn other things you might not consider essential but still gain some skills. And I believe that we are to predict the results by the current nets, not by any stronger bots. That's how the reinforcement learning works, I believe.

@gcp
Copy link
Member

gcp commented Nov 27, 2017

The only concern is when the resigning would change the results between the current players, not what is theoretically optimal. If the bot resigns a game that a 9d bot could win, that doesn't matter as long as it would have lost against itself anyway, so nothing was changed by the resignation.

@killerducky
Copy link
Contributor

killerducky commented Nov 28, 2017

The current code also requires the first child to have been visited more than 100 times. With only 1000 playouts a majority of the positions don't reach this limit. Maybe the 100 visits should be lowered because that was probably tuned for old-style playouts.

Also Ttl's analysis was done on just a single pass of the NN, and the current resign code is based on the results of the 1000 playout UCT search. I have modified my local leelaz to save the NN's winrate, the UCT winrate, and the number of visits to the first child. By tomorrow I will have enough results to do more analysis.

@Ttl tomorrow I can send you the data I collect if you want to analyze it. Also I'll work on getting my code changes into a branch on github.

ETA: I just noticed Ttl included a link to the source code used, I'll look at it tomorrow.

@tsuchiLo
Copy link

the shortest game.

Best network hash: 92c658d7325fe38f0c8adbbb1444ed17afd891b9f208003c272547a7bcb87909
Required client version: 2 (OK)
Already downloaded network.
Engine has started.
Infinite thinking time set.
1 (P5) 2 (pass) 3 (pass) Game has ended.
Score: B+353.5
Winner: black
Writing 3800028acf3d426ab112d832f23c41da.sgf
Dumping 3800028acf3d426ab112d832f23c41da.txt
Stopping engine.

@isty2e
Copy link
Author

isty2e commented Nov 28, 2017

That is not the result of resignation, rather the consequence of double pass. I don't see why you are showing the short game here.

@gcp
Copy link
Member

gcp commented Nov 28, 2017

Maybe the 100 visits should be lowered because that was probably tuned for old-style playouts.

Correct.

Looking forward to seeing the 1000 playout data.

Indeed it would have been great to put the winrate in the training data, but I didn't think of this ahead of time and I don't want to change the data formats during the run.

@killerducky
Copy link
Contributor

killerducky commented Nov 29, 2017

ETA: There is a problem when I changed to using root.get_first_child()->get_eval(). See my new post below.

I have some results but first a caveat: I used root.get_eval() to collect winrates, but I noticed I should use root.get_first_child()->get_eval() to match what the resign code does. I'll rerun this tonight.

So far using this close but not correct method, it looks like the threshold should be set higher when you use 1000 playouts. Presumably the net winrates are noisier requiring a lower threshold then the more stable uct winrates.

"uct resigns" is based on root.get_eval(). "net resigns" is based on result.second where result = Network::get_scored_moves(&state, Network::Ensemble::DIRECT, 0).

Dataset size 183 games
Resign rate: 0.50
Incorrect uct resigns = 182/183 (99.45)
Incorrect net resigns = 183/183 (100.00)

Resign rate: 0.20
Incorrect uct resigns = 18/183 (9.84)
Incorrect net resigns = 29/183 (15.85)

Resign rate: 0.15
Incorrect uct resigns = 11/183 (6.01)
Incorrect net resigns = 23/183 (12.57)

Resign rate: 0.10
Incorrect uct resigns = 6/183 (3.28)
Incorrect net resigns = 18/183 (9.84)

Resign rate: 0.05
Incorrect uct resigns = 2/183 (1.09)
Incorrect net resigns = 9/183 (4.92)

ETA: I pushed my code to https://github.com/killerducky/leela-zero

@sethtroisi
Copy link
Member

sethtroisi commented Nov 29, 2017

@killerducky If you need extra games, now or in the future, I'm willing to run most arbitrary code (I can generate ~20 games / hour) and mail you results.

And it goes without saying, I'm excited by your analysis

@killerducky
Copy link
Contributor

killerducky commented Nov 29, 2017

I ran just a few games with root.get_first_child()->get_eval(), something is wrong, still analyzing...
Ok false alarm things seem to be working.

@sethtroisi you can pull my changes from https://github.com/killerducky/leela-zero/tree/aolsen
Make sure to run with "autogtp -k savedir" so it saves the extra data files.

This code isn't really in any shape to be pulled into master right now, but I could cleanup the hashname.txt.verbose.0 file I create (remove the lines that are redundant with the standard training datafile, remove extra labels I added to reduce size). And add some options to allow these to be created or not according to the user.

@sethtroisi
Copy link
Member

@killerducky Here's 60 games, hope it helps out: https://drive.google.com/open?id=1pFVt5pDce7sHes6kwS3enMz3ps4pnRfi

@killerducky
Copy link
Contributor

killerducky commented Nov 29, 2017

New results with root.get_first_child()->get_eval(), and I also added code to calculate average game length. Note the 0.50 resign rate is in there as a sanity check, the games average 2 moves.

I didn't analyze number visits for best child, although the data is there. It might be simplest to change it to number visits for the root node > 500. If we keep it as best child visits, it does go over 100 often, but lowering it would catch more. Just glancing at it seems like 50 or 30 would catch most.

Also I just checked the paper, probably none of these details matter but:

AlphaGo Zero resigns if its root value and best child value are lower than a threshold value vresign.

Resign rate: 0.50
Incorrect uct resigns = 48/182 (26.37%)
Average game length = 409. Average game length with resigns = 2 (99.42% reduction)

Resign rate: 0.20
Incorrect uct resigns = 11/182 (6.04%)
Average game length = 409. Average game length with resigns = 290 (28.96% reduction)

Resign rate: 0.15
Incorrect uct resigns = 8/182 (4.40%)
Average game length = 409. Average game length with resigns = 307 (24.90% reduction)

Resign rate: 0.10
Incorrect uct resigns = 5/182 (2.75%)
Average game length = 409. Average game length with resigns = 326 (20.36% reduction)

Resign rate: 0.05
Incorrect uct resigns = 2/182 (1.10%)
Average game length = 409. Average game length with resigns = 348 (14.81% reduction)

@sethtroisi This includes your 60 games, thanks.

@isty2e
Copy link
Author

isty2e commented Dec 4, 2017

I wonder when it will be adopted in the live version... Maybe it will be handled on the server, not client, like this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants