-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When will resignation be enabled? #135
Comments
AGZ pulled statistics from the games first to set a suitable resignation threshold. It's possible to obtain the same during validation tests, and hence see if it makes sense to try resigning, but this needs some tooling to be written first. |
If it makes sense, we probably should allow resignation as soon as possible. LZ is in its early stage, and refreshing games faster with updated networks would be a virtue. And it will presumably be trained less with nearly fully filled board, which might be desirable too. |
This assumes the bot is estimating reliably who is winning. That's far from obvious. And if it's not, it needs to see the game played out and realize that it's wrong. |
Yeah, so that's why setting a proper resignation threshold depending on the network is important, if I understand correctly. Given the data of self-play games, how long will it take to estimate the resignation threshold for a trained NN? |
You can't, see above. It needs tooling during the validation test. |
Oh well... I got it. |
I wrote some Python code to analyze the resign rate from the completed self-play games played with the newest network. I loaded completed games to leelaz, ran it with one playout and recorded the NN eval value for each position. The above plot is the minimum NN eval value for the winner during the game. If the resign rate is set above the winner's minimum rate then there won't be an incorrect resignation. Some statistics I calculated:
Enabling resignations even with a very low threshold would speed up the self-play games by about 20%. Number of incorrect resignations wouldn't be very high with the current network and it seems that the network can currently count with good enough accuracy. If the clients would upload the minimum winners score during the self-play games it could be used to accurately and fast to set the correct resign rate for the each network on the server. I uploaded the scripts I wrote to: https://github.com/Ttl/leela-zero/tree/resign_analysis/resign_analysis EDIT: It seems that currently the NN eval corresponds very closely to the actual probability of winning. It makes sense that it does as that's what it is trained for. Even just hard coding the resignation rate to for example 0.01 would be helpful. |
A nice experiment! The number of false positives would decrease with more playouts, hopefully. By the way, how long did it take for this analysis? Could it be done for each training step without significant increase in computational cost? |
It takes few minutes to analyze the already played games. False positives currently seem to be caused by the network not realizing that it needs two eyes to live. Some games I looked at the winner's evaluation was very low because it didn't realize that one of the opponents groups is actually dead. After putting the group actually in atari the evaluation shoots up. Note that during the analysis there are are no playouts and only the neural network output is used. It should be much better at evaluation when combined with the MCTS. I'm not sure if resignation threshold can be determined from the validation games since one of the networks could be much stronger. In my opinion it should be determined from the self play games like in the AGZ paper. It wouldn't slow the self play games if the evaluations were recorded, but it needs adding a support for it in client and server codes. |
Maybe it can be done without such additional works: We have at least 10%, and practically more, of completed games. That means we'll have at least 50k fully completed games, and I suppose it will be sufficient for such an analysis. Of course we lose the playouts, but it can't be at least dangerous, I believe. |
If autogtp automatically recorded the value head output after every move and sent this file to the server along with .sgf and training data (or, if the value head output was part of the training data), it would be a simple matter of statistically evaluating the minimum NN evaluation of the winner to determine a suitable resignation threshold. It would only add little to the data volume collected. |
Sure, it would be ideal if there is no problem with the storage. |
Thanks for the analysis. It seems resigning at 3% would be reasonably safe, and if we disable resignation in 25% of the games that should give the net enough training data for the cases where it resigned wrongly. Both parameters seem robust enough to me that I'm wiling to enable this. That would give almost 20% speedup. |
Are we sure about this? In most self play games, the winner is correct even with premature passes, but i made some sneaky examples, for example with the bulky five and it doesn't recognize that yet. Is in the analyses above also checked if the correct winner was selected, even if a stronger bot got a chance to play it out? You don't even need the bulky five, i noticed that if a group is large enough and has a large eye with a opponent stone in it, even if it's only a 3 point eye, the value network doesn't think it can kill it yet. For now it seem to depend on the other player to reduce the liberties of this eye first before it captures. In most self play games this does happen eventually, but as the value network gets smarter, it might stop filling its own inside liberties and then incorrect resignations might happen. (i'm just speculating a bit here, 25% of the games might be enough to teach it about these concepts anyway) Edit: For example on this board: (;PL[B]AB[aa][ab][ac][ad][ae][af][ag][ah][ai][aj][ak][al][am][an][ao][ap][aq][ar][as][ba][bb][bc][bd][be][bf][bg][bh][bi][bj][bk][bl][bm][bn][bo][bp][bq][br][bs][ca][cb][cd][ce][cf][cg][ci][cj][ck][cl][cm][cn][co][cp][cq][cr][cs][da][db][dc][dd][de][df][dg][dh][di][dj][dk][dm][dn][dr][ds][ea][eb][ec][ed][ee][ef][eg][eh][ei][ej][ek][el][em][en][ep][eq][er][es][fa][fb][fc][fd][fe][fg][fh][fj][fk][fl][fm][fn][fp][fq][fr][fs][ga][gb][gc][gd][ge][gf][gg][gh][gi][gj][gk][gl][gm][gn][gp][gq][gr][gs][ha][hb][hc][hd][hf][hg][hh][hi][hj][hk][hl][hm][hp][hq][hr][hs][ia][ib][ic][id][ig][ih][ii][ij][ik][il][io][ip][iq][ir][is][ja][jb][jc][jd][jh][ji][jj][jk][jl][jn][jo][jp][jq][jr][js][kj][lj][mj][nj][og][oj][pj][qj][rj][sj]AW[do][dp][dq][eo][fo][go][he][hn][ho][ie][if][im][in][je][jf][jg][jm][ka][kb][kc][kd][ke][kf][kg][kh][ki][kk][kl][km][kn][ko][kp][kq][kr][ks][la][lb][lc][ld][le][lf][lg][lh][li][lk][ll][lm][ln][lo][lp][lq][lr][ls][ma][mb][mc][md][me][mf][mg][mh][mi][mk][mm][mo][mr][ms][na][nb][nc][nd][ne][nf][ni][nk][nl][nn][no][np][nr][ns][oa][ob][oc][od][oe][of][oi][ok][om][on][oo][oq][or][os][pa][pb][pc][pd][pe][pf][ph][pi][pk][pl][pn][pp][pr][ps][qa][qb][qc][qd][qe][qf][qg][qh][qi][qk][qm][qn][qo][qq][qr][qs][ra][rb][rc][rd][re][rf][rg][rh][ri][rk][rl][rn][rp][rr][rs][sa][sb][sc][sd][se][sf][sg][sh][si][sk][sl][sm][sn][so][sp][sq][sr][ss]DT[2017-11-26]PB[Leela 295k]PW[Leela 295k]AP[Sabaki:0.31.5]CA[UTF-8]KM[7.5] Both players seem happy to pass, even while white would win and black could kill the bulky 5. Edit: Ran some more simulations, now they dont pass, but black messed up the kill anyway with B Q13, W P12, B O12, WO13. Anyway winrate doesn't drop below 3% it seems so this might be fine. |
It might be a good idea to embed a comment into the sgf for the 25% non-resignable games whenever the winrate drops below 3%. So if we aren't sure that it's safe, we can later grep for those comments and (hopefully) find out that most of the games were really lost. |
Another board: Here winrate does drop below 3% because white has a larger margin on the dead group. (;FF[4]CA[UTF-8]AP[GoGui:1.4.9] |
If shapes like the dead 5 are outside the range of the 1000 simulations, then the neural network is probably not ready to absorb that level of information anyway, so I do not think any harm is done by an early resignation. Leelaz is still learning the basics and does not understand liberties yet. |
http://eidogo.com/#CH7ruGF2 the white group is only cut of by a thin black line of stones, I wonder if Leela might think the top group is connected to the bottom, due to the density of white stones on the right side of the board. |
I tested it with a 3 stone cut and it gave the same result. The heatmap is simply unable to detect the stones are dead. Even if i reduce it to just a 1 point eye. It seems only the MCTS search is able to declare the stones dead by taking them. It seems it just assumes large groups are alive, if i try the same with a small group, it does consider it dead. Even to a point where it thinks it's acceptable to pass without cleaning up said stones. |
Leela must be taking the proverb “Big Dragons Never Die” literally 😎 |
Do such counterexamples matter if the majority of the games are predicted correctly? We don't have to care about such specific cases nor force LZ to do something specific which a more skilled player would've done, I believe. Such skill issues can be ultimately handled by the RL process. |
We're not talking about inputing specific cases, rather, how likely adding resignation will make it harder for the network to learn certain important concepts. Making games 20% shorter without losing quality is great. Making them 20% shorter by tossing 75% important date is terrible. That's why I asks for an analysis of the current game with a stronger bot to see how often it would surrender correctly. If we just depend on the actual game result we will never spot bad results from early passes and surrenders. |
Again, I don't see any reason to do that. The "certain important concepts" you are talking about is something in human perspective, whereas the NN can learn other things you might not consider essential but still gain some skills. And I believe that we are to predict the results by the current nets, not by any stronger bots. That's how the reinforcement learning works, I believe. |
The only concern is when the resigning would change the results between the current players, not what is theoretically optimal. If the bot resigns a game that a 9d bot could win, that doesn't matter as long as it would have lost against itself anyway, so nothing was changed by the resignation. |
The current code also requires the first child to have been visited more than 100 times. With only 1000 playouts a majority of the positions don't reach this limit. Maybe the 100 visits should be lowered because that was probably tuned for old-style playouts. Also Ttl's analysis was done on just a single pass of the NN, and the current resign code is based on the results of the 1000 playout UCT search. I have modified my local leelaz to save the NN's winrate, the UCT winrate, and the number of visits to the first child. By tomorrow I will have enough results to do more analysis. @Ttl tomorrow I can send you the data I collect if you want to analyze it. Also I'll work on getting my code changes into a branch on github. ETA: I just noticed Ttl included a link to the source code used, I'll look at it tomorrow. |
the shortest game. Best network hash: 92c658d7325fe38f0c8adbbb1444ed17afd891b9f208003c272547a7bcb87909 |
That is not the result of resignation, rather the consequence of double pass. I don't see why you are showing the short game here. |
Correct. Looking forward to seeing the 1000 playout data. Indeed it would have been great to put the winrate in the training data, but I didn't think of this ahead of time and I don't want to change the data formats during the run. |
ETA: There is a problem when I changed to using root.get_first_child()->get_eval(). See my new post below. I have some results but first a caveat: I used root.get_eval() to collect winrates, but I noticed I should use root.get_first_child()->get_eval() to match what the resign code does. I'll rerun this tonight. So far using this close but not correct method, it looks like the threshold should be set higher when you use 1000 playouts. Presumably the net winrates are noisier requiring a lower threshold then the more stable uct winrates. "uct resigns" is based on root.get_eval(). "net resigns" is based on result.second where result = Network::get_scored_moves(&state, Network::Ensemble::DIRECT, 0).
ETA: I pushed my code to https://github.com/killerducky/leela-zero |
@killerducky If you need extra games, now or in the future, I'm willing to run most arbitrary code (I can generate ~20 games / hour) and mail you results. And it goes without saying, I'm excited by your analysis |
I ran just a few games with root.get_first_child()->get_eval(), something is wrong, still analyzing... @sethtroisi you can pull my changes from https://github.com/killerducky/leela-zero/tree/aolsen This code isn't really in any shape to be pulled into master right now, but I could cleanup the hashname.txt.verbose.0 file I create (remove the lines that are redundant with the standard training datafile, remove extra labels I added to reduce size). And add some options to allow these to be created or not according to the user. |
@killerducky Here's 60 games, hope it helps out: https://drive.google.com/open?id=1pFVt5pDce7sHes6kwS3enMz3ps4pnRfi |
New results with root.get_first_child()->get_eval(), and I also added code to calculate average game length. Note the 0.50 resign rate is in there as a sanity check, the games average 2 moves. I didn't analyze number visits for best child, although the data is there. It might be simplest to change it to number visits for the root node > 500. If we keep it as best child visits, it does go over 100 often, but lowering it would catch more. Just glancing at it seems like 50 or 30 would catch most. Also I just checked the paper, probably none of these details matter but:
@sethtroisi This includes your 60 games, thanks. |
I wonder when it will be adopted in the live version... Maybe it will be handled on the server, not client, like this? |
Apparently current LZ doesn't have any good idea about counting, and that has been the reason of disabling resignation for self-play games. A few questions regarding this:
The text was updated successfully, but these errors were encountered: