New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
End of first training run - next steps #2560
Comments
Well, let me start by saying CONGRATULATIONS. Whatever the next steps may be, this is as good a time as any to acknowledge the incredible achievement that this "first training run" represents. |
what a pity! when i read the post, i feel the lz would leave us. |
I also wanted to take the opportunity to thank you for all these efforts, basically bringing AlphaGo Zero to our homes and out into the world. I hope that journey continues, most probably both with SAI focussing on a zero bot for 19x19 with the very nice winrate sigmoid idea, and on KataGo focussing on playing strength on different net sizes, rule sets even if not everything is completely zero. I would be happy if there would be a LZ 2.0 run and would be glad to join the discussion on what stuff to test etc -- should that discussion happen here on GitHub or in the Discord? Concerning that last paragraph
I wanted to add that Lc0 tried simply increasing the playout count at the end of one of their runs, and it definitely caused a regression as with increasing the playout count, the visit statistics for moves quite massively changed which apparently caused the net to focus on "learning" the new policy distribution only, and despite letting it run for a while it didn't recover. I therefore wouldn't recommend a change in playout count at such a low LR. |
Since Sai and Katago are already freely available, it would be pointless to mimic them. I think that at some point, we should switch to an other project. A collaborative training for Katago would be great! |
Yeah, congrats and thx @gcp
Most of these ideas require limited coding, or even no coding (eg -m 999, 50% gating, 1% no-resign) |
Congratulations for the wonderful achievements with Leela during the last two years, who would have thought that the project would have risen to such heights. |
Thank you @gcp I can positively say that your project changed my research interests and my scientific life! Leela Zero is great! |
Model 255 will be the last network for Leela Zero (first training run)? Wow, 2^8 - 1, well done! |
Firstly, thank you. For almost literally everything :) I've had a great time working with you all on this project and have definitely learnt a few things along the way :) As for ideas:
hmm I guess those are the main things from me. It would be great to keep alive the community that has gathered, whether that is with this codebase or another. I love the idea that there's open source resource for everyone to give ideas to and help with where they can :) |
Leela Zero is amazing work, and it's remarkable that it made it so far despite being the first serious open-sourced attempt and mostly groping in the dark as to early bugs and for ambiguous hyperparameters and implementation details not mentioned in papers. Even retired, I expect Leela Zero will also probably stand for quite a while as the top (taking into account both strength and ease of use for users) open source essentially-"zero" bot. Neither SAI nor KataGo are quite zero, with both having liberty/chain-related features, and learning based on the board score giving additional regularization and learning signal beyond just a binary win/loss (although in different ways in the two bots) - and SAI also considering to add some change specifically targeted towards improving ladder reading (or so I hear from discord chat). If there will be any followup project, I would be excited if it was willing to try out any of the improvements in KataGo to get independent verification of their usefulness. Precisely to have the chance to contribute back in that regard, KataGo has stayed very close to "zero" in spirit, and maybe I haven't advertised this as well as I could have, but most of the big improvements found are general (not Go-specific!) methods (or at least, they should be, assuming they work outside of KataGo): The big such methods that are compatible with "zero":
A second "true zero" run but leveraging some of these things (and/or other "zero-compatible" methods from SAI and MiniGo regarding temperature, game branching, and other hyperparameters) would be absolutely fascinating. Anyways, kudos to @gcp again for piloting the defining run of the first few years of post-AlphaZero computer Go. |
I think Globis-AQZ will be open source in future. In the 11th UEC cup, she has 90% winrate over the previous version (2019 CHINA SECURITIES Cup). |
Thank you @gcp for your time contributed for this project. And now, I'm wondering what the end of Leela Zero project is. Anyway, Thank you @gcp again. I will remember you as the author of first public value network based Go program (Leela 0.9) and first successful open source reimplementation of AlphaGo Zero (Leela Zero). |
@kityanhem I think they are going to open a weight playing in Japanese rule, which was successful in UEC Cup. |
Note though how LZ has long been swapping Elf data in and out (between normal and bubblesld nets), which changed distribution on a weekly basis... |
Another thing we might explore is subsequent work on Leela or follow up community projects is time management. For chess programs I know that this was a very productive area of research. This could be explored using the "final" LZ network playing with different time management strategies. One simple approach would be to take a snapshot of the distribution of visits every 5 seconds for a given position, and predict the amount of the distribution of visits will change over the next 5 seconds. This actually sounds like a very doable side project. Does anyone have a sense of how hard it would be to do a dump of the top level position visit counts every N playouts? |
We have AlphaZero Shogi project(AobaZero). In AobaZero, there was no improvment from 2600k to 3200k games. So I think even if there is no progress from latest 500k games, |
Also not to be forgotten is the huge improvement on 15b after selfplay stopped, but higher quality 40b games got fed. So current 40b likely still have a lot of room for further progress. |
I am using this for ataxx (Dankx). I find it useful if only because it is an easy way to get additional randomness and exploration. I have not tried weighting based on the counts.
This did not help at all in chess (Stoofvlees).
SE nets were worse for chess (I know lc0 concluded the opposite!) but a large gain in ataxx. Not sure what to think of this. Because of lc0, I tried this multiple times but all failed. I use global pooling rather than 1x1 down-convolution networks for all my new programs. I didn't even try alternatives, global pooling seems to achieve the same outcome in a cleaner way.
I haven't tried these. BatchNorm for sure has some side effects. The original Leela used ELU units instead of BatchNorm, but this was before residual networks. Remi has revealed that instead of using temperature, his Crazy Zero bots build an opening book of suspected equal positions to get the randomization. Stoofvlees does not use temperature either. Another thing I found to help a lot...is to get rid of regularization entirely. Once you have enough games, overfitting isn't that much of a problem any more! |
There is a really beautiful solution to this "diversity by building its own book" kind of problem which is directly related to #2230 , which is using a policy softmax temperature slightly above 1.0 for training games (I found the range [1.15, 1.4] to work in theory and recommend using 1.2). The effect is basically a regularization of the top move policies so that all moves which have a sufficiently small Q difference to the top move converge to a non-zero policy, while it doesn't affect moves with significantly worse eval. I'd be happy to share more details with you if you are interested. |
"Remi has revealed that instead of using temperature, his Crazy Zero bots build an opening book of suspected equal positions to get the randomization": I did a try to manually create a zero opening book here (#2104) with LZ
This 2 or 3-moves deep opening book could be used for selfplay but also for match games. Not clear if the usual noise would still be useful for selfplay (I guess it would help somewhat) PS: @Naphthalin's approach just above is another interesting alternative, working all along the game and not only for the opening phase. Not clear for me if it can totally replace the opening-book approach, or if it is an additionnal tool to provide more diversity after the first few opening moves. |
@Friday9i It is supposed to totally replace the opening book, as you still get enough diversity even with temp values around 0.8 (so much lower than 1.0, significantly reducing the number of actual blunders). |
First, I would like to thank everybody (especially @gcp) who made such a valuable contribution. To me, leela-zero was a chance to learn how to write code and see how things evolve. It was exciting to see the same code starting from making random moves and then end up beating even the best human players in less than a year. Meanwhile I started with a single GTX1080... and now I have four GPUs running all sorts of different stuff. :) That said, I don't think I will spend much time working on leela-zero anymore. I do expect that there Meanwhile I started some of my own projects - I am writing my own 'zero' implementation of Janggi - a Korean 'chess-like' board game - from scratch. This was more of to sharpen my programming skills rather than learn how to create the world-best AI. Again, Thanks everybody, and bon voyage! |
Leela zero is the first open source 'zero-type' go engine and its a great memory for me that I observed its entire evolvation history since random play. Nowasdays with a laptop I can beat any pro-go player. Such amazing experience! |
@gcp - Thanks for the insights on the usefulness of some of the methods I listed for other games! |
Is the Globis-AQZ selfplay publicly available? Maybe you have a link for it? |
Alpha zero uses a sliding window of the last 500K games. Each training cycle uses 2048 x 1000 states. Some of the states used in a training cycle should be good data and some of them should be bad. It depends on luck on whether a cycle hits sufficient good data to create a promotion. The current state is that the window of games is purely from the current best model. Perhaps it is worth sticking to the 500K window and let it run for a few more months to see whether it can hit a good set of game states. Alpha zero stopped at 21 million self play games. We are now at 17 million self play games. |
@lcwd2 where did you get those numbers? The Science paper reports 140 million games (but without 8-fold symmetries data augmentation) and 4096 minibatch size, while the pseudocode reports also a 1 billion games window for training. This seems to be their second run, as the arxiv paper instead had one of your figures (21 million games), but not all of them, so maybe you got some source that I missed? |
I looked at this but I don't really buy it. The quoted advantages are being able to deal with small batch sizes due to RAM restrictions, dealing with interdependent samples, and avoidance of training vs inference batch differences. The first can be handled with gradient accumulation (which our code already supports! BATCH_SIZE vs RAM_BATCH_SIZE), the second is something we try very hard to avoid, and the third isn't really a factor for large batch sizes. Testing gave me slightly worse results at a performance penalty during training (due to no cuDNN support). I wonder if lc0 got stuck with this because they didn't backport the gradient accumulation changes. OTOH, if there'd be no performance penalty, it might be a no-brainer as it probably allows you to be more careless during training. I will try Fixup next, it looks interesting enough.
I use this for Ataxx, whereas Stoofvlees uses gradual UCT policy flattening (described in that issue IIRC). I'm curios how you (theoretically) arrived at that range. But note that the approaches we're discussing don't use temperature at all. As long as you can still explore, this significantly accelerates the learning on the value head for obvious reasons. Flattening the policy by itself doesn't cause any exploration at all. |
Well yes, if x=0 is good and x=1 is bad, then it's obvious that x=0.5 is less bad than bad, but it doesn't tell me why that is preferable over x=0. So this argument doesn't help us further?
I think this argument doesn't actually require temperature, correct? Only a feedback loop on policy percentages due to search effort redistribution after training a new policy?
I can't tell this from the graphs. (I'm interested because I'm already doing this and empirically 2.0 worked better than 1.5 for me 😄) Is it because the bad moves might still turn out to be good because in practice the eval head can still be shifting around? |
The third can actually be a factor with large batch sizes. I recall the reason that LC0 switched was due to finding certain kinds of promotion moves being possible or not actually caused a strong enough difference in activation strength of certain channels in their net that there would be a nontrivial difference in the batch statistics depending on if they were possible or not - and were rare enough that you might have a batch have zero instances of the move. So you got different inference time vs training time scaling, empirically causing the policy at inference time putting almost zero weight on a move that it should have put weight on - a strange blind spot. (I could of course be misremembering or mischaracterizing why they switched). So I think generally the worry is what regarding what batch norm does regarding things like this - robustness in "rare" situations, blind spots, etc. Which unfortunately makes it a little harder to test than if you're just worried about "main-line" average performance. It might be that it's much less of a worry with Go since even the fact that the board is 19x19 will mean a wider surface over which average channel activations. |
@Vandertic my source of information is the same arxiv paper. The leela zero elo graph doesn't yet have the sharp turning point as in the different versions of alpha go. A large portion of the alpha go training steps are done after the turning point. It would be interesting to have the data on how lz will behave in the remaining 4 million games before closing this project. |
This is correct, it is a policy feedback thing only. The argument rather goes in the reverse direction; while currently temperature doesn't produce enough diversity, after that change suddenly even conservative temperature choices produce diverse training games as the policies are flattened in an "informed manner" through the RL loop.
For the last graph with "bad moves" I mean that cyan line, which is slowly but significantly rising with PST where I assume that they are evaluated as bad moves correctly. The PST-noise interaction obviously can't be seen there, as all graphs are with the standard 25% noise, but when reducing it, the cyan line would be significantly lower, while the others would basically stay the same. These policies shouldn't be too high, as otherwise search will waste some nodes there, and temperature might choose them too often. I think the whole point of assigning a significant number of training games to seemingly suboptimal moves/lines as well is that they could turn out to be better/worse in the long run, and the magnitude of these fluctuations will be game specific. Just from anecdotal evidence for LZ these fluctuations were <1.5%, for Lc0 they sometimes are up to 8% (which means 4% for LZ due to the [-1,1] vs [0,1] ranges). To cover these fluctuations, Go therefore might get away with the PST parameter slightly smaller, and I can easily imagine that in your case fluctuations might be even higher. Did you try a net specific CLOP for applying a PST value after training? For Lc0, they use 2.2 for a while now (with 1.0 in training), but recently had a first test run with 1.2 in training, where CLOP suggested using 1.5 in match play. If (as LZ does) PST 1.0 is used in match play, it is likely that a higher training PST will push the optimal match PST closer to 1.0, which maybe might explain your empirical result already. However, due to the saturation I see and the combination with noise, it might actually be better to increase cpuct at root only in your case, as this effectively scales down the Q differences. Easiest thing would be if you could give me a list of Q values in your case and your choices on cpuct and N, and I could run my script to produce the expected equilibrium policies. |
Sounds a bit weird. If the effect is rare enough that some batches have 0 or 1, and the batches are, say, 2048 sized, then obviously the effect they can have is scaled down by the same 1/2048. I checked and lc0 did end up adding gradient accumulation, so they shouldn't be limited to small batches. I couldn't find what batch size they use right now though. I ended up digging more, and found: LeelaChessZero/lc0#784 (comment) Note that this points out: "Currently the training code calculates the batch norm statistics using ghost batch norm of 64 positions." So, this explains why they ended up needing batch renorm, but now I wonder what this ghost batch norm is about. |
Okay, it's to match AZ0 better: LeelaChessZero/lczero-training#28 And because there's one paper claiming that generalizes better. I am sure I can find multiple others claiming the exact opposite. But it does make sense now to me why they ended up with it. |
Note that with gradient accumulation batch norm statistics are still calculated from GPU batch size. The plane was normally zeroed and only had activations in rare cases. The issue was that batch normalization uses batch statistics in training and moving averages during inference. Let's assume that activation would be 1 when activated and 0 otherwise and the position occurs in one in thousand training samples. If the GPU batch size is 64, with one positive example in the batch the calculated variance would be 1/64 and if there would be no positive examples in the batch the variance would be 0. But the variance with very large large batch size should be 1/1000. There is significant discrepancy in the variance that is used in the normalization during the training when there is a positive example in batch. During inference moving average variance of approximately 1/1000 would be used for normalization, which doesn't match the variance that was used in the training. Renorm uses moving variances for normalization with some clipping during the training and the variance used for normalization would be much closer to the real 1/1000 variance. |
Argh, that's a good point. It's clear why it's useful if you have small batches. They key thing to realize is that simply using moving statistics during training as well (what your first idea would be...) doesn't work due to some surprising maths, and batchrenorm is basically the workaround for that. It didn't make much sense to me why it would be a problem with large batches. And I think we've now confirmed that was just not the case for lc0 - they were/are using 64 sized batches as far as batchnorm is concerned. |
if someone would like to run the next step training of leelazero, what should they do? for example, rent a server and storage, pay for Internet traffic, the power fees and so on? could they still use zero.sjeng.org domain name? |
Thank you @gcp , for starting this project and making it so succesful. Before Leela Zero, I was proud to tell people that I played about 1 game of Go / day. I always did some Go software development on the side. But end of 2018 I pretty much stopped playing Go at all. I have been busy working on ZBaduk ever since. So, without Leela Zero, I would be playing Go now, but instead I am working on the most challenging NodeJS/Angular project I ever worked on. - And this forum is filled with these kind of stories. Of course I would have loved to see you continue this project until the end of time. But I guess that you have already made up your mind. (I am slowly moving through the 7 steps of acceptance.) |
my evaluation shows that lz255 can only win lz254 176 out of 428, 41.1%. We'll probably get the next upgrade very soon. |
256 is born
|
Since I think it would be an awful waste if such a still significant contributing pool were to dissipate just like that, I'd like to once again propose to start training networks with a modest komi range of 5.5 to 8.5 inclusive (#1625 (comment)), while having test matches be played with a komi of 7 (which has the added benefit of making current specialized networks possibly beatable by new more generalist ones). I think that #1825 is ready to go and that only the server code needs to be adapted and a new LZ release be made. |
Would seem a pity to stop the project before reaching AGZ's 30M games (maybe not even 20M?). |
it seems recent nets are very similar strength, who wins whom mostly by chance. |
Has anyone tried https://arxiv.org/pdf/1902.02476.pdf? |
congratulation |
Blowing up the AGZ elo graph shows that AGZ has a total of 269 upgrades. The following is an approximation of the last 7 upgrades obtained from the AGZ elo graph. AGZ Model Elo Games The first abnormally long upgrade that required more than 600K games occurred after 2.34m games at AGZ263. There is a ELO increase of 206 from AGZ263 to AGZ269. This is around 75% win. Our first long upgrade occurred at LZ254. It would be great if we can get a 75% win over LZ254 before we close the project. We are probably just a few upgrades away. |
A great next step would be to port Katago to Android. Through the AQ Go app, the extended training 15b Leela Zero nets are very strong on mobile devices. At a few seconds a move, you get a mid dan level opponent. Because katago has ladder code, it would be ideal in a mobile setting. Especially since now many mobile devices support Opencl |
The test match LZ260 vs LZ250 gave 57%, so there is still significant progress going on. Stopping here seems premature. |
just my personal opinion, but if leela-zero doesnt stop, SAI can't skyrocket |
since yssaya has a forked leela zero that does not try to escape a doomed ladder (https://github.com/yssaya/leela-zero-ladder/releases) and alreadydone has implemented code to allow lz to play decent games with 0 komi and 15b nets including bubbles id extended training (https://github.com/alreadydone/lz/releases/tag/komi-v0.31) |
I did not say which year, did I? Oops! With over 1 year of warning, and as far as I know 2 successors set up and running, it's time to stop Leela Zero. The training data and SGF will be updated and stay up (as long as the OGS wants to fund it). I've asked Johnathan to start working on archiving the pages. Once he's done, we'll shut down the training machine and work/match server. It was nice to work with y'all, and we made some nice computer go history. |
Thank you very much for your invaluable contribution in computer Go, GCP! |
Hello, I'm working with @lightvector on katago distributed (backend and infra side): https://katagotraining.org Kudos @gcp and @roy7, your work on Leela Zero as been an inspiration for me. |
I believe that my growth is due to the LZ project led by @gcp and all the people involved in it. However, it seems that it was not a dream. With the LZ project, I have gained knowledge, sometimes asked questions and solved problems, and that dream is half fulfilled. I really appreciate all the people involved in the LZ project. |
Thank you @gcp. Not only SAI would have never existed without Leela Zero (obviously), but also my research life has totally changed thanks to this adventure that started with your terrific project. Long life to Leela Zero! ...And please, contributors, consider helping SAI, which is a variable-komi heir to LZ, backward compatible with LZ networks, actively developed, but still at about 3.4 million games. |
Hi all,
there's 500k games now without a network promoting. This means the training window is "full". I increased it to 750k games, and did a last learning rate lowering (0.00001 @ bs=96). If this doesn't result in a promotion, we've pretty much reached the end of the first training run, after just over 2 years.
As I stated before, I don't have enough time nor energy to contribute any more (Leela Zero was very successful and achieved much more than I set out for, but it also ate a lot of my time especially the first year...) so I will not set up a second training run myself. There's many options to go from here, from someone picking and choosing the most promising improvements that have been proposed here and elsewhere to do a Leela Zero 2.0, to joining the community behind one of the other initiatives that built upon what we started (obviously SAI, but I heard KataGo is considering setting up a distributed effort as well).
I plan to keep the training server running until the 31st of January (if necessary bumping the playout count to add some more high quality data to the pool). If the last learning rate drop miraculously sends us improving again, I can extend that, but for now I would say that the community should probably plan its next steps with that date in mind.
The text was updated successfully, but these errors were encountered: