Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End of first training run - next steps #2560

Open
gcp opened this issue Dec 16, 2019 · 57 comments
Open

End of first training run - next steps #2560

gcp opened this issue Dec 16, 2019 · 57 comments

Comments

@gcp
Copy link
Member

gcp commented Dec 16, 2019

Hi all,

there's 500k games now without a network promoting. This means the training window is "full". I increased it to 750k games, and did a last learning rate lowering (0.00001 @ bs=96). If this doesn't result in a promotion, we've pretty much reached the end of the first training run, after just over 2 years.

As I stated before, I don't have enough time nor energy to contribute any more (Leela Zero was very successful and achieved much more than I set out for, but it also ate a lot of my time especially the first year...) so I will not set up a second training run myself. There's many options to go from here, from someone picking and choosing the most promising improvements that have been proposed here and elsewhere to do a Leela Zero 2.0, to joining the community behind one of the other initiatives that built upon what we started (obviously SAI, but I heard KataGo is considering setting up a distributed effort as well).

I plan to keep the training server running until the 31st of January (if necessary bumping the playout count to add some more high quality data to the pool). If the last learning rate drop miraculously sends us improving again, I can extend that, but for now I would say that the community should probably plan its next steps with that date in mind.

@stephenmartindale
Copy link

Well, let me start by saying CONGRATULATIONS.

Whatever the next steps may be, this is as good a time as any to acknowledge the incredible achievement that this "first training run" represents.

@l1t1
Copy link

l1t1 commented Dec 16, 2019

what a pity! when i read the post, i feel the lz would leave us.

@Naphthalin
Copy link
Contributor

I also wanted to take the opportunity to thank you for all these efforts, basically bringing AlphaGo Zero to our homes and out into the world. I hope that journey continues, most probably both with SAI focussing on a zero bot for 19x19 with the very nice winrate sigmoid idea, and on KataGo focussing on playing strength on different net sizes, rule sets even if not everything is completely zero.

I would be happy if there would be a LZ 2.0 run and would be glad to join the discussion on what stuff to test etc -- should that discussion happen here on GitHub or in the Discord?

Concerning that last paragraph

(if necessary bumping the playout count to add some more high quality data to the pool)

I wanted to add that Lc0 tried simply increasing the playout count at the end of one of their runs, and it definitely caused a regression as with increasing the playout count, the visit statistics for moves quite massively changed which apparently caused the net to focus on "learning" the new policy distribution only, and despite letting it run for a while it didn't recover. I therefore wouldn't recommend a change in playout count at such a low LR.

@Glrr
Copy link

Glrr commented Dec 16, 2019

Since Sai and Katago are already freely available, it would be pointless to mimic them. I think that at some point, we should switch to an other project. A collaborative training for Katago would be great!

@Friday9i
Copy link

Friday9i commented Dec 16, 2019

Yeah, congrats and thx @gcp
A few ideas that could be tested, with limited coding requirements:

  • 50% or 52% gating
  • more noise for the first few moves in selfplay
  • some noise in match games (to limit long identical openings and the associated possible bias)
  • include a panel of short (eg 2-moves) zero openings to be used randomly in selfplay, to add more diversity
  • test again -m 999 (or -m 100 maybe)
  • play eg 10% of selfplay with 800 visits and 10% with 3200 visits
  • select some long and rarely played zero openings and play a % of selfplay games after these openings
  • limit time-consuming "no-resign" match games to 1%
  • etc

Most of these ideas require limited coding, or even no coding (eg -m 999, 50% gating, 1% no-resign)

@john45678
Copy link

Congratulations for the wonderful achievements with Leela during the last two years, who would have thought that the project would have risen to such heights.
Thanks To GCP, Roy7 and all.

@Vandertic
Copy link

Thank you @gcp I can positively say that your project changed my research interests and my scientific life! Leela Zero is great!

@hwj-111
Copy link

hwj-111 commented Dec 16, 2019

Model 255 will be the last network for Leela Zero (first training run)? Wow, 2^8 - 1, well done!

@Hersmunch
Copy link
Member

Firstly, thank you. For almost literally everything :) I've had a great time working with you all on this project and have definitely learnt a few things along the way :)

As for ideas:

  1. Create a MuZero algorithm based fork
  2. Add the ability to run concurrent training runs with different parameters, possibly even different network architectures

hmm I guess those are the main things from me. It would be great to keep alive the community that has gathered, whether that is with this codebase or another. I love the idea that there's open source resource for everyone to give ideas to and help with where they can :)
I would be happy to help where I can.

@lightvector
Copy link

lightvector commented Dec 16, 2019

Leela Zero is amazing work, and it's remarkable that it made it so far despite being the first serious open-sourced attempt and mostly groping in the dark as to early bugs and for ambiguous hyperparameters and implementation details not mentioned in papers.

Even retired, I expect Leela Zero will also probably stand for quite a while as the top (taking into account both strength and ease of use for users) open source essentially-"zero" bot. Neither SAI nor KataGo are quite zero, with both having liberty/chain-related features, and learning based on the board score giving additional regularization and learning signal beyond just a binary win/loss (although in different ways in the two bots) - and SAI also considering to add some change specifically targeted towards improving ladder reading (or so I hear from discord chat).

If there will be any followup project, I would be excited if it was willing to try out any of the improvements in KataGo to get independent verification of their usefulness. Precisely to have the chance to contribute back in that regard, KataGo has stayed very close to "zero" in spirit, and maybe I haven't advertised this as well as I could have, but most of the big improvements found are general (not Go-specific!) methods (or at least, they should be, assuming they work outside of KataGo):

The big such methods that are compatible with "zero":

  • Visit/playout cap randomization - improving value/policy data balance.
  • Policy target pruning - enabling much greater experimentation with exploration and PUCT-alternatives while reducing the risk screwing up the policy target distribution, such as in the ways that @Naphthalin warned about above for LC0.
  • Predict the next few moves instead of only the current move - richer training signal, better regularization. And/or predict the next few board states.
  • Squeeze-excite / Global pooling
  • Using batch-renorm (as with LC0) or getting rid of batch norm entirely. LC0 explicitly found that batch-norm caused problems that could not be corrected by any amount of training, the ELF OpenGo paper actually also mentioned fiddly issues with batch norm, and some preliminary experiments I've tried suggest that using Fixup initialization, is sufficient to train a net without batch norm without apparent loss of strength (and without any additional regularization!).

A second "true zero" run but leveraging some of these things (and/or other "zero-compatible" methods from SAI and MiniGo regarding temperature, game branching, and other hyperparameters) would be absolutely fascinating.
Edit: Note that while it's not absolutely guaranteed that all of the above will replicate, maybe they require tricky tuning or to be used in conjunction with other things - unlike some other possible improvement ideas, these ones are NOT merely speculation. Empirical testing in KataGo really does shows significant gains from them. Except maybe the batch-norm one, that one hasn't been fully tested yet.

Anyways, kudos to @gcp again for piloting the defining run of the first few years of post-AlphaZero computer Go.

@kityanhem
Copy link

kityanhem commented Dec 17, 2019

I think Globis-AQZ will be open source in future.
https://twitter.com/ymg_aq/status/1206119379777179649

In the 11th UEC cup, she has 90% winrate over the previous version (2019 CHINA SECURITIES Cup).
If she is stronger than LZ, can we use the self-play games of her like ELF before?

@wpstmxhs
Copy link
Contributor

wpstmxhs commented Dec 17, 2019

Thank you @gcp for your time contributed for this project.
We've been really happy to experience with strong Go playing program freely.
I do not doubt that you will make a different huge success other than Leela Zero, as you did it on this project.

And now, I'm wondering what the end of Leela Zero project is.
Actually this is end of the first run, but not end of the entire Leela Zero project.
In other words, It has many possibilities to advance somehow in the future.
I cannot wait for Leela Zero 2.0!

Anyway, Thank you @gcp again. I will remember you as the author of first public value network based Go program (Leela 0.9) and first successful open source reimplementation of AlphaGo Zero (Leela Zero).

@wpstmxhs
Copy link
Contributor

@kityanhem I think they are going to open a weight playing in Japanese rule, which was successful in UEC Cup.
If so, we can't use its self play as training data since the game rule is different.

@nemja
Copy link

nemja commented Dec 17, 2019

the visit statistics for moves quite massively changed which apparently caused the net to focus on "learning" the new policy distribution only

Note though how LZ has long been swapping Elf data in and out (between normal and bubblesld nets), which changed distribution on a weekly basis...

@bixbyr
Copy link

bixbyr commented Dec 17, 2019

Another thing we might explore is subsequent work on Leela or follow up community projects is time management. For chess programs I know that this was a very productive area of research. This could be explored using the "final" LZ network playing with different time management strategies.

One simple approach would be to take a snapshot of the distribution of visits every 5 seconds for a given position, and predict the amount of the distribution of visits will change over the next 5 seconds.

This actually sounds like a very doable side project. Does anyone have a sense of how hard it would be to do a dump of the top level position visit counts every N playouts?

@yssaya
Copy link

yssaya commented Dec 17, 2019

We have AlphaZero Shogi project(AobaZero).
Without LeelaZero, we could not start. I really thanks to this project.

In AobaZero, there was no improvment from 2600k to 3200k games.
But after this, we got about +60 Elo without any changes.
http://www.yss-aya.com/aobazero/index_e.html

So I think even if there is no progress from latest 500k games,
selfplay might find "something" needed to improve with a certain probability.
AobaZero also uses same window(replay buffer) size, 500k games.

@nemja
Copy link

nemja commented Dec 17, 2019

Also not to be forgotten is the huge improvement on 15b after selfplay stopped, but higher quality 40b games got fed. So current 40b likely still have a lot of room for further progress.

@gcp
Copy link
Member Author

gcp commented Dec 17, 2019

Visit/playout cap randomization - improving value/policy data balance.

I am using this for ataxx (Dankx). I find it useful if only because it is an easy way to get additional randomness and exploration. I have not tried weighting based on the counts.

Predict the next few moves instead of only the current move - richer training signal, better regularization. And/or predict the next few board states.

This did not help at all in chess (Stoofvlees).

Squeeze-excite / Global pooling

SE nets were worse for chess (I know lc0 concluded the opposite!) but a large gain in ataxx. Not sure what to think of this. Because of lc0, I tried this multiple times but all failed.

I use global pooling rather than 1x1 down-convolution networks for all my new programs. I didn't even try alternatives, global pooling seems to achieve the same outcome in a cleaner way.

Using batch-renorm (as with LC0) or getting rid of batch norm entirely....Fixup

I haven't tried these. BatchNorm for sure has some side effects. The original Leela used ELU units instead of BatchNorm, but this was before residual networks.

Remi has revealed that instead of using temperature, his Crazy Zero bots build an opening book of suspected equal positions to get the randomization. Stoofvlees does not use temperature either.

Another thing I found to help a lot...is to get rid of regularization entirely. Once you have enough games, overfitting isn't that much of a problem any more!

@Naphthalin
Copy link
Contributor

Remi has revealed that instead of using temperature, his Crazy Zero bots build an opening book of suspected equal positions to get the randomization. Stoofvlees does not use temperature either.

There is a really beautiful solution to this "diversity by building its own book" kind of problem which is directly related to #2230 , which is using a policy softmax temperature slightly above 1.0 for training games (I found the range [1.15, 1.4] to work in theory and recommend using 1.2). The effect is basically a regularization of the top move policies so that all moves which have a sufficiently small Q difference to the top move converge to a non-zero policy, while it doesn't affect moves with significantly worse eval. I'd be happy to share more details with you if you are interested.

@Friday9i
Copy link

Friday9i commented Dec 17, 2019

"Remi has revealed that instead of using temperature, his Crazy Zero bots build an opening book of suspected equal positions to get the randomization": I did a try to manually create a zero opening book here (#2104) with LZ
But that could be done automatically every few network (or for every new network) to create a 2 or 3-moves opening book. 2-moves book have ~200 positions while 3-moves deep books must have around 3000 positions, which is probably enough to provide good diversity and exploration.
Approach to create efficiently a broad zero opening book:

  • from root position, spend ~10 playouts for each legal move and record the list of moves within ~10% winrate from the best move (eg if best move has a winrate of 45%, select all moves above 35% winrate)
  • explore these moves with ~100 playouts dedicated to each of them, and record the list of moves within ~4% winrate from the best move
  • explore these moves further with ~1000 playouts for each of them, select moves within ~2% from best move: this list of moves is the "1-move opening book"
  • as it is done only once for every new network, it's reasonable to add one step and spend ~10K playouts for each of these moves, to ensure they are still within ~2% winrate from best move
  • then, create the 2-moves book: for each move in the "1-move opening book", iterate the same approach (test all legal moves with 10 playouts each, select the best ones and spend 100 playouts, select the best ones and spend 1000 playouts, list moves within 2% winrate): this constitutes the "2-moves opening book"
  • if needed, iterate a 3rd time from all positions after the 2-moves opening book, to create a "3-moves opening book"

This 2 or 3-moves deep opening book could be used for selfplay but also for match games. Not clear if the usual noise would still be useful for selfplay (I guess it would help somewhat)
BTW, 2 books could be created: a broad one as proposed above with moves within ~2% winrate and a deeper and narrower one by selecting moves within ~0.5% from best move at each ply. Combining both books could help exploration of diverse positions while also spending more time on the most interesting ones.

PS: @Naphthalin's approach just above is another interesting alternative, working all along the game and not only for the opening phase. Not clear for me if it can totally replace the opening-book approach, or if it is an additionnal tool to provide more diversity after the first few opening moves.

@Naphthalin
Copy link
Contributor

PS: @Naphthalin's approach just above is another interesting alternative, working all along the game and not only for the opening phase. Not clear for me if it can totally replace the opening-book approach, or if it is an additionnal tool to provide more diversity after the first few opening moves.

@Friday9i It is supposed to totally replace the opening book, as you still get enough diversity even with temp values around 0.8 (so much lower than 1.0, significantly reducing the number of actual blunders).

@ihavnoid
Copy link
Member

First, I would like to thank everybody (especially @gcp) who made such a valuable contribution. To me, leela-zero was a chance to learn how to write code and see how things evolve. It was exciting to see the same code starting from making random moves and then end up beating even the best human players in less than a year. Meanwhile I started with a single GTX1080... and now I have four GPUs running all sorts of different stuff. :)

That said, I don't think I will spend much time working on leela-zero anymore. I do expect that there
may be better ideas that will end up improving things here and there, but it just feels like I won't be able to start something completely new and end up still being useful. At one point I developed a leela-zero instance with a web interface (which even plays handicapped games within reasonable level) on https://cbaduk.net/ - I am planning to keep it running until my PC dies. :)

Meanwhile I started some of my own projects - I am writing my own 'zero' implementation of Janggi - a Korean 'chess-like' board game - from scratch. This was more of to sharpen my programming skills rather than learn how to create the world-best AI.

Again, Thanks everybody, and bon voyage!

@arondes
Copy link

arondes commented Dec 17, 2019

Leela zero is the first open source 'zero-type' go engine and its a great memory for me that I observed its entire evolvation history since random play. Nowasdays with a laptop I can beat any pro-go player. Such amazing experience!

@lightvector
Copy link

@gcp - Thanks for the insights on the usefulness of some of the methods I listed for other games!

@SHKD13
Copy link

SHKD13 commented Dec 17, 2019

I think Globis-AQZ will be open source in future.
https://twitter.com/ymg_aq/status/1206119379777179649

In the 11th UEC cup, she has 90% winrate over the previous version (2019 CHINA SECURITIES Cup).
If she is stronger than LZ, can we use the self-play games of her like ELF before?

Is the Globis-AQZ selfplay publicly available? Maybe you have a link for it?

@lcwd2
Copy link

lcwd2 commented Dec 18, 2019

Alpha zero uses a sliding window of the last 500K games. Each training cycle uses 2048 x 1000 states. Some of the states used in a training cycle should be good data and some of them should be bad. It depends on luck on whether a cycle hits sufficient good data to create a promotion. The current state is that the window of games is purely from the current best model. Perhaps it is worth sticking to the 500K window and let it run for a few more months to see whether it can hit a good set of game states. Alpha zero stopped at 21 million self play games. We are now at 17 million self play games.

@Vandertic
Copy link

@lcwd2 where did you get those numbers? The Science paper reports 140 million games (but without 8-fold symmetries data augmentation) and 4096 minibatch size, while the pseudocode reports also a 1 billion games window for training. This seems to be their second run, as the arxiv paper instead had one of your figures (21 million games), but not all of them, so maybe you got some source that I missed?
Also have a look to this and please tell me if you know something to be wrong. Thanks!

@gcp
Copy link
Member Author

gcp commented Dec 18, 2019

Using batch-renorm (as with LC0)

I looked at this but I don't really buy it. The quoted advantages are being able to deal with small batch sizes due to RAM restrictions, dealing with interdependent samples, and avoidance of training vs inference batch differences. The first can be handled with gradient accumulation (which our code already supports! BATCH_SIZE vs RAM_BATCH_SIZE), the second is something we try very hard to avoid, and the third isn't really a factor for large batch sizes.

Testing gave me slightly worse results at a performance penalty during training (due to no cuDNN support). I wonder if lc0 got stuck with this because they didn't backport the gradient accumulation changes. OTOH, if there'd be no performance penalty, it might be a no-brainer as it probably allows you to be more careless during training.

I will try Fixup next, it looks interesting enough.

There is a really beautiful solution to this "diversity by building its own book" kind of problem which is directly related to #2230 , which is using a policy softmax temperature slightly above 1.0 for training games (I found the range [1.15, 1.4] to work in theory and recommend using 1.2).

I use this for Ataxx, whereas Stoofvlees uses gradual UCT policy flattening (described in that issue IIRC).

I'm curios how you (theoretically) arrived at that range.

But note that the approaches we're discussing don't use temperature at all. As long as you can still explore, this significantly accelerates the learning on the value head for obvious reasons. Flattening the policy by itself doesn't cause any exploration at all.

@Naphthalin
Copy link
Contributor

But note that the approaches we're discussing don't use temperature at all. As long as you can still explore, this significantly accelerates the learning on the value head for obvious reasons.

Which in my opinion is very similar to constricting the moves which temp can choose to some very conservative range, e.g. some maximum delta Q below the top move or by some minimum amount of visits which both eliminates moves where the network already knows that they are blunders.

Flattening the policy by itself doesn't cause any exploration at all.

Exactly that is the beauty of it: When only using policy flattening in match games (as Lc0 does with PST=2.2 there), this is true. However this changes completely when even only slightly flattening the policies in the training games! Assuming that after 1600 node search all moves have roughly the same Q+U value, and that policies converged to an equilibrium (while value head is stationary) one can directly calculate the equilibrium policy distribution from the list of Q values. When flattening the policies in the right way, in this equilibrium equation the policy term doesn't cancel out anymore, but instead policies of slightly suboptimal moves have an non zero equilibrium, while bad moves are still pushed to zero. Currently the only thing preventing nearly optimal moves from going to 0 is the 25% dirichlet noise, which provides some small regularization.

I'm curios how you (theoretically) arrived at that range.

Mainly by solving the equilibrium policy equation system numerically for a variety of combinations of Q values taken from different Leela nets, cpuct, N, PST, amount of noise, both for chess and for Go. Going below the lower bound at around 1.15 results in too small Q differences being relevant (e.g. if 3-3 is roughly 1% behind 4-4 and 4-3, it won't have some significant policy), above 1.3 mostly the bad moves start to profit from the combination of dirichlet noise and policy flattening.

I attached two example graphs for chess (where the lower amount of total moves simplifies the numerical treatment a lot, but figures would look nearly identical for Go), where one can see how 1) already a very small Q difference of 0.01 leads to drastic policy differences, and 2) how using a policy softmax temperature >1 in training (!) affects this:
PST_vs_Orig_noise25_Sicilian
PST_variable_noise25_Sicilian

@gcp
Copy link
Member Author

gcp commented Dec 18, 2019

Which in my opinion is very similar to constricting the moves which temp can choose to some very conservative range, e.g. some maximum delta Q below the top move or by some minimum amount of visits which both eliminates moves where the network already knows that they are blunders.

Well yes, if x=0 is good and x=1 is bad, then it's obvious that x=0.5 is less bad than bad, but it doesn't tell me why that is preferable over x=0. So this argument doesn't help us further?

However this changes completely when even only slightly flattening the policies in the training games!

I think this argument doesn't actually require temperature, correct? Only a feedback loop on policy percentages due to search effort redistribution after training a new policy?

above 1.3 mostly the bad moves start to profit from the combination of dirichlet noise and policy flattening

I can't tell this from the graphs. (I'm interested because I'm already doing this and empirically 2.0 worked better than 1.5 for me 😄) Is it because the bad moves might still turn out to be good because in practice the eval head can still be shifting around?

@lightvector
Copy link

Using batch-renorm (as with LC0)

I looked at this but I don't really buy it. The quoted advantages are being able to deal with small batch sizes due to RAM restrictions, dealing with interdependent samples, and avoidance of training vs inference batch differences. The first can be handled with gradient accumulation (which our code already supports! BATCH_SIZE vs RAM_BATCH_SIZE), the second is something we try very hard to avoid, and the third isn't really a factor for large batch sizes.

The third can actually be a factor with large batch sizes. I recall the reason that LC0 switched was due to finding certain kinds of promotion moves being possible or not actually caused a strong enough difference in activation strength of certain channels in their net that there would be a nontrivial difference in the batch statistics depending on if they were possible or not - and were rare enough that you might have a batch have zero instances of the move. So you got different inference time vs training time scaling, empirically causing the policy at inference time putting almost zero weight on a move that it should have put weight on - a strange blind spot. (I could of course be misremembering or mischaracterizing why they switched).

So I think generally the worry is what regarding what batch norm does regarding things like this - robustness in "rare" situations, blind spots, etc. Which unfortunately makes it a little harder to test than if you're just worried about "main-line" average performance. It might be that it's much less of a worry with Go since even the fact that the board is 19x19 will mean a wider surface over which average channel activations.

@lcwd2
Copy link

lcwd2 commented Dec 18, 2019

@Vandertic my source of information is the same arxiv paper. The leela zero elo graph doesn't yet have the sharp turning point as in the different versions of alpha go. A large portion of the alpha go training steps are done after the turning point. It would be interesting to have the data on how lz will behave in the remaining 4 million games before closing this project.

@Naphthalin
Copy link
Contributor

I think this argument doesn't actually require temperature, correct? Only a feedback loop on policy percentages due to search effort redistribution after training a new policy?

This is correct, it is a policy feedback thing only. The argument rather goes in the reverse direction; while currently temperature doesn't produce enough diversity, after that change suddenly even conservative temperature choices produce diverse training games as the policies are flattened in an "informed manner" through the RL loop.

I can't tell this from the graphs. (I'm interested because I'm already doing this and empirically 2.0 worked better than 1.5 for me 😄 ) Is it because the bad moves might still turn out to be good because in practice the eval head can still be shifting around?

For the last graph with "bad moves" I mean that cyan line, which is slowly but significantly rising with PST where I assume that they are evaluated as bad moves correctly. The PST-noise interaction obviously can't be seen there, as all graphs are with the standard 25% noise, but when reducing it, the cyan line would be significantly lower, while the others would basically stay the same. These policies shouldn't be too high, as otherwise search will waste some nodes there, and temperature might choose them too often.

I think the whole point of assigning a significant number of training games to seemingly suboptimal moves/lines as well is that they could turn out to be better/worse in the long run, and the magnitude of these fluctuations will be game specific. Just from anecdotal evidence for LZ these fluctuations were <1.5%, for Lc0 they sometimes are up to 8% (which means 4% for LZ due to the [-1,1] vs [0,1] ranges). To cover these fluctuations, Go therefore might get away with the PST parameter slightly smaller, and I can easily imagine that in your case fluctuations might be even higher. Did you try a net specific CLOP for applying a PST value after training? For Lc0, they use 2.2 for a while now (with 1.0 in training), but recently had a first test run with 1.2 in training, where CLOP suggested using 1.5 in match play. If (as LZ does) PST 1.0 is used in match play, it is likely that a higher training PST will push the optimal match PST closer to 1.0, which maybe might explain your empirical result already.

However, due to the saturation I see and the combination with noise, it might actually be better to increase cpuct at root only in your case, as this effectively scales down the Q differences. Easiest thing would be if you could give me a list of Q values in your case and your choices on cpuct and N, and I could run my script to produce the expected equilibrium policies.

@gcp
Copy link
Member Author

gcp commented Dec 18, 2019

The third can actually be a factor with large batch sizes...that there would be a nontrivial difference in the batch statistics depending on if they were possible or not - and were rare enough that you might have a batch have zero instances of the move.

Sounds a bit weird. If the effect is rare enough that some batches have 0 or 1, and the batches are, say, 2048 sized, then obviously the effect they can have is scaled down by the same 1/2048.

I checked and lc0 did end up adding gradient accumulation, so they shouldn't be limited to small batches. I couldn't find what batch size they use right now though.

I ended up digging more, and found: LeelaChessZero/lc0#784 (comment)

Note that this points out: "Currently the training code calculates the batch norm statistics using ghost batch norm of 64 positions."

So, this explains why they ended up needing batch renorm, but now I wonder what this ghost batch norm is about.

@gcp
Copy link
Member Author

gcp commented Dec 18, 2019

So, this explained why they ended up needing batch renorm, but now I wonder what this ghost batch norm is about.

Okay, it's to match AZ0 better: LeelaChessZero/lczero-training#28

And because there's one paper claiming that generalizes better. I am sure I can find multiple others claiming the exact opposite. But it does make sense now to me why they ended up with it.

@Ttl
Copy link
Member

Ttl commented Dec 18, 2019

Note that with gradient accumulation batch norm statistics are still calculated from GPU batch size. The plane was normally zeroed and only had activations in rare cases. The issue was that batch normalization uses batch statistics in training and moving averages during inference.

Let's assume that activation would be 1 when activated and 0 otherwise and the position occurs in one in thousand training samples.

If the GPU batch size is 64, with one positive example in the batch the calculated variance would be 1/64 and if there would be no positive examples in the batch the variance would be 0. But the variance with very large large batch size should be 1/1000. There is significant discrepancy in the variance that is used in the normalization during the training when there is a positive example in batch. During inference moving average variance of approximately 1/1000 would be used for normalization, which doesn't match the variance that was used in the training.

Renorm uses moving variances for normalization with some clipping during the training and the variance used for normalization would be much closer to the real 1/1000 variance.

@gcp
Copy link
Member Author

gcp commented Dec 18, 2019

Note that with gradient accumulation batch norm statistics are still calculated from GPU batch size

Argh, that's a good point. It's clear why it's useful if you have small batches. They key thing to realize is that simply using moving statistics during training as well (what your first idea would be...) doesn't work due to some surprising maths, and batchrenorm is basically the workaround for that.

It didn't make much sense to me why it would be a problem with large batches. And I think we've now confirmed that was just not the case for lc0 - they were/are using 64 sized batches as far as batchnorm is concerned.

@l1t1
Copy link

l1t1 commented Dec 18, 2019

if someone would like to run the next step training of leelazero, what should they do? for example, rent a server and storage, pay for Internet traffic, the power fees and so on? could they still use zero.sjeng.org domain name?

@bvandenbon
Copy link

bvandenbon commented Dec 20, 2019

Thank you @gcp , for starting this project and making it so succesful.
You made an extreme impact on the Go community with it, and probably impacted the life of millions. It's certainly something to be proud about, an amazing accomplishment !

Before Leela Zero, I was proud to tell people that I played about 1 game of Go / day. I always did some Go software development on the side. But end of 2018 I pretty much stopped playing Go at all. I have been busy working on ZBaduk ever since. So, without Leela Zero, I would be playing Go now, but instead I am working on the most challenging NodeJS/Angular project I ever worked on. - And this forum is filled with these kind of stories.

Of course I would have loved to see you continue this project until the end of time. But I guess that you have already made up your mind. (I am slowly moving through the 7 steps of acceptance.)

@hank93304
Copy link

This is a great project! As a fans of Go, I enjoy playing with LeelaZero.

Congratulate for the new weight #255! Thanks @bjiyxo for the training.

@lcwd2
Copy link

lcwd2 commented Dec 20, 2019

my evaluation shows that lz255 can only win lz254 176 out of 428, 41.1%. We'll probably get the next upgrade very soon.

@l1t1
Copy link

l1t1 commented Dec 21, 2019

256 is born

Start Date Network Hashes Wins / Losses Games SPRT
2019-12-21 01:27 482b46a3 VS b12e8925 164 : 121 (57.54%) 285 / 400 PASS

@TFiFiE
Copy link
Contributor

TFiFiE commented Dec 23, 2019

Since I think it would be an awful waste if such a still significant contributing pool were to dissipate just like that, I'd like to once again propose to start training networks with a modest komi range of 5.5 to 8.5 inclusive (#1625 (comment)), while having test matches be played with a komi of 7 (which has the added benefit of making current specialized networks possibly beatable by new more generalist ones). I think that #1825 is ready to go and that only the server code needs to be adapted and a new LZ release be made.

@nemja
Copy link

nemja commented Dec 24, 2019

Would seem a pity to stop the project before reaching AGZ's 30M games (maybe not even 20M?).

@l1t1
Copy link

l1t1 commented Dec 26, 2019

it seems recent nets are very similar strength, who wins whom mostly by chance.

@barrtgt
Copy link

barrtgt commented Dec 30, 2019

Has anyone tried https://arxiv.org/pdf/1902.02476.pdf?

@jason19659
Copy link

congratulation

@lcwd2
Copy link

lcwd2 commented Jan 9, 2020

Blowing up the AGZ elo graph shows that AGZ has a total of 269 upgrades. The following is an approximation of the last 7 upgrades obtained from the AGZ elo graph.

AGZ Model Elo Games
AGZ263 4979 651k
AGZ264 5075 1,490k
AGZ265 5094 335k
AGZ266 5094 354k
AGZ267 5168 790k
AGZ268 5174 1,953k
AGZ269 5185

The first abnormally long upgrade that required more than 600K games occurred after 2.34m games at AGZ263. There is a ELO increase of 206 from AGZ263 to AGZ269. This is around 75% win. Our first long upgrade occurred at LZ254. It would be great if we can get a 75% win over LZ254 before we close the project. We are probably just a few upgrades away.

@portkata
Copy link

portkata commented Feb 2, 2020

A great next step would be to port Katago to Android. Through the AQ Go app, the extended training 15b Leela Zero nets are very strong on mobile devices. At a few seconds a move, you get a mid dan level opponent. Because katago has ladder code, it would be ideal in a mobile setting. Especially since now many mobile devices support Opencl

@nemja
Copy link

nemja commented Feb 3, 2020

The test match LZ260 vs LZ250 gave 57%, so there is still significant progress going on. Stopping here seems premature.

@wonderingabout
Copy link
Contributor

wonderingabout commented Feb 7, 2020

just my personal opinion, but if leela-zero doesnt stop, SAI can't skyrocket
we learned a lot from leela-zero and it was a great experience in many ways, but it's useless to go at the end of this run, it's already near stalling, no need to crawl endlessly
while new people are invested in new ideas, i'd rather push and motivate them
and SAI is basically just leela zero 2.0

@portkata
Copy link

since yssaya has a forked leela zero that does not try to escape a doomed ladder (https://github.com/yssaya/leela-zero-ladder/releases) and alreadydone has implemented code to allow lz to play decent games with 0 komi and 15b nets including bubbles id extended training (https://github.com/alreadydone/lz/releases/tag/komi-v0.31)
how hard would it be to combine the code and have a release that can play decent games at 0 komi and not try to escape a doomed ladder?

@gcp
Copy link
Member Author

gcp commented Feb 8, 2021

I plan to keep the training server running until the 31st of January

I did not say which year, did I? Oops!

With over 1 year of warning, and as far as I know 2 successors set up and running, it's time to stop Leela Zero. The training data and SGF will be updated and stay up (as long as the OGS wants to fund it).

I've asked Johnathan to start working on archiving the pages. Once he's done, we'll shut down the training machine and work/match server. It was nice to work with y'all, and we made some nice computer go history.

@SHKD13
Copy link

SHKD13 commented Feb 8, 2021

Thank you very much for your invaluable contribution in computer Go, GCP!
Leela Zero made our dreams about superhuman AI inside home PCs come true. An absolute legend

@tychota
Copy link

tychota commented Feb 8, 2021

Hello, I'm working with @lightvector on katago distributed (backend and infra side): https://katagotraining.org

Kudos @gcp and @roy7, your work on Leela Zero as been an inspiration for me.
And happy retierement, leela zero !

@intenseG
Copy link

intenseG commented Feb 9, 2021

I believe that my growth is due to the LZ project led by @gcp and all the people involved in it.
Developing a bot that mimics my Go Style was a dream come true.

However, it seems that it was not a dream.

With the LZ project, I have gained knowledge, sometimes asked questions and solved problems, and that dream is half fulfilled.

I really appreciate all the people involved in the LZ project.
And thank you for all your hard work!

@Vandertic
Copy link

Thank you @gcp. Not only SAI would have never existed without Leela Zero (obviously), but also my research life has totally changed thanks to this adventure that started with your terrific project.

Long life to Leela Zero! ...And please, contributors, consider helping SAI, which is a variable-komi heir to LZ, backward compatible with LZ networks, actively developed, but still at about 3.4 million games.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests