Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8M games - training progress note #1569

Open
gcp opened this issue Jun 18, 2018 · 72 comments
Open

8M games - training progress note #1569

gcp opened this issue Jun 18, 2018 · 72 comments

Comments

@gcp
Copy link
Member

gcp commented Jun 18, 2018

Small note because I know many people will start "panicking" as we have 200k games with no notable strength progress (i.e. PASS).

One reason why progress dropped a bit is that the ELF games left the training window. This wasn't intentional. I tried a quick fix (I was traveling) of increasing the window to 800k but the result wasn't particularly good. I somewhat rewrote the dumping code now to correctly deal with this and the next run should be closer to what was intended (~230k ELF + 250k regular). We'll see if this improves things. There'll be some delay before the next bunch of networks as I was fiddling with the training setup to get it debugged.

If these fixes don't produce progress, the next step will be a lowering of the learning rate. Because we're using SWA this might not necessarily produce a jump either.

It would be good to be able to start training and comparing 256x20 now, to figure out if network size is the issue (it sounds likely, but we do need confirmation in the form of a stronger 256x20 without which we can't do anything anyway!). It is really extremely unfortunate that we have the problem in #167 now. If this does not get resolved in a few days I will see if I can get some other hosting set up. (Data loss is not an issue, I have 2 full backups of the database...)

@herazul
Copy link

herazul commented Jun 18, 2018

How many Gigabyte of data is the necessary training data ? (i mean the data we need to train a new network can't be that long ago, 10x128 training games can't be valuable now for example)
The total GB of 15*192 games + ELF games ?
Honnestly it may be doable to just torrent it, if we several hundred peoples ready to autogtp every day, we might have also enough people to host data ourselves.
For exemple, I would be ready to torrent it 24/24 on a gigabit connection on my server, and i think that i would not be the only one. Even if there would be like 10 of us, it would be more than enough to be a LOT better, faster and stronger more reliable than almost all hosting we could find.

@gcp
Copy link
Member Author

gcp commented Jun 18, 2018

A full window is about 20G-30G IIRC.

The problem is that it evolves a few times a day so it's not very fit for torrenting. The storage server that broke was very nice because I could update it almost incrementally with rsync.

At worst I'll rent a dedicated server with a big HDD for a while.

@gcp
Copy link
Member Author

gcp commented Jun 18, 2018

The problem is that it evolves a few times a day so it's not very fit for torrenting.

To make it clear, the data for old networks doesn't change any more so this you can torrent. But of course if you're trying to beat the last network, you want the data for the last network, and this gets a bit more messy.

@herazul
Copy link

herazul commented Jun 18, 2018

Yeah it would not be as practical, but it would be possible to create a new torrent on a schedule like on every 2 days. It would maybe be acceptable of a delay to do training experiments.

Renting a server could be doable too, do not hesitate to create a go fund me or something like it to help with money, i think many of us would donate a little to help with renting.

@barrtgt
Copy link

barrtgt commented Jun 18, 2018

Is there room for improvement with SWA?

@LRGB
Copy link

LRGB commented Jun 18, 2018

@gcp FYI, bjiyxo released v15 of his 256x20 network a few days ago. It hasn't been queued yet.

@l1t1
Copy link

l1t1 commented Jun 19, 2018

the webpage shows the training games of 256x20 v15 is 8.121M, it is too big, as bjiyxo syas it was up to lz 146

@roy7
Copy link
Collaborator

roy7 commented Jun 19, 2018

I wasn't sure what number to use. Should I change it to read 7690965?

I changed it to 7690965.

@bjiyxo
Copy link

bjiyxo commented Jun 19, 2018

@gcp Can you post the new raw training data (including 147, 148, and ELF) on Google Drive or Dropbox? Then I can train a new 20b catching up the latest 15b.

@carljohanr
Copy link

Did the previous 20b include any ELF games?

@bjiyxo
Copy link

bjiyxo commented Jun 19, 2018

@carljohanr Yes, including ~ 150k ELF games.

@ThorAvaTahr
Copy link

You attribute the slow progress to lack of ELF games in the window. However, if that is the case, i am rather worried by the rate of progress intrinsically our training pipeline.

It looks as though our training hardly inproves strength, while apparently there is still a lot of room for Improvement. Of course I understand that learning Goes faster with a good tutor, but it seems to have nearly stalled without it.

@gcp
Copy link
Member Author

gcp commented Jun 19, 2018

You attribute the slow progress to lack of ELF games in the window. However, if that is the case, i am rather worried by the rate of progress intrinsically our training pipeline.

Our progress would have been clearly slower without the ELF data. That's why we are using it! I am not sure we would have stalled out or not without it, or would have needed to jump to 256x20 faster, but it seems likely.

(Of course, in terms of time, generating the ELF data took time that could have been used for "regular" training games)

@Friday9i
Copy link

@ThorAvaTahr Not so worried on my side, I think we are "not so far" from the theoretical maximum of a 15x192 network, hence the selfplay improvement necessarily slows down. What kept a relatively good pace of progress is due to ELF, accelerating strongly the process vs a pure selfplay approach. Once you remove ELF, it is not accelerated anymore and as we are close to the max, it is (very) slow... JMHO.

@john45678
Copy link

Did anyone notice that each time GCP posted about the current situation and what the plans were...we'd have a PASS in the next few hours :)

@TFiFiE
Copy link
Contributor

TFiFiE commented Jun 19, 2018

Not to be a spoilsport, but the irony of that disappears with the knowledge these posts tend to be accompanied with a lowering of the learning rate.

@l1t1
Copy link

l1t1 commented Jun 19, 2018

maybe lz148 is better than 149, as it beat 20 weights and last 20k games, while 149 only beat 1 weight, and is beaten by 2 weights and only last 6000 games

@gcp
Copy link
Member Author

gcp commented Jun 19, 2018

Not to be a spoilsport, but the irony of that disappears with the knowledge these posts tend to be accompanied with a lowering of the learning rate.

The learning rate was not changed, fixing the dump of the window was enough (for now).

@gcp
Copy link
Member Author

gcp commented Jun 19, 2018

Can you post the new raw training data (including 147, 148, and ELF) on Google Drive or Dropbox?

I'll update the #167 topic with some temporary links.

@2ji3150
Copy link

2ji3150 commented Jun 20, 2018

The selfplay games of Elf are going to 25k. Will we train a super ELF (20*224) for replacing ELF to generate new better selfplay game?

@l1t1
Copy link

l1t1 commented Jun 20, 2018

the elf self play will hit 250k games today, will @gcp stop elf play tomorrow?

@TFiFiE
Copy link
Contributor

TFiFiE commented Jun 20, 2018

So it's normal to expect a point at which a network of a given size can still be improved but only by training games that come from a larger network?

@Friday9i
Copy link

@TFiFiE it's not true theoretically as, after an infinite time of efficient selfplay training, the net should go to its maximum performance. So training it with a larger net or not, it should reach its maximum.
But practically speaking, the convergence is slowing down exponentially... Hence, it's probably more efficient to train a larger net to get stronger games and then train the smaller net with it: it should reach its maximum level faster. That seems logic, and the experience with ELF seems to confirm that.

@alphaladder
Copy link

alphaladder commented Jun 20, 2018

@2j3150 elf is just a helper for LZ. Why do we need to discard our LZ and train a so -called “super Elf II “?

But we do need to enlarge our network for a better LZ in order to beat and swallow the elf weight.

@ryouiki
Copy link

ryouiki commented Jun 21, 2018

2j3150 means, since we have good training set(>250k) to enhance ELF itself, we could try to train better-ELF separately. If we would get better-ELF net, it may help LZ training too.

@l1t1
Copy link

l1t1 commented Jun 22, 2018

elf self play games is over 250k.

7933493 total self-play games (20519 in past 24 hours, 625 in past hour, includes 250252 ELF).

@wonderingabout
Copy link
Contributor

wonderingabout commented Jun 22, 2018

most of you seem only to think about how to make lz stronger instead of appreciating the progress of the project as it is
i think i can say confidently that lz will never reach alphago level, and even for elf level its still far beyond
i think lz project should think more about how useful it is rather than how powerful

also, 250k is just an estimation, it could be 260k, 300k, 350k, depending on how things go
and i still believe 192x15 has more room for improvement, provided we wait long enough (no network in 5 days is not dramatic), so i would still wait a few more weeks and see how things go

my suggestion would be less matches, instead of doing them every 30k games, do it rather every 50-60k games should be quite safe, and start with +64.0k networks (as +32.0k rarely pass). The computing power saved can be invested in more selfplay training

EDIT : +30k new games after 8M total games have naturally much less impact than +30k new games at 1M total games, so maybe its the right time to widen the window

@alphaladder
Copy link

alphaladder commented Jun 22, 2018

@wonderingabout I disagree with your suggestion which is just a disturbing to normal LZ training.
Note that our goal is to obtain another AGZ . In addition, you could train another project based on your own idea and it may be reasonable.

@wonderingabout
Copy link
Contributor

wonderingabout commented Jun 22, 2018

@alphaladder

sorry but you're dreaming..
with current computing power, it would take 10-20 years to reach alphago level for lz, and by that time other tools like elf may be released, so again strength race, while enjoyable, should not be lz's main objective, but rather as an "open" source project, it should focus on being useful for the go community, amateurs as well as pro scene. (and for alphago, the learning curve starts with a peak then gets close to flat for a long time, see alphago curve.)
so far, lz had a very positive effect on the go community, among which promoting the game of go, creating a positive emulation to thrill and motivate go players as well as giving a better understanding of the game and a refreshed non human new approach to the game of go, providing review tools and a big game selfplay database for online viewing, making big companies like tencent or facebook open source their ai, from programmers and developpers perspective the data and experiments could be applicable to other non go projects, also inspiring other non-go projects like neural net leela chess zero etc etc

so i think lz keeping working in the same direction would be much appreciated, at least by me, and among possible projects the ones that first come to my mind are generalized komi (being able to play with -50.5 komi against dan players for example), support for handi games with realistic moves, customizbale realistic and reliable difficulty (i know you can pick first networks to lower difficulty, but these have many flaws in particular falling to ladders)
again, while enjoyable, i think if you're following lz for the strength race you're missing the most interesting points of it
personally, i'm also looking forward to lz improvements and growth (how more than up to which point), but i dont expect it to reach alphago, i just enjoy seeing it improve as much as it can

see my above comment for EDIT

EDIT 2 : for customizable difficulty, make lz play moves to reach ex. 30% winrate = win ?, or variable winrate during the game = win ?

@bjiyxo
Copy link

bjiyxo commented Jun 22, 2018

Don't take ELF for granted. FAIR is not a go company. We might need to reach AlphaGo Zero by ourselves. As far as I'm concerned,I will always support the Go AI which will constantly improve itself,no matter how long it takes.

@l1t1
Copy link

l1t1 commented Jun 27, 2018

i am afraid of the new weights become too strong, others can training with them, and the risk of lz lost in ai competition increases

@wonderingabout
Copy link
Contributor

wonderingabout commented Jun 27, 2018

i'm curious to see how lz 152 compares to elf now
i wonder if it can go below 80% winrate for elf, but testing is interesting
last test was 03 june against lz 147 with 86% winrate for elf
is a match planned in the future ?

@Friday9i
Copy link

Friday9i commented Jun 27, 2018

Difference was about 315 Elo with 86% winrate. Since that, LZ progressed around 205 Elo, but it is selfplay, so the real improvement is rather ~80 Elo, so the remaining difference should be around 230 Elo, ie a ~79% win rate for ELF vs LZ152. But LZ vs Elf is becoming a kind of selfplay scale too, with an inflated scale. Hence, the difference should be smaller than 230 Elo: hard to guess but I'd say ~180 Elo, ie a ~74% win rate
All in all, I'd say around 75% for ELF vs LZ152. Hopefully we'll know soon.

@john45678
Copy link

I'm a little less optimistic, so I'll say 79% for ELF (vs. LZ152).

@Friday9i
Copy link

78% after (only) 37 games: seems we are not too far of, but still extremely noisy, we have to wait for at least 100 games to get a better idea!

@Mardak
Copy link
Collaborator

Mardak commented Jun 27, 2018

I just updated the 3-3 knight's move tracking #1442

From LZ148, the priors for the critical move for the joseki increased suddenly from 17% to LZ150 32% to LZ152 46%. Perhaps in addition to ELF reaching half of the window, newer training data is pushing out those from much older networks that would have trained towards ~10% prior instead of ELF's ~100% prior.

Here's the progress of the joseki for just the 192x15 networks every 5 since turning on ELF:
screen shot 2018-06-27 at 10 29 19 am

@bjiyxo
Copy link

bjiyxo commented Jul 7, 2018

@gcp We should lower the learning rate in my opinion if there isn't any progress on 15b today.

@diadorak
Copy link

diadorak commented Jul 8, 2018

200k without a promotion. It's time for LR drop :)

@jkiliani
Copy link

jkiliani commented Jul 9, 2018

@bjiyxo's point still holds in my opinion. With process as slow as currently, you eventually get lucky passes. A change to LR and a plan to soon switch to 256x20 would make a lot of sense now.

@l1t1
Copy link

l1t1 commented Jul 9, 2018

2018-07-09_180356
promote too early, in next 58 games , need 30 win to pass, but only lost 28 will not pass

@Marcin1960
Copy link

@jkiliani "A change to LR and a plan to soon switch to 256x20 would make a lot of sense now."

Perhaps after lowering LR gets exhausted, to give a try to lowering promotion limit to 52% or 51% for a week or so? It would not hurt, would it?

@l1t1
Copy link

l1t1 commented Jul 9, 2018

2018-07-09_183136
final passed, we can wait for next 10 days.

@kityanhem
Copy link

image
Two nets appear in a row.
I feel like 15b no limit (just kidding!)

@wonderingabout
Copy link
Contributor

wonderingabout commented Jul 9, 2018

lz lol 1-6

no promotion in 10 days, then 7 nets appear on a row.
it was so fast that 2 networks didnt have time to be promoted (one with 83%!!)
one network was even promoted twice, normal behaviour ?
I feel like 15b no limit (just kidding @l1t1, i like your contributions on lz, so its a fun way to thank you)

@gcp
Copy link
Member Author

gcp commented Jul 9, 2018

I lowered learning rate at around 200k games, but 7ff1 passed while still using the old rate.

@wonderingabout
Copy link
Contributor

wonderingabout commented Jul 9, 2018

on a more serious note, i find it a bit strange that lz gets 2 (and maybe 3) promotions on a row after this 200k games stagnation
it could be possible that lz figured out a lot of things in the last few games, but i find this explanation strange and baseless
or it is possible that one of the networks was a false positive, making it easier to be beaten once (and maybe twice with a "random" double promotion)
this is all interesting though, i'm curious about the impact of lower learning rate in the future

@remdu
Copy link
Contributor

remdu commented Jul 9, 2018

@gcp What is the current learning rate ?

@jokkebk
Copy link

jokkebk commented Jul 9, 2018

I’ve also wondered if there is a trend with many recent "long reigning champions" (networks that last 50k+ selfplay without losing 55% to anynew nets) first being beaten, and then these new networks usually get beaten themselves in 5k games or so? "Revolution eats its children"?

Learning rate was not the culprit for many of these, as it was changed only now. Could be just random stuff ofc

@l1t1
Copy link

l1t1 commented Jul 10, 2018

@gcp why lower learning rate can speed up promote?
I found an article at https://blog.csdn.net/Uwr44UOuQcNsUQb60zk2/article/details/78566465
in which, the author says
Training should begin with a relatively large learning rate. This is because at the beginning, the initial random weight is far from the optimal value. During the training process, the learning rate should be reduced to allow for fine-grained weight updates.
p

@NhanHo
Copy link

NhanHo commented Jul 10, 2018

@jokkebk that was actually fairly easy to explain: those cases were networks of the same training batch. As they were trained with more training steps on the same data, they got stronger.

screen shot 2018-07-10 at 10 42 05 am

@Tsun-hung
Copy link

@NhanHo Do you have the plan to train new network of 1286 using later raw data, as you have done to 645 network?

@gcp
Copy link
Member Author

gcp commented Jul 10, 2018

why lower learning rate can speed up promote?

If we're in between your two examples, it's as if the parameters jump from one side of the ramp to the other, without going down (but not too big, so it doesn't diverge up either). If you halve (or more) the step sizes, it will go right down to the bottom.

@gcp
Copy link
Member Author

gcp commented Jul 10, 2018

and then these new networks usually get beaten themselves in 5k games or so?

One other factor might be that on a new iteration the learning starts from the best network, so you're now starting the learning off of a very good spot and adding a lot of iterations onto that. This could especially be true for 256k promotes.

@gcp
Copy link
Member Author

gcp commented Jul 10, 2018

@gcp What is the current learning rate ?

0.0001 @ batch size = 256.

@jkiliani
Copy link

@gcp: Could you please queue @bjiyxo V18 for a match?

@wonderingabout
Copy link
Contributor

wonderingabout commented Jul 15, 2018

lz just got a new promotion
and since it has been 400k games and 4 promotions since last elf match, is it the right time to queue another elf match or should we wait for one more network ?

@l1t1
Copy link

l1t1 commented Jul 16, 2018

lz 156 vs elf winrate 23-25%

@Mardak
Copy link
Collaborator

Mardak commented Jul 16, 2018

Anyone know if ELF is correctly resigning in the 300+ move games?

http://zero.sjeng.org/match-games/5b4be7022f06263c66c692a7

http://zero.sjeng.org/viewmatch/580fc26fdfa45d8707374cef8018a03424df3d06eff03cd0c68fa295da6132b9?viewer=wgo
http://zero.sjeng.org/viewmatch/695570224e2b2a9e617afb9e411821787afa9b357ed5da2b044aae3382f45c4f?viewer=wgo

Looking at the average network eval (no search), ELF thinks the win rate is around 15% while LZ156 thinks it's closer to 25% and 40%. So I suppose at least both agree that ELF was in a losing position…

@ChinChangYang
Copy link
Contributor

To self-prove the resign is correct or not, Leela Zero can continue the game without resigning? Then, scoring from the endgame to see which side (black or white) wins.

@wonderingabout
Copy link
Contributor

wonderingabout commented Jul 16, 2018

some games are wrongly resigned, but it is less than 5% generally, so on 400 games we can assume the number of wrongly resigned games is the same
so, with a delta of arround 5%, it seems leela zero got significantly stronger in the last 400k games

thanks for doing the elf test, i was surprised to see it improve that much (with a delta of possible error margin in mind)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests