Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tian Yuandong: Summary of the ELF World Artificial Intelligence Competition #1673

Closed
l1t1 opened this Issue Jul 30, 2018 · 37 comments

Comments

Projects
None yet
@l1t1
Copy link

l1t1 commented Jul 30, 2018

https://zhuanlan.zhihu.com/p/40834419

Tian Yuandong

Excellent answer to topics of artificial intelligence and deep learning

This game ELF OpenGo has been using the V100 single card, without adding any computing resources. The neural network model has been slightly improved, but it is still a version of 224x20blocks. This time we were able to enter the semi-finals, and it was unexpected to score 2-3 with the golaxy in the semi-finals. After that, we will open source the version of the neural network, welcome everyone to use.

In the preliminaries, because of the 30 seconds per step, the parameters are also a bit problematic, so the chess power will drop a bit and return to normal during the rematch. After the first set against the octopus in the quarter-finals, we fixed the problem that the falling time often times out, and the time limit becomes very accurate (56 seconds per step, the error is tens of milliseconds). However, in the last two rounds of the rematch, the new code with the problem was changed for the golaxy and AQ, and the parameters were not set correctly, which would lead to weaker chess. The semi-final against the golaxy changed back to a stable version.

Overall, this time we insisted on a civilian single card configuration and achieved much higher than expected results, I feel very happy. Our goal has always been to enable the masses to use artificial intelligence to gain convenience in their daily lives, and to serve as scientists, and to contribute to the world. This time, we did a good job.

Thanks to the company and the organizers for their support!

@l1t1

This comment has been minimized.

Copy link
Author

l1t1 commented Jul 30, 2018

@l1t1

This comment has been minimized.

Copy link
Author

l1t1 commented Jul 30, 2018

elf will play with
a new player of korea at #1573

@tterava

This comment has been minimized.

Copy link
Contributor

tterava commented Jul 30, 2018

It's probably been translated from chinese, and for some reason go gets translated to chess. It's quite funny actually.

@diadorak

This comment has been minimized.

Copy link

diadorak commented Jul 30, 2018

Octopus is the name of another program in the competition.

@gogoai

This comment has been minimized.

Copy link

gogoai commented Jul 31, 2018

The ELF new version has been released
https://github.com/pytorch/ELF/releases

@l1t1

This comment has been minimized.

Copy link
Author

l1t1 commented Jul 31, 2018

can you use the code (https://github.com/gcp/leela-zero/tree/master/training/elf) to convert it to a lz format weight? and run a test match against elf v0?
@gcp @roy7

@l1t1

This comment has been minimized.

Copy link
Author

l1t1 commented Jul 31, 2018

I upload the convert weight to https://userscloud.com/4dgfy46eum5p
sha256 of text d13c40993740cb77d85c838b82c08cc9c3f0fbc7d8c3761366e5d59e8f371cbd

@Mardak

This comment has been minimized.

Copy link
Collaborator

Mardak commented Jul 31, 2018

Looks like the new ELFv1 network wants to play step 12 of the "final joseki" differently at 3200 visits #1442:

LZ157 (192x15)
 S16 ->    2330 (V: 54.58%) (N: 66.78%) PV: S16 R18 T17 Q18 S17 N18 O16 N17 R14 O14 S3 S2 N2 S4 N4 O5 O4 S5 F3 C12 C8 E12 C17 C18 D16 E17 G14
 P19 ->     722 (V: 54.37%) (N: 23.43%) PV: P19 N18 Q18 N17 S3 S2 N2 S4 N4 O5 O4 S5 O15 Q12 F3 C12 C8 E12 L14 C6 E8
 Q18 ->     124 (V: 53.61%) (N:  6.19%) PV: Q18 N18 P19 N17 S3 S2 N2 S4 N4 O5 O4 S5 O15 Q12 F3 C12 C8
 O16 ->      23 (V: 53.29%) (N:  1.35%) PV: O16 N17 Q18 N18 O15 M15 S3 S2 N2 S4

LZ160 (256x20)
 S16 ->    2205 (V: 53.56%) (N: 64.10%) PV: S16 R18 T17 Q18 S17 N18 O16 N17 R14 O14 Q12 C6 O15 N15 P15 N16 C12 Q13 C5 D6 F4 C10 C8 D8 D9 C9 E8
 P19 ->     850 (V: 53.44%) (N: 27.03%) PV: P19 N18 Q18 N17 S3 S2 N2 S4 N4 O5 O4 S5 F3 C12 O15 Q12 C9 E12 E9 C6 C5
 Q18 ->     114 (V: 52.97%) (N:  4.82%) PV: Q18 N18 P19 N17 S3 S2 N2 S4 N4 O5 O4 S5 F3 C12 O15 Q12 C9
 O16 ->      30 (V: 52.75%) (N:  1.44%) PV: O16 N17 Q18 N18 O15 M15 S3 S2 N2 S4 N4

ELFv0
 S16 ->    3103 (V: 53.56%) (N: 96.31%) PV: S16 R18 T17 Q18 S17 N18 O16 N17 R14 O14 Q12 C3 C4 D3 E4 F2 O15 N15 P15 N16 D11 Q13 P12 R13 Q15 R10 S11 P10 R8 C9 E9 C6
 P19 ->      44 (V: 53.21%) (N:  1.73%) PV: P19 N18 Q18 N17 O15 Q12 R10 Q10 Q9 P10 R11
 O16 ->      26 (V: 53.36%) (N:  0.94%) PV: O16 N17 Q18 N18 O15 M15 S3 S2 P2 S4 R2
 Q18 ->      26 (V: 53.22%) (N:  1.01%) PV: Q18 N18 O16 N17 O15 M15 S3 S2 P2 S4

ELFv1
 Q18 ->    1290 (V: 49.66%) (N: 15.98%) PV: Q18 N18 O16 N17 O15 M15 Q11 P14 N14 P12 M14 K15 P15 R13 S16 S15 T17 Q14 L14 K16 K14
 S16 ->    1179 (V: 47.53%) (N: 70.69%) PV: S16 R18 T17 Q18 S17 N18 O16 N17 R14 O14 N3 N2 M3 C6 Q12 C13 O15 N15 P15 N16 C17 C18 D16 E17 D12 D13
 O16 ->     730 (V: 49.40%) (N: 13.08%) PV: O16 N17 Q18 N18 O15 M15 Q11 P14 N14 P12 M14 K15 P15 R13 S16 S15 T17 Q14 L14 K16 K14

The priors and searched win rate are quite different with ELFv1. And just checking how ~self-play at 800 vs 1600 with ELFv1 would be like:

800 visits
 S16 ->     322 (V: 46.89%) (N: 70.69%) PV: S16 R18 T17 Q18 S17 N18 O16 N17 R14 O14 Q12 N3 O4 S3 O15 N15 P15 N16 D11 Q13 P12 R13 Q15 Q8 O8 R10 S11 P10 R11
 O16 ->     307 (V: 50.87%) (N: 13.08%) PV: O16 N17 Q18 N18 O15 M15 Q11 P14 N14 P12 M14 K15 P15 R13 S16 S15 T17
 Q18 ->     170 (V: 49.68%) (N: 15.98%) PV: Q18 N18 Q10 O19 S16 N14 N3 N2 M3 C13 C7 E13

1600 visits
 Q18 ->     583 (V: 50.22%) (N: 15.98%) PV: Q18 N18 O16 N17 O15 M15 Q11 P14 N14 P12 M14 K15 P15 R13 S16 S15 T17
 S16 ->     549 (V: 46.99%) (N: 70.69%) PV: S16 R18 T17 Q18 S17 N18 O16 N17 R14 O14 N3 N2 M3 Q12 P13 Q9 P12 Q11 Q13 C6 C8
 O16 ->     467 (V: 50.18%) (N: 13.08%) PV: O16 N17 Q18 N18 O15 M15 Q11 P14 N14 P12 M14 K15 P15 R13 S16 S15 T17 Q14 L14 K16

So even with 1600 visits, it would generate training data quite different from the LZ 192x15 and 256x20 networks.

@gcp

This comment has been minimized.

Copy link
Member

gcp commented Jul 31, 2018

Test match for this against previous ELF is queued.

@jkiliani

This comment has been minimized.

Copy link

jkiliani commented Jul 31, 2018

Looks very promising, but the progress chart apparently needs fixing again.

If ELF v1 turns out to be a substantial improvement on ELF v0, will we resume to produce ELF self-play games, and gradually replace the ELF v0 games used in training with v1 games?

@gcp

This comment has been minimized.

Copy link
Member

gcp commented Jul 31, 2018

@roy7 help!

If it turns out that we need ELF games to stabilize the current training, and this net is stronger, then yes, I think it makes a lot of sense to self-play with the new ELF network.

@Eddh

This comment has been minimized.

Copy link
Contributor

Eddh commented Jul 31, 2018

I guess we can hope that this time, the 20 block net will have the capacity to surpass ELF, at which point ELF generation can finally be stopped. It seems like it makes bootstrapping harder so if we can avoid it...

@jkiliani

This comment has been minimized.

Copy link

jkiliani commented Jul 31, 2018

I can't think of any reason why a 256x20 net should not be able to surpass a 224x20 net, given enough training. But who knows, maybe in a few months, Facebook will produce a Super-ELF with 40 blocks or something...

@herazul

This comment has been minimized.

Copy link

herazul commented Jul 31, 2018

It seems that they managed to squeeze quite a bit of additional strengh in their 20b net. It's impressive, their training process seems on point !
It's a kinda good timing for our 20b bootstrap, if we generate games with elf v1, it will maybe help to carry the 20b net above the 15b net and out of the low WR trench.

@roy7

This comment has been minimized.

Copy link
Collaborator

roy7 commented Jul 31, 2018

@gcp I can change the ELF hash in server to the new one and turn selfplay back on if you like. From the 3-3 joseki thread it sounds like ELF v1 has quite a bit different view of some things than v0 or LZ, so the learning targets might be really different.

@gcp

This comment has been minimized.

Copy link
Member

gcp commented Jul 31, 2018

I think that's a good idea, yes. I can't see any scenario in which having ELF v1 self-play data is not useful to us in some way, so it make sense to spend resources on it.

@jkiliani

This comment has been minimized.

Copy link

jkiliani commented Jul 31, 2018

Should the new ELF self-play be at 3200 visits or 1600?

@herazul

This comment has been minimized.

Copy link

herazul commented Jul 31, 2018

@jkiliani Yeah good question, since these elf v1 self play games will be with us for a long time (until 20x256 net catch up at least), would'nt it make sense to make it 3200 visit or more to have very good quality data ? It would maybe be a very good ressources spending investment.

@jkiliani

This comment has been minimized.

Copy link

jkiliani commented Jul 31, 2018

I'm tending to 3200 visits as well, but it's a question for @roy7 if different visit counts for the main 256x20 nets and the ELF net is problematic for the server...

@gcp

This comment has been minimized.

Copy link
Member

gcp commented Jul 31, 2018

Should the new ELF self-play be at 3200 visits or 1600?

I don't see any reason to use more than v=1600 (an argument can even be made it should be even lower, as gating is not an issue in this case).

In terms of training, overfitting etc, I'll take twice the games over <100 or so Elo in self-play quality.

But let's keep resigning on for these.

@herazul

This comment has been minimized.

Copy link

herazul commented Jul 31, 2018

I'll take twice the games over <100 or so Elo in self-play quality.

It may be the good call for the usual phasing window of games, but is it still true for the 250k games of elf v1 that will stay with us for maybe several months ?

@gcp

This comment has been minimized.

Copy link
Member

gcp commented Jul 31, 2018

I'd sample the set we have randomly if it's more than 250k.

@herazul

This comment has been minimized.

Copy link

herazul commented Jul 31, 2018

You think having 500k games to sample randomly from is more efficient for training than having a 250k games quite a few dozens ELO stronger ?
It seems counter intuive for me at first glance, because i imagine that in a test-match setup, elf v1 1600 visit would'nt score much higher (maybe even lower ?) than elf v0 at 3200 visit. So my intuition screams at me that the training would benefit a little from diversity (new v1 elf games would be a little different from previous v0 elf games even if playing strength is similar), but not THAT much from the skill of the new v1 elf net because the new elf games wouldn't be of higher ELO than the previous games of elf (due to the reduction of visits).
edit - if i'm wrong, and the training could benefit from a lot from 1600 visit elf games, would that mean that the training would benefit a lot, in theory, from a monster net like alphago 40b playing at very low visit like 10 visits), even if the resulting games doesn't have much a very high absolute ELO score ?
edit 2 - two sentences were wrongly phrased

@Friday9i

This comment has been minimized.

Copy link

Friday9i commented Jul 31, 2018

75% of Elf games at 1600 visits and 25% at 6400 visits? That would give diversity, depth and quite a lot of games efficiently

@jkiliani

This comment has been minimized.

Copy link

jkiliani commented Jul 31, 2018

Why sample the ELF set randomly, instead of just phasing out the oldest games, like we're doing for the regular self-play? Phasing will guarantee that the v1 games are always used preferentially to the v0 self-play...

Edit: I probably misunderstood, so the sampling is only applied once v0 games are completely replaced by v1 in the training window?

@herazul

This comment has been minimized.

Copy link

herazul commented Jul 31, 2018

@jkiliani he said he will sample randomly if we had more than 250k games of v1 elf net. So he was saying that he would prefer having 500k v1 elf games at 1600 visits where he would sample randomly from, than having 250k v1 elf games at 3200 visits.

@gcp

This comment has been minimized.

Copy link
Member

gcp commented Jul 31, 2018

You think having 500k games to sample randomly from is more efficient for training than having a 250k games quite a few dozens ELO stronger ?

Given that the ELF games don't change in the training window, the benefit probably reduces a bit after a few training cycles. If there's more games that we can randomly sample, that seems like it should be better. You can try doing SL on just the current ELF window, and see how strong you can get. I suspect you'll find you're limited by the amount of data, not the strength.

I mean, it's a tradeoff. I don't see why we would generate higher quality ELF games by arbitrary running them at a higher level, instead of having more data to train on, or start generating more new LZ self-play games faster. Running ELF on v=1600 will generate higher quality data than current LZ games at the same speed, so that seems like an obvious win and why I'm interested. (And as pointed out, this is probably true for ELF@v=800, yet nobody suggested lowering visits for some reason!)

The same arguments used for running ELF at v=3200 seems like they could be used to run ELF at a million visits...but both of those ignore the trade-off.

@jkiliani

This comment has been minimized.

Copy link

jkiliani commented Jul 31, 2018

About visit count for ELF games: I think part of the community generally believes in high visit counts, but in my case, this suggestion was based on cost of investment: In the past, an individual ELF game was used in training far more often than an individual regular self-play game, so I thought it sensible to invest more into the ELF games individually. The idea to produce more ELF games in order to not overtrain on them as much as before may turn out to better though.

@herazul

This comment has been minimized.

Copy link

herazul commented Jul 31, 2018

The same arguments used for running ELF at v=3200 seems like they could be used to run ELF at a million visits...but both of those ignore the trade-off.

That what i was thinking about while driving, there is a tradoff and therefore must be a sweet spot that we have no way of knowing or finding, so we are back to pointless discussion since i don't have any data to back up anything. My bad ! (I think it happen a lot because the project is so interesting and all those possibilities and parameters makes up for interresting theorycrafting discussions ^^)

@kfc51151271

This comment has been minimized.

Copy link

kfc51151271 commented Jul 31, 2018

The winrate of elf v1 is strange. For example, the first move of Black has a win rate of more than 50, which seems to be different from other software. And then the next move if White play at (4,3) instead of (4,4), the winrate of Black will rise directly to more than 60! Is it too extreme?

@diadorak

This comment has been minimized.

Copy link

diadorak commented Jul 31, 2018

@kfc51151271 pytorch/ELF#74

ELF OpenGo trains using an even distribution of black wins and white wins. Consequently, the WR of the first frame should be close to 50/50.

@Mardak

This comment has been minimized.

Copy link
Collaborator

Mardak commented Jul 31, 2018

I'd sample the set we have randomly if it's more than 250k.

Will this happen immediately with the inclusion of the ELFv0 games, which puts us past the 250k limit already (although I forget how many early ELFv0 games had resign off and probably are lower quality??)?

Or will the most recent 250k ELFv{0,1} games be used shifting out v0 and shifting in V1 until we reach ~500k total ELF games. Then at that point switch to sampling 250k of the most recent 500k and keep generating ELFv1 until ~750k total ELF games?

Separately, the server is set up to be 25% ELFv1 self-play, which is the same as the 25% ELFv0 rate towards the end, but we did initially start with 50%. Keep the 25%?

@kfc51151271

This comment has been minimized.

Copy link

kfc51151271 commented Aug 1, 2018

It seems that elf-self play game tend to end up more quickly. In the past 124 elf-self play game, 44 games are less than 100 moves while only 15 games play more than 200 moves

@Ishinoshita

This comment has been minimized.

Copy link

Ishinoshita commented Aug 1, 2018

@kfc51151271: I did some stats on 6400 self-play games from ELF v0 vs ~ 4300+ LZ156 self-play games (here: #1656 (comment)) and I concur. But I'm still wondering what's the reason behind this difference. Sharper value head vs being ahead/late after a suboptimal move (temperature effect) or failure in a local sequence, causing the search/move policy to collapse earlier in the game? Or overall more agressive line of play, bringing one side to sucomb earlier? Hard to say not having the win rate value in the training data.

@diadorak

This comment has been minimized.

Copy link

diadorak commented Aug 1, 2018

ELF winrate is very sensitive (because of sharp value head?). It is not rare to see that one bad move (due to t=1 in training) causing win rate to drop 40%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.