40B v 15B #1810

qzq1 · 2018-09-05T18:18:31Z

I matched 174 (40B) and 157 (15B) with 1 visit each for 100 games, CPU only. The 40B weights won 86.0% of the games - almost exactly the expectation based on the ELO difference between the two.*

What I find odd is that every game of the 100 was identical through 43 moves, though the colors were alternating. It takes my computer an hour to play games at 500 visits, but the couple so far are identical to the 100 through 24 moves (all star point opening, double approach by B in one corner, etc.). Is this an artifact of the training,, or a result of my sad machine being overwhelmed? I can't make sense of it.

*actually, based on a modification of the win likelihood formula based on my experience matching other versions. P(A) = 1/(1+10m), where m = (Br - Ar)/800.

roy7 · 2018-09-05T18:30:17Z

With 1 visit the only randomness will come from the rotation of the board. Generally, the highest prior move will always be the same and thus the games will repeat since that first visit is all they can do.

qzq1 · 2018-09-05T19:12:12Z

Thanks for the response. I anticipated that every second game might start nearly the same. I was more surprised that when the colors switched the games were so similar for so long. I thought the difference in the structure of the weights and the more than 650 ELO difference might have shown up in opening tendencies, which is what I was curious about. But I suppose in the grand scheme of things - i.e., the number of training games - there isn't so much difference yet.

jkiliani · 2018-09-06T11:11:48Z

Can the 40b win against 15b at equal time? I presume it should, but it's likely not that decisive... did you try this as well yet?

qzq1 · 2018-09-06T15:56:53Z

@jkiliani, I don't have the equipment to look at that question properly (i.e., it takes so long to run games CPU-only at a decent number of visits), but based on a cursory look with very low visits, I find 15B to be stronger, even with less time. I'm running 100 games now allowing 2 visits for 40B and 20 visits for 15B. Still 40B uses twice as much time, and 15B has currently won 11 of 16 games.

For me, it's no contest for everyday use: I would always use 15B for live analysis or game review. I have no idea if a GPU would narrow the performance gap, though it would obviously make 40B more practical for my purposes..

qzq1 · 2018-09-06T16:12:46Z

Following up on my original observation about repeated openings: spot checking recent 40B-40B match games, I have yet to find a single one that does not begin with Black star point, White diagonal star point, Black star point (followed more often than not by another White star point). That unchanging three-move opening strikes me as odd. There's clearly plenty of variation in the training games.

Marcin1960 · 2018-09-06T16:23:38Z

My dream would be 1. to let 40b do self-play games to accumulate them and to advance (hopefully, clients will not desert), then train 20b (I hope it will not be dumped as 15b was) and 15b on those games.

Vargooo · 2018-09-06T16:29:52Z

jkiliani said :

Can the 40b win against 15b at equal time?

No, see HERE for a 60 game match at equal time between networks 157 and 174

Strappa71 · 2018-09-06T16:54:10Z

@qzq1 Is that really that surprising? Training games have randomizing to create variation. When you let LZ try and find "best move" with very few visits then less variation is exactly what happens.

qzq1 · 2018-09-06T17:35:43Z

@Strappa71 regarding my first post and follow up, what I found a bit surprising was that both LZ 40B and 15B played the same, fixed openings through at least 43 moves, whether black or white. I thought there would be enough difference between the two that that would not be the case. Upon reflection I guess they are still quite closely related, genealogically, so perhaps it's not a great surprise.

Regarding my last post, on the 40B - 40B matches, those are games with 1600 visits, and I do find it odd that the first three moves never vary. Even if it's not a surprise because of the underlying methodology, it's fair to ask - speaking as a go player - whether that indicates a rigidity that makes LZ less useful and interesting. As gcp has noted elsewhere, LZ generates a lot of variety in 'second best' moves, so it's not a great concern for practical purposes. But I do think it's something to note. Granting that the available Alphago games were curated, and not necessarily representative, they display more variety in openings.

Splee99 · 2018-09-06T17:59:11Z

To my understanding the net2net conversion doesn't introduce any new knowledge to the neural network. This was what happened when we initially bootstrapped the 15b to 20b then to 40b. To use the full potential of the 40b network, we have to do A LOT of self play training.

maxinjapan · 2018-09-06T18:08:31Z

@qzq1 I too find too many identical openings (up to move 37 or so) even in current match games.
I think it kind of makes sense, actually. The other moves are second (or third, or forth, etc) best and as such it is not really convenient for LZ to play them. If you already knew one move was better than the others, even if marginally, which one would you use in a match?

qzq1 · 2018-09-06T19:15:54Z

@maxinjapan I understand that in match play LZ always chooses the move it evaluates as best, and that move is likely to be unchanging given identical circumstances - for a given set of weights. Yet, if there is to be any advance in strength, we expect to see changes in play from one set of weights to another. One question is whether there is any "true" superiority in, for example, Black playing move 3 on the star point rather than one of the 3-4 points. I doubt that it is true as a matter of principle, so either LZ gets stuck at a local minimum and is missing something (until lots of accumulated training games gets it out of the trough) or maybe it's just that if there is no meaningful difference in true value between alternative moves, there is no way to train LZ off its initial choice. Possibly we won't ever know for sure.

gjm11 · 2018-09-06T19:24:21Z

In training self-play, the moves are randomized somewhat and quite a lot of games will have (say) black playing 3-4 on move 3. If those games turn out better on average than games where black plays 4-4, the network will learn this and start playing 3-4 more often.

maxinjapan · 2018-09-06T19:40:49Z

@qzq1 @gjm11 or maybe LZ has already learned that starting in 4,4 is better.
I'd leave the answer to the 9Ds around here - this is way above my level

gjm11 · 2018-09-06T22:14:04Z

@maxinjapan If the network strongly prefers playing 4-4 to playing 3-4 (which I think it does) then that's good evidence that that playing 4-4 is (at least given the network's other strengths and weaknesses) better than playing 3-4. I was arguing (against qzq1 and agreeing with you, unless I'm confused) that we shouldn't be too worried that LZ might just have randomly latched onto 3-4 in preference to 4-4 and have no way of discovering if it's wrong. I think it will discover if it's wrong because self-play explores a wide variety of openings, I think it would have discovered by now if it were systematically playing bad openings, and I think we should probably trust that whatever openings it plays all the time really do work better (at least for LZ playing against LZ) than other options.

maxinjapan · 2018-09-07T07:40:10Z

@gjm11 yes, that was my point too :-)

kuba97531 · 2018-09-07T08:22:25Z

I have run a local test of 256 games on time parity of 15B (157) and 40B (174).
15B has won 134 vs 122 (52.3%) which shows that likely neither net is considerably stronger.

run on a relatively recent next branch
Both 157 and 174 had identical GPU (1080Ti) with 1 second per move pondering enabled.
in 1 second 174 was getting rougly 330 playouts / s, and 157 about 3-4 times that much

I have uploaded logs and sgfs generated from logs here:
https://kklzhost.com/leela-zero/157_vs_154_time_parity.zip

The openings are very repetitive but are completely different depending on who is black and white

hred6 · 2018-09-09T02:58:54Z

Half precision compute support: NO

In the *.log file
Why is that?

l1t1 · 2018-09-16T07:21:03Z

request test 177 weight

qzq1 · 2018-09-17T14:46:49Z

I ran 40B (177) with 100 visits versus 15B (157) with 600 visits. No GPU, no pondering; each averaged about 13.7 seconds per move; (wish I had better hardware).
After 156 games, 40B (177) had won 20 games (17%); 19 as white, 1 as black. Not statistically different from what I was able to do with LZ176, which isn't too surprising given the low number of games, but in any case, no striking difference for the newer 40B weights.

l1t1 · 2018-09-20T00:42:26Z

request test 178 weight

someone did this https://lifein19x19.com/viewtopic.php?f=18&t=16086&sid=03587ca0498c93eae434bb198726cc15

qzq1 · 2018-09-20T16:03:48Z

Great effort by xela posted to lifein19x19, but unfortunate that the 157 weight wasn't part of the test. Also, I would prefer more games for a smaller set of engines.
I ran my even-time comparison using 178 v 157 overnight. Only 45 games, with win rate of about 18% for 178, not distinguishable from the results for 177, though the opening patterns were somewhat different.

Splee99 · 2018-09-21T00:07:54Z

My impression is that 40b is too optimistic in winning a ko. Very often when a ko is lost the winrate drops considerably. More training in that area will be needed for the overall strength increase.

l1t1 · 2018-10-01T10:27:27Z

https://www.lifein19x19.com/viewtopic.php?f=18&t=16109&sid=7a1fc0e6c3286791b9150d9ac0cf3bc0

Marcin1960 · 2018-10-01T14:07:51Z

Now testing ELF 240x20 against new 128x10 (LeelaDan vs LeelaZeroT on KGS). Same time about 30 sec per move.

#1889

gjm11 · 2018-10-01T14:15:59Z

@Splee99 It seems like correct evaluation of a position with a big ko is likely to depend on detailed tactical reading -- does the 40b network do better when given plenty of time to think?

Splee99 · 2018-10-01T15:55:26Z

I would think the ko is related to tactic reading, too. However, there is some ko knowledge embedded somewhere in the 40b as appeared in the games with only 1 playout. And after further training it seems to handle ko better.

l1t1 · 2018-10-03T12:42:45Z

http://www.yss-aya.com/cgos/19x19/cross/LZ_181_6882_p400.html

gcp closed this as completed Oct 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

40B v 15B #1810

40B v 15B #1810

qzq1 commented Sep 5, 2018

roy7 commented Sep 5, 2018

qzq1 commented Sep 5, 2018

jkiliani commented Sep 6, 2018

qzq1 commented Sep 6, 2018

qzq1 commented Sep 6, 2018

Marcin1960 commented Sep 6, 2018

Vargooo commented Sep 6, 2018 •

edited

Strappa71 commented Sep 6, 2018

qzq1 commented Sep 6, 2018

Splee99 commented Sep 6, 2018

maxinjapan commented Sep 6, 2018 •

edited

qzq1 commented Sep 6, 2018

gjm11 commented Sep 6, 2018

maxinjapan commented Sep 6, 2018

gjm11 commented Sep 6, 2018

maxinjapan commented Sep 7, 2018

kuba97531 commented Sep 7, 2018 •

edited

hred6 commented Sep 9, 2018

l1t1 commented Sep 16, 2018

qzq1 commented Sep 17, 2018

l1t1 commented Sep 20, 2018 •

edited

qzq1 commented Sep 20, 2018

Splee99 commented Sep 21, 2018

l1t1 commented Oct 1, 2018

Marcin1960 commented Oct 1, 2018 •

edited

gjm11 commented Oct 1, 2018

Splee99 commented Oct 1, 2018 via email •

edited

l1t1 commented Oct 3, 2018

40B v 15B #1810

40B v 15B #1810

Comments

qzq1 commented Sep 5, 2018

roy7 commented Sep 5, 2018

qzq1 commented Sep 5, 2018

jkiliani commented Sep 6, 2018

qzq1 commented Sep 6, 2018

qzq1 commented Sep 6, 2018

Marcin1960 commented Sep 6, 2018

Vargooo commented Sep 6, 2018 • edited

Strappa71 commented Sep 6, 2018

qzq1 commented Sep 6, 2018

Splee99 commented Sep 6, 2018

maxinjapan commented Sep 6, 2018 • edited

qzq1 commented Sep 6, 2018

gjm11 commented Sep 6, 2018

maxinjapan commented Sep 6, 2018

gjm11 commented Sep 6, 2018

maxinjapan commented Sep 7, 2018

kuba97531 commented Sep 7, 2018 • edited

hred6 commented Sep 9, 2018

Half precision compute support: NO

l1t1 commented Sep 16, 2018

qzq1 commented Sep 17, 2018

l1t1 commented Sep 20, 2018 • edited

qzq1 commented Sep 20, 2018

Splee99 commented Sep 21, 2018

l1t1 commented Oct 1, 2018

Marcin1960 commented Oct 1, 2018 • edited

gjm11 commented Oct 1, 2018

Splee99 commented Oct 1, 2018 via email • edited

l1t1 commented Oct 3, 2018

Vargooo commented Sep 6, 2018 •

edited

maxinjapan commented Sep 6, 2018 •

edited

kuba97531 commented Sep 7, 2018 •

edited

l1t1 commented Sep 20, 2018 •

edited

Marcin1960 commented Oct 1, 2018 •

edited

Splee99 commented Oct 1, 2018 via email •

edited