is it time to increase net size to (40+n)x256? #2396

l1t1 · 2019-05-22T05:48:24Z

just like
#965

Friday9i · 2019-05-22T09:54:14Z

I would really prefer more innovations rather than larger net improving extremely slowly...
@lightvector introduced with KataGo many innovations (many of them have been discussed here but were rejected...) and had a huge success. Here is a list of the innovations introduced

very little time spent on "no resign" games (versus ~30% of the GPU time still spent for LZ on these crap endgame moves) ...
komi is adjustable
native handicap management (and a % of selfplay games is played with handi and a randomly modified komi)
active play until the end of games (ie no "slack" endgame moves)
native support of all board sizes from 9x9 to 37x37 (and the possibility to play non-square boards, such as 9x19 or whatever sizes!)
selfplay games are played on random board size (!)
variable visits in order to explore more deeply interesting/unclear moves, and to speed up selfplay game generation: most of the moves are played "fast" with low visits and quite a lot of noise to create diversity and some moves are much deeper, and they are the ones used for training! A very clever way to speed up training and performance (variants already proposed here)!
50% gating (vs 55% for LZ): clever way to get more frequent updates of the net and to get the opportunity to have more diversity in selfplay games (proposed already)
random rulesets (japanese rules, trump-taylor, ...)
branching of selfplay games with different komi
additional random noise in the first game moves, in order to increase exploration of alternative openings (proposed also for LZ)

All in all, here are the benefits of these innovations:

much more diversity
a WAY faster improvement: 30x to 100x faster learning than LZ! (it reached ~LZ130 level in 1 week with ressources about equivalent with LZ, where LZ took 6 months!)
a much more flexible net (board sizes, komi, handicap, rule sets)

JMHO

Splee99 · 2019-05-22T23:57:08Z

I think it's time to try different self play parameters, such as visits, randomcnt and randomvisits (i.e. the minimum acceptable visits for a random move, which seems to be important).

iopq · 2019-05-23T01:54:43Z

I think 52% gating has been proposed as an alternative. I think the project has proved that zero is viable. But now it would be cool to use these resources to build a slightly optimized version for HANDICAP games, different komi and different ruleset games.

These are the actual requirements. We can see if it ever learns to play ladders by itself better later.

But counting points comes not as necessarily an optimization, but as a requirement of being able to play with different rulesets, handicaps, komi, etc. which are features actual go players want

generating separate games for all of these is wasting resources, while using a value network that predicts board ownership is better - although I'm not sure that going point by point is good, since many points are tied to each other (either white takes these ten, or the other nine)

wonderingabout · 2019-05-23T11:45:51Z

all these new features are likely to hinder search max strength for the sake of diversity
and the 50% gating is just lost on such a small community, the slightest elo yoyo will make community go back and forth

the way i see it, just keep adding other search enhancements like lcb, then redo a run from scratch directly at 20b (then upgrade to 40b like alphago zero did), without fancy things (unless they are solidly proven reliable)

i'd rather bet on a fresh run whenever SAI is ready for 19x19 rather than katago
maybe @Vandertic has some thoughts on this #1835

until then, lz will keep stalling and contributors may get impatient, but that may not be a bad thing

iopq · 2019-05-23T11:48:08Z

Counting points instead of win rates in the endgame will likely result in a higher maximum strength, not lower. LZ learns strong endgame moves because it doesn't play lax endgame when ahead.

wonderingabout · 2019-05-23T14:09:51Z

@bubblesld good timing in #2398 , i wanted to suggest in my previous message to consider stopping to quantize nets in the next run from scratch, if there is any (with sai 19x19 for example)

maybe it has some long term effects 👍

bubblesld · 2019-05-23T15:34:24Z

Personally, I do not see quantized is an isuue. It is just an approximation. I think that the difference is strength is little.

From the latest test match, it seems that 2xx mb does not cause any problem this time. A quantized 80b version will be less than 200mb. That means we can increase the blocks if necessary. However, I think that 40b is not done yet. You can see that v225 has about 70% w.r. against v215. And from my personal test,
v226 with 1600 visits vs minigo v17 990 with 3200 visit = 99:101
It means that lz-40b is catching up minigo v17 using the same time. (comparing to v222 vs minigo v17 is about the same strength for the same visits)

l1t1 · 2019-05-23T22:29:22Z

could someone explain the different importance of numbers in the weights file, maybe some less important number take little effect to the stength, so they can be quantized firstly

l1t1 · 2019-05-26T02:48:47Z

2019-05-26 02:37 f9f18e65 VS a20c31da 23 : 56 (29.11%) 79 / 400 fail
lz226 is a good student, but is a bad teacher

wonderingabout · 2019-05-26T06:43:27Z

i think this is plain nonsense
i think lz227 is going to be very strong (but then hard to beat)
we'll see though if a lucky promotion doesnt get in the way, which is not desirable

iopq · 2019-05-26T08:05:20Z

@wonderingabout there is no evidence 226 is stronger than 225-222

https://cloudygo.com/leela-zero-eval/eval-graphs

against a panel of other LZ versions, they perform almost exactly the same, there has been no increase in strength

wonderingabout · 2019-05-26T08:17:25Z

@iopq 50 game match are totally unsignificant

https://cloudygo.com/leela-zero-eval/eval-model/226?sorted=False

the most accurate data we have so far is 56% winrate in 400 games -v 1600 against lz 225, which is significant, i think

iopq · 2019-05-26T08:18:59Z

the last 5 nets have something like 3000 games among them, that has to have some significance

wonderingabout · 2019-05-26T08:20:08Z

@iopq unless i am mistaken, i dont see 3000 games in the lz 226 match against other networks

iopq · 2019-05-26T08:29:42Z

it's not just LZ 226

LZ 225 plays against the same set of networks and has the same elo
LZ 224 plays against the same net of networks and has the same elo
LZ 223 plays against the same set of networks and has the same elo
LZ 222 plays against the same set of networks and has the same elo

it would be extremely unlikely to happen if their strengths were very different, like 100 points difference between 222 and 226 is very unlikely

someone can do the actual statistical analysis, but there hasn't been any noticeable progress

wonderingabout · 2019-05-26T08:31:30Z

@iopq the elo numbers extracted from 50 game matches are again, totally unsignificant

so you cant draw any conclusion on such small game samples, i think

iopq · 2019-05-26T08:33:26Z

If you have 5 different networks and they do 50 game matches vs. each other and nobody has a higher elo... that's still hundreds of games being played total

individual number of games in each match doesn't matter, especially since they also have the same winrate against older networks too

wonderingabout · 2019-05-26T08:38:00Z

@iopq you are saying that each of these last 5 networks have the same winrate, which is significantly true in the official test, generally at 55%
but the winrate being the same in each of the 50 game matches may just be a total coincidence or something real, there is no way to conclude this from 50 game matches

individual number of games matter, because each of these networks are different, and if A > B and B > C then A is not necessarily > C, because there are several aspects of strength in go, and there is fluctuation in the data even in a 400 game match (if you do 10 x 400 game matches, you will get a different number everytime, as we saw recently in the 15b test match that @bubblesld did

but still, 50 game match is just totally unsignificant, be it the right result or not, it is at best only good enough to tell very grossly the strength range of a network

wonderingabout · 2019-05-26T08:43:30Z

@iopq see : #2192 (comment)

iopq · 2019-05-26T08:48:47Z

But the total number of matches is around 3000, not 50. This is so far the most statistically significant result, since it's bigger than the 400 game matches we have normally.

wonderingabout · 2019-05-26T08:54:04Z

no, the total number of matches of lz 226 is :
50 vs lz225
50 vs lz224
50 vs lz223
etc...

the total number of matches of lz 225 is :
50 vs lz224
50 vs lz223
etc...

if i used your logic, i could say that :

lz 226 had > 10 000 games other networks but still didnt get beat, so it is strong
lz 226 has 400 x 5 = 2000 games of being significantly stronger than lz 221, so it is significant...

but this logic is flawed, as the total number of games does not indicate anything : each of the small parts of this sum is made of different matches, again

wonderingabout · 2019-05-26T08:54:47Z

just because you're gathering a bunch of individually unsignificant data together doesnt mean the whole becomes magically significant...

tapsika · 2019-05-26T13:35:47Z

just because you're gathering a bunch of individually unsignificant data together doesnt mean the whole becomes magically significant...

This is exactly what 400 game matches do (gathering a bunch of individually insignificant 1-game matches then average).

It is possible to talk about the probability of having, say, 10 individual 50 games matches and the average performance of these nets is still too low or not. (I'm not saying anything about that.)

iopq · 2019-05-26T13:50:31Z

just because you're gathering a bunch of individually unsignificant data together doesnt mean the whole becomes magically significant...

that's exactly how law of averages works

l1t1 · 2019-05-27T03:10:02Z

186K+ self play games, the biggest number since lz157

Vandertic · 2019-05-28T06:38:02Z

Thank you @wonderingabout for asking. We just finished the paper on 9x9 SAI and posted on arxiv, see #1835 (comment)

As for 19x19, we are not ready yet, but working on it. Let me share a bit of our to do list...

update SAI code to the last version of LZ (still on 0.16 and this means a lot of commits),
move on new server,
put additional info inside training data to trace back game and source of every position (to better deal with bad data)
learn to compile on Windows

I hope we will be able to start in 4-6 weeks.

wonderingabout · 2019-05-28T09:13:12Z

@Vandertic

welcome, thanks for keeping us updated, looking forward to changes and improvements whenever they come 👍

l1t1 · 2019-05-29T12:59:19Z

what will gcp do if no new weight would promote any more?
2019-05-29 13:28 2cd2ccb5
VS a20c31da
43 : 67 (39.09%) 110 / 400 fail

Marcin1960 · 2019-05-29T14:12:22Z

@l1t1 "what will gcp do if no new weight would promote any more?"

I propose to reduce the size of the net to 30b, as 40b was proven to be too large.

This will speed up things and will allow more people to participate in testing and development.

Quite possibly, after shedding the dead weight, the 30b nets will get much stronger.

wonderingabout · 2019-05-29T14:14:25Z

option A : nothing
option B : a fresh run implementing all the improvements that were made so far (20b from scratch)
option C : wait until sai 19x19 is ready and make a new run with it (20b from scratch)

and 40b is by no means too big, it is the same size alphagozero successfully used

Marcin1960 · 2019-05-29T14:19:53Z

@wonderingabout "and 40b is by no means too big, it is the same size alphagozero successfully used"

I agree. We need only to borrow their resources and use them for longer time, than they did:

"One number that is suspiciously missing is the number of self-play machines that were used over the course of the three days1. Using an estimate of 211 moves per Go match on average, we come to a final number of 1,595 self-play machines, or 6,380 TPUs. (Calculations are below.)

At the quoted rate of $6.50/TPU/hr (as of March 2018), the whole venture would cost $2,986,822 in TPUs alone to replicate. And that’s just the smaller of the two experiments they report

[...]

The neural network used in the 40-day experiment has twice as many layers (of the same size) as the network used in the 3-day experiment, so making a single move takes about twice as much computer thinking time, assuming nothing else changed about the experiment. With this in mind, going back through the series of calculations leads us to a final cost of $35,354,222 in TPUs to replicate the 40-day experiment.
"

l1t1 · 2019-05-31T00:50:22Z

the games amount will beyond lz 157 at June 1

l1t1 · 2019-05-31T11:19:43Z

weights by month

   count(*) year month
1         9 2017    11
2        38 2017    12
3        24 2018    01
4        19 2018    02
5        25 2018    03
6        15 2018    04
7        16 2018    05
8         8 2018    06
9         7 2018    07
10       12 2018    08
11        9 2018    09
12        3 2018    10
13        9 2018    11
14        4 2018    12
15        5 2019    01
16        4 2019    02
17        9 2019    03
18        9 2019    04
19        2 2019    05

l1t1 · 2019-05-31T11:24:04Z

weights and games by month

   count(*) sum(games) year month
1         9     620230 2017    11
2        38    1074670 2017    12
3        24    1660020 2018    01
4        19    1322090 2018    02
5        25    1780150 2018    03
6        15     715671 2018    04
7        16     437540 2018    05
8         8     684120 2018    06
9         7     667900 2018    07
10       12     535240 2018    08
11        9     466512 2018    09
12        3     236121 2018    10
13        9     562723 2018    11
14        4     442589 2018    12
15        5     575053 2019    01
16        4     382624 2019    02
17        9     466570 2019    03
18        9     777244 2019    04
19        2     371990 2019    05

SHKD13 · 2019-06-01T20:30:38Z

@bubblesld

...And from my personal test,
v226 with 1600 visits vs minigo v17 990 with 3200 visit = 99:101
It means that lz-40b is catching up minigo v17 using the same time. (comparing to v222 vs minigo v17 is about the same strength for the same visits)

Is it possible to get these sgf files? Interesting to take a look at MG v17 vs LZ 226.

l1t1 · 2019-06-02T03:27:45Z

sqldf("select games,'lz_'||num net from lz order by 1 desc limit 4")

    Games    net
1  295170 lz_226
2  293460  lz_54
3  289300  lz_57
4  274350 lz_157

is it time to increase net size to (40+n)x256? #2396

is it time to increase net size to (40+n)x256? #2396

Comments

l1t1 commented May 22, 2019

Friday9i commented May 22, 2019 • edited Loading

Splee99 commented May 22, 2019

iopq commented May 23, 2019

wonderingabout commented May 23, 2019 • edited Loading

iopq commented May 23, 2019

wonderingabout commented May 23, 2019

bubblesld commented May 23, 2019

l1t1 commented May 23, 2019

l1t1 commented May 26, 2019

wonderingabout commented May 26, 2019

iopq commented May 26, 2019

wonderingabout commented May 26, 2019 • edited Loading

iopq commented May 26, 2019

wonderingabout commented May 26, 2019

iopq commented May 26, 2019

wonderingabout commented May 26, 2019

iopq commented May 26, 2019 • edited Loading

wonderingabout commented May 26, 2019 • edited Loading

wonderingabout commented May 26, 2019

iopq commented May 26, 2019

wonderingabout commented May 26, 2019 • edited Loading

wonderingabout commented May 26, 2019 • edited Loading

tapsika commented May 26, 2019

iopq commented May 26, 2019

l1t1 commented May 27, 2019

Vandertic commented May 28, 2019

wonderingabout commented May 28, 2019

l1t1 commented May 29, 2019

Marcin1960 commented May 29, 2019

wonderingabout commented May 29, 2019

Marcin1960 commented May 29, 2019 • edited Loading

l1t1 commented May 31, 2019

l1t1 commented May 31, 2019

l1t1 commented May 31, 2019

SHKD13 commented Jun 1, 2019

l1t1 commented Jun 2, 2019

Friday9i commented May 22, 2019 •

edited

Loading

wonderingabout commented May 23, 2019 •

edited

Loading

wonderingabout commented May 26, 2019 •

edited

Loading

iopq commented May 26, 2019 •

edited

Loading

wonderingabout commented May 26, 2019 •

edited

Loading

wonderingabout commented May 26, 2019 •

edited

Loading

wonderingabout commented May 26, 2019 •

edited

Loading

Marcin1960 commented May 29, 2019 •

edited

Loading