Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is it time to increase net size to (40+n)x256? #2396

Open
l1t1 opened this issue May 22, 2019 · 36 comments
Open

is it time to increase net size to (40+n)x256? #2396

l1t1 opened this issue May 22, 2019 · 36 comments

Comments

@l1t1
Copy link

l1t1 commented May 22, 2019

just like
#965

@Friday9i
Copy link

Friday9i commented May 22, 2019

I would really prefer more innovations rather than larger net improving extremely slowly...
@lightvector introduced with KataGo many innovations (many of them have been discussed here but were rejected...) and had a huge success. Here is a list of the innovations introduced

  • very little time spent on "no resign" games (versus ~30% of the GPU time still spent for LZ on these crap endgame moves) ...
  • komi is adjustable
  • native handicap management (and a % of selfplay games is played with handi and a randomly modified komi)
  • active play until the end of games (ie no "slack" endgame moves)
  • native support of all board sizes from 9x9 to 37x37 (and the possibility to play non-square boards, such as 9x19 or whatever sizes!)
  • selfplay games are played on random board size (!)
  • variable visits in order to explore more deeply interesting/unclear moves, and to speed up selfplay game generation: most of the moves are played "fast" with low visits and quite a lot of noise to create diversity and some moves are much deeper, and they are the ones used for training! A very clever way to speed up training and performance (variants already proposed here)!
  • 50% gating (vs 55% for LZ): clever way to get more frequent updates of the net and to get the opportunity to have more diversity in selfplay games (proposed already)
  • random rulesets (japanese rules, trump-taylor, ...)
  • branching of selfplay games with different komi
  • additional random noise in the first game moves, in order to increase exploration of alternative openings (proposed also for LZ)

All in all, here are the benefits of these innovations:

  • much more diversity
  • a WAY faster improvement: 30x to 100x faster learning than LZ! (it reached ~LZ130 level in 1 week with ressources about equivalent with LZ, where LZ took 6 months!)
  • a much more flexible net (board sizes, komi, handicap, rule sets)

JMHO

@Splee99
Copy link

Splee99 commented May 22, 2019

I think it's time to try different self play parameters, such as visits, randomcnt and randomvisits (i.e. the minimum acceptable visits for a random move, which seems to be important).

@iopq
Copy link

iopq commented May 23, 2019

I think 52% gating has been proposed as an alternative. I think the project has proved that zero is viable. But now it would be cool to use these resources to build a slightly optimized version for HANDICAP games, different komi and different ruleset games.

These are the actual requirements. We can see if it ever learns to play ladders by itself better later.

But counting points comes not as necessarily an optimization, but as a requirement of being able to play with different rulesets, handicaps, komi, etc. which are features actual go players want

generating separate games for all of these is wasting resources, while using a value network that predicts board ownership is better - although I'm not sure that going point by point is good, since many points are tied to each other (either white takes these ten, or the other nine)

@wonderingabout
Copy link
Contributor

wonderingabout commented May 23, 2019

all these new features are likely to hinder search max strength for the sake of diversity
and the 50% gating is just lost on such a small community, the slightest elo yoyo will make community go back and forth

the way i see it, just keep adding other search enhancements like lcb, then redo a run from scratch directly at 20b (then upgrade to 40b like alphago zero did), without fancy things (unless they are solidly proven reliable)

i'd rather bet on a fresh run whenever SAI is ready for 19x19 rather than katago
maybe @Vandertic has some thoughts on this #1835

until then, lz will keep stalling and contributors may get impatient, but that may not be a bad thing

@iopq
Copy link

iopq commented May 23, 2019

Counting points instead of win rates in the endgame will likely result in a higher maximum strength, not lower. LZ learns strong endgame moves because it doesn't play lax endgame when ahead.

@wonderingabout
Copy link
Contributor

@bubblesld good timing in #2398 , i wanted to suggest in my previous message to consider stopping to quantize nets in the next run from scratch, if there is any (with sai 19x19 for example)

maybe it has some long term effects 👍

@bubblesld
Copy link

Personally, I do not see quantized is an isuue. It is just an approximation. I think that the difference is strength is little.

From the latest test match, it seems that 2xx mb does not cause any problem this time. A quantized 80b version will be less than 200mb. That means we can increase the blocks if necessary. However, I think that 40b is not done yet. You can see that v225 has about 70% w.r. against v215. And from my personal test,
v226 with 1600 visits vs minigo v17 990 with 3200 visit = 99:101
It means that lz-40b is catching up minigo v17 using the same time. (comparing to v222 vs minigo v17 is about the same strength for the same visits)

@l1t1
Copy link
Author

l1t1 commented May 23, 2019

could someone explain the different importance of numbers in the weights file, maybe some less important number take little effect to the stength, so they can be quantized firstly

@l1t1
Copy link
Author

l1t1 commented May 26, 2019

2019-05-26 02:37 f9f18e65 VS a20c31da 23 : 56 (29.11%) 79 / 400 fail
lz226 is a good student, but is a bad teacher

@wonderingabout
Copy link
Contributor

i think this is plain nonsense
i think lz227 is going to be very strong (but then hard to beat)
we'll see though if a lucky promotion doesnt get in the way, which is not desirable

@iopq
Copy link

iopq commented May 26, 2019

@wonderingabout there is no evidence 226 is stronger than 225-222

https://cloudygo.com/leela-zero-eval/eval-graphs

against a panel of other LZ versions, they perform almost exactly the same, there has been no increase in strength

@wonderingabout
Copy link
Contributor

wonderingabout commented May 26, 2019

@iopq 50 game match are totally unsignificant

https://cloudygo.com/leela-zero-eval/eval-model/226?sorted=False

the most accurate data we have so far is 56% winrate in 400 games -v 1600 against lz 225, which is significant, i think

@iopq
Copy link

iopq commented May 26, 2019

the last 5 nets have something like 3000 games among them, that has to have some significance

@wonderingabout
Copy link
Contributor

@iopq unless i am mistaken, i dont see 3000 games in the lz 226 match against other networks

@iopq
Copy link

iopq commented May 26, 2019

it's not just LZ 226

LZ 225 plays against the same set of networks and has the same elo
LZ 224 plays against the same net of networks and has the same elo
LZ 223 plays against the same set of networks and has the same elo
LZ 222 plays against the same set of networks and has the same elo

it would be extremely unlikely to happen if their strengths were very different, like 100 points difference between 222 and 226 is very unlikely

someone can do the actual statistical analysis, but there hasn't been any noticeable progress

@wonderingabout
Copy link
Contributor

@iopq the elo numbers extracted from 50 game matches are again, totally unsignificant

so you cant draw any conclusion on such small game samples, i think

@iopq
Copy link

iopq commented May 26, 2019

If you have 5 different networks and they do 50 game matches vs. each other and nobody has a higher elo... that's still hundreds of games being played total

individual number of games in each match doesn't matter, especially since they also have the same winrate against older networks too

@wonderingabout
Copy link
Contributor

wonderingabout commented May 26, 2019

@iopq you are saying that each of these last 5 networks have the same winrate, which is significantly true in the official test, generally at 55%
but the winrate being the same in each of the 50 game matches may just be a total coincidence or something real, there is no way to conclude this from 50 game matches

individual number of games matter, because each of these networks are different, and if A > B and B > C then A is not necessarily > C, because there are several aspects of strength in go, and there is fluctuation in the data even in a 400 game match (if you do 10 x 400 game matches, you will get a different number everytime, as we saw recently in the 15b test match that @bubblesld did

but still, 50 game match is just totally unsignificant, be it the right result or not, it is at best only good enough to tell very grossly the strength range of a network

@wonderingabout
Copy link
Contributor

@iopq see : #2192 (comment)

@iopq
Copy link

iopq commented May 26, 2019

But the total number of matches is around 3000, not 50. This is so far the most statistically significant result, since it's bigger than the 400 game matches we have normally.

@wonderingabout
Copy link
Contributor

wonderingabout commented May 26, 2019

no, the total number of matches of lz 226 is :
50 vs lz225
50 vs lz224
50 vs lz223
etc...

the total number of matches of lz 225 is :
50 vs lz224
50 vs lz223
etc...

if i used your logic, i could say that :

  • lz 226 had > 10 000 games other networks but still didnt get beat, so it is strong
  • lz 226 has 400 x 5 = 2000 games of being significantly stronger than lz 221, so it is significant...

but this logic is flawed, as the total number of games does not indicate anything : each of the small parts of this sum is made of different matches, again

@wonderingabout
Copy link
Contributor

wonderingabout commented May 26, 2019

just because you're gathering a bunch of individually unsignificant data together doesnt mean the whole becomes magically significant...

@tapsika
Copy link

tapsika commented May 26, 2019

just because you're gathering a bunch of individually unsignificant data together doesnt mean the whole becomes magically significant...

This is exactly what 400 game matches do (gathering a bunch of individually insignificant 1-game matches then average).

It is possible to talk about the probability of having, say, 10 individual 50 games matches and the average performance of these nets is still too low or not. (I'm not saying anything about that.)

@iopq
Copy link

iopq commented May 26, 2019

just because you're gathering a bunch of individually unsignificant data together doesnt mean the whole becomes magically significant...

that's exactly how law of averages works

@l1t1
Copy link
Author

l1t1 commented May 27, 2019

186K+ self play games, the biggest number since lz157

@Vandertic
Copy link

Thank you @wonderingabout for asking. We just finished the paper on 9x9 SAI and posted on arxiv, see #1835 (comment)

As for 19x19, we are not ready yet, but working on it. Let me share a bit of our to do list...

  • update SAI code to the last version of LZ (still on 0.16 and this means a lot of commits),
  • move on new server,
  • put additional info inside training data to trace back game and source of every position (to better deal with bad data)
  • learn to compile on Windows

I hope we will be able to start in 4-6 weeks.

@wonderingabout
Copy link
Contributor

@Vandertic

welcome, thanks for keeping us updated, looking forward to changes and improvements whenever they come 👍

@l1t1
Copy link
Author

l1t1 commented May 29, 2019

what will gcp do if no new weight would promote any more?
2019-05-29 13:28 2cd2ccb5
VS a20c31da
43 : 67 (39.09%) 110 / 400 fail

@Marcin1960
Copy link

@l1t1 "what will gcp do if no new weight would promote any more?"

I propose to reduce the size of the net to 30b, as 40b was proven to be too large.

This will speed up things and will allow more people to participate in testing and development.

Quite possibly, after shedding the dead weight, the 30b nets will get much stronger.

@wonderingabout
Copy link
Contributor

option A : nothing
option B : a fresh run implementing all the improvements that were made so far (20b from scratch)
option C : wait until sai 19x19 is ready and make a new run with it (20b from scratch)

and 40b is by no means too big, it is the same size alphagozero successfully used

@Marcin1960
Copy link

Marcin1960 commented May 29, 2019

@wonderingabout "and 40b is by no means too big, it is the same size alphagozero successfully used"

I agree. We need only to borrow their resources and use them for longer time, than they did:

"One number that is suspiciously missing is the number of self-play machines that were used over the course of the three days1. Using an estimate of 211 moves per Go match on average, we come to a final number of 1,595 self-play machines, or 6,380 TPUs. (Calculations are below.)

At the quoted rate of $6.50/TPU/hr (as of March 2018), the whole venture would cost $2,986,822 in TPUs alone to replicate. And that’s just the smaller of the two experiments they report

[...]

The neural network used in the 40-day experiment has twice as many layers (of the same size) as the network used in the 3-day experiment, so making a single move takes about twice as much computer thinking time, assuming nothing else changed about the experiment. With this in mind, going back through the series of calculations leads us to a final cost of $35,354,222 in TPUs to replicate the 40-day experiment.
"

@l1t1
Copy link
Author

l1t1 commented May 31, 2019

the games amount will beyond lz 157 at June 1

@l1t1
Copy link
Author

l1t1 commented May 31, 2019

weights by month

   count(*) year month
1         9 2017    11
2        38 2017    12
3        24 2018    01
4        19 2018    02
5        25 2018    03
6        15 2018    04
7        16 2018    05
8         8 2018    06
9         7 2018    07
10       12 2018    08
11        9 2018    09
12        3 2018    10
13        9 2018    11
14        4 2018    12
15        5 2019    01
16        4 2019    02
17        9 2019    03
18        9 2019    04
19        2 2019    05

@l1t1
Copy link
Author

l1t1 commented May 31, 2019

weights and games by month

   count(*) sum(games) year month
1         9     620230 2017    11
2        38    1074670 2017    12
3        24    1660020 2018    01
4        19    1322090 2018    02
5        25    1780150 2018    03
6        15     715671 2018    04
7        16     437540 2018    05
8         8     684120 2018    06
9         7     667900 2018    07
10       12     535240 2018    08
11        9     466512 2018    09
12        3     236121 2018    10
13        9     562723 2018    11
14        4     442589 2018    12
15        5     575053 2019    01
16        4     382624 2019    02
17        9     466570 2019    03
18        9     777244 2019    04
19        2     371990 2019    05

@SHKD13
Copy link

SHKD13 commented Jun 1, 2019

@bubblesld

...And from my personal test,
v226 with 1600 visits vs minigo v17 990 with 3200 visit = 99:101
It means that lz-40b is catching up minigo v17 using the same time. (comparing to v222 vs minigo v17 is about the same strength for the same visits)

Is it possible to get these sgf files? Interesting to take a look at MG v17 vs LZ 226.

@l1t1
Copy link
Author

l1t1 commented Jun 2, 2019

sqldf("select games,'lz_'||num net from lz order by 1 desc limit 4")

    Games    net
1  295170 lz_226
2  293460  lz_54
3  289300  lz_57
4  274350 lz_157

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants