Continue training smaller networks? #1889

pondturtle · 2018-09-28T10:24:00Z

With transfer to 256x40 I encountered few people who's hardware has trouble making it run alltogether. Others get very bad performance.

I do understand it is far from the main goal of LZ project. But is it feasible as a service to go community to train 192x15 or 256x20 network with latest games included at test it time to time? Talking like maybe once a month. Given that 256x40 is far from reaching it's potential and developement is not the fastest one right now.

wonderingabout · 2018-09-28T12:01:20Z

for 20b one may argue there is already elf, but for 15b it is worth trying it

Friday9i · 2018-09-28T12:29:17Z

Indeed, it would be interesting to see if a 15x192 net trained with current high-level 40b games can significantly outperform the best selftrained 15x192 net (LZ157).
That was tested for 64x5 nets a few months ago, and it was much better than LZ057 (the best selftrained 64x5 net), but at that time LZ's software had some significant bugs, so we cannot really extrapolate from that experiment.
I guess it's probably possible to get a 15x192 net slightly better than LZ157, but if it is "much better" (which would mean "close to Elfv1" or even better), that would be a true surprise and a quite extraordinary result! One consequence would be that Elfv1 is far from the maximum 20b level!

pondturtle · 2018-09-28T12:34:04Z

Also there is question of efficiency about which net is truly strongest one at fixed computing power - which is what matters in analysis or game playing setting . Meaning sure, 40b is almost sure to be stronger at same number of visits than 20b. But what if they both get - let's say 15s per move. I'd hazard uneducated guess that 20b/15b could actually be stronger given how early phase of training 40b LZ is in.

I fully admit that is just my own conjecture more than anything else though.

l1t1 · 2018-09-28T12:47:11Z

I saw some modified 15b weights better than 157

https://userscloud.com/a7mienvbfg9g
https://userscloud.com/8wkks632dh8t

Friday9i · 2018-09-28T12:49:51Z

@pondturtle I did some tests of current 40b (mainly LZ177 and 178) against LZ157 and Elfv1 at time parity with a GTX1080, and LZ157 and Elfv1 are still better than 40b for relatively fast games (around 10s/move) / low visits (mainly between 500 and 2000 visits). For longer games (eg 10K visits for 40b), it seems more or less on par... But I just did a total of around 10 long games, so it is clearly not enough statistically speaking. From experience (I did many tests 6 months ago on scalability, eg #1113 (comment)), larger nets scale better with visits than smaller nets, so that seems credible, but it still needs to be confirmed by more solid tests.
@l1t1 Nice, if you find the links to these 15x192 nets stronger than LZ157, could you please share them ;-)? I'm interested!

Umsturz · 2018-09-28T16:28:22Z

Somebody on Lifein19x19 tested a lot of different networksizes against each other, with time and visit parity. It starts with LZ#157 (192x15) vs. LZ#159 (256x20) https://www.lifein19x19.com/viewtopic.php?p=234413#p234413

zhanzhenzhen · 2018-09-28T16:49:34Z

Has anybody thought of training a network with more blocks but less filters, such as 128x30?

Marcin1960 · 2018-09-28T20:00:47Z

@pondturtle "But is it feasible as a service to go community to train 192x15 or 256x20 network with"

I doubt it is possible in the official project. The only way is to establish a separate independent training pool. alreadydone#61

Perhaps after a few months the situation will change.

godmoves · 2018-09-29T09:23:34Z

Actually, I am training some 10b networks now to test different training settings.

This is the strongest one I get so far, and it is slightly stronger than the first 15b weight (under same 1600 playouts). For more info, you can check it here. And the training data are listed here.

I only have one 1080ti, so my progress is slower than expected.

Marcin1960 · 2018-09-29T11:17:57Z

@godmoves "This is the strongest one I get so far, and it is slightly stronger than the first 15b weight"

Great! I started tuning, and hopefully, fast_lr_drop_1600k_final.txt will be available at KGS as LeelaZeroT

fame872toe857 · 2018-09-29T16:52:34Z

@godmoves Do you think the 1600K reach the 10b's limit? or it can be stronger?

godmoves · 2018-09-30T00:15:20Z

@fame872toe857 I think you can get a stronger net by using newer games and longer training steps.

I need to use the same training data to compare results of different steps, so these data are a little bit old now. And I think using more training steps may also be useful (according to the trend of 100k to 1600k), but it will take a really long time. (e.g. training the 1600k took about 19 days on a single 1080ti.)

Marcin1960 · 2018-09-30T09:59:00Z

OK, fast_lr_drop_1600k_final.txt 10x128 is running on KGS as LeelaZeroT
at 6400 visits

Now against 2 dan sneroht, nice style so far

Marcin1960 · 2018-09-30T14:37:55Z

Now against 9 dan ELF

Marcin1960 · 2018-10-01T14:07:06Z

Now testing ELF 240x20 against 128x10 (LeelaDan vs LeelaZeroT on KGS). Same time about 30 sec per move.

Friday9i · 2018-10-02T09:27:13Z

I'm currently evaluating the scaling of LZ181 vs LZ157: it seems quite comparable to the usual situation, already encountered with previous smaller nets:

larger 40b net is much stronger than smaller 15b net when it has a few visits, eg LZ157 needs around 14 visits to match LZ181 with 1 visit (ie a ratio of 14)!
the ratio goes down to around 2.5 or 3 with 30 visits: LZ157 needs only around 2.7 more visits to match LZ181@30 visits (ie ~2.7 x 30 = ~80 visits are needed)
then it seems to go up again from around 500 visits, with LZ157 needing around 4K visits against LZ181@1K visits, ie a ratio of ~4 (but I need to run more games to be statistically sure, and it takes time with 4K visits...)
Note: I'm using -n and -m 10 in order to get some reasonable variability in the games. Graphs coming when I get enough data (this one I trained a 20b 256f network (93229e) #1113 (comment), completed by LZ181 vs LZ157).

zhanzhenzhen · 2018-10-02T14:59:28Z

I have evaluated LZ180 (40b) with ELFv1 on time parity (400 playouts for LZ and 800 playouts for ELF). The result is that LZ180 only wins 19 out of 82 games.

herazul · 2018-10-02T15:06:18Z

It's expected. Keep in mind that LZ 40b is improving fast, and that even now you probably would have a better result than 19/82 with higher visits (say LZ 20k vs ELF 40k).

bubblesld · 2018-12-22T02:50:07Z

http://zero.sjeng.org/networks/92297ff22dfa781bd02def6cadafdf7d69e9546300a913faf19e6164b895ed39.gz

Now we present a stronger 15b than v157. We may try to do this once in a while.

Marcin1960 · 2018-12-22T08:59:40Z

@bubblesld "Now we present a stronger 15b than v157"

Which net is "v157"?
Why 92297ff is not listed in https://leela.online-go.com/networks/ directory?

BTW, I would like to see a new net trained that is a little larger than elf1. I mean 224x24, or maybe if 192x15 can be so strong, perhaps 192x18 would have the best potential, or 224x18?

This would make possible using it for the majority of people, because 40b is too large to handle on the typical hardware.

bubblesld · 2018-12-22T09:24:53Z

look at the homepage, v157 is the weight The question about "Computing Cloud " #157
I guess that it is not 40b, which is the current default

Everyone may prefere different block/filter size, and we only have limited resources. The latest 15b was trained when there is free gpu not occupied. Most of time, the 40b training is running. We also want to try 80b, but it is very slow. If someone can transform elfv1 into the lz style which can be directly used as the initial for the training, I would love to improve 20b x 224.

Marcin1960 · 2018-12-22T09:33:53Z

@bubblesld: look at the homepage, v157 is the weight #157

It is not there.

@bubblesld: "We also want to try 80b, but it is very slow."

I suspect that in the very moment when 40b becomes stronger than elf, it will be abandoned by "you" in favor of 80b, and most of people will drop out. What is the objective of that?

bubblesld · 2018-12-22T09:44:32Z

http://zero.sjeng.org/

80b can be used in the tournament :)

roy7 · 2018-12-22T14:37:02Z

Please don't pop 80b test networks into the pipeline though, the file size requirement will be annoyingly high. Especially for people on slower network links.

Marcin1960 mentioned this issue Oct 1, 2018

40B v 15B #1810

Closed

wonderingabout mentioned this issue Oct 2, 2018

(ENTIRELY AUTOMATED) Google Cloud + Microsoft Azure Tesla V100 Free Trial : video+text tutorial with managed instance group #1905

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continue training smaller networks? #1889

Continue training smaller networks? #1889

pondturtle commented Sep 28, 2018

wonderingabout commented Sep 28, 2018

Friday9i commented Sep 28, 2018

pondturtle commented Sep 28, 2018 •

edited

Loading

l1t1 commented Sep 28, 2018 •

edited

Loading

Friday9i commented Sep 28, 2018 •

edited

Loading

Umsturz commented Sep 28, 2018

zhanzhenzhen commented Sep 28, 2018

Marcin1960 commented Sep 28, 2018 •

edited

Loading

godmoves commented Sep 29, 2018 •

edited

Loading

Marcin1960 commented Sep 29, 2018

fame872toe857 commented Sep 29, 2018

godmoves commented Sep 30, 2018

Marcin1960 commented Sep 30, 2018 •

edited

Loading

Marcin1960 commented Sep 30, 2018

Marcin1960 commented Oct 1, 2018

Friday9i commented Oct 2, 2018

zhanzhenzhen commented Oct 2, 2018

herazul commented Oct 2, 2018

bubblesld commented Dec 22, 2018

Marcin1960 commented Dec 22, 2018 •

edited

Loading

bubblesld commented Dec 22, 2018

Marcin1960 commented Dec 22, 2018

bubblesld commented Dec 22, 2018

roy7 commented Dec 22, 2018

Continue training smaller networks? #1889

Continue training smaller networks? #1889

Comments

pondturtle commented Sep 28, 2018

wonderingabout commented Sep 28, 2018

Friday9i commented Sep 28, 2018

pondturtle commented Sep 28, 2018 • edited Loading

l1t1 commented Sep 28, 2018 • edited Loading

Friday9i commented Sep 28, 2018 • edited Loading

Umsturz commented Sep 28, 2018

zhanzhenzhen commented Sep 28, 2018

Marcin1960 commented Sep 28, 2018 • edited Loading

godmoves commented Sep 29, 2018 • edited Loading

Marcin1960 commented Sep 29, 2018

fame872toe857 commented Sep 29, 2018

godmoves commented Sep 30, 2018

Marcin1960 commented Sep 30, 2018 • edited Loading

Marcin1960 commented Sep 30, 2018

Marcin1960 commented Oct 1, 2018

Friday9i commented Oct 2, 2018

zhanzhenzhen commented Oct 2, 2018

herazul commented Oct 2, 2018

bubblesld commented Dec 22, 2018

Marcin1960 commented Dec 22, 2018 • edited Loading

bubblesld commented Dec 22, 2018

Marcin1960 commented Dec 22, 2018

bubblesld commented Dec 22, 2018

roy7 commented Dec 22, 2018

pondturtle commented Sep 28, 2018 •

edited

Loading

l1t1 commented Sep 28, 2018 •

edited

Loading

Friday9i commented Sep 28, 2018 •

edited

Loading

Marcin1960 commented Sep 28, 2018 •

edited

Loading

godmoves commented Sep 29, 2018 •

edited

Loading

Marcin1960 commented Sep 30, 2018 •

edited

Loading

Marcin1960 commented Dec 22, 2018 •

edited

Loading