Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continue training smaller networks? #1889

Open
pondturtle opened this issue Sep 28, 2018 · 24 comments
Open

Continue training smaller networks? #1889

pondturtle opened this issue Sep 28, 2018 · 24 comments

Comments

@pondturtle
Copy link

With transfer to 256x40 I encountered few people who's hardware has trouble making it run alltogether. Others get very bad performance.

I do understand it is far from the main goal of LZ project. But is it feasible as a service to go community to train 192x15 or 256x20 network with latest games included at test it time to time? Talking like maybe once a month. Given that 256x40 is far from reaching it's potential and developement is not the fastest one right now.

@wonderingabout
Copy link
Contributor

for 20b one may argue there is already elf, but for 15b it is worth trying it

@Friday9i
Copy link

Indeed, it would be interesting to see if a 15x192 net trained with current high-level 40b games can significantly outperform the best selftrained 15x192 net (LZ157).
That was tested for 64x5 nets a few months ago, and it was much better than LZ057 (the best selftrained 64x5 net), but at that time LZ's software had some significant bugs, so we cannot really extrapolate from that experiment.
I guess it's probably possible to get a 15x192 net slightly better than LZ157, but if it is "much better" (which would mean "close to Elfv1" or even better), that would be a true surprise and a quite extraordinary result! One consequence would be that Elfv1 is far from the maximum 20b level!

@pondturtle
Copy link
Author

pondturtle commented Sep 28, 2018

Also there is question of efficiency about which net is truly strongest one at fixed computing power - which is what matters in analysis or game playing setting . Meaning sure, 40b is almost sure to be stronger at same number of visits than 20b. But what if they both get - let's say 15s per move. I'd hazard uneducated guess that 20b/15b could actually be stronger given how early phase of training 40b LZ is in.

I fully admit that is just my own conjecture more than anything else though.

@l1t1
Copy link

l1t1 commented Sep 28, 2018

I saw some modified 15b weights better than 157

https://userscloud.com/a7mienvbfg9g
https://userscloud.com/8wkks632dh8t

@Friday9i
Copy link

Friday9i commented Sep 28, 2018

@pondturtle I did some tests of current 40b (mainly LZ177 and 178) against LZ157 and Elfv1 at time parity with a GTX1080, and LZ157 and Elfv1 are still better than 40b for relatively fast games (around 10s/move) / low visits (mainly between 500 and 2000 visits). For longer games (eg 10K visits for 40b), it seems more or less on par... But I just did a total of around 10 long games, so it is clearly not enough statistically speaking. From experience (I did many tests 6 months ago on scalability, eg #1113 (comment)), larger nets scale better with visits than smaller nets, so that seems credible, but it still needs to be confirmed by more solid tests.
@l1t1 Nice, if you find the links to these 15x192 nets stronger than LZ157, could you please share them ;-)? I'm interested!

@Umsturz
Copy link

Umsturz commented Sep 28, 2018

Somebody on Lifein19x19 tested a lot of different networksizes against each other, with time and visit parity. It starts with LZ#157 (192x15) vs. LZ#159 (256x20) https://www.lifein19x19.com/viewtopic.php?p=234413#p234413

@zhanzhenzhen
Copy link
Contributor

Has anybody thought of training a network with more blocks but less filters, such as 128x30?

@Marcin1960
Copy link

Marcin1960 commented Sep 28, 2018

@pondturtle "But is it feasible as a service to go community to train 192x15 or 256x20 network with"

I doubt it is possible in the official project. The only way is to establish a separate independent training pool. alreadydone#61

Perhaps after a few months the situation will change.

@godmoves
Copy link
Contributor

godmoves commented Sep 29, 2018

Actually, I am training some 10b networks now to test different training settings.

This is the strongest one I get so far, and it is slightly stronger than the first 15b weight (under same 1600 playouts). For more info, you can check it here. And the training data are listed here.

I only have one 1080ti, so my progress is slower than expected.

@Marcin1960
Copy link

@godmoves "This is the strongest one I get so far, and it is slightly stronger than the first 15b weight"

Great! I started tuning, and hopefully, fast_lr_drop_1600k_final.txt will be available at KGS as LeelaZeroT

@fame872toe857
Copy link

@godmoves Do you think the 1600K reach the 10b's limit? or it can be stronger?

@godmoves
Copy link
Contributor

@fame872toe857 I think you can get a stronger net by using newer games and longer training steps.

I need to use the same training data to compare results of different steps, so these data are a little bit old now. And I think using more training steps may also be useful (according to the trend of 100k to 1600k), but it will take a really long time. (e.g. training the 1600k took about 19 days on a single 1080ti.)

@Marcin1960
Copy link

Marcin1960 commented Sep 30, 2018

OK, fast_lr_drop_1600k_final.txt 10x128 is running on KGS as LeelaZeroT
at 6400 visits

Now against 2 dan sneroht, nice style so far

@Marcin1960
Copy link

Now against 9 dan ELF

@Marcin1960
Copy link

Now testing ELF 240x20 against 128x10 (LeelaDan vs LeelaZeroT on KGS). Same time about 30 sec per move.

@Marcin1960 Marcin1960 mentioned this issue Oct 1, 2018
@Friday9i
Copy link

Friday9i commented Oct 2, 2018

I'm currently evaluating the scaling of LZ181 vs LZ157: it seems quite comparable to the usual situation, already encountered with previous smaller nets:

  • larger 40b net is much stronger than smaller 15b net when it has a few visits, eg LZ157 needs around 14 visits to match LZ181 with 1 visit (ie a ratio of 14)!
  • the ratio goes down to around 2.5 or 3 with 30 visits: LZ157 needs only around 2.7 more visits to match LZ181@30 visits (ie ~2.7 x 30 = ~80 visits are needed)
  • then it seems to go up again from around 500 visits, with LZ157 needing around 4K visits against LZ181@1K visits, ie a ratio of ~4 (but I need to run more games to be statistically sure, and it takes time with 4K visits...)
    Note: I'm using -n and -m 10 in order to get some reasonable variability in the games. Graphs coming when I get enough data (this one I trained a 20b 256f network (93229e) #1113 (comment), completed by LZ181 vs LZ157).

@zhanzhenzhen
Copy link
Contributor

I have evaluated LZ180 (40b) with ELFv1 on time parity (400 playouts for LZ and 800 playouts for ELF). The result is that LZ180 only wins 19 out of 82 games.

@herazul
Copy link

herazul commented Oct 2, 2018

It's expected. Keep in mind that LZ 40b is improving fast, and that even now you probably would have a better result than 19/82 with higher visits (say LZ 20k vs ELF 40k).

@bubblesld
Copy link

http://zero.sjeng.org/networks/92297ff22dfa781bd02def6cadafdf7d69e9546300a913faf19e6164b895ed39.gz

Now we present a stronger 15b than v157. We may try to do this once in a while.

@Marcin1960
Copy link

Marcin1960 commented Dec 22, 2018

@bubblesld "Now we present a stronger 15b than v157"

  1. Which net is "v157"?
  2. Why 92297ff is not listed in https://leela.online-go.com/networks/ directory?

BTW, I would like to see a new net trained that is a little larger than elf1. I mean 224x24, or maybe if 192x15 can be so strong, perhaps 192x18 would have the best potential, or 224x18?

This would make possible using it for the majority of people, because 40b is too large to handle on the typical hardware.

@bubblesld
Copy link

  1. look at the homepage, v157 is the weight The question about "Computing Cloud " #157
  2. I guess that it is not 40b, which is the current default

Everyone may prefere different block/filter size, and we only have limited resources. The latest 15b was trained when there is free gpu not occupied. Most of time, the 40b training is running. We also want to try 80b, but it is very slow. If someone can transform elfv1 into the lz style which can be directly used as the initial for the training, I would love to improve 20b x 224.

@Marcin1960
Copy link

@bubblesld: look at the homepage, v157 is the weight #157

It is not there.

@bubblesld: "We also want to try 80b, but it is very slow."

I suspect that in the very moment when 40b becomes stronger than elf, it will be abandoned by "you" in favor of 80b, and most of people will drop out. What is the objective of that?

@bubblesld
Copy link

http://zero.sjeng.org/
v157

80b can be used in the tournament :)

@roy7
Copy link
Collaborator

roy7 commented Dec 22, 2018

Please don't pop 80b test networks into the pipeline though, the file size requirement will be annoyingly high. Especially for people on slower network links.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests