-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
is it time to increase net size to (40+n)x256? #2396
Comments
I would really prefer more innovations rather than larger net improving extremely slowly...
All in all, here are the benefits of these innovations:
JMHO |
I think it's time to try different self play parameters, such as visits, randomcnt and randomvisits (i.e. the minimum acceptable visits for a random move, which seems to be important). |
I think 52% gating has been proposed as an alternative. I think the project has proved that zero is viable. But now it would be cool to use these resources to build a slightly optimized version for HANDICAP games, different komi and different ruleset games. These are the actual requirements. We can see if it ever learns to play ladders by itself better later. But counting points comes not as necessarily an optimization, but as a requirement of being able to play with different rulesets, handicaps, komi, etc. which are features actual go players want generating separate games for all of these is wasting resources, while using a value network that predicts board ownership is better - although I'm not sure that going point by point is good, since many points are tied to each other (either white takes these ten, or the other nine) |
all these new features are likely to hinder search max strength for the sake of diversity the way i see it, just keep adding other search enhancements like lcb, then redo a run from scratch directly at 20b (then upgrade to 40b like alphago zero did), without fancy things (unless they are solidly proven reliable) i'd rather bet on a fresh run whenever SAI is ready for 19x19 rather than katago until then, lz will keep stalling and contributors may get impatient, but that may not be a bad thing |
Counting points instead of win rates in the endgame will likely result in a higher maximum strength, not lower. LZ learns strong endgame moves because it doesn't play lax endgame when ahead. |
@bubblesld good timing in #2398 , i wanted to suggest in my previous message to consider stopping to quantize nets in the next run from scratch, if there is any (with sai 19x19 for example) maybe it has some long term effects 👍 |
Personally, I do not see quantized is an isuue. It is just an approximation. I think that the difference is strength is little. From the latest test match, it seems that 2xx mb does not cause any problem this time. A quantized 80b version will be less than 200mb. That means we can increase the blocks if necessary. However, I think that 40b is not done yet. You can see that v225 has about 70% w.r. against v215. And from my personal test, |
could someone explain the different importance of numbers in the weights file, maybe some less important number take little effect to the stength, so they can be quantized firstly |
2019-05-26 02:37 f9f18e65 VS a20c31da 23 : 56 (29.11%) 79 / 400 fail |
i think this is plain nonsense |
@wonderingabout there is no evidence 226 is stronger than 225-222 https://cloudygo.com/leela-zero-eval/eval-graphs against a panel of other LZ versions, they perform almost exactly the same, there has been no increase in strength |
@iopq 50 game match are totally unsignificant https://cloudygo.com/leela-zero-eval/eval-model/226?sorted=False the most accurate data we have so far is 56% winrate in 400 games -v 1600 against lz 225, which is significant, i think |
the last 5 nets have something like 3000 games among them, that has to have some significance |
@iopq unless i am mistaken, i dont see 3000 games in the lz 226 match against other networks |
it's not just LZ 226 LZ 225 plays against the same set of networks and has the same elo it would be extremely unlikely to happen if their strengths were very different, like 100 points difference between 222 and 226 is very unlikely someone can do the actual statistical analysis, but there hasn't been any noticeable progress |
@iopq the elo numbers extracted from 50 game matches are again, totally unsignificant so you cant draw any conclusion on such small game samples, i think |
If you have 5 different networks and they do 50 game matches vs. each other and nobody has a higher elo... that's still hundreds of games being played total individual number of games in each match doesn't matter, especially since they also have the same winrate against older networks too |
@iopq you are saying that each of these last 5 networks have the same winrate, which is significantly true in the official test, generally at 55% individual number of games matter, because each of these networks are different, and if A > B and B > C then A is not necessarily > C, because there are several aspects of strength in go, and there is fluctuation in the data even in a 400 game match (if you do 10 x 400 game matches, you will get a different number everytime, as we saw recently in the 15b test match that @bubblesld did but still, 50 game match is just totally unsignificant, be it the right result or not, it is at best only good enough to tell very grossly the strength range of a network |
@iopq see : #2192 (comment) |
But the total number of matches is around 3000, not 50. This is so far the most statistically significant result, since it's bigger than the 400 game matches we have normally. |
no, the total number of matches of lz 226 is : the total number of matches of lz 225 is : if i used your logic, i could say that :
but this logic is flawed, as the total number of games does not indicate anything : each of the small parts of this sum is made of different matches, again |
just because you're gathering a bunch of individually unsignificant data together doesnt mean the whole becomes magically significant... |
This is exactly what 400 game matches do (gathering a bunch of individually insignificant 1-game matches then average). It is possible to talk about the probability of having, say, 10 individual 50 games matches and the average performance of these nets is still too low or not. (I'm not saying anything about that.) |
that's exactly how law of averages works |
186K+ self play games, the biggest number since lz157 |
Thank you @wonderingabout for asking. We just finished the paper on 9x9 SAI and posted on arxiv, see #1835 (comment) As for 19x19, we are not ready yet, but working on it. Let me share a bit of our to do list...
I hope we will be able to start in 4-6 weeks. |
welcome, thanks for keeping us updated, looking forward to changes and improvements whenever they come 👍 |
what will gcp do if no new weight would promote any more? |
@l1t1 "what will gcp do if no new weight would promote any more?" I propose to reduce the size of the net to 30b, as 40b was proven to be too large. This will speed up things and will allow more people to participate in testing and development. Quite possibly, after shedding the dead weight, the 30b nets will get much stronger. |
option A : nothing and 40b is by no means too big, it is the same size alphagozero successfully used |
@wonderingabout "and 40b is by no means too big, it is the same size alphagozero successfully used" I agree. We need only to borrow their resources and use them for longer time, than they did: "One number that is suspiciously missing is the number of self-play machines that were used over the course of the three days1. Using an estimate of 211 moves per Go match on average, we come to a final number of 1,595 self-play machines, or 6,380 TPUs. (Calculations are below.) At the quoted rate of $6.50/TPU/hr (as of March 2018), the whole venture would cost $2,986,822 in TPUs alone to replicate. And that’s just the smaller of the two experiments they report [...] The neural network used in the 40-day experiment has twice as many layers (of the same size) as the network used in the 3-day experiment, so making a single move takes about twice as much computer thinking time, assuming nothing else changed about the experiment. With this in mind, going back through the series of calculations leads us to a final cost of $35,354,222 in TPUs to replicate the 40-day experiment. |
the games amount will beyond lz 157 at June 1 |
weights by month
|
weights and games by month
|
Is it possible to get these sgf files? Interesting to take a look at MG v17 vs LZ 226. |
sqldf("select games,'lz_'||num net from lz order by 1 desc limit 4")
|
just like
#965
The text was updated successfully, but these errors were encountered: