-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
40B v 15B #1810
Comments
With 1 visit the only randomness will come from the rotation of the board. Generally, the highest prior move will always be the same and thus the games will repeat since that first visit is all they can do. |
Thanks for the response. I anticipated that every second game might start nearly the same. I was more surprised that when the colors switched the games were so similar for so long. I thought the difference in the structure of the weights and the more than 650 ELO difference might have shown up in opening tendencies, which is what I was curious about. But I suppose in the grand scheme of things - i.e., the number of training games - there isn't so much difference yet. |
Can the 40b win against 15b at equal time? I presume it should, but it's likely not that decisive... did you try this as well yet? |
@jkiliani, I don't have the equipment to look at that question properly (i.e., it takes so long to run games CPU-only at a decent number of visits), but based on a cursory look with very low visits, I find 15B to be stronger, even with less time. I'm running 100 games now allowing 2 visits for 40B and 20 visits for 15B. Still 40B uses twice as much time, and 15B has currently won 11 of 16 games. For me, it's no contest for everyday use: I would always use 15B for live analysis or game review. I have no idea if a GPU would narrow the performance gap, though it would obviously make 40B more practical for my purposes.. |
Following up on my original observation about repeated openings: spot checking recent 40B-40B match games, I have yet to find a single one that does not begin with Black star point, White diagonal star point, Black star point (followed more often than not by another White star point). That unchanging three-move opening strikes me as odd. There's clearly plenty of variation in the training games. |
My dream would be 1. to let 40b do self-play games to accumulate them and to advance (hopefully, clients will not desert), then train 20b (I hope it will not be dumped as 15b was) and 15b on those games. |
jkiliani said :
No, see HERE for a 60 game match at equal time between networks 157 and 174 |
@qzq1 Is that really that surprising? Training games have randomizing to create variation. When you let LZ try and find "best move" with very few visits then less variation is exactly what happens. |
@Strappa71 regarding my first post and follow up, what I found a bit surprising was that both LZ 40B and 15B played the same, fixed openings through at least 43 moves, whether black or white. I thought there would be enough difference between the two that that would not be the case. Upon reflection I guess they are still quite closely related, genealogically, so perhaps it's not a great surprise. Regarding my last post, on the 40B - 40B matches, those are games with 1600 visits, and I do find it odd that the first three moves never vary. Even if it's not a surprise because of the underlying methodology, it's fair to ask - speaking as a go player - whether that indicates a rigidity that makes LZ less useful and interesting. As gcp has noted elsewhere, LZ generates a lot of variety in 'second best' moves, so it's not a great concern for practical purposes. But I do think it's something to note. Granting that the available Alphago games were curated, and not necessarily representative, they display more variety in openings. |
To my understanding the net2net conversion doesn't introduce any new knowledge to the neural network. This was what happened when we initially bootstrapped the 15b to 20b then to 40b. To use the full potential of the 40b network, we have to do A LOT of self play training. |
@qzq1 I too find too many identical openings (up to move 37 or so) even in current match games. |
@maxinjapan I understand that in match play LZ always chooses the move it evaluates as best, and that move is likely to be unchanging given identical circumstances - for a given set of weights. Yet, if there is to be any advance in strength, we expect to see changes in play from one set of weights to another. One question is whether there is any "true" superiority in, for example, Black playing move 3 on the star point rather than one of the 3-4 points. I doubt that it is true as a matter of principle, so either LZ gets stuck at a local minimum and is missing something (until lots of accumulated training games gets it out of the trough) or maybe it's just that if there is no meaningful difference in true value between alternative moves, there is no way to train LZ off its initial choice. Possibly we won't ever know for sure. |
In training self-play, the moves are randomized somewhat and quite a lot of games will have (say) black playing 3-4 on move 3. If those games turn out better on average than games where black plays 4-4, the network will learn this and start playing 3-4 more often. |
@maxinjapan If the network strongly prefers playing 4-4 to playing 3-4 (which I think it does) then that's good evidence that that playing 4-4 is (at least given the network's other strengths and weaknesses) better than playing 3-4. I was arguing (against qzq1 and agreeing with you, unless I'm confused) that we shouldn't be too worried that LZ might just have randomly latched onto 3-4 in preference to 4-4 and have no way of discovering if it's wrong. I think it will discover if it's wrong because self-play explores a wide variety of openings, I think it would have discovered by now if it were systematically playing bad openings, and I think we should probably trust that whatever openings it plays all the time really do work better (at least for LZ playing against LZ) than other options. |
@gjm11 yes, that was my point too :-) |
I have run a local test of 256 games on time parity of 15B (157) and 40B (174).
I have uploaded logs and sgfs generated from logs here: The openings are very repetitive but are completely different depending on who is black and white |
Half precision compute support: NOIn the *.log file |
request test 177 weight |
I ran 40B (177) with 100 visits versus 15B (157) with 600 visits. No GPU, no pondering; each averaged about 13.7 seconds per move; (wish I had better hardware). |
request test 178 weight someone did this https://lifein19x19.com/viewtopic.php?f=18&t=16086&sid=03587ca0498c93eae434bb198726cc15 |
Great effort by xela posted to lifein19x19, but unfortunate that the 157 weight wasn't part of the test. Also, I would prefer more games for a smaller set of engines. |
My impression is that 40b is too optimistic in winning a ko. Very often when a ko is lost the winrate drops considerably. More training in that area will be needed for the overall strength increase. |
Now testing ELF 240x20 against new 128x10 (LeelaDan vs LeelaZeroT on KGS). Same time about 30 sec per move. |
@Splee99 It seems like correct evaluation of a position with a big ko is likely to depend on detailed tactical reading -- does the 40b network do better when given plenty of time to think? |
I would think the ko is related to tactic reading, too. However, there is some ko knowledge embedded somewhere in the 40b as appeared in the games with only 1 playout. And after further training it seems to handle ko better.
|
I matched 174 (40B) and 157 (15B) with 1 visit each for 100 games, CPU only. The 40B weights won 86.0% of the games - almost exactly the expectation based on the ELO difference between the two.*
What I find odd is that every game of the 100 was identical through 43 moves, though the colors were alternating. It takes my computer an hour to play games at 500 visits, but the couple so far are identical to the 100 through 24 moves (all star point opening, double approach by B in one corner, etc.). Is this an artifact of the training,, or a result of my sad machine being overwhelmed? I can't make sense of it.
*actually, based on a modification of the win likelihood formula based on my experience matching other versions. P(A) = 1/(1+10m), where m = (Br - Ar)/800.
The text was updated successfully, but these errors were encountered: