How to find Information about the new best network #78

sethtroisi · 2017-11-17T09:30:12Z

I noticed a new network (fe3f6...) in http://zero.sjeng.org/networks/

Can you provide some information about win rate over the previous best network?
Number of games it was trained on? How long it took to train?

I'm thirsty for details :)

gcp · 2017-11-17T09:33:27Z

leelaz-9k v leelaz-19k (176/2000 games)
board size: 19   komi: 7.5
             wins              black         white       avg cpu
leelaz-9k      65 36.93%       37 42.53%     28 31.46%   2150.07
leelaz-19k    111 63.07%       61 68.54%     50 57.47%   2263.43
                               98 55.68%     78 44.32%

19k games. Learning on 38k is running now.

I observed that the network now thinks white has an advantage in the opening. I think this is because it learned that if black passes before capturing much or gaining territory (not something it understands at this point), white will win on komi.

sethtroisi · 2017-11-17T09:41:10Z

Thanks for the information.

I see that the network file is named based on the number of games it was trained on which will help answer this question in the future.

gcp · 2017-11-17T09:43:18Z

The files on the server are named after the hash of the contents, though. I just do this to keep track of which is which. I also tested a smaller network (to control for overfitting) but it was not better.

sethtroisi · 2017-11-17T09:46:31Z

I was referenced "19k.txt" inside of fe3f6...gz

lithander · 2017-11-17T10:56:51Z

I appreciate updates like that. I think it will help to keep contributers motivated to start up autogtp.exe when they get some feedback on how their contribution helps to make progress.

Btw, how good would the current network (19k) play against a human?

HaochenLiu · 2017-11-17T10:58:27Z

Including the win rate info in the best network would be great.

@lithander I don't think the 19k network is better than a human beginner.

gcp · 2017-11-17T11:03:14Z

It barely knows how to count I think.

olbrichj · 2017-11-17T12:01:16Z

I was quite curious and replayed a few of the games. Interestingly it seems like the newest version has a small understanding of specific shapes.

gcp · 2017-11-17T12:08:33Z

The learning now has the policy network achieving a 4% prediction rate, which is very far from random (0.3%). I wonder if this is just learning to understand what the legal moves are (many training games have almost-filled boards) or if it can already statically see some capture and defense moves.

jkiliani · 2017-11-17T12:31:22Z

The learning now has the policy network achieving a 4% prediction rate, which is very far from random (0.3%).

Could you please clarify what prediction rate means? Is this in regards to a dataset of human professional games (GoKifu), as they used in Fig.3 of https://www.nature.com/articles/nature24270?

gcp · 2017-11-17T12:33:46Z

It's a prediction rate over the dataset of games from the program itself. So in 4% of the cases, it correctly guesses the move it would eventually play after searching. This is a sign play is starting to become more non-random. (Or, as said above, maybe simply that the network now understands you can't place stones on top of each other)

pcengine · 2017-11-17T12:52:12Z

Hi, you mentioned that learning on 38k is running now, and I wonder how much time will this 38k learning process take? Could you give an estimation based on your experience? Thank you.

roy7 · 2017-11-17T16:40:51Z

Any idea what the prediction rate would be for a strong fully trained Leela Zero? Did you ever happen to try loading in Leela's own human game data into Leela Zero's architecture just to see what happens?

gcp · 2017-11-17T19:09:43Z

That's how the supervised network in the README.md was built. It does about 52.5% on humans. But the prediction rates for those vary a bit with the exact dataset. Also it trained for only a few days. You can probably get quite a bit more by running it for a few weeks.

The prediction rate for a Zero that is trained by supervised learning (i.e. what we're building now) should be less, because it won't predict the bad moves those puny humans play by imitation.

Marcin1960 · 2017-11-17T21:58:51Z

"The 19K game network beats the 9k game network 63% of the time. A 38K network is training now."

I wonder, should the older less informed games to be discarded at certain moments?

gcp · 2017-11-17T22:01:34Z

Alpha Go Zero used a window of 500k games IIRC.

Marcin1960 · 2017-11-17T22:05:51Z

1/10th for Leela Zero? 50K?

featurecat · 2017-11-17T22:05:57Z

Are you sure about using a 500k game window instead of a 100-300k window? Because our network is only 6 blocks, it should improve faster.

gcp · 2017-11-17T22:07:12Z

I'm not sure about anything. But it's important to keep a window of the old games, or the network forgets the basic things it has learned before. This is a very common problem for reinforcement learning setups.

jkiliani · 2017-11-17T22:25:34Z

You could probably experiment with the window size, by testing two networks with the same recent data but different numbers of older games against each other. I doubt this would make any sense before we have at least 150-200k games though.

sethtroisi · 2017-11-18T00:15:06Z

I might suggest a slight wording change given the confusion over kyu some people had

"The 19K game network beats the 9k game network 63% of the time. A 38K network is training now."
=>
The 19K game network beats the 9k game network 63% of the time. A new network is being training now on the first 38K games.

Matuiss2 · 2017-11-18T02:28:26Z

Replacing K for T, to mark a thousand games wouldn't be a bad idea too since K is a rank measurement in Baduk.

roy7 · 2017-11-18T03:13:57Z

Sorry. Although I play Go, saying 9K in that context didn't even occur to me it might confuse people. I just changed it to be full numbers with no abbreviation.

sbbdms · 2017-11-18T09:16:04Z

Is the network which learns from 38k games still under training?
There are more than 61k games in the database now. I wonder if the next network which is used for AutoGTP will directly be the one which learns from 60k games or so.

Marcin1960 · 2017-11-18T10:19:19Z

To keep the progression I would see 76K as next :)

gcp · 2017-11-18T10:20:34Z

leelaz-19k v leelaz-38k2 (123/1000 games)
unknown results: 1 0.81%
board size: 19   komi: 7.5
              wins              black         white       avg cpu
leelaz-19k      64 52.03%       28 44.44%     36 60.00%   1441.24
leelaz-38k      58 47.15%       23 38.33%     35 55.56%   1527.08
                                51 41.46%     71 57.72%

leelaz-19k v leelaz-49k2 (47/2000 games)
board size: 19   komi: 7.5
              wins              black         white       avg cpu
leelaz-19k      25 53.19%       14 58.33%     11 47.83%   2581.61
leelaz-49k      22 46.81%       12 52.17%     10 41.67%   2577.97
                                26 55.32%     21 44.68%

The current ones did not beat the 19k games network yet. So clearly it's not all so easy. I am retraining 49k with stronger regularization, and starting 62k soon.

godmoves · 2017-11-18T10:59:02Z

@gcp Can you publish some games between different networks? I think it might be a way to motivate people who join the training of leelaz.

killerducky · 2017-11-18T11:03:12Z

@godmoves it hasn't been updated for awhile but you can get them here: https://sjeng.org/zero/

jkiliani · 2017-11-18T11:51:16Z

The strength evolution curve in the Deepmind paper is not strictly monotonous. That suggests to me that they must have allowed their network to update occasionally when the evaluator did not prove a strength increase, presumably to get out of local maxima.

jkiliani · 2018-02-26T09:23:54Z

@roy7: We just had another mis-promotion with a91721af. It passed SPRT at 11-1, and then the losses started coming in. Maybe the code for promoting networks should have a minimum number of games requirement (like 50-100), in addition to an SPRT pass?

roy7 · 2018-02-26T15:41:50Z

@jkiliani Makes sense since shorter games arrive earlier in the results. What's a reasonable minimum though? 100 feels high... imagine if it's like 1-99? Perhaps a minimum 50 games, or even 30, would avoid these 11-1 situations well enough.

jkiliani · 2018-02-26T15:44:36Z

Don't know, anything upward of 50 would be my choice. But even 30 might prevent a case like today.

I don't see the problem with even 100 though, since match games are generated really quickly these days. The delay from waiting hardly seems significant.

jkiliani · 2018-02-26T15:50:09Z

Wait, what do you mean 1-99? I would only implement a minimum to pass, not to fail. Or maybe allow a pass SPRT, but remove the automatic promotion? A minimum to fail isn't needed in any case: Even if a net does fail 1-11, more results are going to come in due to queue_buffer anyway, and if it returns to null this way, it still has a chance to pass. The difference between fail and pass is that promotion is permanent, so you need a higher confidence.

By the way, could you mark mistaken passes in the match list somehow? Maybe blue color for the bar of any net that was promoted, to be kept even if the net returns to null or fails after all? This would make it easier to find the passed nets in the list, which (in my opinion) include those who failed after passing, since they produced self-play games and were used as reference net for further matches.

Dorus · 2018-02-26T15:57:46Z

@gcp Something is odd with the current training. It's a lot slower than before. I remember you mentioned somewhere we now do 256k steps, but i only see a 128k steps one. Is it possible that the server is displaying the wrong step count? So the 128k is actually 256k, and the 64k is actually the 128k one etc?

(I forgot what issue or i would post this there)..

jkiliani · 2018-02-26T16:00:37Z

Not really @Dorus. 7b7d5d59 was the 64k net, the 128k net is due in 20 minutes, and the 256k probably around 9 hours from now. I don't see any irregularity...

roy7 · 2018-02-26T16:19:49Z

It's also possible someone weaker in short games but much stronger in long games could start with a fail but end up over 55%. I was going to just make it 100 games minimum before any pass or fail but good catch there. Perhaps 100 minimum to pass but no minimum to fail, then?

roy7 · 2018-02-26T16:27:37Z

Here we go @jkiliani leela-zero/leela-zero-server@0251e63

jkiliani · 2018-02-26T16:27:45Z

That would be my suggestion. There is already an effective minimum to fail of around 50 games, by the queue_buffer.

Edit: Looks good to me.

Dorus · 2018-02-26T16:35:27Z

@jkiliani Oops, you are right. I must have remembered it wrong because of the inbetween steps.

gcp · 2018-02-26T16:36:38Z

The networks are adapting because we are selecting the ones that happen to work better

The AZ method working demonstrates that this is not true. The networks are not doing mutations and intersections between a population as in an evolutionary process, and being improved from that. It's training from external data that improves them (that happens to be generated by them, but that's actually unrelated, which is exactly what AZ shows!).

The analogy and comparison with evolutionary methods is just totally and utterly wrong, and again, the only "adaptation" that is being done is really us cherry picking the best between completely non-adapted networks.

CosmoBrown · 2018-02-26T18:25:53Z

This will be totally off topic, but still...

Obviously it's not just a comparison, it IS an evolutionary method, and that's just totally and utterly right.

If you adopt the AZ method you are no longer selecting networks, but you must still be selecting something: that something is the single knowledge elements that we humans would call moves/tesujis/techniques/strategies/other things we cannot even conceive but the network can, and the "survival of the fittest" part is not "surviving a match between networks" but surviving the training phase, which decides what goes into the next network.

Knowledge doesn't appear by magic. Especially in a "Zero" situation, where you scrupolously keep the Designer away, there must be a selection mechanism. How can you expect to keep moving towards the optimum if you exclude the existence of any mechanism to guide you in the right direction? Why do you suppose the AZ method works?

By the way, the "happens to be generated by them, but that's actually unrelated" is just as it should be. You don't need to be clever and try to generate good "mutations". Anything goes. We evolved from single cells just from random mutations. It's the selection part that's important.

Dorus · 2018-02-26T18:32:11Z

Did you juts throw together a bunch of random keywords and compiled it into a post?

Why do you suppose the AZ method works?

Because MCTS guides the learning algorithm. This guidance is so strong you need very little other help.

CosmoBrown · 2018-02-26T18:36:53Z

Sure, and where does it guide it? It helps you choose the good moves over the bad ones. That's my point. What's yours?

Dorus · 2018-02-26T19:07:47Z

Training does not happen "at random" like how it would happen during real evolution. Instead, it does pick random moves and that subset of moves that is picked has some effect, but for each picked move, it modifies to neural net to output a result closer to the output of the MCTS at that move. This modification is not a random one, but instead the neural net is one big mathematical formula, and that formula is modified to match the expected result closer.

Also the move picked by the MCTS is not necessarily a 'good' or 'bad' move, but even if it is a bad move, next generation networks will pick that move more often, and it will start doing MCTS on the next move also. At one point the MCTS one (or several) moves deeper will discover a way to reject the previous considered move, and now the MCTS at the top level move will become unfavorable. It will then switch to the previous considered second best move, that might actually be a good move all along.

The knowledge about this now unused search path, however, is not lost. As long as the neural net is large enough, future modifications made to it during training will not completely erased during training. At one point the network will however reach the limit of what it can learn, this happen when every next training step makes it forget just as much as that it learns from it.

With the AGZ approach, you train a new net, and if it is not stronger, you roll back to the current net. All the training you just did is then 100% lost. With AZ, you might progress even faster, because you never roll back any training. Instead, you might have some trouble once the network reaches capacity because you are not hand picking the best net, however if you set the learning rate low enough, new networks will not forget as much, and you stay relatively at the same strength. Also if you really want to, you can also hand pick the very last net you get from an AZ method.

Anyway i feel like i'm just writing the long version of what @gcp wrote a few post above:

The AZ method working demonstrates that this is not true. The networks are not doing mutations and intersections between a population as in an evolutionary process, and being improved from that. It's training from external data that improves them (that happens to be generated by them, but that's actually unrelated, which is exactly what AZ shows!).

The analogy and comparison with evolutionary methods is just totally and utterly wrong, and again, the only "adaptation" that is being done is really us cherry picking the best between completely non-adapted networks.

I have some doubt about an AZ method to surpass an AGZ method, however i have little doubt an AZ method can be equally fast if not faster than AGZ. Just as long as your methods are sound, an AZ method provide much less feedback so you need to have a very good idea about when to drop learning rate, how efficient your MCTS is, and have no bugs etc.

Moving to AZ method from an already saturated AGZ net like we have now will probably not result in a much stronger net, however it should also not result in catastrophic forgetting.

wctgit · 2018-02-27T01:50:33Z

The networks are adapting because we are selecting the ones that happen to work better

The AZ method working demonstrates that this is not true. The networks are not doing mutations and intersections between a population as in an evolutionary process, and being improved from that. It's training from external data that improves them (that happens to be generated by them, but that's actually unrelated, which is exactly what AZ shows!).

The analogy and comparison with evolutionary methods is just totally and utterly wrong, and again, the only "adaptation" that is being done is really us cherry picking the best between completely non-adapted networks.

I'm sorry, but you're really just trying to knock down a straw man, based on faulty assumptions of what 'must' be the case for something to qualify as an evolutionary method. In fact, there are very few assumptions that must be met, and this project (and the AGZ project, and even the AZ project (see below)) meets those few assumptions just fine.

You don't need a large population size to be an evolutionary method, you can have a population of just 1 surviving individual per generation, which is what we're working with. Mutations need not be entirely random, they can be produced by any process, including neural net training, as long as some level of inheritance is maintained. Intersection (i.e. crossover) is also not necessary; in fact, it can be considered as simply another variant of generalized 'mutation'. And honestly, I think you simply are using an unrelated meaning of the term 'adaptation', perhaps. I already linked to the Wikipedia article which explains the usage of 'adaptation' that I'm referring to, but here it is again: https://en.wikipedia.org/wiki/Adaptation

In the current project, when a champion network (population size=1) is used to train a variant of itself (mutation), that new variant isn't just a random network out of the blue, it is still based largely off of the original champ (inheritance). However, the mutant first must pass the SPRT gauntlet (selection) against the champ before becoming the champ's successor. If it fails, it is left behind (death aka extinction). If it succeeds, then it becomes the new champ (generations), and the process continues (iteration).

From an evolutionary process point of view, it's irrelevant if the mutation step is unguided or guided. If it's unguided, it will still work, it will just take a lot longer. But if it's guided, it's still producing a child variant largely related to its parent, it's just that the odds are much higher that the child will be better than its parent, so the overall process can go faster. [There's an inherent risk of getting stuck in a local optimum, especially with a population size of just 1, but the learning rate parameter apparently is intended to prevent that (similar to simulated annealing, I presume).]

Even if this project were to take on the AZ method of always accepting the mutant child without testing SPRT against the parent, it would still count as an evolutionary process, but you would just take on a broader perspective, considering the training games themselves as part of the 'environment' left over by previous generations of parent/ancestor networks. Perhaps more like cultural evolution than biological evolution, but evolution nonetheless (see memetic/Baldwinian algorithm below).

Yes, the neural net training does most of the heavy lifting in terms of generating candidates that are likely to be more 'fit' to their environment. But also yes, it can still be considered a form of mutation. Guided or unguided doesn't matter. In this project's case, it's guided. Still mutation. Still evolutionary.

Here are a collection of relevant topics which buttress my points:

https://en.wikipedia.org/wiki/Evolutionary_computation (the general category; of which, this project most definitely qualifies)
https://en.wikipedia.org/wiki/Evolutionary_algorithm (the specific 'evolutionary process' or 'evolutionary method' I outlined; of which, this project is a special case (especially since the population size = 1))
https://en.wikipedia.org/wiki/Evolution_strategy (a long-established (from the 1960s) form of evolutionary computation, which only operates on a single champ at a time (what they call pop. size 2, I'm calling pop. size 1, but it's the same thing, parent and child compared); only difference is that in this project, mutation is neural net training rather than normally distributed randomness)
- The simplest evolution strategy operates on a population of size two: the current point (parent) and the result of its mutation. Only if the mutant's fitness is at least as good as the parent one, it becomes the parent of the next generation. Otherwise the mutant is disregarded. This is a (1 + 1)-ES. More generally, λ mutants can be generated and compete with the parent, called (1 + λ)-ES. In (1 , λ)-ES the best mutant becomes the parent of the next generation while the current parent is always disregarded. ... Contemporary derivatives of evolution strategy often use a population of μ parents and also recombination as an additional operator, called (μ/ρ+, λ)-ES. This makes them less prone to get stuck in local optima.
https://en.wikipedia.org/wiki/Clonal_selection_algorithm (this project resembles this type of evolutionary computation, where pop. size = 1 and mutation is performed by neural net training; also does not use crossover/intersections)
https://en.wikipedia.org/wiki/Learnable_evolution_model (rather closely matches what I described earlier above about how this project has mutation, inheritance, selection, generations, iteration, etc.; matches this project quite well, assuming population size of 1)
https://en.wikipedia.org/wiki/Memetic_algorithm (closely matches this project, assuming pop. size of 1; especially makes sense of how even the AZ technique counts as an evolutionary method; also called Baldwinian evolutionary algorithms; Baldwin showed how the evolution of brains/learning allows organisms to evolve faster to their environment by utilizing learning in the organism's lifecycle to achieve vastly higher fitness; see https://en.wikipedia.org/wiki/Baldwin_effect)
https://en.wikipedia.org/wiki/Neuroevolution (just what it sounds like; similar to this project, even describes using it for learning a game; lists several different techniques and variations)
https://en.wikipedia.org/wiki/Learning_classifier_system (also uses an evolutionary algorithm)
https://en.wikipedia.org/wiki/Competitive_learning (another variant of evolving neural networks)
https://en.wikipedia.org/wiki/Differential_evolution (a technique that can be used when the gradient is not known; not as much like this project as the previous links, but shows that the goals of this project could be accomplished even without gradient descent. Just an interesting alternative technique to know about.)

The point is, there are many many forms of evolutionary computation, and this project definitely falls under that broad category. You could even argue that it uses one of the oldest forms of evolutionary computation, namely Evolution Strategies. It just uses a souped-up mutation operator that's way better than random mutation. And it keeps some additional legacy inheritance from long-dead ancestors in the form of a long window of training games that they produced. None of that disqualifies this project from being a form of evolutionary computation, and indeed, an evolutionary algorithm with the special case of having a population size of just 1 (or 2 depending on how you delineate a 'population').

The reason I'm dragging this on is because understanding that this is the case will allow us to borrow knowledge and techniques from evolutionary computation which can help us overcome actual practical problems that we may run into, such as getting stuck in local optima (most likely simple solution would be increase pop. size to > 1), or perhaps adopting / experimenting with certain techniques or operators, such as the simple linear 'hybrid' technique that seems to work pretty well (which can be explained/understood partly by analogy with a generalized 'crossover' operator).

Denying outright that it's a form of evolutionary computation/algorithm won't serve any useful purpose, IMHO. Yes, the main workhorse is of course the neural net training algorithm; that fact has never been under contention. That doesn't mean it's not also an evolutionary process. The two methods blend together quite well, in fact.

wctgit · 2018-02-27T02:33:35Z

@roy7

Here we go @jkiliani leela-zero/leela-zero-server@0251e63

I think you could easily get away with W + L > 50. By the time you get past 50, the very short games that may have tipped the SPRT balance to 'PASS' will have already been diluted by many more longer games, undoubtedly bringing the balance back to 'green bar' (or even 'fail'). And then the SPRT should from then on function as it's intended. It's really only when you get a whole bunch of early wins all at once (in first 10-20 results returned) that you're in danger of a really bad false positive.

In fact, going by the old Statistics class rule-of-thumb for minimum sample sizes, you could probably get away with only waiting for 30 games. But why not stick with 50, as that's the number we currently use to prevent a false negative? There are 20 out of the last 100 matches where 50 < N < 100, i.e. 20% of matches don't need to go all the way to 100, but would be forced to go to 100.

wctgit · 2018-02-27T02:54:03Z

@CosmoBrown:

If you adopt the AZ method you are no longer selecting networks, but you must still be selecting something:

I disagree. You're still selecting networks, but the networks are also participating in a kind of 'cultural evolution' (which itself does indeed involve the evolution of 'something', that being a library of games and/or game positions). However, at the end of the day, it is the networks that we select and use to play further games (inside and outside of the training process). The training game window exists as a kind of persistent (yet evolving) environment that the current network exists within, and which the 'souped-up' mutation operator of neural network training uses to produce new candidate networks which are more likely to be better than their parents than random. (That's not to say that the evolving library of games is not something worthwhile on its own; indeed, most of what the Go community has gotten from Deep Mind has been a bunch of games -- no one has access to play any of the Alphas directly anymore, they can just look back on the history of their games and try to learn from them.)

In a sense, the window of training games is like a 'ghost population' of dead ancestors (or their memoirs) whose effect on the environment indirectly influences the mutation/evolution of the lone survivor (who also leaves its own memoirs/ghost, and so on), via the super-mutation operator of neural network training.

This could also be considered a form of co-evolution, which is a very powerful evolutionary engine, and may partially explain why the Alpha methods work so well. See https://en.wikipedia.org/wiki/Coevolution.

wctgit · 2018-02-27T03:17:57Z

@Dorus, good explanation of the effect of MCTS on the training. Well said.

I have some doubt about an AZ method to surpass an AGZ method, however i have little doubt an AZ method can be equally fast if not faster than AGZ. Just as long as your methods are sound, an AZ method provide much less feedback so you need to have a very good idea about when to drop learning rate, how efficient your MCTS is, and have no bugs etc.

I was just about to say that we'll eventually always hit the 'no free lunch' limit either way, which is in reference to the no free lunch theorems, which is an interesting topic all on its own.

But then I just happened to run across this other NFL article, https://en.wikipedia.org/wiki/No_free_lunch_in_search_and_optimization, which has a subsection titled Coevolutionary Free Lunches. You'd never guess WTF it says (I swear I just found this like 5 minutes ago!):

Wolpert and Macready have proved that there are free lunches in coevolutionary optimization.[9] Their analysis "covers 'self-play' problems. In these problems, the set of players work together to produce a champion, who then engages one or more antagonists in a subsequent multiplayer game."[9] That is, the objective is to obtain a good player, but without an objective function. The goodness of each player (candidate solution) is assessed by observing how well it plays against others. An algorithm attempts to use players and their quality of play to obtain better players. The player deemed best of all by the algorithm is the champion. Wolpert and Macready have demonstrated that some coevolutionary algorithms are generally superior to other algorithms in quality of champions obtained. Generating a champion through self-play is of interest in evolutionary computation and game theory. The results are inapplicable to coevolution of biological species, which does not yield champions.[9]

So now I'm like, "Yeah. What they said." :-)

[BTW, that Wikipedia reference [9] is to "Wolpert, D.H., and Macready, W.G. (2005) "Coevolutionary free lunches," IEEE Transactions on Evolutionary Computation, 9(6): 721–735". I found a few slightly different PDF versions of it:

https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20060007558.pdf (seems official, kinda messy, missing stuff at end)
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.100.2425&rep=rep1&type=pdf (nicer, has ending stuff, but abstract begins with 'ecent' instead of 'Recent'; not an encouraging sign; still, probably the most 'usable' version)
https://www.researchgate.net/publication/3418856_Coevolutionary_Free_Lunches (even nicer, but missing stuff at the end)
https://archive.org/details/nasa_techdoc_20050082129 (probably most complete, though still messy, not as usable, but can be used to ensure completeness of other versions)
]

isty2e · 2018-02-27T09:56:34Z

Meanwhile I ran a match between 82 (0fb6) and 87 (92dd) to check whether we are actually making a progress. The rating difference between two is 248, which corresponds to a winrate of 80.7%. And the result was:

82 v 87 (100/400 games)
board size: 19   komi: 7.5
     wins              black         white       avg cpu
82     26 26.00%       9  18.00%     17 34.00%    214.70
87     74 74.00%       33 66.00%     41 82.00%    204.91
                       42 42.00%     58 58.00%

Within the 95% confidence interval, the winrate is between 64.6% and 81.6%, which translates to the rating difference between 104 and 259. So while it seems like the rating difference is not fully cumulative, we are making some progress still.

timmmGZ · 2020-03-03T17:27:31Z

oh!

gcp added the question label Nov 17, 2017

2ji3150 mentioned this issue Feb 26, 2018

why no.88 "best" weight is selected with a winrate of only 0.44 #933

Closed

jkiliani mentioned this issue Feb 26, 2018

Weird promotions #932

Closed

jkiliani mentioned this issue Feb 26, 2018

a91721af promoted with a winrate of 44.72% ? #937

Closed

wctgit referenced this issue in leela-zero/leela-zero-server Feb 27, 2018

100 games required before SPRT will pass

0251e63

2ji3150 mentioned this issue Feb 27, 2018

[Question] New best network weaker than previous #940

Closed

jkiliani mentioned this issue Mar 22, 2018

First play urgency glinscott/leela-chess#160

Closed

bochen2027 mentioned this issue Apr 4, 2018

I trained a 20b 256f network (93229e) #1113

Closed

alreadydone mentioned this issue Feb 4, 2019

Future plans? #2157

Open

CGLemon mentioned this issue May 1, 2022

Questions about the supervised learning. lightvector/KataGo#632

Closed

How to find Information about the new best network #78

How to find Information about the new best network #78

Comments

sethtroisi commented Nov 17, 2017

gcp commented Nov 17, 2017

sethtroisi commented Nov 17, 2017

gcp commented Nov 17, 2017

sethtroisi commented Nov 17, 2017

lithander commented Nov 17, 2017

HaochenLiu commented Nov 17, 2017

gcp commented Nov 17, 2017

olbrichj commented Nov 17, 2017

gcp commented Nov 17, 2017

jkiliani commented Nov 17, 2017 • edited Loading

gcp commented Nov 17, 2017 • edited Loading

pcengine commented Nov 17, 2017

roy7 commented Nov 17, 2017

gcp commented Nov 17, 2017 • edited Loading

Marcin1960 commented Nov 17, 2017

gcp commented Nov 17, 2017

Marcin1960 commented Nov 17, 2017

featurecat commented Nov 17, 2017

gcp commented Nov 17, 2017

jkiliani commented Nov 17, 2017

sethtroisi commented Nov 18, 2017

Matuiss2 commented Nov 18, 2017 • edited Loading

roy7 commented Nov 18, 2017

sbbdms commented Nov 18, 2017 • edited Loading

Marcin1960 commented Nov 18, 2017

gcp commented Nov 18, 2017 • edited Loading

godmoves commented Nov 18, 2017

killerducky commented Nov 18, 2017

jkiliani commented Nov 18, 2017

jkiliani commented Feb 26, 2018

roy7 commented Feb 26, 2018

jkiliani commented Feb 26, 2018

jkiliani commented Feb 26, 2018 • edited Loading

Dorus commented Feb 26, 2018

jkiliani commented Feb 26, 2018

roy7 commented Feb 26, 2018

roy7 commented Feb 26, 2018

jkiliani commented Feb 26, 2018 • edited Loading

Dorus commented Feb 26, 2018

gcp commented Feb 26, 2018 • edited Loading

CosmoBrown commented Feb 26, 2018

Dorus commented Feb 26, 2018 • edited Loading

CosmoBrown commented Feb 26, 2018 • edited Loading

Dorus commented Feb 26, 2018 • edited Loading

wctgit commented Feb 27, 2018 • edited Loading

wctgit commented Feb 27, 2018 • edited Loading

wctgit commented Feb 27, 2018

wctgit commented Feb 27, 2018 • edited Loading

isty2e commented Feb 27, 2018

timmmGZ commented Mar 3, 2020

jkiliani commented Nov 17, 2017 •

edited

Loading

gcp commented Nov 17, 2017 •

edited

Loading

gcp commented Nov 17, 2017 •

edited

Loading

Matuiss2 commented Nov 18, 2017 •

edited

Loading

sbbdms commented Nov 18, 2017 •

edited

Loading

gcp commented Nov 18, 2017 •

edited

Loading

jkiliani commented Feb 26, 2018 •

edited

Loading

jkiliani commented Feb 26, 2018 •

edited

Loading

gcp commented Feb 26, 2018 •

edited

Loading

Dorus commented Feb 26, 2018 •

edited

Loading

CosmoBrown commented Feb 26, 2018 •

edited

Loading

Dorus commented Feb 26, 2018 •

edited

Loading

wctgit commented Feb 27, 2018 •

edited

Loading

wctgit commented Feb 27, 2018 •

edited

Loading

wctgit commented Feb 27, 2018 •

edited

Loading