-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Go tournament manager #870
Comments
Gomill has been recommended here before for testing two engines, but it seems it can also run tournaments: Alternatively, you could get in contact with the owner of this thread and ask how he does it ( https://www.lifein19x19.com/viewtopic.php?f=18&t=13322 ). In the first post he uses twogtp and GoGui, but it might be more automated now. |
Here's a config for running multiple matches with gomill ringmaster. Assuming you have these binaries in the directory you are running from (in the example below, the directory with ringmaster). Run with the config in a subdirectory so that the extra files are in the subdirectory.
|
I am the owner of the Engine-Torunament thread... I still use twogtp and gogui. It is not as comfortable as Arena but it is more reliable ;-) As soon as you figured out how to configure engine matches you can create some standard text files, add the engine commands with an text editor and run quickly the matches for a tournament. For windows I can recommend smartgo. I like the concept of the GUI, even more than Arena for chess. You can't run tournaments but engine matches, collect them in a folder and create the tables. I just don't use it for running the tournaments because it can't handle all engines in Linux One hint for twogtp: Don't use the alternate option. When switching colours the result is messed up. |
Thanks. I will throw an eye. |
@Cabu if you are willing to write down some specification, I can see if I can modify Validation to run tournaments. |
For a very basic tournament program we should be able to define multiple engine (ideally more than 2) by their command line (eg: "leelaz.exe -g -q -w my_network -v 2000 --noponder -t 1"). We don't really need to separate the executable name, its network and parameters into separate command line options like it is actually done as some engines doesn't have such parameters (eg: AQ doesn't have a network parameter). We should also be able to set:
Then the tournament should start by matching each engine against each other except itself. Then it could display a nice table like:
In this table we can see:
For an advanced tournament system we can add:
|
@Cabu
Validation, as default, takes the first engine as Black and the second as White and alternates them over the different games. Is this a problem? The result in the table then can be print as we want. Validation is already multitasking and multi GPU, but I think is really difficult to support a computer cluster. Byoyomi setting should also be a GTP command, so is also given at command line. To manage Handicaps I have really no Idea what I need. (free handicap or fixed, is a gtp command, will the engine choose the free handicap or we, etc.) To Manage leagues I will have to implement the tournament to tell you how easy it will be. |
How do I then set them to the values I would like? For me, they should be passed through the command line parameters with a default value such that Validation is backward compatible.
Not a problem, it's just a convention.
That is not a big problem. But we then lose the capacity to evaluate if an engine is better with a color over another. If there is no way to change that, engine n should only fight against engine m for m>n as there is no need for explicit 1vs2 and 2vs1. In that case numgames should ideally be an even number.
That should also be a parameter saying the type of handicap to use (free/fixed)
League could be seen as multiple tournaments. Passing that kind of information only by the command line could be considered as impossible. You will need config files for that. Side question: Could you add the compilation of Validation in the VS2017 solution. I have tried but without success (I don't know how to tell the compiler to add the QT dependencies) :( |
@Cabu I do not have VS2017 so I cannot do it. But did you tried with just qmake in the validation directory? |
Nope I didn't tried as I can barrely compile leela in the VS. But it doesn't seems to work:
No executable is generated. |
@marcocalignano, @Cabu: Running general tournaments could become a big project in itself. Just a caution that it might become a bigger project than you initially imagined and could eat up more of your time then initially planned. But not to say don't go for it, just be aware of what you might be getting into. (Just to be clear: I think this is an awesome idea, regardless of how big- or small-scale you end up going with, and might even be able to help out possibly (no guarantees, though).) But with that in mind, I've read a little bit about actual Go tournaments recently, and one of the most common general tournament styles between multiple competitors is the Swiss system, specifically the McMahon variant of the Swiss: https://en.wikipedia.org/wiki/McMahon_system_tournament. This allows many multiple entrants, but only requires a few rounds (as opposed to round-robin). I'm not sure if that's the kind of tournament you're intending to run; I just thought I'd mention it so you have some context of how big and potentially complicated running a generalized tournament project could get. A more simple system might just be based on the GA/evolutionary system I described earlier in #814 (comment) and subsequent comments. This would be a kind of 'ongoing tournament' of a population of networks. (You needn't do the evolutionary stuff like mutation and crossover, of course!) The main idea is simply to randomly pair networks against each other. I suppose you could then compile statistics based on who beats who. (In the GA/Evolutionary setting, the 'statistics' are maintained simply by keeping a pool/population of 'survivors', and letting the rest of the entrants to 'go extinct'.) Similar to that, you could try something like is done on CGOS which is more along the ongoing random match-up style or the KGS Computer Go tournaments which is more of the fixed-length 'official tournament' style. |
@Cabu You need to run nmake after qmake to build. |
Also there's info on Sensei's Library about Go tournaments. As a starting point, here's the page on https://senseis.xmp.net/?McMahonPairing |
Oh, and I guess the 'ongoing random match-up style' I'm describing is perhaps more properly known as a Ladder system, rather than a Tournament System. See also https://senseis.xmp.net/?ClubLadder for examples. |
Also, this looks useful, as a sort of alternative to Elo: https://senseis.xmp.net/?EGFRatingSystem. (Doesn't seem as general as BayesElo, though.) |
One thing which might be really cool and helpful for running experiments would be for AutoGTP to be forked/adapted to allow tournaments, ladders, and validation to be run across multiple computers, based on people opting-in to volunteer their computer power to help run a particular tournament (or set of tournaments) or whatever. A server can then dole out games as needed. Perhaps each client could register which kinds of engines it is able to support, and tournament organizers could provide people with a list of pre-requisites that clients need to be able to support their tournament (e.g. a custom/forked version of Leela Zero, as a downloadable binary (perhaps; perhaps requiring compilation is safer?), or an alternative Go engine installation like AQ). |
But I wonder if all of this is pertinent to the leela zero project. Why do we need tournaments? |
@wctgit I suggested this to @gcp before, at that point for the purpose of search parameter tuning. The response was not positive because of the problems with distributing multiple binaries across clients. For this reason, I'm not optimistic anything of the sort is going to happen, even though it might be highly useful for the current attempts to find good solutions for FPU reduction that work for multiple nets. |
To evaluate leela engine and leela network against itself and/or other engines. |
@wctgit |
But I still would like to know the opinion of @gcp on this matter. |
Search parameter tuning does not need separate binaries. Trying other search algorithms does. We're not in chess territory where a 2 Elo improvement is something that gets people to pop the champagne, so I have or see little need for this. It's handy if people want to try search changes and don't have a GPU and can't leave their machine on overnight and don't want to use AWS or GCP. But it requires a significant investment: making packages that can do local git repo fetches + builds of the source code (so including a compiler), a server side that can dole out the parameters, and an account system with approval so random people can't upload arbitrary C code to the network participants.
I really don't understand why that would require anything of the sort, very least of all tournaments, and especially not using other engines, which, combined with the non-homogeneous testing environment can fuck up such a tuning really hard. It's going to be hard to tune something if the strength of your opponent depends on which system the current match got scheduled on. If you think the problem of those FPU reduction threads is that the results vary too much from net to net, I think you're mistaken: the problem is that much of the testing was initially only done for a few games before turning knobs, so the results look very random. But people are starting to learn, I think. |
Fair point, and I am also sceptical of results with extremely few games. I do think there's a high possibility that FPU reduction (or any search parameter change for that matter) will also change the balance of strength between nets, to some extent. I got this impression from one post who retested a net that lost by 0:50 or maybe 1:50 to then current best, but managed to score 4:1 against that same net with FPU reduction. Of course a 4:1 does not mean that the first net is stronger, but it should mean that there is a statistically significant discrepancy to another test ending in 1:50. Another result that puzzled me is the blowout that @killerducky experienced when implementing dynamic parent evaluation for Minigo (tensorflow/minigo#87), when for Leela Zero this turned out considerably worse than FPU. I'm certainly not saying here that I can prove my suspicions in this regard, you may well be correct after all. I just saw a number of results that suggested that there are a lot of things we don't completely understand about the effects of search code changes, so testing such changes with multiple nets instead of a single one seems prudent to me. |
But minigo didn't have any kind of FPU reduction right? Leela Zero's next already has the original proposal. |
Minigo doesn't have any FPU reduction, so my results just show that dynamic parent evaluation is better than the original init to parent eval. I also have another pull for minigo that shows FPU reduction works will on minigo. I didn't test FPU reduction vs dynamic parent eval on minigo, which is what we are testing here. |
FPU reduction tested around 75% winrate vs static parent eval (baseline) for LZ, and dynamic parent eval tested around 40% vs FPU reduction. If we assume these results are transitive to at least a reasonable degree, there is a discrepancy to a 92% winrate of dynamic parent eval to static parent eval in Minigo, right? |
It's 92% for 65 games. Not sure how it could change with more games. Also is it on 9x9 ? Might be a lot better for this size. |
One reason is: It would help people to run side experiments based on Leela Zero. |
Yes, I understand that, but you also mentioned "any crazy ideas that could come :-)". Hence, I dumped a bunch of crazy ideas. :-D |
So, most of us, then. ;-)
So you're saying that it would be better if people were able to run large numbers of games in like, maybe, a tournament or something, so they can get more valid statistical results than possible on just their one local machine. ;-) |
I've found it only takes a day or two with a GPU to get a hundred or more games. |
You said it, "side experiments based on" are not part of this project. |
They could be if they are used to discover some useful bug fix or innovation. E.g. FPU calculations, winrate estimation tweaks (opportunity/risk), etc. |
Anyone used gomill/ringmaster under Ubuntu on Windows? My config.ctl: board_size = 19 players = { matchups = [ |
closing, no active discussion for ~1 year with subsequent issues and PRs (describing ringmaster on README) |
Is there a program who can manage multiple go engines to run a series of matches and display the result like arena (http://www.playwitharena.com/) or LittleBlitzer (http://www.kimiensoftware.com/software/chess/littleblitzer) could do for chess?
The text was updated successfully, but these errors were encountered: