New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Matches against Chinese top tier pros (2:0 so far, 4 x Titan V) #1046
Comments
What are the specs? Windows or linux sustem? how many gpu was used and which LZ network? nevermind, I see in the title 4xtitanV... so did you see the n/s during teh game? was it comparable to the other recent match on OGS? |
https://www.reddit.com/r/cbaduk/comments/852szt/lz_recently_won_a_2h_game_against_fineart_aq_lost/ An older network of LZ won against FineArt with 2H, something that previously both Ke Jie and AQ had lost in this configuration! |
Here's some meta-information from the page
|
Some additional info I found. |
I ran lz with this unsophisticated command and manually played through the moves of the game: whenever lz didn't pick the same move I undid her move and played the one from the game. as with the haylee game this was generally a close second choice. however, there are a few moves where the picked move isn't right at the top. in particular, moves 56, 60, 86 and 110 weren't really considered in my experiment. move number, coordinate, rank: |
The tester somehow put some special characters behind -l, sadly
the network used is actually a habrid one, the tester thinks it is much stable. I will upload it here later. |
@bood Hybrid network? between leela and a third paty program or what? if so, how does that count as a clean win lol can you upload it |
@Hydrogenpi why do you assume there is other program involved?! The hybris method is well described in #814 . clean is about the content, zhangli never found a chancw jn thw game. And Im not a fan of hybrid network too, not my decision to make though. I just want to share the info. |
Okay I see. but this ""Chimera"" is not really a pure LZ network tho |
@bood Could you please share the hybrid network? Preferably on somewhere else other than baidu please? |
@pcengine I concur, I recommend he upload it to Internet Archive, that ways by default it will send to Google's VirusTotal for a clean scan. I'm sandboxed in VM but don't want any potential zero days |
Network uploaded |
@bood thanks. Could you tell us this was a combination of which two 10 block LZ networks so we can recreate and/or confirm ourselves? wait, 10*192 means it wasn't simply adding two networks together. so this is more than just a hybrid? @davidsoncolin at least you should be able to re(confirm) if the replay matches now. |
Sorry I cannot, I'm not the creator of this network. But we do have several 10x192 trained and tested before, I would assume it is combined from them. |
Against Zhang Tao (GoRating: 34 | Zhang Tao | ♂ | 3446) now, living: Network used is latest 20b: 8e3bd368 |
There is research supporting the "averaging weight" technique when it comes to networks from the same SGD run now https://arxiv.org/abs/1803.05407. |
Log with Zhang Tao uploaded. |
Not speaking Chinese, but I get the impression Zhang Tao resigned: could someone check? I don't see how to download the sgf either (and not strong enough to assess the position still quite open) |
The log is a text file, and LZ won. Apparently c13 from black was a big mistake. |
On strong hardware, it seems that LZ is at least a 'good pro' now, and possibly a 'top pro'. A few more games are needed to confirm that feeling, but it's amazing anyway: congratulations to the dev team!!! |
zhang tao should be considered one of the top pros. He won the go blitz tournament in China in 2017, beating Ke Jie along the way. |
Yeah, but he may be not in his best status today. It's said he just finished a slow game in the day time. And he went straight into sleep right after this match. :-( |
Not his best day possible, but what an achievement anyway! And a LZ bot with last network is reaching 3800 on CGOS! We've got a champion, congratulations once again : -)) |
@Friday9i its a test that I'm running in conjunction with the ZenLeeBot / Aquabot operator, the same guy who hosted the AQ vs Haylee match two weeks ago. On exactly identical hardware specs, AQ ended up at 3680 (AquaTest http://www.yss-aya.com/cgos/19x19/standings.html ) and although LZed26 has no where near ran 100 matches yet, it already ranks higher than AQ and winrate pattern seems to be much stronger than AQ. Of course both of these tests were ran on amazon ec2 instances on the p3.8xlarge (4x V100 GPU) and AQ was getting sustained 20000 pp/s while I couldn't seem to get better performance out of Leelaz than compared to a setup where 4x1080's were used. Regardless this means without a doubt that LZ has surpassed AQ, What is suprising to me is that LZ scales better than AQ at higher hardware settings even when taking into consideration it seems to not do as well on AWS for whatever reason compared to bare metal/ non VM environment. I had expected that AQ will scale better at higher gpu settings but it appears not to be the case! LZ should go beat 95 pros. |
@bood so this new match was not a hybrid network correct? |
You're right. It is a network published by gcp, as a test of 20b promotion. |
@bood thanks so that is a fair and square win. I believe LZ is already stronger than AQ. Lets see LZ win 95-5 against pros just like the AQ did! |
BTW, LZ ed26 going for the top on CGOS (see http://www.yss-aya.com/cgos/19x19/standings.html): several 100 ELO stronger than any LZ version before I believe. |
Apart from network strength progression, it should be noted that this LZ runs on 4xV100 (cost of GPUs alone around $32,000). Incidentally, whoever runs it, should probably modify their CGOS client to use kgs-genmove_cleanup instead of genmove (change 1 line in the script), as I can see an incorrect result on the last game it lost with Perseus-8 (whatever bot that is). In that game, it doesn't affect the win/loss outcome, but if I'm not mistaken, it could, theoretically. |
Wow, the 4xV100 LZ just failed to read a ladder terminating only 4 spaces away: last lost game vs moon_1.1. Was it some kind of time management issue? It had more time left than moon (3:30 vs 2:37 out of 15:00 each). |
I think she was probably losing already. Can't read ladder when every move has 0% winrate. |
@bood the hybrid weights aren't stronger, I tested it on CGOS and it lost 200 ELO overnight lol |
I thought bot versions on CGOS were supposed to be static? Which bot is it? |
Correct. Any change in weights/code/etc should result in a new bot account on the site. A bot's strength should never change after it starts playing games, that will disrupt the purpose of CGOS and the admins have expressed a concern about this happening previously. |
There are currently 10 bots on CGOS playing which have a ELO of 3000 or higher. 5 of those are LZ versions. Please, if you run LZ on CGOS, try to run it for a longer time, definetly not under 100 games, and definetly not switching networks in the middle. Run it for much longer than 100 games if you can. I wouldn't be surprised if other bots started disconnecting from CGOS or the results becoming very unreliable because over half the bots are random leelazeros. |
@MaxMaki agreed! I know it's exiting to test the latest networks or tune the MCTS with a custom flavour, but keep those experiments off CGOS until you're sure you have something good, and then commit to running it unchanged to at least 100 games, and give it a meaningful name if you can. Kudos at this point to whoever is running b3b00c6d, the last 128x6 as a nice anchor. 637 games and counting 👍 PS: Bonus points for being a bit conservative with time management to prevent losses on time. |
@odeint I clearly labeled it was a temporarily test bot with a t_LZ underscore prefix I have since pulled it and started over with two more real tests that I will run to 150+ games each |
@roy7 thanks for advising, I was not aware of this as I read the site says in the disclaimer that the two assumptions were the testing was static and all games were played at the same time, I did not know it was compulsory that settings don't change. BTW the guy running the "Maximus" bot a while back not only upped his network but also hardware specs several times in game between matches. |
@Hydrogenpi It doesn't matter if you label it or not. If you change settings or do other things to change the strength you are ruining the ELO calculations for everyone. (EDIT: Note that you are still causing this even if you run less than 100 games) Also, if everyone keeps running LZ bots for a minimum of 100 games just to get a BayesELO score those scores will become less and less reliable as well. |
Oh, here seems a good place to add my pet peeve. In addition to running 100+ games and never changing settings/network/hardware[1] for a single CGOS account, please, use more sensible names, so that there's a reasonable fix for what you are actually testing:
It's of course CGOS's fault that it doesn't provide longer names/more data fields, but for the time being, it'd be great if we tried to limit our CGOS use to sensible names and rigs dedicated to the test for 100+ games (there are approx. 2 games per hour on CGOS, so that'd be 2 days non-stop play, or likely over a week of on-and-off testing). [1] You could, theoretically, change hardware with limited playouts/visits, but since CGOS games are short, you have to be sure both systems always play out in full. |
@Hydrogenpi Where can I find these data? Seems that CGOS offers no place for them. The 19x19 Maximus may not be the same person as the current CGOS top 9x9 bot Maximus_160B_512F, I suppose? |
@Hydrogenpi That was FineArt B/C, which is an older AlphaGo-Lee type network. The one beat Ke Jie with 2H is FineArt A, which is a new AGZ type network, probably 800 Elo stronger than FineArt B/C. |
Apparently FineArt A is the "Hainan Challenge" version in this article. It's 40-block dual-resnet, not trained tabula rasa but used RL starting from old versions of FineArt with "millions of self-play games". Two other versions mentioned are also dual-resnets (20 blocks), i.e. not the old AG Lee architecture. I don't know whether FineArt B/C are even older versions though. |
@alreadydone Some pros tell me that the FineArt B is a modified version of Fuheyuqi. |
good to know that. Fuheyuqi was a 20 block dual-resnet trained from several 0.1 millions of self-play games according to the article. |
closing, no active discussion for ~1year |
Days ago, yikeweiqi.com helped us setup some accounts in their platform so we can play LZ with other players. (They helped encouraging their user to help train LZ too)
Today, they invited a current Pro player ZhangLi (6P, Rank 52 in China, Elo ~2458) to play with LZ, and LZ got a clean win. Time setting is 60mins + 5 x 60s.
https://share.yikeweiqi.com/onlinechess/instantplay?room=2856380
Unfortunately the log is lost due to incorrect command line setting...
Second game vs ZhangTao (34 Zhang Tao ♂ cn 3446):
https://share.yikeweiqi.com/onlinechess/instantplay?room=2876452
Log:
8e3b_vs_zhangtao_20180318.log.gz
More Pros are yet to come to play with LZ, stay tuned. :-)
Network used uploaded, turns out to be a 10x192 one...
https://drive.google.com/file/d/1DGI5tcNtP9hmrIrb0Ey-UOFA_LzCeAIZ/view?usp=sharing
The text was updated successfully, but these errors were encountered: