Matches against Chinese top tier pros (2:0 so far, 4 x Titan V) #1046

bood · 2018-03-17T15:35:26Z

Days ago, yikeweiqi.com helped us setup some accounts in their platform so we can play LZ with other players. (They helped encouraging their user to help train LZ too)

Today, they invited a current Pro player ZhangLi (6P, Rank 52 in China, Elo ~2458) to play with LZ, and LZ got a clean win. Time setting is 60mins + 5 x 60s.

https://share.yikeweiqi.com/onlinechess/instantplay?room=2856380

Unfortunately the log is lost due to incorrect command line setting...

Second game vs ZhangTao (34 Zhang Tao ♂ cn 3446):
https://share.yikeweiqi.com/onlinechess/instantplay?room=2876452

Log:
8e3b_vs_zhangtao_20180318.log.gz

More Pros are yet to come to play with LZ, stay tuned. :-)

Network used uploaded, turns out to be a 10x192 one...
https://drive.google.com/file/d/1DGI5tcNtP9hmrIrb0Ey-UOFA_LzCeAIZ/view?usp=sharing

bochen2027 · 2018-03-17T18:18:20Z

What are the specs? Windows or linux sustem? how many gpu was used and which LZ network?
What do you mean by clean win? How is it possible you lost the logs lol

nevermind, I see in the title 4xtitanV... so did you see the n/s during teh game? was it comparable to the other recent match on OGS?

bochen2027 · 2018-03-17T19:15:38Z

https://www.reddit.com/r/cbaduk/comments/852szt/lz_recently_won_a_2h_game_against_fineart_aq_lost/

An older network of LZ won against FineArt with 2H, something that previously both Ke Jie and AQ had lost in this configuration!

davidsoncolin · 2018-03-17T21:54:14Z

Here's some meta-information from the page

42["init",{"my_list":[],"my_opponent_list":[],"user_list":[],"user_count":76,"invite_count":0,
"game_info":"id":"2856380","basicTime":"3600","readSecTime":"60","readSecLimit":"5","boardSize":"19",
"negotiation_flag":"0","negotiation_count":"0","isBlack":"1","offlineTime":"-1","komi":"7.5",
"handicap":"0","ratingFlag":"0","createTime":"1521288485","curColor":"1","startTime":"1521288490",
"isImmediate":"1","startSleep":"0","endSleep":"0","viewer_count":734,
"gameInfo":"\u6807\u51c6\u5373\u65f6",
"tourId":"-1","isAdminStop":"0","tourInfo":"",
"sgf":"(;GM[1]FF[4]CA[UTF-8]SO[\u5f08\u5ba2\u56f4\u68cb]RU[zh]KM[7.5]HA[0]SZ[19];B[pd];W[dp];B[pq];W[dd];B[cc];W[dc];B[cd];W[ce];B[be];W[bf];B[cf];W[de];B[bg];W[bd];B[af];W[bc];B[fq];W[qc];B[qd];W[pc];B[od];W[nb];B[cn];W[dn];B[dm];W[co];B[en];W[do];B[qk];W[em];B[dl];W[el];B[dk];W[fn];B[iq];W[po];B[np];W[qq];B[qr];W[qp];B[rr];W[ql];B[pl];W[rk];B[pk];W[rm];B[qm];W[rl];B[qn];W[rn];B[rj];W[on];B[pm];W[qo];B[qi];W[rq];B[lp];W[oq];B[or];W[fp];B[df];W[ek];B[dj];W[gq];B[gr];W[fr];B[hr];W[eq];B[mc];W[nc];B[nd];W[mb];B[lc];W[bn];B[rd];W[rc];B[ee];W[cb];B[sc];W[sb];B[sd];W[lb];B[kc];W[lh];B[jh];W[lj];B[jj];W[ni];B[kb];W[qb];B[fc];W[ej];B[nh];W[mh];B[oh];W[ll];B[fh];W[ei];B[di];W[eh];B[fg];W[oi];B[kg];W[fd];B[gd];W[ed];B[ge];W[mf];B[ph];W[pp];B[pr];W[mo];B[lo];W[bl];B[mm];W[ml];B[nm];W[jl];B[sq];W[sp];B[sr];W[sk];B[io];W[no];B[hl];W[lm];B[ln];W[nl];B[om];W[jk];B[ij];W[jn];B[im];W[jm])","cleanSgf":";B[pd];W[dp];B[pq];W[dd];B[cc];W[dc];B[cd];W[ce];B[be];W[bf];B[cf];W[de];B[bg];W[bd];B[af];W[bc];B[fq];W[qc];B[qd];W[pc];B[od];W[nb];B[cn];W[dn];B[dm];W[co];B[en];W[do];B[qk];W[em];B[dl];W[el];B[dk];W[fn];B[iq];W[po];B[np];W[qq];B[qr];W[qp];B[rr];W[ql];B[pl];W[rk];B[pk];W[rm];B[qm];W[rl];B[qn];W[rn];B[rj];W[on];B[pm];W[qo];B[qi];W[rq];B[lp];W[oq];B[or];W[fp];B[df];W[ek];B[dj];W[gq];B[gr];W[fr];B[hr];W[eq];B[mc];W[nc];B[nd];W[mb];B[lc];W[bn];B[rd];W[rc];B[ee];W[cb];B[sc];W[sb];B[sd];W[lb];B[kc];W[lh];B[jh];W[lj];B[jj];W[ni];B[kb];W[qb];B[fc];W[ej];B[nh];W[mh];B[oh];W[ll];B[fh];W[ei];B[di];W[eh];B[fg];W[oi];B[kg];W[fd];B[gd];W[ed];B[ge];W[mf];B[ph];W[pp];B[pr];W[mo];B[lo];W[bl];B[mm];W[ml];B[nm];W[jl];B[sq];W[sp];B[sr];W[sk];B[io];W[no];B[hl];W[lm];B[ln];W[nl];B[om];W[jk];B[ij];W[jn];B[im];W[jm]",
"isStop":"1","updateTime":"1521295771","status":"3","appContent":"0",
"handsCount":"134","result":"-1","resultDesc":"\u767d\u4e2d\u76d8\u80dc","type":"",
"blackId":"123703","blackName":"\u5f20\u7acb","blackFace":"http://cdn.yikeweiqi.com/face/98f73a96-4296-4c17-b752-935a2bb61d44.jpg",
"blackGrade":"9.7D","blackCurScore":"2920.690","blackIsReady":"1","blackIsReadSec":"1",
"blackReadTime":"60","blackReadSecLimit":"5","blackIsOff":"1","blackOffTime":"1521306780",
"blackOfflineTime":"-1","blackAchieveName":"\u6682\u65e0\u79f0\u53f7","blackParise":"12",
"whiteId":"513342","whiteName":"LeelaZero",
"whiteFace":"http://cdn.yikeweiqi.com/reguser/headimg/1e241eb452f11fc27ee1f25e0ed44354.jpg",
"whiteGrade":"8.7D","whiteCurScore":"2824.138","whiteIsReady":"1","whiteReadTime":"168",  
"whiteIsReadSec":"0","whiteReadSecLimit":"5","whiteIsOff":"1","whiteOffTime":"1521296207",
"whiteOfflineTime":"-1","whiteAchieveName":"\u6682\u65e0\u79f0\u53f7","whiteParise":"8",
"isSleep":0,"role":0},"rule_id":0}]

fell111 · 2018-03-17T23:13:07Z

Some additional info I found.
Zhang Li 6p, world rank 118, goratings 3305.
LZ running with 2 vega. LZ's winrate was 58% on move 38. Move 61dropped 6%. 71% on move 75. Over 80% on move 103.

davidsoncolin · 2018-03-17T23:47:13Z

lz_replay_18_03_17.txt

I ran lz with this unsophisticated command and manually played through the moves of the game:
leelaz.exe -w ed26f634a3420ced1cef437f4cddd7e35edafcf793b96d4cbaf2e7da376ccfec --v 150000 --gtp --logfile log.txt -r 10 -t 7

whenever lz didn't pick the same move I undid her move and played the one from the game. as with the haylee game this was generally a close second choice. however, there are a few moves where the picked move isn't right at the top. in particular, moves 56, 60, 86 and 110 weren't really considered in my experiment.

move number, coordinate, rank:
30 e7 2,
36 q5 3,
56 s3 9 !
58 p3 4,
60 f4 ?? !
70 o17 2,
84 m12 6,
86 m10 ?? !
88 o11 3,
92 e10 6,
96 m8 2,
102 p11 2,
106 e16 2,
110 q4 10 !
112 n5 2,
126 m7 3,
130 k9 2

bood · 2018-03-18T01:11:23Z

What do you mean by clean win? How is it possible you lost the logs lol

The tester somehow put some special characters behind -l, sadly

in particular, moves 56, 60, 86 and 110 weren't really considered in my experiment.

the network used is actually a habrid one, the tester thinks it is much stable. I will upload it here later.

bochen2027 · 2018-03-18T01:32:54Z

@bood Hybrid network? between leela and a third paty program or what? if so, how does that count as a clean win lol

can you upload it

bood · 2018-03-18T01:50:20Z

@Hydrogenpi why do you assume there is other program involved?! The hybris method is well described in #814 . clean is about the content, zhangli never found a chancw jn thw game.

And Im not a fan of hybrid network too, not my decision to make though. I just want to share the info.

bochen2027 · 2018-03-18T01:54:09Z

Okay I see. but this ""Chimera"" is not really a pure LZ network tho

pcengine · 2018-03-18T01:55:13Z

@bood Could you please share the hybrid network? Preferably on somewhere else other than baidu please?

bochen2027 · 2018-03-18T02:12:07Z

@pcengine I concur, I recommend he upload it to Internet Archive, that ways by default it will send to Google's VirusTotal for a clean scan. I'm sandboxed in VM but don't want any potential zero days

bood · 2018-03-18T03:20:02Z

Network uploaded

bochen2027 · 2018-03-18T03:31:21Z

@bood thanks. Could you tell us this was a combination of which two 10 block LZ networks so we can recreate and/or confirm ourselves?

wait, 10*192 means it wasn't simply adding two networks together. so this is more than just a hybrid?

@davidsoncolin at least you should be able to re(confirm) if the replay matches now.

bood · 2018-03-18T04:14:17Z

Sorry I cannot, I'm not the creator of this network. But we do have several 10x192 trained and tested before, I would assume it is combined from them.

bood · 2018-03-18T11:20:17Z

Against Zhang Tao (GoRating: 34 | Zhang Tao | ♂ | 3446) now, living:
https://share.yikeweiqi.com/onlinechess/instantplay?room=2876452

Network used is latest 20b: 8e3bd368

remdu · 2018-03-18T11:39:37Z

There is research supporting the "averaging weight" technique when it comes to networks from the same SGD run now https://arxiv.org/abs/1803.05407.

bood · 2018-03-18T12:27:19Z

Log with Zhang Tao uploaded.

Friday9i · 2018-03-18T12:33:02Z

Not speaking Chinese, but I get the impression Zhang Tao resigned: could someone check? I don't see how to download the sgf either (and not strong enough to assess the position still quite open)
Edit: oh, you uploaded it above, sorry! What do you use to open the log file please? And hence, what is the result (as I don't know how to open the log)?

remdu · 2018-03-18T12:52:35Z

The log is a text file, and LZ won. Apparently c13 from black was a big mistake.

Friday9i · 2018-03-18T13:09:43Z

On strong hardware, it seems that LZ is at least a 'good pro' now, and possibly a 'top pro'. A few more games are needed to confirm that feeling, but it's amazing anyway: congratulations to the dev team!!!

AAPMTG306 · 2018-03-18T13:31:06Z

zhang tao should be considered one of the top pros. He won the go blitz tournament in China in 2017, beating Ke Jie along the way.

bood · 2018-03-18T13:33:48Z

zhang tao should be considered one of the top pros. He won the go blitz tournament in China in 2017, beating Ke Jie along the way.

Yeah, but he may be not in his best status today. It's said he just finished a slow game in the day time. And he went straight into sleep right after this match. :-(

Friday9i · 2018-03-18T13:48:27Z

Not his best day possible, but what an achievement anyway! And a LZ bot with last network is reaching 3800 on CGOS! We've got a champion, congratulations once again : -))

bochen2027 · 2018-03-18T14:55:11Z

@Friday9i its a test that I'm running in conjunction with the ZenLeeBot / Aquabot operator, the same guy who hosted the AQ vs Haylee match two weeks ago. On exactly identical hardware specs, AQ ended up at 3680 (AquaTest http://www.yss-aya.com/cgos/19x19/standings.html ) and although LZed26 has no where near ran 100 matches yet, it already ranks higher than AQ and winrate pattern seems to be much stronger than AQ. Of course both of these tests were ran on amazon ec2 instances on the p3.8xlarge (4x V100 GPU) and AQ was getting sustained 20000 pp/s while I couldn't seem to get better performance out of Leelaz than compared to a setup where 4x1080's were used. Regardless this means without a doubt that LZ has surpassed AQ, What is suprising to me is that LZ scales better than AQ at higher hardware settings even when taking into consideration it seems to not do as well on AWS for whatever reason compared to bare metal/ non VM environment. I had expected that AQ will scale better at higher gpu settings but it appears not to be the case!
In all tests the vanilla versions were used for AQ and LZ, no hybrid networks and none of that nonesense. The V100 is comparable to a TitanV, but my guess is that on regular 1080/Ti versions, regardless at 1GPU or 2GPU or even 4GPU, LZ is now stronger than AQ.

LZ should go beat 95 pros.

bochen2027 · 2018-03-18T15:11:36Z

@bood so this new match was not a hybrid network correct?
"Network used is latest 20b: 8e3bd368"

bood · 2018-03-18T15:29:09Z

this new match was not a hybrid network correct?

You're right. It is a network published by gcp, as a test of 20b promotion.

bochen2027 · 2018-03-18T15:30:04Z

@bood thanks so that is a fair and square win. I believe LZ is already stronger than AQ. Lets see LZ win 95-5 against pros just like the AQ did!

pw31 · 2018-03-18T15:40:46Z

BTW, LZ ed26 going for the top on CGOS (see http://www.yss-aya.com/cgos/19x19/standings.html):

several 100 ELO stronger than any LZ version before I believe.

StanTraykov · 2018-03-18T18:59:56Z

Apart from network strength progression, it should be noted that this LZ runs on 4xV100 (cost of GPUs alone around $32,000).

Incidentally, whoever runs it, should probably modify their CGOS client to use kgs-genmove_cleanup instead of genmove (change 1 line in the script), as I can see an incorrect result on the last game it lost with Perseus-8 (whatever bot that is). In that game, it doesn't affect the win/loss outcome, but if I'm not mistaken, it could, theoretically.

StanTraykov · 2018-03-18T19:42:05Z

Wow, the 4xV100 LZ just failed to read a ladder terminating only 4 spaces away: last lost game vs moon_1.1. Was it some kind of time management issue? It had more time left than moon (3:30 vs 2:37 out of 15:00 each).

remdu · 2018-03-18T19:44:36Z

I think she was probably losing already. Can't read ladder when every move has 0% winrate.

bochen2027 · 2018-03-19T13:36:22Z

@bood the hybrid weights aren't stronger, I tested it on CGOS and it lost 200 ELO overnight lol

odeint · 2018-03-19T14:06:56Z

I tested it on CGOS and it lost 200 ELO overnight lol

I thought bot versions on CGOS were supposed to be static? Which bot is it?

roy7 · 2018-03-19T14:10:42Z

I thought bot versions on CGOS were supposed to be static? Which bot is it?

Correct. Any change in weights/code/etc should result in a new bot account on the site. A bot's strength should never change after it starts playing games, that will disrupt the purpose of CGOS and the admins have expressed a concern about this happening previously.

MaxMaki · 2018-03-19T14:23:17Z

There are currently 10 bots on CGOS playing which have a ELO of 3000 or higher. 5 of those are LZ versions.
On top of that, there are always a lot of random LZ ones that never end up with 100 games, or people changing the network in the middle etc.
There was already earlier discussion on the computer go mailing list of the Zen bots disconnecting because the server was turning into a LeelaZero testing ground making the server much less useful.

Please, if you run LZ on CGOS, try to run it for a longer time, definetly not under 100 games, and definetly not switching networks in the middle. Run it for much longer than 100 games if you can. I wouldn't be surprised if other bots started disconnecting from CGOS or the results becoming very unreliable because over half the bots are random leelazeros.

odeint · 2018-03-19T14:27:18Z

@MaxMaki agreed! I know it's exiting to test the latest networks or tune the MCTS with a custom flavour, but keep those experiments off CGOS until you're sure you have something good, and then commit to running it unchanged to at least 100 games, and give it a meaningful name if you can. Kudos at this point to whoever is running b3b00c6d, the last 128x6 as a nice anchor. 637 games and counting 👍

PS: Bonus points for being a bit conservative with time management to prevent losses on time.

bochen2027 · 2018-03-19T14:32:44Z

@odeint I clearly labeled it was a temporarily test bot with a t_LZ underscore prefix

I have since pulled it and started over with two more real tests that I will run to 150+ games each

bochen2027 · 2018-03-19T14:39:35Z

@roy7 thanks for advising, I was not aware of this as I read the site says in the disclaimer that the two assumptions were the testing was static and all games were played at the same time, I did not know it was compulsory that settings don't change. BTW the guy running the "Maximus" bot a while back not only upped his network but also hardware specs several times in game between matches.

MaxMaki · 2018-03-19T14:43:26Z

@Hydrogenpi It doesn't matter if you label it or not. If you change settings or do other things to change the strength you are ruining the ELO calculations for everyone. (EDIT: Note that you are still causing this even if you run less than 100 games) Also, if everyone keeps running LZ bots for a minimum of 100 games just to get a BayesELO score those scores will become less and less reliable as well.
This probobly wouldn't be an issue if there were many other bots as well running all the time, but there are not. There are extremely few anchors, 50% of the top population are LZ versions, and the rest of the bots coming in for testing are also running a minimum 100 games.

StanTraykov · 2018-03-19T15:30:46Z

Oh, here seems a good place to add my pet peeve. In addition to running 100+ games and never changing settings/network/hardware[1] for a single CGOS account, please, use more sensible names, so that there's a reasonable fix for what you are actually testing:

LZ-18827f-1x1080Ti and LZ-b7768-4xV100 are good names: we see the net and the hardware (presuming you used settings you considered optimal--there's no space left to add them)
LZ-b7768-t1-v3200 also works, we see limited visits and thread count and assume the hardware is good enough to always reach 3200 visits (but do check!). The thread count is also important as t6-v3200 is different (weaker) than t1-v3200.
LZ-b7768-t4-nolim on the other hand, tells us absolutely nothing: is it running on an old GTX670 or on 4xTesla V100s? One would be a top player, the other probably wouldn't be anywhere near 3000.

It's of course CGOS's fault that it doesn't provide longer names/more data fields, but for the time being, it'd be great if we tried to limit our CGOS use to sensible names and rigs dedicated to the test for 100+ games (there are approx. 2 games per hour on CGOS, so that'd be 2 days non-stop play, or likely over a week of on-and-off testing).

[1] You could, theoretically, change hardware with limited playouts/visits, but since CGOS games are short, you have to be sure both systems always play out in full.

alreadydone · 2018-03-20T05:45:24Z

the guy running the "Maximus" bot a while back not only upped his network but also hardware specs several times in game between matches.

@Hydrogenpi Where can I find these data? Seems that CGOS offers no place for them. The 19x19 Maximus may not be the same person as the current CGOS top 9x9 bot Maximus_160B_512F, I suppose?

higherdim · 2018-03-20T15:46:14Z

An older network of LZ won against FineArt with 2H, something that previously both Ke Jie and AQ had lost in this configuration!

@Hydrogenpi That was FineArt B/C, which is an older AlphaGo-Lee type network. The one beat Ke Jie with 2H is FineArt A, which is a new AGZ type network, probably 800 Elo stronger than FineArt B/C.

alreadydone · 2018-03-20T20:57:15Z

Apparently FineArt A is the "Hainan Challenge" version in this article. It's 40-block dual-resnet, not trained tabula rasa but used RL starting from old versions of FineArt with "millions of self-play games". Two other versions mentioned are also dual-resnets (20 blocks), i.e. not the old AG Lee architecture. I don't know whether FineArt B/C are even older versions though.

godmoves · 2018-03-21T03:00:05Z

@alreadydone Some pros tell me that the FineArt B is a modified version of Fuheyuqi.

alreadydone · 2018-03-21T06:37:45Z

good to know that. Fuheyuqi was a 20 block dual-resnet trained from several 0.1 millions of self-play games according to the article.

sethtroisi · 2019-02-14T06:12:26Z

closing, no active discussion for ~1year

bood changed the title ~~Clean won against ZhangLi 6P (4 x Titan V)~~ Matches against Chinese top tier pros (2:0 so far, 4 x Titan V) Mar 18, 2018

sethtroisi closed this as completed Feb 14, 2019

Matches against Chinese top tier pros (2:0 so far, 4 x Titan V) #1046

Matches against Chinese top tier pros (2:0 so far, 4 x Titan V) #1046

Comments

bood commented Mar 17, 2018 • edited

bochen2027 commented Mar 17, 2018 • edited

bochen2027 commented Mar 17, 2018

davidsoncolin commented Mar 17, 2018

fell111 commented Mar 17, 2018

davidsoncolin commented Mar 17, 2018

bood commented Mar 18, 2018

bochen2027 commented Mar 18, 2018 • edited

bood commented Mar 18, 2018

bochen2027 commented Mar 18, 2018

pcengine commented Mar 18, 2018

bochen2027 commented Mar 18, 2018

bood commented Mar 18, 2018

bochen2027 commented Mar 18, 2018 • edited

bood commented Mar 18, 2018

bood commented Mar 18, 2018 • edited

remdu commented Mar 18, 2018

bood commented Mar 18, 2018

Friday9i commented Mar 18, 2018 • edited

remdu commented Mar 18, 2018

Friday9i commented Mar 18, 2018

AAPMTG306 commented Mar 18, 2018

bood commented Mar 18, 2018

Friday9i commented Mar 18, 2018

bochen2027 commented Mar 18, 2018

bochen2027 commented Mar 18, 2018

bood commented Mar 18, 2018

bochen2027 commented Mar 18, 2018 • edited

pw31 commented Mar 18, 2018

StanTraykov commented Mar 18, 2018

StanTraykov commented Mar 18, 2018

remdu commented Mar 18, 2018

bochen2027 commented Mar 19, 2018

odeint commented Mar 19, 2018 • edited

roy7 commented Mar 19, 2018

MaxMaki commented Mar 19, 2018

odeint commented Mar 19, 2018 • edited

bochen2027 commented Mar 19, 2018

bochen2027 commented Mar 19, 2018

MaxMaki commented Mar 19, 2018 • edited

StanTraykov commented Mar 19, 2018

alreadydone commented Mar 20, 2018 • edited

higherdim commented Mar 20, 2018

alreadydone commented Mar 20, 2018 • edited

godmoves commented Mar 21, 2018

alreadydone commented Mar 21, 2018

sethtroisi commented Feb 14, 2019

bood commented Mar 17, 2018 •

edited

bochen2027 commented Mar 17, 2018 •

edited

bochen2027 commented Mar 18, 2018 •

edited

bochen2027 commented Mar 18, 2018 •

edited

bood commented Mar 18, 2018 •

edited

Friday9i commented Mar 18, 2018 •

edited

bochen2027 commented Mar 18, 2018 •

edited

odeint commented Mar 19, 2018 •

edited

odeint commented Mar 19, 2018 •

edited

MaxMaki commented Mar 19, 2018 •

edited

alreadydone commented Mar 20, 2018 •

edited

alreadydone commented Mar 20, 2018 •

edited