Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matches against Chinese top tier pros (2:0 so far, 4 x Titan V) #1046

Closed
bood opened this Issue Mar 17, 2018 · 46 comments

Comments

Projects
None yet
@bood
Copy link
Collaborator

bood commented Mar 17, 2018

Days ago, yikeweiqi.com helped us setup some accounts in their platform so we can play LZ with other players. (They helped encouraging their user to help train LZ too)

Today, they invited a current Pro player ZhangLi (6P, Rank 52 in China, Elo ~2458) to play with LZ, and LZ got a clean win. Time setting is 60mins + 5 x 60s.

https://share.yikeweiqi.com/onlinechess/instantplay?room=2856380

Unfortunately the log is lost due to incorrect command line setting...

Second game vs ZhangTao (34 Zhang Tao ♂ cn 3446):
https://share.yikeweiqi.com/onlinechess/instantplay?room=2876452

Log:
8e3b_vs_zhangtao_20180318.log.gz

More Pros are yet to come to play with LZ, stay tuned. :-)

Network used uploaded, turns out to be a 10x192 one...
https://drive.google.com/file/d/1DGI5tcNtP9hmrIrb0Ey-UOFA_LzCeAIZ/view?usp=sharing

@hydrogenpi

This comment has been minimized.

Copy link

hydrogenpi commented Mar 17, 2018

What are the specs? Windows or linux sustem? how many gpu was used and which LZ network?
What do you mean by clean win? How is it possible you lost the logs lol

nevermind, I see in the title 4xtitanV... so did you see the n/s during teh game? was it comparable to the other recent match on OGS?

@hydrogenpi

This comment has been minimized.

Copy link

hydrogenpi commented Mar 17, 2018

https://www.reddit.com/r/cbaduk/comments/852szt/lz_recently_won_a_2h_game_against_fineart_aq_lost/

An older network of LZ won against FineArt with 2H, something that previously both Ke Jie and AQ had lost in this configuration!

@davidsoncolin

This comment has been minimized.

Copy link

davidsoncolin commented Mar 17, 2018

Here's some meta-information from the page

42["init",{"my_list":[],"my_opponent_list":[],"user_list":[],"user_count":76,"invite_count":0,
"game_info":"id":"2856380","basicTime":"3600","readSecTime":"60","readSecLimit":"5","boardSize":"19",
"negotiation_flag":"0","negotiation_count":"0","isBlack":"1","offlineTime":"-1","komi":"7.5",
"handicap":"0","ratingFlag":"0","createTime":"1521288485","curColor":"1","startTime":"1521288490",
"isImmediate":"1","startSleep":"0","endSleep":"0","viewer_count":734,
"gameInfo":"\u6807\u51c6\u5373\u65f6",
"tourId":"-1","isAdminStop":"0","tourInfo":"",
"sgf":"(;GM[1]FF[4]CA[UTF-8]SO[\u5f08\u5ba2\u56f4\u68cb]RU[zh]KM[7.5]HA[0]SZ[19];B[pd];W[dp];B[pq];W[dd];B[cc];W[dc];B[cd];W[ce];B[be];W[bf];B[cf];W[de];B[bg];W[bd];B[af];W[bc];B[fq];W[qc];B[qd];W[pc];B[od];W[nb];B[cn];W[dn];B[dm];W[co];B[en];W[do];B[qk];W[em];B[dl];W[el];B[dk];W[fn];B[iq];W[po];B[np];W[qq];B[qr];W[qp];B[rr];W[ql];B[pl];W[rk];B[pk];W[rm];B[qm];W[rl];B[qn];W[rn];B[rj];W[on];B[pm];W[qo];B[qi];W[rq];B[lp];W[oq];B[or];W[fp];B[df];W[ek];B[dj];W[gq];B[gr];W[fr];B[hr];W[eq];B[mc];W[nc];B[nd];W[mb];B[lc];W[bn];B[rd];W[rc];B[ee];W[cb];B[sc];W[sb];B[sd];W[lb];B[kc];W[lh];B[jh];W[lj];B[jj];W[ni];B[kb];W[qb];B[fc];W[ej];B[nh];W[mh];B[oh];W[ll];B[fh];W[ei];B[di];W[eh];B[fg];W[oi];B[kg];W[fd];B[gd];W[ed];B[ge];W[mf];B[ph];W[pp];B[pr];W[mo];B[lo];W[bl];B[mm];W[ml];B[nm];W[jl];B[sq];W[sp];B[sr];W[sk];B[io];W[no];B[hl];W[lm];B[ln];W[nl];B[om];W[jk];B[ij];W[jn];B[im];W[jm])","cleanSgf":";B[pd];W[dp];B[pq];W[dd];B[cc];W[dc];B[cd];W[ce];B[be];W[bf];B[cf];W[de];B[bg];W[bd];B[af];W[bc];B[fq];W[qc];B[qd];W[pc];B[od];W[nb];B[cn];W[dn];B[dm];W[co];B[en];W[do];B[qk];W[em];B[dl];W[el];B[dk];W[fn];B[iq];W[po];B[np];W[qq];B[qr];W[qp];B[rr];W[ql];B[pl];W[rk];B[pk];W[rm];B[qm];W[rl];B[qn];W[rn];B[rj];W[on];B[pm];W[qo];B[qi];W[rq];B[lp];W[oq];B[or];W[fp];B[df];W[ek];B[dj];W[gq];B[gr];W[fr];B[hr];W[eq];B[mc];W[nc];B[nd];W[mb];B[lc];W[bn];B[rd];W[rc];B[ee];W[cb];B[sc];W[sb];B[sd];W[lb];B[kc];W[lh];B[jh];W[lj];B[jj];W[ni];B[kb];W[qb];B[fc];W[ej];B[nh];W[mh];B[oh];W[ll];B[fh];W[ei];B[di];W[eh];B[fg];W[oi];B[kg];W[fd];B[gd];W[ed];B[ge];W[mf];B[ph];W[pp];B[pr];W[mo];B[lo];W[bl];B[mm];W[ml];B[nm];W[jl];B[sq];W[sp];B[sr];W[sk];B[io];W[no];B[hl];W[lm];B[ln];W[nl];B[om];W[jk];B[ij];W[jn];B[im];W[jm]",
"isStop":"1","updateTime":"1521295771","status":"3","appContent":"0",
"handsCount":"134","result":"-1","resultDesc":"\u767d\u4e2d\u76d8\u80dc","type":"",
"blackId":"123703","blackName":"\u5f20\u7acb","blackFace":"http://cdn.yikeweiqi.com/face/98f73a96-4296-4c17-b752-935a2bb61d44.jpg",
"blackGrade":"9.7D","blackCurScore":"2920.690","blackIsReady":"1","blackIsReadSec":"1",
"blackReadTime":"60","blackReadSecLimit":"5","blackIsOff":"1","blackOffTime":"1521306780",
"blackOfflineTime":"-1","blackAchieveName":"\u6682\u65e0\u79f0\u53f7","blackParise":"12",
"whiteId":"513342","whiteName":"LeelaZero",
"whiteFace":"http://cdn.yikeweiqi.com/reguser/headimg/1e241eb452f11fc27ee1f25e0ed44354.jpg",
"whiteGrade":"8.7D","whiteCurScore":"2824.138","whiteIsReady":"1","whiteReadTime":"168",  
"whiteIsReadSec":"0","whiteReadSecLimit":"5","whiteIsOff":"1","whiteOffTime":"1521296207",
"whiteOfflineTime":"-1","whiteAchieveName":"\u6682\u65e0\u79f0\u53f7","whiteParise":"8",
"isSleep":0,"role":0},"rule_id":0}]
@fell111

This comment has been minimized.

Copy link

fell111 commented Mar 17, 2018

Some additional info I found.
Zhang Li 6p, world rank 118, goratings 3305.
LZ running with 2 vega. LZ's winrate was 58% on move 38. Move 61dropped 6%. 71% on move 75. Over 80% on move 103.

@davidsoncolin

This comment has been minimized.

Copy link

davidsoncolin commented Mar 17, 2018

lz_replay_18_03_17.txt

I ran lz with this unsophisticated command and manually played through the moves of the game:
leelaz.exe -w ed26f634a3420ced1cef437f4cddd7e35edafcf793b96d4cbaf2e7da376ccfec --v 150000 --gtp --logfile log.txt -r 10 -t 7

whenever lz didn't pick the same move I undid her move and played the one from the game. as with the haylee game this was generally a close second choice. however, there are a few moves where the picked move isn't right at the top. in particular, moves 56, 60, 86 and 110 weren't really considered in my experiment.

move number, coordinate, rank:
30 e7 2,
36 q5 3,
56 s3 9 !
58 p3 4,
60 f4 ?? !
70 o17 2,
84 m12 6,
86 m10 ?? !
88 o11 3,
92 e10 6,
96 m8 2,
102 p11 2,
106 e16 2,
110 q4 10 !
112 n5 2,
126 m7 3,
130 k9 2

@bood

This comment has been minimized.

Copy link
Collaborator Author

bood commented Mar 18, 2018

What do you mean by clean win? How is it possible you lost the logs lol

The tester somehow put some special characters behind -l, sadly

in particular, moves 56, 60, 86 and 110 weren't really considered in my experiment.

the network used is actually a habrid one, the tester thinks it is much stable. I will upload it here later.

@hydrogenpi

This comment has been minimized.

Copy link

hydrogenpi commented Mar 18, 2018

@bood Hybrid network? between leela and a third paty program or what? if so, how does that count as a clean win lol

can you upload it

@bood

This comment has been minimized.

Copy link
Collaborator Author

bood commented Mar 18, 2018

@hydrogenpi why do you assume there is other program involved?! The hybris method is well described in #814 . clean is about the content, zhangli never found a chancw jn thw game.

And Im not a fan of hybrid network too, not my decision to make though. I just want to share the info.

@hydrogenpi

This comment has been minimized.

Copy link

hydrogenpi commented Mar 18, 2018

Okay I see. but this ""Chimera"" is not really a pure LZ network tho

@pcengine

This comment has been minimized.

Copy link

pcengine commented Mar 18, 2018

@bood Could you please share the hybrid network? Preferably on somewhere else other than baidu please?

@hydrogenpi

This comment has been minimized.

Copy link

hydrogenpi commented Mar 18, 2018

@pcengine I concur, I recommend he upload it to Internet Archive, that ways by default it will send to Google's VirusTotal for a clean scan. I'm sandboxed in VM but don't want any potential zero days

@bood

This comment has been minimized.

Copy link
Collaborator Author

bood commented Mar 18, 2018

Network uploaded

@hydrogenpi

This comment has been minimized.

Copy link

hydrogenpi commented Mar 18, 2018

@bood thanks. Could you tell us this was a combination of which two 10 block LZ networks so we can recreate and/or confirm ourselves?

wait, 10*192 means it wasn't simply adding two networks together. so this is more than just a hybrid?

@davidsoncolin at least you should be able to re(confirm) if the replay matches now.

@bood

This comment has been minimized.

Copy link
Collaborator Author

bood commented Mar 18, 2018

Sorry I cannot, I'm not the creator of this network. But we do have several 10x192 trained and tested before, I would assume it is combined from them.

@bood

This comment has been minimized.

Copy link
Collaborator Author

bood commented Mar 18, 2018

Against Zhang Tao (GoRating: 34 | Zhang Tao | ♂ | 3446) now, living:
https://share.yikeweiqi.com/onlinechess/instantplay?room=2876452

Network used is latest 20b: 8e3bd368

@Eddh

This comment has been minimized.

Copy link
Contributor

Eddh commented Mar 18, 2018

There is research supporting the "averaging weight" technique when it comes to networks from the same SGD run now https://arxiv.org/abs/1803.05407.

@bood

This comment has been minimized.

Copy link
Collaborator Author

bood commented Mar 18, 2018

Log with Zhang Tao uploaded.

@Friday9i

This comment has been minimized.

Copy link

Friday9i commented Mar 18, 2018

Not speaking Chinese, but I get the impression Zhang Tao resigned: could someone check? I don't see how to download the sgf either (and not strong enough to assess the position still quite open)
Edit: oh, you uploaded it above, sorry! What do you use to open the log file please? And hence, what is the result (as I don't know how to open the log)?

@Eddh

This comment has been minimized.

Copy link
Contributor

Eddh commented Mar 18, 2018

The log is a text file, and LZ won. Apparently c13 from black was a big mistake.

@Friday9i

This comment has been minimized.

Copy link

Friday9i commented Mar 18, 2018

On strong hardware, it seems that LZ is at least a 'good pro' now, and possibly a 'top pro'. A few more games are needed to confirm that feeling, but it's amazing anyway: congratulations to the dev team!!!

@pheasant75

This comment has been minimized.

Copy link

pheasant75 commented Mar 18, 2018

zhang tao should be considered one of the top pros. He won the go blitz tournament in China in 2017, beating Ke Jie along the way.

@bood

This comment has been minimized.

Copy link
Collaborator Author

bood commented Mar 18, 2018

zhang tao should be considered one of the top pros. He won the go blitz tournament in China in 2017, beating Ke Jie along the way.

Yeah, but he may be not in his best status today. It's said he just finished a slow game in the day time. And he went straight into sleep right after this match. :-(

@Friday9i

This comment has been minimized.

Copy link

Friday9i commented Mar 18, 2018

Not his best day possible, but what an achievement anyway! And a LZ bot with last network is reaching 3800 on CGOS! We've got a champion, congratulations once again : -))

@hydrogenpi

This comment has been minimized.

Copy link

hydrogenpi commented Mar 18, 2018

@Friday9i its a test that I'm running in conjunction with the ZenLeeBot / Aquabot operator, the same guy who hosted the AQ vs Haylee match two weeks ago. On exactly identical hardware specs, AQ ended up at 3680 (AquaTest http://www.yss-aya.com/cgos/19x19/standings.html ) and although LZed26 has no where near ran 100 matches yet, it already ranks higher than AQ and winrate pattern seems to be much stronger than AQ. Of course both of these tests were ran on amazon ec2 instances on the p3.8xlarge (4x V100 GPU) and AQ was getting sustained 20000 pp/s while I couldn't seem to get better performance out of Leelaz than compared to a setup where 4x1080's were used. Regardless this means without a doubt that LZ has surpassed AQ, What is suprising to me is that LZ scales better than AQ at higher hardware settings even when taking into consideration it seems to not do as well on AWS for whatever reason compared to bare metal/ non VM environment. I had expected that AQ will scale better at higher gpu settings but it appears not to be the case!
In all tests the vanilla versions were used for AQ and LZ, no hybrid networks and none of that nonesense. The V100 is comparable to a TitanV, but my guess is that on regular 1080/Ti versions, regardless at 1GPU or 2GPU or even 4GPU, LZ is now stronger than AQ.

LZ should go beat 95 pros.

@hydrogenpi

This comment has been minimized.

Copy link

hydrogenpi commented Mar 18, 2018

@bood so this new match was not a hybrid network correct?
"Network used is latest 20b: 8e3bd368"

@bood

This comment has been minimized.

Copy link
Collaborator Author

bood commented Mar 18, 2018

this new match was not a hybrid network correct?

You're right. It is a network published by gcp, as a test of 20b promotion.

@hydrogenpi

This comment has been minimized.

Copy link

hydrogenpi commented Mar 18, 2018

@bood thanks so that is a fair and square win. I believe LZ is already stronger than AQ. Lets see LZ win 95-5 against pros just like the AQ did!

@bood bood changed the title Clean won against ZhangLi 6P (4 x Titan V) Matches against Chinese top tier pros (2:0 so far, 4 x Titan V) Mar 18, 2018

@pw31

This comment has been minimized.

Copy link

pw31 commented Mar 18, 2018

BTW, LZ ed26 going for the top on CGOS (see http://www.yss-aya.com/cgos/19x19/standings.html):

image

several 100 ELO stronger than any LZ version before I believe.

@StanTraykov

This comment has been minimized.

Copy link

StanTraykov commented Mar 18, 2018

Apart from network strength progression, it should be noted that this LZ runs on 4xV100 (cost of GPUs alone around $32,000).

Incidentally, whoever runs it, should probably modify their CGOS client to use kgs-genmove_cleanup instead of genmove (change 1 line in the script), as I can see an incorrect result on the last game it lost with Perseus-8 (whatever bot that is). In that game, it doesn't affect the win/loss outcome, but if I'm not mistaken, it could, theoretically.

@StanTraykov

This comment has been minimized.

Copy link

StanTraykov commented Mar 18, 2018

Wow, the 4xV100 LZ just failed to read a ladder terminating only 4 spaces away: last lost game vs moon_1.1. Was it some kind of time management issue? It had more time left than moon (3:30 vs 2:37 out of 15:00 each).

@Eddh

This comment has been minimized.

Copy link
Contributor

Eddh commented Mar 18, 2018

I think she was probably losing already. Can't read ladder when every move has 0% winrate.

@hydrogenpi

This comment has been minimized.

Copy link

hydrogenpi commented Mar 19, 2018

@bood the hybrid weights aren't stronger, I tested it on CGOS and it lost 200 ELO overnight lol

@odeint

This comment has been minimized.

Copy link

odeint commented Mar 19, 2018

I tested it on CGOS and it lost 200 ELO overnight lol

I thought bot versions on CGOS were supposed to be static? Which bot is it?

@roy7

This comment has been minimized.

Copy link
Collaborator

roy7 commented Mar 19, 2018

I thought bot versions on CGOS were supposed to be static? Which bot is it?

Correct. Any change in weights/code/etc should result in a new bot account on the site. A bot's strength should never change after it starts playing games, that will disrupt the purpose of CGOS and the admins have expressed a concern about this happening previously.

@MaxMaki

This comment has been minimized.

Copy link

MaxMaki commented Mar 19, 2018

There are currently 10 bots on CGOS playing which have a ELO of 3000 or higher. 5 of those are LZ versions.
On top of that, there are always a lot of random LZ ones that never end up with 100 games, or people changing the network in the middle etc.
There was already earlier discussion on the computer go mailing list of the Zen bots disconnecting because the server was turning into a LeelaZero testing ground making the server much less useful.

Please, if you run LZ on CGOS, try to run it for a longer time, definetly not under 100 games, and definetly not switching networks in the middle. Run it for much longer than 100 games if you can. I wouldn't be surprised if other bots started disconnecting from CGOS or the results becoming very unreliable because over half the bots are random leelazeros.

@odeint

This comment has been minimized.

Copy link

odeint commented Mar 19, 2018

@MaxMaki agreed! I know it's exiting to test the latest networks or tune the MCTS with a custom flavour, but keep those experiments off CGOS until you're sure you have something good, and then commit to running it unchanged to at least 100 games, and give it a meaningful name if you can. Kudos at this point to whoever is running b3b00c6d, the last 128x6 as a nice anchor. 637 games and counting 👍

PS: Bonus points for being a bit conservative with time management to prevent losses on time.

@hydrogenpi

This comment has been minimized.

Copy link

hydrogenpi commented Mar 19, 2018

@odeint I clearly labeled it was a temporarily test bot with a t_LZ underscore prefix

I have since pulled it and started over with two more real tests that I will run to 150+ games each

@hydrogenpi

This comment has been minimized.

Copy link

hydrogenpi commented Mar 19, 2018

@roy7 thanks for advising, I was not aware of this as I read the site says in the disclaimer that the two assumptions were the testing was static and all games were played at the same time, I did not know it was compulsory that settings don't change. BTW the guy running the "Maximus" bot a while back not only upped his network but also hardware specs several times in game between matches.

@MaxMaki

This comment has been minimized.

Copy link

MaxMaki commented Mar 19, 2018

@hydrogenpi It doesn't matter if you label it or not. If you change settings or do other things to change the strength you are ruining the ELO calculations for everyone. (EDIT: Note that you are still causing this even if you run less than 100 games) Also, if everyone keeps running LZ bots for a minimum of 100 games just to get a BayesELO score those scores will become less and less reliable as well.
This probobly wouldn't be an issue if there were many other bots as well running all the time, but there are not. There are extremely few anchors, 50% of the top population are LZ versions, and the rest of the bots coming in for testing are also running a minimum 100 games.

@StanTraykov

This comment has been minimized.

Copy link

StanTraykov commented Mar 19, 2018

Oh, here seems a good place to add my pet peeve. In addition to running 100+ games and never changing settings/network/hardware[1] for a single CGOS account, please, use more sensible names, so that there's a reasonable fix for what you are actually testing:

  • LZ-18827f-1x1080Ti and LZ-b7768-4xV100 are good names: we see the net and the hardware (presuming you used settings you considered optimal--there's no space left to add them)
  • LZ-b7768-t1-v3200 also works, we see limited visits and thread count and assume the hardware is good enough to always reach 3200 visits (but do check!). The thread count is also important as t6-v3200 is different (weaker) than t1-v3200.
  • LZ-b7768-t4-nolim on the other hand, tells us absolutely nothing: is it running on an old GTX670 or on 4xTesla V100s? One would be a top player, the other probably wouldn't be anywhere near 3000.

It's of course CGOS's fault that it doesn't provide longer names/more data fields, but for the time being, it'd be great if we tried to limit our CGOS use to sensible names and rigs dedicated to the test for 100+ games (there are approx. 2 games per hour on CGOS, so that'd be 2 days non-stop play, or likely over a week of on-and-off testing).

[1] You could, theoretically, change hardware with limited playouts/visits, but since CGOS games are short, you have to be sure both systems always play out in full.

@alreadydone

This comment has been minimized.

Copy link
Contributor

alreadydone commented Mar 20, 2018

the guy running the "Maximus" bot a while back not only upped his network but also hardware specs several times in game between matches.

@hydrogenpi Where can I find these data? Seems that CGOS offers no place for them. The 19x19 Maximus may not be the same person as the current CGOS top 9x9 bot Maximus_160B_512F, I suppose?

@higherdim

This comment has been minimized.

Copy link

higherdim commented Mar 20, 2018

An older network of LZ won against FineArt with 2H, something that previously both Ke Jie and AQ had lost in this configuration!

@hydrogenpi That was FineArt B/C, which is an older AlphaGo-Lee type network. The one beat Ke Jie with 2H is FineArt A, which is a new AGZ type network, probably 800 Elo stronger than FineArt B/C.

@alreadydone

This comment has been minimized.

Copy link
Contributor

alreadydone commented Mar 20, 2018

Apparently FineArt A is the "Hainan Challenge" version in this article. It's 40-block dual-resnet, not trained tabula rasa but used RL starting from old versions of FineArt with "millions of self-play games". Two other versions mentioned are also dual-resnets (20 blocks), i.e. not the old AG Lee architecture. I don't know whether FineArt B/C are even older versions though.

@godmoves

This comment has been minimized.

Copy link
Contributor

godmoves commented Mar 21, 2018

@alreadydone Some pros tell me that the FineArt B is a modified version of Fuheyuqi.

@alreadydone

This comment has been minimized.

Copy link
Contributor

alreadydone commented Mar 21, 2018

good to know that. Fuheyuqi was a 20 block dual-resnet trained from several 0.1 millions of self-play games according to the article.

@sethtroisi

This comment has been minimized.

Copy link
Member

sethtroisi commented Feb 14, 2019

closing, no active discussion for ~1year

@sethtroisi sethtroisi closed this Feb 14, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.