Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KataGo v1.4.2 vs LZ272 #254

Open
lightvector opened this issue Jun 21, 2020 · 6 comments
Open

KataGo v1.4.2 vs LZ272 #254

lightvector opened this issue Jun 21, 2020 · 6 comments

Comments

@lightvector
Copy link
Owner

lightvector commented Jun 21, 2020

Just posting for the record some test results against LZ272 that I ran a while back using KataGo 1.4.2 and the last "semi-zero" nets (not the last nets in the run as a whole), which were g170-b40c256x2-s3708042240-d967973220 (40 blocks) and g170e-b20c256x2-s4384473088-d968438914 (20 blocks). And also against LZ-ELFv2, just to see how far we've come since ELF.

I posted these results about a month ago in the discord chat, this is just re-posting them here.

Summary: KataGo won around 80-90% of games given comparable amounts of compute time (but, on a V100 machine which might have a smaller gap between GPU performance between KG and LZ than would be the case on certain users' hardware) and won 70%-80% of games when put at a modest visits handicap to LZ, without having to enable avoidMYTDaggerHack, although enabling it significantly further helped in some cases.


All tests used a single V100 cloud GPU (roughly, comparable to AWS "P3 2xlarge" instance, except on Google Cloud, not AWS).

KataGo was left at mostly default settings, but with a bit of tuning:

  • 64 threads (suggested by the normal benchmark tool for the 40b, did not attempt to tune 20b separately).
  • NN cache bumped to 2^23
  • As a reminder, default settings also include 0.5 early temperature, decaying to 0.1 with halflife 19.

LZ272 and LZ-ELF used:

  • --threads 32 --batchsize 16 since some testing indicated that this produced best LZ performance given the GPU.
  • --randomcnt 20 --randomtemp 0.3 to increase opening diversity on LZ's side a little in lieu of having an opening panel. Higher than LZ's default of no temperature at all, but still lower and briefer overall than KataGo's default.
  • --noponder --timemanage off

Also, both sides set to resign immediately at 5% winrate.

First test, KG set to use a fixed 5 seconds per move, and LZ used 18K playouts per move LZ-ELFv2 used 36K playouts per move, aiming to make them take about 5 seconds per move because they have no command-line way to fix a time per move. In actuality, they took about 5.6 s/move and 6s/move, so this calibration was a bit off, in LZ's favor.

Win/loss results:

                              LZ272(40b)  LZ-ELFv2(20b)
KG40b avoid dagger hack:     151/162(93%)  79/81 (97%)
KG40b plain:                 135/164(82%)  78/82 (95%)
KG20b avoid dagger hack:     143/160(89%)  76/80 (95%)
KG20b plain:                 150/164(91%)  79/82 (96%)

Second test: fixed playouts, KG set a bit lower than either LZ or ELF.

  • LZ used 10k playouts/move
  • LZ-ELFv2 used 20k playouts/move
  • KG 40b and 20b BOTH used 5k playouts/move. (so 20b moves quite fast).
                              LZ272(40b)  LZ-ELFv2(20b)
KG40b avoid dagger hack:     148/172(86%)  78/86 (90%)
KG40b plain:                 137/168(81%)  73/84 (86%)
KG20b avoid dagger hack:     126/170(74%)  76/86 (88%)
KG20b plain:                 118/168(70%)  70/84 (83%)
@lightvector
Copy link
Owner Author

lightvector commented Jun 21, 2020

Games here:
kg142-vslz272elf5s-split.zip
kg142-vslz272elfpfixed-split.zip

Also, I'm momentarily about to upload some nets that, at least when tested against older KataGo nets, appear to be much stronger than these nets, due to learning rate drops at the end of the g170 run. :)

@y-ich
Copy link
Contributor

y-ich commented Jun 21, 2020

@lightvector san,

Great results!

I have a question.
How much stronger is KataGo 40b than KataGo 20b on the same condition of a fixed 5 seconds per move?

@Friday9i
Copy link

For recent nets, 40b and 20b were close from each other on time parity, but 20b was almost not progressing anymore. For the coming nets, I guess 40b will be significantly stronger as it improved a lot and I guess 20b did not improve a lot. But we'll know more in a few hours hopefully, patience 😜

@y-ich
Copy link
Contributor

y-ich commented Jun 21, 2020

@Friday9i san,

I can't wait for the new release!😆

@lightvector
Copy link
Owner Author

Done! https://github.com/lightvector/KataGo/releases

Gained 200ish Elo for the 40 block net, and 100ish Elo for the 20 block net, based on matches against earlier networks. No idea how much this gain transfers to gains against opponents like LZ, but anyone is free of course to try them and compare. Enjoy!

@sbbdms
Copy link

sbbdms commented Jun 22, 2020

Congrats!!
Any plan or new features for the next official run?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants