Handicap games with additional planes #2331

ihavnoid · 2019-04-11T13:56:24Z

I always wanted to learn how to play better, but I figured out that modern Go engines are just waaaaay better than mere mortals, and playing an even game with them will make things even harder to learn - and hence I always wanted an engine that is capable of playing handicapped games well. There were quite a bit of ideas that suggested good handicap game ideas, which some of them I tried and didn't work well.

Initially I tried the 'uneven playout' ideas or assigning more randomness to the losing side, but those all didn't go well due to the engines only learning to win by expecting the opponent to make some obvious mistake. The problem didn't really seem to be limited to the handicapped player having no hope of winning.

So, I decided to try some more tricks using ideas from KataGo (#2260). To be specific:

The problem is that if a player is on a losing situation, it should add uncertainty and increase the probability of a larger territory even if there is no chance of winning.
Thus, it seems that we need another output plane - in this case, the board occupancy seemed to be a useful feature. So, I added two outputs - each are used for predicting the end state of the board - to be specific, the third head looks like this:

        # endstate head
        conv_st = self.conv_block(flow, filter_size=1,
                                   input_channels=self.RESIDUAL_FILTERS,
                                   output_channels=2,
                                   name="endstate_head")
        h_conv_st_flat = tf.reshape(conv_st, [-1, 2 * 19 * 19])
        W_fc4 = weight_variable("w_fc_4", [2 * 19 * 19, (19 * 19) * 2])
        b_fc4 = bias_variable("b_fc_4", [(19 * 19) * 2])
        self.add_weights(W_fc4)
        self.add_weights(b_fc4)
        h_fc4 = tf.add(tf.matmul(h_conv_st_flat, W_fc4), b_fc4)

To have this, we need to play the game until the very end - so that instead of resigning, the game enters an 'acceleration mode' (10 playouts) once the losing side passes the resignation threshold.
The 'endstate' plane is used as an auxiliary plane for the value head - to be specific, the winrate is 80% from the value output, and 20% from the endstate net - using this formula:

   endstate_winrate = tanh( avg_delta * confidence / 10.0 )
   delta = sum (number_of_my_stone - number_of_opponent_stone + komi_bias)
   confidence = average ( (v - 0.5) * ( v - 0.5) for v in endstate_plane )

That is, winrate is calculated by the multiple of expected score and uncertainty - that is, the engine will prefer playing a chaotic game rather than giving the opponent clear territory.

With the idea, and after spending a couple of hundreds of $ on Google Compute, I got an engine that plays handicapped games (with 0.5 komi) quite reasonably well - it consistently beats me with 6 stone handicap. The problem is how to measure whether the games are 'fun', and being 'fun' is a quite subjective matter - so I would like some feedback on how the game 'feels'.

I put the current net running on a website that I am running - please visit https://cbaduk.net/ and try some handicap games and let me know how it feels. I will cleanup the code and post a branch and the net file that I have during the weekends.

The text was updated successfully, but these errors were encountered:

Hersmunch · 2019-04-11T18:56:18Z

I have had a couple of very casual games and it definitely feels more like a game I might learn something from than a normal network. A couple of questions:
For the training, like Katago, did you downweight the training samples for the moves played with only 10 playouts?
Also, would I be right in assuming that you trained from scratch for a Komi of 0.5?
Thank you for doing this! :)

iopq · 2019-04-11T19:10:44Z

@ihavnoid it does too many 3-3 invasions for a high handicap game

ihavnoid · 2019-04-11T21:42:44Z

@iopq Yeah it seems that it is a problem. I tried 6 and 7 stones and it seems that it is learning that 3 3 invasion on first move is not a good idea with higher playouts (something like 10k playouts) though.

The website is running 1500 playouts.

ihavnoid · 2019-04-11T21:47:20Z

@Hersmunch I dropped the acceleration mode data rather than downweighting them. It wasn't completely done from scratch though - I started from the 40B data with the endstate plane filled with the very last state of the board, and continuously ran training using the last 100k games. Now that I ran something like 200k games with 200 playouts, the initial data is all gone.

nerai · 2019-04-11T23:54:08Z

With the idea, and after spending a couple of hundreds of $ on Google Compute, I got an engine that plays handicapped games (with 0.5 komi) quite reasonably well - it consistently beats me with 6 stone handicap. The problem is how to measure whether the games are 'fun', and being 'fun' is a quite subjective matter - so I would like some feedback on how the game 'feels'.

I played a bit. I'm 3d and used 5 to 6 handicap. The bot was surprisingly strong. Many moves reminded me of games with professionals (the typical "oh, yes, of course there, now that suddenly became a problem for me!"). Some moves were clearly useless (asking me to make a clear mistake), and some useful forcing moves (kikashi) were not played while still possible. The bot sometimes failed to complicate the game and resigned early; I feel it should tenuki more often in such situations to keep the game alive. All in all, great work.

nemja · 2019-04-12T01:07:30Z

I'm also 3d, played two games. First on 4 stones, I played passively and lost comfortably - it felt strong.

Second game on 5 stones was odd. It wasn't apparent at first but I'm not sure it understood what's going on in some corner shapes, After dying it played a few forcing moves in the dead corner, a bit later got into a squeeze and pulled out a bad ladder. It feels like it's too reliant on the opponent answering obediently.

cjohnchen · 2019-04-12T02:54:42Z

very good!

ihavnoid · 2019-04-12T23:46:06Z

Weight : https://drive.google.com/open?id=1ZotPAUG0zz-y7K-e934AHyYF8_StWmyN
Code : https://github.com/ihavnoid/leela-zero/tree/endstate_head

ihavnoid · 2019-04-12T23:48:10Z

I can see that if it is still losing too terribly it will play nonsense moves. The auxillary output will also converge to zero if it feels it is hopeless - I will try to come up with a way to make that behave a bit more reasonably.

kaorahi · 2019-04-13T12:00:03Z

interesting! (just testing)

Can we use the sum of endstate as the estimated score?

vipmath · 2019-04-13T14:44:04Z

@ihavnoid not compiled: 'pinnedOutBufferHost_es'==>opencl.cpp L357(not initialing error)
please help(win10,vs17)

ihavnoid · 2019-04-14T08:59:24Z

@vipmath - pushed a minor fix addressing that error

vipmath · 2019-04-14T09:54:50Z

@ihavnoid very great thanks!!

kaorahi · 2019-04-15T11:48:06Z

The sum of endstates seems oscillating.

Is this correct? (Move 127 is "Ear-reddening move". My code is shown below.)

BTW, do you have a plan to add some outputs of endstates for GUIs? I am testing visualization of endstates and it looks interesting. For example, I can observe that an invasion increases the opponent's territory indirectly in other zones.

Network::Netresult Network::get_output_internal(
...
    for (auto idx = size_t{0}; idx < NUM_INTERSECTIONS; idx++) {
        const auto sym_idx = symmetry_nn_idx_table[symmetry][idx];
        result.policy[sym_idx] = outputs[idx];
        if (m_has_es_head) {
            const int offset_b = blacks_move ? 0 : NUM_INTERSECTIONS;
            const int offset_w = NUM_INTERSECTIONS - offset_b;
            const auto es_b = endstate_out[idx + offset_b];
            const auto es_w = endstate_out[idx + offset_w];
            result.endstate_sum_b += es_b;
            result.endstate_sum_w += es_w;
        }
    }
...

ihavnoid · 2019-04-15T13:04:45Z

@kaorahi I think some oscillation should be reasonable since playing every move will increase the probability of the player increasing its odd of winning. Plus, the komi's polarity should also change on every move, but the komi here is only 0.5 pts so it should be small.

alreadydone · 2019-04-15T17:04:33Z

I think when the human makes the move suggested by the network, the expected score should not change too much (like two days ago in OpenAI Five Finals, the winrate doesn't change when Five picks a hero and only changes when human picks a hero).

ihavnoid · 2019-04-15T21:29:19Z

Well, in this case the expected score is the byproduct rather than the net output itself, so there will be errors accumulated here and there. I don't have any idea why it is oscillating with a pattern, though.

mylyu · 2019-04-16T13:27:48Z

Very interesting. I am around 1k level, and won on 9 stones with full effort.

poptangtwe · 2019-05-09T16:09:51Z

Thanks for your excellent work!
Is it realizable to develop a visual user tool likes KataGo does for your engine? featurecat/lizzie#505 (comment)
I'm so looking forward to seeing the idea merged into the next branch!

willemhendriks · 2019-05-09T18:58:12Z

I had this Idea for a while - to have 2 networks during a handicap games.

1 strong network as white.
1 weaker network represent black.

This has not yet been tested, on my to-do but won't be soon.

ihavnoid · 2019-05-16T21:27:26Z

Now that the weather is getting hotter, I have to shut down my machines that were used for training the nets - probably will resume when it gets colder so that the machines can contribute to heating my house. Still, I will leave https://cbaduk.net/ running.

I would be happy if this gets merged into the master branch, but I still have a lot of concerns due to 1) this not really being perfect - it doesn't seem that the net will eventually learn enough to work with small number of playouts, and 2) there is no effort for training the nets using this alternative format here, and hence having code that exists for a net weight that isn't being produced here seems to be a move that is a bit odd.

lightvector · 2019-05-17T12:55:12Z

Nice! Glad to see other people experimenting with these ideas.

One note on terminology - I think it's a not a great idea to still call it "winrate" if you're mixing a notion of score and such into it, particularly because it's no longer measuring the same type of thing as what everyone else usually uses "winrate" to mean. So the general word I've been using for it is "utility", because that's the generic word from both economics and a lot of existing agent and AI-based literature that precisely means "whatever a given agent is attempting to maximize". So in this case, you've set the utility for this new version of LZ to be a weighted sum of "winrate" and final score adjusted by uncertainty of the final ownership or endstate status.

trainewbie · 2019-05-25T04:24:12Z

@ihavnoid I feel kind of sad that you can't contiue to train Handicap games's nets.

I have a private computer system for machine learning, and I am interested in your Handicap games's nets. So if you don't mind, how about I resume it's training instead of you?

If you are interested, please contact me by e-mail(trainewbie@gmail.com)

ihavnoid · 2019-05-25T04:49:08Z

For people who are interested in trying to go with a similar effort, I will clean up the code and rebase it against the latest 'next' branch. This should include:

Generating the modified train data
Self-train commands for people who do it themselves (rather than rely on autogtp)
The modified training code
Accelerated endgame mode, and some other heuristics that were required to make things work better
I will post it when ready - expecting things to be in a branch in a week or two.

ihavnoid · 2019-05-28T13:46:52Z

Updated the code on :
https://github.com/ihavnoid/leela-zero/tree/endstate_head

Quick way to run self-play games:

Build leela-zero
Place a net file on training/tf/ (as a txt file - you can start with one of the weights that I uploaded)
Run leela-zero/minitrain.sh [data_file_prefix] [gpu_number] - will run 25 games, store the training data, and so on
Run tensorflow to continue training the data as you need

ihavnoid · 2019-05-28T13:50:01Z

Usually I mounted two or more machines on the same disk and ran multiple minitrain.sh scripts. Or, I ran the script on a (potentially preemptive) google compute cloud machine, uploaded new nets as they appear, and download the training data periodically. Since it needs much more machines for selfplay than training, I only ran training for 6 hours a day on a single GPU - for the rest of the 18 hours all GPUs were used for self-play.

trainewbie · 2019-05-28T16:21:54Z

@ihavnoid Great thanks for your update and detailed explanation.
What batch size and learning rate did you use?

ihavnoid · 2019-05-29T13:43:31Z

I used batch size of 64, and learning rate something like this:

        learning_rate = tf.train.exponential_decay(0.01, self.global_step,
                                                   500000, 0.2, staircase=False)

trainewbie · 2019-06-03T01:47:59Z

Sometimes, I have encountered error message when I used GPUs for self-play with multiple minitrain.sh scripts on Ubuntu 16.04.

Error in OpenCL calculation: Update your device's OpenCL drivers or reduce the amount of games played simultaneously.
terminate called after throwing an instance of 'std::runtime_error'
  what():  OpenCL self-check mismatch.

./minitrain.sh: line 24:  5953 완료                  echo -e komi 0.5 \\nautotrain training/tf/traindata_${timestamp} 25 \\nquit
      5954 중지됨               (core dumped) | ${leelaz_cmd} -w $latest_weight -m 20

So I checked opencl version with clinfo.

  Device Name                                     GeForce GTX 1080 Ti
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 1.2 CUDA
  Driver Version                                  396.54
  Device OpenCL C Version                         OpenCL C 1.2

Should I install a different version of OpenCL driver?
Or is there no problem for training with unfinished dump_data?

ihavnoid · 2019-06-03T13:41:10Z

Well, if you are starting the training from scratch it seems that a random net is likely to hit self check failure - I recommend you disable self check for the first 10k games or so. If not... I don't know what to do.

ihavnoid · 2019-06-03T13:42:34Z

FYI I am using Ubuntu 18.04 on a 410.xx driver, using three GTX1080s. I did use 390.xx last year so it shouldn't be an issue, I think.

trainewbie · 2019-06-04T09:04:13Z

@ihavnoid Thanks for your advice.
My all other efforts failed. (driver change, gpu count control, virus checking, ubuntu software upgrade)

I changed --precision half to --precision single, and the OpenCL error disappeared completely.
I don't know the reason. :)

intenseG · 2019-10-13T04:11:01Z

Great!
My strength is foxgo 7d-9d, but I lost 18.5 points without handicap.
However, I felt that it was not an aggressive style.
My Go style is super-aggressive (I like to capture stones very much), but the 'endstate' version of leelaz had a stronger impression of creating my area more efficiently than it was aggressive.

I am also trying to train with a dataset (4700+) that clones the endstate branch and collects only my games.
This is to make you play very aggressively.
Are there any other pages or ideas that can be helpful other than increasing the dataset by self-play using acceleration mode?
Thank you!

kaorahi pushed a commit to kaorahi/lizgoban that referenced this issue Apr 20, 2019

add endstate estimation based on leela-zero/leela-zero#2331

fe44a45

kaorahi mentioned this issue May 17, 2019

Oscillation of territory estimation lightvector/KataGo#13

Closed

kaorahi mentioned this issue Sep 9, 2019

How to improve KataGo at high handicap? lightvector/KataGo#39

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handicap games with additional planes #2331

Handicap games with additional planes #2331

ihavnoid commented Apr 11, 2019

Hersmunch commented Apr 11, 2019

iopq commented Apr 11, 2019

ihavnoid commented Apr 11, 2019

ihavnoid commented Apr 11, 2019

nerai commented Apr 11, 2019

nemja commented Apr 12, 2019

cjohnchen commented Apr 12, 2019

ihavnoid commented Apr 12, 2019

ihavnoid commented Apr 12, 2019

kaorahi commented Apr 13, 2019

vipmath commented Apr 13, 2019 •

edited

ihavnoid commented Apr 14, 2019

vipmath commented Apr 14, 2019

kaorahi commented Apr 15, 2019

ihavnoid commented Apr 15, 2019

alreadydone commented Apr 15, 2019

ihavnoid commented Apr 15, 2019

mylyu commented Apr 16, 2019 •

edited

poptangtwe commented May 9, 2019 •

edited

willemhendriks commented May 9, 2019

ihavnoid commented May 16, 2019

lightvector commented May 17, 2019

trainewbie commented May 25, 2019

ihavnoid commented May 25, 2019

ihavnoid commented May 28, 2019

ihavnoid commented May 28, 2019

trainewbie commented May 28, 2019

ihavnoid commented May 29, 2019

trainewbie commented Jun 3, 2019

ihavnoid commented Jun 3, 2019

ihavnoid commented Jun 3, 2019

trainewbie commented Jun 4, 2019

intenseG commented Oct 13, 2019 •

edited

Handicap games with additional planes #2331

Handicap games with additional planes #2331

Comments

ihavnoid commented Apr 11, 2019

Hersmunch commented Apr 11, 2019

iopq commented Apr 11, 2019

ihavnoid commented Apr 11, 2019

ihavnoid commented Apr 11, 2019

nerai commented Apr 11, 2019

nemja commented Apr 12, 2019

cjohnchen commented Apr 12, 2019

ihavnoid commented Apr 12, 2019

ihavnoid commented Apr 12, 2019

kaorahi commented Apr 13, 2019

vipmath commented Apr 13, 2019 • edited

ihavnoid commented Apr 14, 2019

vipmath commented Apr 14, 2019

kaorahi commented Apr 15, 2019

ihavnoid commented Apr 15, 2019

alreadydone commented Apr 15, 2019

ihavnoid commented Apr 15, 2019

mylyu commented Apr 16, 2019 • edited

poptangtwe commented May 9, 2019 • edited

willemhendriks commented May 9, 2019

ihavnoid commented May 16, 2019

lightvector commented May 17, 2019

trainewbie commented May 25, 2019

ihavnoid commented May 25, 2019

ihavnoid commented May 28, 2019

ihavnoid commented May 28, 2019

trainewbie commented May 28, 2019

ihavnoid commented May 29, 2019

trainewbie commented Jun 3, 2019

ihavnoid commented Jun 3, 2019

ihavnoid commented Jun 3, 2019

trainewbie commented Jun 4, 2019

intenseG commented Oct 13, 2019 • edited

vipmath commented Apr 13, 2019 •

edited

mylyu commented Apr 16, 2019 •

edited

poptangtwe commented May 9, 2019 •

edited

intenseG commented Oct 13, 2019 •

edited