Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handicap games with additional planes #2331

Open
ihavnoid opened this issue Apr 11, 2019 · 33 comments
Open

Handicap games with additional planes #2331

ihavnoid opened this issue Apr 11, 2019 · 33 comments

Comments

@ihavnoid
Copy link
Member

I always wanted to learn how to play better, but I figured out that modern Go engines are just waaaaay better than mere mortals, and playing an even game with them will make things even harder to learn - and hence I always wanted an engine that is capable of playing handicapped games well. There were quite a bit of ideas that suggested good handicap game ideas, which some of them I tried and didn't work well.

Initially I tried the 'uneven playout' ideas or assigning more randomness to the losing side, but those all didn't go well due to the engines only learning to win by expecting the opponent to make some obvious mistake. The problem didn't really seem to be limited to the handicapped player having no hope of winning.

So, I decided to try some more tricks using ideas from KataGo (#2260). To be specific:

  • The problem is that if a player is on a losing situation, it should add uncertainty and increase the probability of a larger territory even if there is no chance of winning.
  • Thus, it seems that we need another output plane - in this case, the board occupancy seemed to be a useful feature. So, I added two outputs - each are used for predicting the end state of the board - to be specific, the third head looks like this:
        # endstate head
        conv_st = self.conv_block(flow, filter_size=1,
                                   input_channels=self.RESIDUAL_FILTERS,
                                   output_channels=2,
                                   name="endstate_head")
        h_conv_st_flat = tf.reshape(conv_st, [-1, 2 * 19 * 19])
        W_fc4 = weight_variable("w_fc_4", [2 * 19 * 19, (19 * 19) * 2])
        b_fc4 = bias_variable("b_fc_4", [(19 * 19) * 2])
        self.add_weights(W_fc4)
        self.add_weights(b_fc4)
        h_fc4 = tf.add(tf.matmul(h_conv_st_flat, W_fc4), b_fc4)
  • To have this, we need to play the game until the very end - so that instead of resigning, the game enters an 'acceleration mode' (10 playouts) once the losing side passes the resignation threshold.
  • The 'endstate' plane is used as an auxiliary plane for the value head - to be specific, the winrate is 80% from the value output, and 20% from the endstate net - using this formula:
   endstate_winrate = tanh( avg_delta * confidence / 10.0 )
   delta = sum (number_of_my_stone - number_of_opponent_stone + komi_bias)
   confidence = average ( (v - 0.5) * ( v - 0.5) for v in endstate_plane )

That is, winrate is calculated by the multiple of expected score and uncertainty - that is, the engine will prefer playing a chaotic game rather than giving the opponent clear territory.

With the idea, and after spending a couple of hundreds of $ on Google Compute, I got an engine that plays handicapped games (with 0.5 komi) quite reasonably well - it consistently beats me with 6 stone handicap. The problem is how to measure whether the games are 'fun', and being 'fun' is a quite subjective matter - so I would like some feedback on how the game 'feels'.

I put the current net running on a website that I am running - please visit https://cbaduk.net/ and try some handicap games and let me know how it feels. I will cleanup the code and post a branch and the net file that I have during the weekends.

@Hersmunch
Copy link
Member

I have had a couple of very casual games and it definitely feels more like a game I might learn something from than a normal network. A couple of questions:
For the training, like Katago, did you downweight the training samples for the moves played with only 10 playouts?
Also, would I be right in assuming that you trained from scratch for a Komi of 0.5?
Thank you for doing this! :)

@iopq
Copy link

iopq commented Apr 11, 2019

@ihavnoid it does too many 3-3 invasions for a high handicap game

image

@ihavnoid
Copy link
Member Author

@iopq Yeah it seems that it is a problem. I tried 6 and 7 stones and it seems that it is learning that 3 3 invasion on first move is not a good idea with higher playouts (something like 10k playouts) though.

The website is running 1500 playouts.

@ihavnoid
Copy link
Member Author

@Hersmunch I dropped the acceleration mode data rather than downweighting them. It wasn't completely done from scratch though - I started from the 40B data with the endstate plane filled with the very last state of the board, and continuously ran training using the last 100k games. Now that I ran something like 200k games with 200 playouts, the initial data is all gone.

@nerai
Copy link
Contributor

nerai commented Apr 11, 2019

With the idea, and after spending a couple of hundreds of $ on Google Compute, I got an engine that plays handicapped games (with 0.5 komi) quite reasonably well - it consistently beats me with 6 stone handicap. The problem is how to measure whether the games are 'fun', and being 'fun' is a quite subjective matter - so I would like some feedback on how the game 'feels'.

I played a bit. I'm 3d and used 5 to 6 handicap. The bot was surprisingly strong. Many moves reminded me of games with professionals (the typical "oh, yes, of course there, now that suddenly became a problem for me!"). Some moves were clearly useless (asking me to make a clear mistake), and some useful forcing moves (kikashi) were not played while still possible. The bot sometimes failed to complicate the game and resigned early; I feel it should tenuki more often in such situations to keep the game alive. All in all, great work.

@nemja
Copy link

nemja commented Apr 12, 2019

I'm also 3d, played two games. First on 4 stones, I played passively and lost comfortably - it felt strong.

Second game on 5 stones was odd. It wasn't apparent at first but I'm not sure it understood what's going on in some corner shapes, After dying it played a few forcing moves in the dead corner, a bit later got into a squeeze and pulled out a bad ladder. It feels like it's too reliant on the opponent answering obediently.

@cjohnchen
Copy link

very good!

@ihavnoid
Copy link
Member Author

@ihavnoid
Copy link
Member Author

I can see that if it is still losing too terribly it will play nonsense moves. The auxillary output will also converge to zero if it feels it is hopeless - I will try to come up with a way to make that behave a bit more reasonably.

@kaorahi
Copy link
Contributor

kaorahi commented Apr 13, 2019

endstate
interesting! (just testing)

Can we use the sum of endstate as the estimated score?

@vipmath
Copy link

vipmath commented Apr 13, 2019

@ihavnoid not compiled: 'pinnedOutBufferHost_es'==>opencl.cpp L357(not initialing error)
please help(win10,vs17)

@ihavnoid
Copy link
Member Author

@vipmath - pushed a minor fix addressing that error

@vipmath
Copy link

vipmath commented Apr 14, 2019

@ihavnoid very great thanks!!

@kaorahi
Copy link
Contributor

kaorahi commented Apr 15, 2019

The sum of endstates seems oscillating.
estimated_score
Is this correct? (Move 127 is "Ear-reddening move". My code is shown below.)

BTW, do you have a plan to add some outputs of endstates for GUIs? I am testing visualization of endstates and it looks interesting. For example, I can observe that an invasion increases the opponent's territory indirectly in other zones.

Network::Netresult Network::get_output_internal(
...
    for (auto idx = size_t{0}; idx < NUM_INTERSECTIONS; idx++) {
        const auto sym_idx = symmetry_nn_idx_table[symmetry][idx];
        result.policy[sym_idx] = outputs[idx];
        if (m_has_es_head) {
            const int offset_b = blacks_move ? 0 : NUM_INTERSECTIONS;
            const int offset_w = NUM_INTERSECTIONS - offset_b;
            const auto es_b = endstate_out[idx + offset_b];
            const auto es_w = endstate_out[idx + offset_w];
            result.endstate_sum_b += es_b;
            result.endstate_sum_w += es_w;
        }
    }
...

@ihavnoid
Copy link
Member Author

@kaorahi I think some oscillation should be reasonable since playing every move will increase the probability of the player increasing its odd of winning. Plus, the komi's polarity should also change on every move, but the komi here is only 0.5 pts so it should be small.

@alreadydone
Copy link
Contributor

I think when the human makes the move suggested by the network, the expected score should not change too much (like two days ago in OpenAI Five Finals, the winrate doesn't change when Five picks a hero and only changes when human picks a hero).

@ihavnoid
Copy link
Member Author

Well, in this case the expected score is the byproduct rather than the net output itself, so there will be errors accumulated here and there. I don't have any idea why it is oscillating with a pattern, though.

@mylyu
Copy link

mylyu commented Apr 16, 2019

Very interesting. I am around 1k level, and won on 9 stones with full effort.
image

kaorahi pushed a commit to kaorahi/lizgoban that referenced this issue Apr 20, 2019
@poptangtwe
Copy link

poptangtwe commented May 9, 2019

Thanks for your excellent work!
Is it realizable to develop a visual user tool likes KataGo does for your engine? featurecat/lizzie#505 (comment)
I'm so looking forward to seeing the idea merged into the next branch!

@willemhendriks
Copy link

I had this Idea for a while - to have 2 networks during a handicap games.

  • 1 strong network as white.
  • 1 weaker network represent black.

This has not yet been tested, on my to-do but won't be soon.

@ihavnoid
Copy link
Member Author

Now that the weather is getting hotter, I have to shut down my machines that were used for training the nets - probably will resume when it gets colder so that the machines can contribute to heating my house. Still, I will leave https://cbaduk.net/ running.

I would be happy if this gets merged into the master branch, but I still have a lot of concerns due to 1) this not really being perfect - it doesn't seem that the net will eventually learn enough to work with small number of playouts, and 2) there is no effort for training the nets using this alternative format here, and hence having code that exists for a net weight that isn't being produced here seems to be a move that is a bit odd.

@lightvector
Copy link

Nice! Glad to see other people experimenting with these ideas.

One note on terminology - I think it's a not a great idea to still call it "winrate" if you're mixing a notion of score and such into it, particularly because it's no longer measuring the same type of thing as what everyone else usually uses "winrate" to mean. So the general word I've been using for it is "utility", because that's the generic word from both economics and a lot of existing agent and AI-based literature that precisely means "whatever a given agent is attempting to maximize". So in this case, you've set the utility for this new version of LZ to be a weighted sum of "winrate" and final score adjusted by uncertainty of the final ownership or endstate status.

@trainewbie
Copy link

@ihavnoid I feel kind of sad that you can't contiue to train Handicap games's nets.

I have a private computer system for machine learning, and I am interested in your Handicap games's nets. So if you don't mind, how about I resume it's training instead of you?

If you are interested, please contact me by e-mail(trainewbie@gmail.com)

@ihavnoid
Copy link
Member Author

For people who are interested in trying to go with a similar effort, I will clean up the code and rebase it against the latest 'next' branch. This should include:

  • Generating the modified train data
  • Self-train commands for people who do it themselves (rather than rely on autogtp)
  • The modified training code
  • Accelerated endgame mode, and some other heuristics that were required to make things work better
    I will post it when ready - expecting things to be in a branch in a week or two.

@ihavnoid
Copy link
Member Author

Updated the code on :
https://github.com/ihavnoid/leela-zero/tree/endstate_head

Quick way to run self-play games:

  1. Build leela-zero
  2. Place a net file on training/tf/ (as a txt file - you can start with one of the weights that I uploaded)
  3. Run leela-zero/minitrain.sh [data_file_prefix] [gpu_number] - will run 25 games, store the training data, and so on
  4. Run tensorflow to continue training the data as you need

@ihavnoid
Copy link
Member Author

Usually I mounted two or more machines on the same disk and ran multiple minitrain.sh scripts. Or, I ran the script on a (potentially preemptive) google compute cloud machine, uploaded new nets as they appear, and download the training data periodically. Since it needs much more machines for selfplay than training, I only ran training for 6 hours a day on a single GPU - for the rest of the 18 hours all GPUs were used for self-play.

@trainewbie
Copy link

@ihavnoid Great thanks for your update and detailed explanation.
What batch size and learning rate did you use?

@ihavnoid
Copy link
Member Author

I used batch size of 64, and learning rate something like this:

        learning_rate = tf.train.exponential_decay(0.01, self.global_step,
                                                   500000, 0.2, staircase=False)

@trainewbie
Copy link

Sometimes, I have encountered error message when I used GPUs for self-play with multiple minitrain.sh scripts on Ubuntu 16.04.

Error in OpenCL calculation: Update your device's OpenCL drivers or reduce the amount of games played simultaneously.
terminate called after throwing an instance of 'std::runtime_error'
  what():  OpenCL self-check mismatch.

./minitrain.sh: line 24:  5953 완료                  echo -e komi 0.5 \\nautotrain training/tf/traindata_${timestamp} 25 \\nquit
      5954 중지됨               (core dumped) | ${leelaz_cmd} -w $latest_weight -m 20

So I checked opencl version with clinfo.

  Device Name                                     GeForce GTX 1080 Ti
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 1.2 CUDA
  Driver Version                                  396.54
  Device OpenCL C Version                         OpenCL C 1.2 

Should I install a different version of OpenCL driver?
Or is there no problem for training with unfinished dump_data?

@ihavnoid
Copy link
Member Author

ihavnoid commented Jun 3, 2019

Well, if you are starting the training from scratch it seems that a random net is likely to hit self check failure - I recommend you disable self check for the first 10k games or so. If not... I don't know what to do.

@ihavnoid
Copy link
Member Author

ihavnoid commented Jun 3, 2019

FYI I am using Ubuntu 18.04 on a 410.xx driver, using three GTX1080s. I did use 390.xx last year so it shouldn't be an issue, I think.

@trainewbie
Copy link

@ihavnoid Thanks for your advice.
My all other efforts failed. (driver change, gpu count control, virus checking, ubuntu software upgrade)

I changed --precision half to --precision single, and the OpenCL error disappeared completely.
I don't know the reason. :)

@intenseG
Copy link

intenseG commented Oct 13, 2019

Great!
My strength is foxgo 7d-9d, but I lost 18.5 points without handicap.
However, I felt that it was not an aggressive style.
My Go style is super-aggressive (I like to capture stones very much), but the 'endstate' version of leelaz had a stronger impression of creating my area more efficiently than it was aggressive.

I am also trying to train with a dataset (4700+) that clones the endstate branch and collects only my games.
This is to make you play very aggressively.
Are there any other pages or ideas that can be helpful other than increasing the dataset by self-play using acceleration mode?
Thank you!

leelaz_endstate-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests