Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First and second move symmetries decrease exploration efficiency #1084

Closed
Vandertic opened this issue Mar 23, 2018 · 2 comments
Closed

First and second move symmetries decrease exploration efficiency #1084

Vandertic opened this issue Mar 23, 2018 · 2 comments
Labels

Comments

@Vandertic
Copy link

I just wanted to remark this fact in view of a second run of the training from scratch. Maybe I am wrong, but it seems to me that this could be a nasty problem.

When LZ builds the tree it doesn't know about symmetries. Suppose at some point in the training the policy network gives a high probability (say 80%) to the four hoshis: then UCT search must explore all of them separately, 20% of playouts each, without averaging knowledge between the four branches and without going as deep as it could in each of these branches.

Moreover, suppose the second most probable positions are komokus (4-3) (say 16% probability). They will have their probability split in 8 (2% each!) and so will be explored much less.

With limited number of playouts/visits, first moves with 8 symmetries will have shallower exploration than moves with 4 symmetries and much lower visit number.

So it is not at all surprising that the first chosen move converges to P(hoshi)=100%.

I would recommend to write special code for the first (and second) move, in order for LZ to exploit symmetries (just in this crucial moment of the game).

For the first move it would be enough to allow only moves in the NNE eighth of the goban. (Which is also tradition and etiquette.)

Am I misunderstanding something or this could be a real issue?

@remdu
Copy link
Contributor

remdu commented Mar 23, 2018

I've already thought about it. It is certainly a bias away from "duplicated" moves, but I don't think it is really problematic. If the duplicate moves are really better than the unique one, then they will get higher winrate and get picked more and the unique one will tend to disappear.

@gcp gcp added the wontfix label Mar 23, 2018
@gcp
Copy link
Member

gcp commented Mar 23, 2018

Already discussed before, not interesting, not worth bothering about.

As Eddh explained, if the moves were actually better, then the split policy prior does not matter. We already went from 4-4 to 4-3 in this regard.

Edit: When I say "it does not matter" I mean I know there's an inefficiency, but it does not change the end result. The solutions to fix the inefficiency are IMHO all pretty ugly and certainly I would never take "first or second move" specific code. It might be cleanest to "fix" in NNCache by having that consider rotations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants