You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I just wanted to remark this fact in view of a second run of the training from scratch. Maybe I am wrong, but it seems to me that this could be a nasty problem.
When LZ builds the tree it doesn't know about symmetries. Suppose at some point in the training the policy network gives a high probability (say 80%) to the four hoshis: then UCT search must explore all of them separately, 20% of playouts each, without averaging knowledge between the four branches and without going as deep as it could in each of these branches.
Moreover, suppose the second most probable positions are komokus (4-3) (say 16% probability). They will have their probability split in 8 (2% each!) and so will be explored much less.
With limited number of playouts/visits, first moves with 8 symmetries will have shallower exploration than moves with 4 symmetries and much lower visit number.
So it is not at all surprising that the first chosen move converges to P(hoshi)=100%.
I would recommend to write special code for the first (and second) move, in order for LZ to exploit symmetries (just in this crucial moment of the game).
For the first move it would be enough to allow only moves in the NNE eighth of the goban. (Which is also tradition and etiquette.)
Am I misunderstanding something or this could be a real issue?
The text was updated successfully, but these errors were encountered:
I've already thought about it. It is certainly a bias away from "duplicated" moves, but I don't think it is really problematic. If the duplicate moves are really better than the unique one, then they will get higher winrate and get picked more and the unique one will tend to disappear.
Already discussed before, not interesting, not worth bothering about.
As Eddh explained, if the moves were actually better, then the split policy prior does not matter. We already went from 4-4 to 4-3 in this regard.
Edit: When I say "it does not matter" I mean I know there's an inefficiency, but it does not change the end result. The solutions to fix the inefficiency are IMHO all pretty ugly and certainly I would never take "first or second move" specific code. It might be cleanest to "fix" in NNCache by having that consider rotations.
I just wanted to remark this fact in view of a second run of the training from scratch. Maybe I am wrong, but it seems to me that this could be a nasty problem.
When LZ builds the tree it doesn't know about symmetries. Suppose at some point in the training the policy network gives a high probability (say 80%) to the four hoshis: then UCT search must explore all of them separately, 20% of playouts each, without averaging knowledge between the four branches and without going as deep as it could in each of these branches.
Moreover, suppose the second most probable positions are komokus (4-3) (say 16% probability). They will have their probability split in 8 (2% each!) and so will be explored much less.
With limited number of playouts/visits, first moves with 8 symmetries will have shallower exploration than moves with 4 symmetries and much lower visit number.
So it is not at all surprising that the first chosen move converges to P(hoshi)=100%.
I would recommend to write special code for the first (and second) move, in order for LZ to exploit symmetries (just in this crucial moment of the game).
For the first move it would be enough to allow only moves in the NNE eighth of the goban. (Which is also tradition and etiquette.)
Am I misunderstanding something or this could be a real issue?
The text was updated successfully, but these errors were encountered: