-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TTable::sync in a ko situation #576
Comments
Shouldn't the prisoners be unequal if it is a ko? They are included in the hash. |
I am referring to a case when two branches arrive at a position with same hash value. The two positions shown above both have same 1 black stone prisoner. Same hash values. The difference is in case 1, black can take the e4 stone, but in case 2 black can not take the e4 stone. |
Not necessarily. For examlpe:
|
This can happen in a triple ko too. The only fully correct solution I can see is to enumerate the superko-illegal points for each position in advance and include them in the hash. Leaving it as is will mess up certain trees where white can remove one of blacks ko threats in sente before starting a ko. |
In addition to ko issue, I think there might be a general problem with the syncing concept. In baduk, correct timing of moves is very important. If the program plays a good sequence of moves to reach a good position, NN should be rewarded. If the program plays bad sequence of moves but reaches same position by accident, NN should not be rewarded for that. But by syncing a node to another node based on matching board configuration, you increase the likelihood that NN will be rewarded for a bad sequence of moves and develop a bad habit. |
Your last comment makes no sense. The evaluation fed back to the root is the one from the leaves. If it comes from a bad sequence, that means the opponent could have forced an inferior branch earlier up the tree and have stopped you from reaching it. If he cannot do that, the moves are good. A bad sequence is one that cannot reach a good position, hand-waving that away with "by accident" makes no sense. I mean if you're trying to construct some argument that Transposition Tables are "wrong", I'm afraid I have bad news for you. |
Ah, that makes sense. The issue is the path dependence. Including the ko move, if any, in the hash should solve the most common cases and may be worthwhile. The general problem is known, more problematic to fix and happens in chess-programs too with threefold repetitions. |
I have the stated doubts about Transposition Tables being appropriate for MCTS guided by NN. What you really have here is MCTS guided by a hybrid of NN + Transposition Table. |
So we've been syncing two nodes whose previous seven positions don't agree? Such nodes don't even have the same NN evaluations! Or do you think (and did tests showing) that the difference would be negligible if no ko was involved? I think @mkw1234 is at least right in that the history affects the NN evaluation at least in AGZ. Maybe the previous seven positions are there not only to deal with repeating positions but also to somehow help the NN learn the timing. |
If the positions are the same, yes. Note that the evaluation should be the same, aside from the issue with ko highlighted here (which I'd consider a bug and should be fixed by treating them as different). If the NN gives a different evaluation for the exact same board position (which it can certainly do), that's just because it has not reached the full truth about Go yet. There is no reason to not reuse an evaluation that is as correct as another. The evaluation will return different results for different symmetries even if the history is the same. Yet the board position is mathematically equivalent! Why would we care about this one and not about that?
This is a completely meaningless statement. The timing only matters when there is a ko, which we've just established should be treated differently. |
https://senseis.xmp.net/?Cycle I have no idea how do we know which 2 game states are "ko-equivalent" or not without pouring in a lot of game knowledge, when there are so many kinds and lengths of cycle. |
FWIW after the addition of the NNCache, and knowing about this ko bug, it seems worthwhile to me to check if the TTable actually makes the program stronger, and I'm testing this now. (NNCache takes the last 2x8 move history as the hash key, so it's immune to this ko bug and some more complex kos) |
It's not necessary to handle all cases correctly, just behave decently on the common ones. Many heuristics are useful even if they are wrong sometimes or have other downsides. So pointing out a heuristic can go wrong is not useful or constructive unless you a) have a better replacement b) can demonstrate the heuristic makes things worse. |
@billyswong for checking cycles/triple ko, leela-zero keeps a list of "ko hashes" (seen above, unique for each board position) for prior positions, and before searching a suggested move, it checks if its ko hash is in the list (i.e. the position has been seen before). If it is, that branch is invalidated and it isnt considered. The problem here is that in specific circumstances, instead of being invalidated, the search can be merged with another search where the ko has played out differently. This is done by comparing the "hash" of two positions, which is supposed to be unique for identical situations, but currently doesnt include ko state at all. This isn't easy to avoid for longer cycles, since the trees could be merged early in the cycle, before any of the later moves causing a cycle are even evaluated. The problem actually affecting the game outcome would be reasonably rare anyway, even without an exotic cycle, so gcp is considering only fixing it for "regular" ko, which is easy to calculate. The merging (transposition tables) is important for things like trades/capture races where there are many move options which are equivalent. Not using it will thin the search exponentially (based on the capture race length) and make it evaluate these trades very inaccurately. Things like timing of exchanges shouldnt be affected, since it will consider any refutations of timing errors before the trees are merged. |
Apologies in advance if I misunderstood your reasoning, but it seems to me you are assuming something about the NN positional evaluation: You seem to make the (hidden?) assumption that the history of the last few positions/moves being fed to the NN is there to be used for somehow training better move sequences (in addition to "normal" better evaluation/move training), or that it can be used for that purpose. Well, there are 2 parts to the answer:
The reason: Your idea seems to be that the NN should output different results if the input planes representing the history are different for the same position (if the NN output would be exactly the same, the exact same prior probabilities would be used, and the same MCTS would occur. In that case, the TT clearly just saves time, and nothing else changes). However, the entire search is guided (and the network is trained) by the assumptions that the policy returns probabilities of moves being the best, and the value returns the chance of winning. In reality. the values will often be different simply because the net hasn't learned they should be the same yet, but training is geared towards making the net recognize them as the same, among other things. The history input planes just weren't meant to be used for your suggested "sequence training", and the entire algorithm AGZ uses wasn't built on such ideas. In fact they are more likely to hurt than to help for the reasons stated above. You would have to think about something quite different to make such an idea realistic (if at all possible), but in any case, whether it's possible or not, LZ is following the AGZ experiment as closely as it can, so this just doesn't apply. |
Result of a little test:
It seems even with a potential ko issue the transposition table does still help a bit. (The above isn't statistically significant, but it's also even more unlikely to be making things worse). |
Yes. I thought this might be beneficial to the process of NN training. Suppose a program considers branch A first and then branch B second. The branches take different paths but end up in same position. When A is evaluated at this position, NN outputs a value of 0.9. The program will be discouraged from taking the branch B because of the lower value. But in case of LZ, it syncs the position B to position A, and by consequence encourages branch B. There are many many possible combinations of moves that LZ can consider that lead to a winning position, but only one is the correct one. |
@mkw1234 Your reasoning really makes no sense to me. Why wouldn't we want to evaluate the end positions as identical (barring ko issues!) if they are the same position. That would be very much the goal of the training! That's simply the correct thing in theory and practice, and the ttable is irrelevant in that regard.
If branch B is taken and the game won, the moves are good and we want to reinforce them. It is the opponent that made a mistake by not punishing the incorrect sequence and forcing another ending position. The whole training most certainly reinforces "bad" moves (from the perspective of a "stronger" player) that the opponent cannot punish and hence win games, there are a few million games by now where you can see this effect in action and also the result that the program can very easily unlearn them once the opponent figures out the correct punishment. |
NN will eventually evaluate the end positions as identical as it become better through training. But before then, syncing will more likely than not burden the network with wrong ideas and slow its progress. |
But why would that be? Your previous post most certainly did not demonstrate that. You still fail to explain why the opponent failing to refute a bad branch would somehow be related to...anything? |
"ko" or "no ko" only determines what are the legal moves in the future. If two position have the same ko-hash (or I would say the same position with two different history), it means just that from any future position a move cannot go back to any of those positions in the history. But obviously there is only one position you can go back to, and even you went back, you are in a position that have a different history. So why the actual history of the position you went back matter? (in the given example, the position after black played f4) In other words, with history + current position we determine weather a move is legal or not. But when we do so, we only need the positions (hash) in the history. We don't care how a historical position being reached. Is it clear? This is why I believe this is hardly an issue. |
Maybe I misunderstood something. The point was the following: |
Right, the legal moves in a position - and hence, the value of it - vary depending on whether ko moves are possible or not, and this depends on the positions' history. Thus, the hash should at the very least include basic ko information. (Compare this to hashing en-passant squares in chess engines - the issue is near identical) |
We catch and handle the basic cases of this now correctly. |
Change m_hash to include side to move. Cover cases where genmove is called with the same color such as free placement handicap. Fixes issue leela-zero#576. Pull request leela-zero#645.
Adds a test for issue leela-zero#576 and pull request leela-zero#645.
Following two positions (case 1 and case 2) have equal hash values but do not have same positional values. However, Leelaz MCTS might treat these two positions as equal when a node is synced with another node in the TTable of equal hash value. Shouldn't the nodes not the be synced in this case?
Case 1
Case 2
The text was updated successfully, but these errors were encountered: