Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTable::sync in a ko situation #576

Closed
mkw1234 opened this issue Jan 6, 2018 · 24 comments
Closed

TTable::sync in a ko situation #576

mkw1234 opened this issue Jan 6, 2018 · 24 comments

Comments

@mkw1234
Copy link

mkw1234 commented Jan 6, 2018

Following two positions (case 1 and case 2) have equal hash values but do not have same positional values. However, Leelaz MCTS might treat these two positions as equal when a node is synced with another node in the TTable of equal hash value. Shouldn't the nodes not the be synced in this case?

image
Case 1

image
Case 2

@gcp
Copy link
Member

gcp commented Jan 6, 2018

Shouldn't the prisoners be unequal if it is a ko? They are included in the hash.

@mkw1234
Copy link
Author

mkw1234 commented Jan 6, 2018

I am referring to a case when two branches arrive at a position with same hash value. The two positions shown above both have same 1 black stone prisoner. Same hash values. The difference is in case 1, black can take the e4 stone, but in case 2 black can not take the e4 stone.

@ThePokerDude
Copy link

Shouldn't the prisoners be unequal if it is a ko? They are included in the hash. <

Not necessarily. For examlpe:

  1. (move number for example 24 (e6f6 already played)) last move = e4 wins opponent's stone -> ko
  2. (move number 22 - e6f6 not played yet) last moves: e4 wins opponent's stone, e6 f6 -> no ko

@joseph-ireland
Copy link

joseph-ireland commented Jan 6, 2018

This can happen in a triple ko too. The only fully correct solution I can see is to enumerate the superko-illegal points for each position in advance and include them in the hash. Leaving it as is will mess up certain trees where white can remove one of blacks ko threats in sente before starting a ko.

@mkw1234
Copy link
Author

mkw1234 commented Jan 6, 2018

In addition to ko issue, I think there might be a general problem with the syncing concept. In baduk, correct timing of moves is very important. If the program plays a good sequence of moves to reach a good position, NN should be rewarded. If the program plays bad sequence of moves but reaches same position by accident, NN should not be rewarded for that. But by syncing a node to another node based on matching board configuration, you increase the likelihood that NN will be rewarded for a bad sequence of moves and develop a bad habit.

@gcp
Copy link
Member

gcp commented Jan 6, 2018

Your last comment makes no sense. The evaluation fed back to the root is the one from the leaves. If it comes from a bad sequence, that means the opponent could have forced an inferior branch earlier up the tree and have stopped you from reaching it. If he cannot do that, the moves are good. A bad sequence is one that cannot reach a good position, hand-waving that away with "by accident" makes no sense.

I mean if you're trying to construct some argument that Transposition Tables are "wrong", I'm afraid I have bad news for you.

@gcp
Copy link
Member

gcp commented Jan 6, 2018

The difference is in case 1, black can take the e4 stone, but in case 2 black can not take the e4 stone.

Ah, that makes sense. The issue is the path dependence. Including the ko move, if any, in the hash should solve the most common cases and may be worthwhile. The general problem is known, more problematic to fix and happens in chess-programs too with threefold repetitions.

@mkw1234
Copy link
Author

mkw1234 commented Jan 6, 2018

I have the stated doubts about Transposition Tables being appropriate for MCTS guided by NN. What you really have here is MCTS guided by a hybrid of NN + Transposition Table.

@alreadydone
Copy link
Contributor

alreadydone commented Jan 6, 2018

So we've been syncing two nodes whose previous seven positions don't agree? Such nodes don't even have the same NN evaluations! Or do you think (and did tests showing) that the difference would be negligible if no ko was involved?

I think @mkw1234 is at least right in that the history affects the NN evaluation at least in AGZ. Maybe the previous seven positions are there not only to deal with repeating positions but also to somehow help the NN learn the timing.

@gcp
Copy link
Member

gcp commented Jan 7, 2018

So we've been syncing two nodes whose previous seven positions don't agree

If the positions are the same, yes. Note that the evaluation should be the same, aside from the issue with ko highlighted here (which I'd consider a bug and should be fixed by treating them as different). If the NN gives a different evaluation for the exact same board position (which it can certainly do), that's just because it has not reached the full truth about Go yet. There is no reason to not reuse an evaluation that is as correct as another.

The evaluation will return different results for different symmetries even if the history is the same. Yet the board position is mathematically equivalent! Why would we care about this one and not about that?

help the NN learn the timing.

This is a completely meaningless statement. The timing only matters when there is a ko, which we've just established should be treated differently.

@billyswong
Copy link

https://senseis.xmp.net/?Cycle

I have no idea how do we know which 2 game states are "ko-equivalent" or not without pouring in a lot of game knowledge, when there are so many kinds and lengths of cycle.

@gcp
Copy link
Member

gcp commented Jan 7, 2018

FWIW after the addition of the NNCache, and knowing about this ko bug, it seems worthwhile to me to check if the TTable actually makes the program stronger, and I'm testing this now.

(NNCache takes the last 2x8 move history as the hash key, so it's immune to this ko bug and some more complex kos)

@gcp
Copy link
Member

gcp commented Jan 7, 2018

I have no idea how do we know which 2 game states are "ko-equivalent" or not without pouring in a lot of game knowledge, when there are so many kinds and lengths of cycle.

It's not necessary to handle all cases correctly, just behave decently on the common ones.

Many heuristics are useful even if they are wrong sometimes or have other downsides. So pointing out a heuristic can go wrong is not useful or constructive unless you a) have a better replacement b) can demonstrate the heuristic makes things worse.

@joseph-ireland
Copy link

@billyswong for checking cycles/triple ko, leela-zero keeps a list of "ko hashes" (seen above, unique for each board position) for prior positions, and before searching a suggested move, it checks if its ko hash is in the list (i.e. the position has been seen before). If it is, that branch is invalidated and it isnt considered.

The problem here is that in specific circumstances, instead of being invalidated, the search can be merged with another search where the ko has played out differently. This is done by comparing the "hash" of two positions, which is supposed to be unique for identical situations, but currently doesnt include ko state at all.

This isn't easy to avoid for longer cycles, since the trees could be merged early in the cycle, before any of the later moves causing a cycle are even evaluated. The problem actually affecting the game outcome would be reasonably rare anyway, even without an exotic cycle, so gcp is considering only fixing it for "regular" ko, which is easy to calculate.

The merging (transposition tables) is important for things like trades/capture races where there are many move options which are equivalent. Not using it will thin the search exponentially (based on the capture race length) and make it evaluate these trades very inaccurately. Things like timing of exchanges shouldnt be affected, since it will consider any refutations of timing errors before the trees are merged.

@grolich
Copy link

grolich commented Jan 7, 2018

I have the stated doubts about Transposition Tables being appropriate for MCTS guided by NN. What you really have here is MCTS guided by a hybrid of NN + Transposition Table.

Apologies in advance if I misunderstood your reasoning, but it seems to me you are assuming something about the NN positional evaluation:

You seem to make the (hidden?) assumption that the history of the last few positions/moves being fed to the NN is there to be used for somehow training better move sequences (in addition to "normal" better evaluation/move training), or that it can be used for that purpose.

Well, there are 2 parts to the answer:

  1. it isn't there for that purpose at all. That can be simply stated as DeepMind, in their AGZ paper, said it's simply there because kos aren't observable by the NN otherwise.

  2. It can't be used for such a purpose the way you seem to be suggesting (not sure how the "sequence training" you mentioned would work, as gcp stated, it's not really making much sense in the context of a search like MCTS).

The reason:

Your idea seems to be that the NN should output different results if the input planes representing the history are different for the same position (if the NN output would be exactly the same, the exact same prior probabilities would be used, and the same MCTS would occur. In that case, the TT clearly just saves time, and nothing else changes).

However, the entire search is guided (and the network is trained) by the assumptions that the policy returns probabilities of moves being the best, and the value returns the chance of winning.
Same positions (if there is no ko issue) have the exact same good and bad moves. The network trains to reach as close as it can to these values.
Any change to those values in a way which might make the outputs deliberately different would ensure that at least one of the positions returns values that do not reflect these correctly, and would violate these assumptions.

In reality. the values will often be different simply because the net hasn't learned they should be the same yet, but training is geared towards making the net recognize them as the same, among other things.

The history input planes just weren't meant to be used for your suggested "sequence training", and the entire algorithm AGZ uses wasn't built on such ideas. In fact they are more likely to hurt than to help for the reasons stated above.

You would have to think about something quite different to make such an idea realistic (if at all possible), but in any case, whether it's possible or not, LZ is following the AGZ experiment as closely as it can, so this just doesn't apply.

@gcp
Copy link
Member

gcp commented Jan 7, 2018

Result of a little test:

leelaz-fca0 v leelaz-fca0-ntt (563/1000 games)
board size: 19   komi: 7.5
                  wins              black          white        avg cpu
leelaz-fca0        292 51.87%       140 49.82%     152 53.90%    314.88
leelaz-fca0-ntt    271 48.13%       130 46.10%     141 50.18%    293.10
                                    270 47.96%     293 52.04%

It seems even with a potential ko issue the transposition table does still help a bit. (The above isn't statistically significant, but it's also even more unlikely to be making things worse).

@mkw1234
Copy link
Author

mkw1234 commented Jan 7, 2018

Your idea seems to be that the NN should output different results if the input planes representing the history are different for the same position

Yes. I thought this might be beneficial to the process of NN training.

Suppose a program considers branch A first and then branch B second.

The branches take different paths but end up in same position.

When A is evaluated at this position, NN outputs a value of 0.9.
When B is evaluated at this position, NN outputs a value of 0.5.

The program will be discouraged from taking the branch B because of the lower value.
This is the desired behavior.

But in case of LZ, it syncs the position B to position A, and by consequence encourages branch B.
If branch B is taken, and the game won by White, the NN will be trained to highly evaluate branch B moves. Now we have a burden of unlearning these moves.

There are many many possible combinations of moves that LZ can consider that lead to a winning position, but only one is the correct one.

image
(A) Correct sequence

image
(B) Incorrect sequence

@gcp
Copy link
Member

gcp commented Jan 7, 2018

@mkw1234 Your reasoning really makes no sense to me. Why wouldn't we want to evaluate the end positions as identical (barring ko issues!) if they are the same position. That would be very much the goal of the training! That's simply the correct thing in theory and practice, and the ttable is irrelevant in that regard.

If branch B is taken, and the game won by White

If branch B is taken and the game won, the moves are good and we want to reinforce them. It is the opponent that made a mistake by not punishing the incorrect sequence and forcing another ending position.

The whole training most certainly reinforces "bad" moves (from the perspective of a "stronger" player) that the opponent cannot punish and hence win games, there are a few million games by now where you can see this effect in action and also the result that the program can very easily unlearn them once the opponent figures out the correct punishment.

@mkw1234
Copy link
Author

mkw1234 commented Jan 7, 2018

Why wouldn't we want to evaluate the end positions as identical (barring ko issues!)

NN will eventually evaluate the end positions as identical as it become better through training. But before then, syncing will more likely than not burden the network with wrong ideas and slow its progress.

@gcp
Copy link
Member

gcp commented Jan 7, 2018

will more likely than not burden the network with wrong ideas and slow its progress.

But why would that be? Your previous post most certainly did not demonstrate that. You still fail to explain why the opponent failing to refute a bad branch would somehow be related to...anything?

@earthengine
Copy link
Contributor

earthengine commented Jan 10, 2018

Not necessarily. For examlpe:

(move number for example 24 (e6f6 already played)) last move = e4 wins opponent's stone -> ko
(move number 22 - e6f6 not played yet) last moves: e4 wins opponent's stone, e6 f6 -> no ko

"ko" or "no ko" only determines what are the legal moves in the future. If two position have the same ko-hash (or I would say the same position with two different history), it means just that from any future position a move cannot go back to any of those positions in the history. But obviously there is only one position you can go back to, and even you went back, you are in a position that have a different history. So why the actual history of the position you went back matter? (in the given example, the position after black played f4)

In other words, with history + current position we determine weather a move is legal or not. But when we do so, we only need the positions (hash) in the history. We don't care how a historical position being reached. Is it clear?

This is why I believe this is hardly an issue.

@ThePokerDude
Copy link

In other words, with history + current position we determine weather a move is legal or not. But when we do so, we only need the positions (hash) in the history.
I agree

Maybe I misunderstood something. The point was the following:
When LeelaZ first calculates the the winning probability of a position the history matters (taking the stone might be the best move, which dependend on the history can be legal or illegal).
Let's assume that in the one case the winning probability is 0.2 and in the other 0.6.
Let's assume you save 0.2 for the position in the TTable.
When you look up the number later, depending on the history, it might be completely wrong.

@gcp
Copy link
Member

gcp commented Jan 10, 2018

Right, the legal moves in a position - and hence, the value of it - vary depending on whether ko moves are possible or not, and this depends on the positions' history.

Thus, the hash should at the very least include basic ko information. (Compare this to hashing en-passant squares in chess engines - the issue is near identical)

killerducky added a commit to killerducky/leela-zero that referenced this issue Jan 14, 2018
gcp pushed a commit that referenced this issue Jan 17, 2018
Change m_hash to include side to move.

Cover cases where genmove is called with the same color
such as free placement handicap.

Fixes issue #576.

Pull request #645.
gcp pushed a commit that referenced this issue Jan 17, 2018
Change m_hash to include side to move. Cover cases where genmove is
called with the same color such as free placement handicap.

Fixes issue #576.

Pull request #645.
@gcp
Copy link
Member

gcp commented Jan 17, 2018

We catch and handle the basic cases of this now correctly.

@gcp gcp closed this as completed Jan 17, 2018
gcp added a commit that referenced this issue Jan 17, 2018
Adds a test for issue #576 and pull request #645.
zhiyue pushed a commit to awesome-archive/leela-zero that referenced this issue Apr 14, 2018
Change m_hash to include side to move. Cover cases where genmove is
called with the same color such as free placement handicap.

Fixes issue leela-zero#576.

Pull request leela-zero#645.
zhiyue pushed a commit to awesome-archive/leela-zero that referenced this issue Apr 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants