Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adversarial Policies Beat Professional-Level Go AIs #705

Open
LL145 opened this issue Nov 27, 2022 · 14 comments
Open

Adversarial Policies Beat Professional-Level Go AIs #705

LL145 opened this issue Nov 27, 2022 · 14 comments

Comments

@LL145
Copy link

LL145 commented Nov 27, 2022

I noticed this paper:
https://goattack.alignmentfund.org/

The authors claim to have found a weakness in Katago. At first I thought these studies were pointless.
But recently they released some new game records, showing that the adversarial policy can beat the katago with 2048 searches with a probability of more than 70%.

@lightvector
Copy link
Owner

Yep, I agree that the new results are more interesting than the old ones! Note that from a Go player's perspective, the discovery isn't new, they just rediscovered this:
#259

As far as we know, AlphaZero-trained nets in Go seem to all share in common the same type of misevaluation - none of them learns to consistently evaluate groups that have cyclic topology correctly. As with some other things in the past, I think authors hadn't known this ahead of time, so I've been corresponding with them privately to help give them a bit more background. I hope at some point they can do more revisions to clean up their presentation and focus on communicating things accurately with a little less confusion than has been done so far.

As far as the implications for users - I still don't think there is any difference in practice. People already knew that Go AI may not evaluate tactics correctly involving two headed dragons (https://senseis.xmp.net/?TwoHeadedDragon) or other cyclic topologies, so for users nothing much has changed. The main new discovery of interest is just that if you optimize you can learn to find sequences to reliably produce them.

By the way, I think some of their older results about passing and low-liberty attacks, which still seem to be reported on the site and the paper, need to be qualified a bit. They give the impression that it holds up against small amounts of search, e.g. 32 or 64 playouts.... and this is actually true of the official released v1.11! But this seems to be because I had mistakenly not spent the time to get the LCB implementation working for low playouts, rather than being a fundamental issue with the algorithm. Next release will contain a fix to make LCB work for low playouts too, makes it so that even small amounts of search (e.g. just 8 or 16 playouts) provide a lot more resistance to either of those, rather the current figures that show those methods still working up to 100 playouts. And perhaps more relevantly, it provides a mild general strength improvement at low playouts across all positions by allowing LCB to work there.

So that leaves mainly cyclic topology and occasionally mirror Go as the two main known ways to exploit Go AI that actually hold up when you add search, instead of only playing against the raw policy. I do have some plans in trying some training at some point to see if cyclic topology can be corrected. It would be scientifically interesting to see how easily neural nets can be taught this thing that they currently fail to learn naturally.

Hope this helps summarize the state of things! Thanks for raising the topic!

@gaogaotiantian
Copy link

Hi @lightvector , thanks for the explanation. I do have some follow up questions.

I totally understand that the NN would have some blindspots on complicated topologies. What I can't understand was - why the MCTS is not doing what it's supposed to do.

image

In the image above, KataGo will give a high score on the red spot(not the highest after a couple thousand pos) with like 5k visits on the place. However, if we placed a black stone there, then KataGo would realize it was a bad move in less than 1k pos and find the correct counterplay.

From my understanding, expanding a subtree node(when analyzing the initial board) and evaluating the root node(after one play) should be almost equivalent. Is there any optimizations on the MCTS that caused the difference?

@michito744
Copy link

@gaogaotiantian

KataGo's policy does not handle the complex conflict process at all, so requires a lot of noise, branch cutting, and brute force searching to do a decent analysis (and the pre-convergence valuations are worth nothing).

An example of a KataGo analysis in a complex conflict situation
2022-12-02 (6)
This is a move based on a setup where the best results were obtained by sheer chance, but under normal circumstances KataGo would not even deploy W-D15.
2022-12-02 (4)
Black also had a case where the policy weighting was so low for B-A15, which was shown 8 moves later, that KataGo could not deploy it and output an incorrect evaluation.

Since the search tree is developed by policy weighting, it is easy to get the dumbest results in the most serious situations.

@lightvector
Copy link
Owner

lightvector commented Dec 2, 2022

@michito744 - Can you provide an SGF for that situation? It looks very interesting. Do you have other situations like this that you also haven't shared yet? I'm always interested in expanding the test set of SGFs I have for cases like this, it might one day help with improving these cases!

@gaogaotiantian

why the MCTS is not doing what it's supposed to do.
In the image above, KataGo will give a high score on the red spot(not the highest after a couple thousand pos) with like 5k visits on the place. However, if we placed a black stone there, then KataGo would realize it was a bad move in less than 1k pos and find the correct counterplay.
From my understanding, expanding a subtree node(when analyzing the initial board) and evaluating the root node(after one play) should be almost equivalent. Is there any optimizations on the MCTS that caused the difference?

I don't think that's a surprise. You are probably seeing the result of random chance. If you re-ran the search many times, you would find the number of visits for both of them to see the issue significantly varies. (Note, it is NOT a valid test to re-run the search simply by going back and forth between the two positions, because KataGo caches some neural net evaluations between the same analysis session. You would have to clear the cache, or entirely restart the program every time, to do a proper test. I don't think you need to bother doing this, it's already known that you will find significant variation).

If what you are seeing isn't just random chance, then likely it has to do with extra noise and exploration that is added to the search at the root node but not deeper in the tree. The main purpose of the extra noise is to force KataGo to try more possible moves to give the user more information, purely to make the analysis more humanly useful by showing more data. Sometimes this extra noise may correctly help KataGo find moves that it wouldn't otherwise find, but on average it makes KataGo weaker, not stronger (i.e. there are more other positions where it will hurt, or make KataGo take longer to find moves), therefore it's only done purely as a human-friendly method to show more moves, rather than for playing strength. Since the root is the only position that human players directly view, it's done at the root only, and therefore sometimes it will have an benefit at the root in finding moves (and sometimes it will harm the root).

But it's easy to overstate the effect of the root noise and exploration. Often, the result is due to just random chance too, rather than that.

Also, note that michito774 is right about KataGo's raw policy, however, note that this is true of the policy of all AIs in complex strategy games, regardless of whether it is Go or Chess or another game, and regardless of whether it is KataGo or Leela Chess Zero or Golaxy or whatever other bot. If you see less of this issue on some bots, it is only because that bot is less openly and extensively studied or attacked, or it is because the bot is stronger and has errors in different positions and/or less frequently, but exponentially many errors will still exist. Especially when a bot directly plays opponents weaker than it - KataGo would look like it had much fewer issues if you only judged it in games against Leela Zero, and Leela Zero would look like it had much fewer issues if you only played it against pre-AlphaZero bots, except for maybe ladders, and so on.

And it is true of humans too - humans, including pros, also have terrible blind spots and misevaluations and moments where the opponent plays only a single move and then they go "oh my god, I didn't expect that move, I'm in trouble now".

Although the frequency of these things and the situations they happen or not varies between different bots, different games, and between humans and computers, who have very different strengths/weaknesses, the fact that the phenomenon itself happens universally for everyone, human, computer, etc, suggests that it is a fundamental property of complex, exponentially-branching strategy games. Right up until the point where we entirely solve a game, (which we can do only for the very simplest games on tiny boards), there will always be blind spots and surprises where the policy is wrong.

@michito744
Copy link

@lightvector Thanks.

I have a number of test cases (real human games) to evaluate new networks.
I will upload the sgf and checkpoints for this one in the next post.

By the way, I think events like this one are quite common. I heard that there is a program that uses deep learning in Shogi (Japanese chess), and that it was able to compete with chess-based thinking routines only after it was trained with "Tsume" phases as a teacher. In the case of Go, reinforcement-learning AI clearly has a serious weakness with regard to complex conflict.

@michito744
Copy link

@lightvector

Derived studies based on the LG Cup game record.
Here is a diagram of the change if B-B12 is not moved.
KataGo_sample_20221203_0001.txt

At a stage with very little exploration, a false rating of B-A17 will give a false value of black not bad. If properly adjusted, the B-A17 evaluation value will decrease with increasing exploration.
2022-12-03 (1)
2022-12-03

White is good with W-D15, but later finds the progression to play B-A15, which becomes very difficult.
2022-12-03 (3)
In the previous setup and test, White found the best procedure here and kept the evaluation value, but this time the failure to do so caused the evaluation value to drop to almost even.
2022-12-03 (5)

If W-C17 can be found and deployed, White's evaluation value will be restored. If black also can discover B-C19, still has a long way to go. However, both moves have infinitely zero policy weighting.
2022-12-03 (9)
2022-12-03 (8)

2022-12-03 (11)
2022-12-03 (10)

Situations in which nothing is known appear many times in less than 20 moves, so KataGo cannot draw any conclusions unless it balances noise and branch trimming in a brute-force search.

@michito744
Copy link

It is not easy to list all the other test cases, I plan to select and upload only those that are particularly egregious.

@michito744
Copy link

@lightvector

Another record of LG Cup (actual game progression)
KataGo_sample_20221203_0002.txt

This is a game where Yang 9P dominated Shibano 9P.
2022-12-03 (21)
The 103rd move (B-S10) was a bad move, and it was not clear if next move could completely kill the white stones on the right side. The move 104 (W-M12), which was made to take advantage of this situation, was an excellent move that surpassed the AI. In the actual game, it was played with only five minutes of consideration time, so it is expected that Yang 9P had been aiming for this move for a long time.
2022-12-03 (23)
2022-12-03 (22)
If the settings are improper and the search performance is inadequate, the AI will not understand the meaning (and will not be properly evaluated) even after seeing W-M12. In fact, the viewers could not understand the situation of the game at all because of such behavior of the Nihon Ki-in AI that was delivering this game.
2022-12-03 (19)
KataGo's policy is to judge that "white is dangerous not to run away from the right side," and it hardly considers the possibility of taking risks. In this example, the search happened to be successful in my environment, but since the policy weighting is low, it is left to luck whether it can actually develop or not.

@michito744
Copy link

michito744 commented Dec 3, 2022

@lightvector

A behavior that is not expected to improve at all.
KataGo_sample_20221203_0003.txt
From the actual Nongshin Cup : Shin 9P trapped Iyama 9P.
2022-12-03 (47)
Since the progression shown on the right occurs, the black on the left side has already collapsed in its current placement.

All of the following phase evaluations must make this a basic premise.

In practice, however, KataGo cannot reuse this partial diagram at all. Every derivation progression must re-explore this diagram and resolve it.
2022-12-03 (39)
1 move later. KataGo tries to postpone the left side problem with another move (since the first diagram is not reusable at all), and required over a million searches to solve by brute force search.
2022-12-03 (44)
2022-12-03 (45)

Unless this dumb "reinvention of the wheel" is eliminated, the cost of reinforcement learning will continue to be sucked into a black hole.

(In the first place KataGo was not even able to develop the wonderful {W-D13 B-D12 W-C11}, which was played in the real game.)

@lightvector
Copy link
Owner

These examples are great, thanks! If you have any others that are similarly egregious that you find later, I would be very appreciative if you could share them as well.

I'm unable to see an SGF for that last example. Do you have an SGF for that last example from the Nongshim cup as well, or know where I can easily obtain it myself?

I'm particularly interested in that one, since I have been doing some experiments with some search techniques that might reduce the severity of examples like your last case, where a branch has to be re-solved over and over. I don't think I can fix it completely, but I think there are some possible things I can try, so having test cases where you have seen that specific problem happen very severely would be useful.

@michito744
Copy link

michito744 commented Dec 4, 2022

@lightvector

I edited the post above and uploaded the file.
I will be able to post more soon.

@michito744
Copy link

@lightvector
It's too tedious to explain, so I'll report the details again at a later date. This is a case where KataGo needed over 5 million searches before converging to a decent conclusion.
2022-12-12 (11)

The upper right is the shape of the seki, but W-J18 was a complete impossible move (first candidate in KataGo's policy).
It was followed by B-H16, W-L18, where the ko generated by B-J12 could not be avoided. White has two choices: lose to the ko or collapse entirely due to a seki collapse in the upper right corner.

(Below is the trend of evaluated values when responding by W-T12. Although not shown in the figure, the evaluation of B-R13 is initially lower than that of B-H12.From here, we need to search several hundred thousand more to reach the above conclusion.)
2022-12-13
2022-12-13 (1)
2022-12-13 (2)
2022-12-13 (3)
2022-12-13 (4)
2022-12-13 (5)

In my environment, KataGo reached this conclusion and eliminated W-J18 as a candidate for first place, so I think he did a good job.
However, this was the result of a time-consuming correction with a brute-force search from a completely white good evaluation at first.
KataGo's policy itself is that it does not recognize this form correctly at all.

@michito744
Copy link

Incidentally, the uncertainty in the process of transitioning from the wrong conclusion to the right one exceeds 45.
Not only in this case, but figures of this magnitude appear occasionally when analyzing professional game.

@michito744
Copy link

michito744 commented Dec 14, 2022

@lightvector

The game record.
KataGo_sample_20221214_0001.txt

In the actual game, White played W-P12 on move 102, where Black resigned.
(This progression is one of the diagrams in which White played W-P10 on move 102. It is the best show that White goes for all the black stones on the right side from the almost winning position and fails, and I introduced it because I found that KataGo has a high probability of making a mistake.)

If White plays W-H16 on move 142, even if the upper right corner is taken in a seki collapse, it is clearly a big win for White to be able to take the black 4 stones on the upper side first.
However, if White plays W-J18 on move 142, it will be in the above situation from the progression of the branching diagram, and White will lose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants