# Cheating

It occurred to me that the way I've designed the puzzle and solver classes opens the way to having a "cheating" solver. Basically a solver that replaces the puzzle with a pre-programmed sequence of numbers that obey the rules but do not match the original puzzle clues.

So just for fun let's see how easy that is to do and how I could improve the puzzle class to detect and block attempts to "cheat".

## Modules required

We're using the [sudoku](../puzzle/sudoku.py) and [tester](../puzzle/tester.py) modules used elsewhere, as well as a small number of standard libraries. We have to make a slight adjustment to the notebook's environment in order to find these modules, since this notebook is in a sub-directory.

In [1]:
import copy
import sys

sys.path.insert(-1, '..')
import puzzle.tester as tester
import puzzle.sudoku as su
from puzzle.jupyter_helpers import *
display(HTML(SUDOKU_CSS))

The "purpose" of cheating is to beat the performance of legitimate solvers. We'll be using Pandas and Matplotlib then to assess results.

In [2]:
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline
plt.rcParams["figure.figsize"] = [12, 6]
pd.set_option('precision', 3)

# Cheating Attempts

## First attempt: Lie

First attempt was based on this code in the original implementation of `is_solved` (if you check [the code](../puzzle/sudoku.py) you'll see the method is no longer implemented this way - see below for why):

```python
def is_solved(self):
    return self.is_puzzle_valid() and self._num_empty_cells == 0
```

So, how about a solver that just plain lies by replacing the number of empty cells left? To try this I've created a new class `CheatingSolver` which tries to replace `puzzle`'s private attribute tracking the number of empty cells and then just *always* return `True`.


In [3]:
class CheatingSolver:
    def solve(self, puzzle):
        """Easiest way to cheat would be to trick the is_solved() method on the puzzle to always returning True"""
        puzzle._num_empty_cells = 0
        return True

puzzle = su.SudokuPuzzle(starting_grid=su.from_string(su.SAMPLE_PUZZLES[0]['puzzle']))
solver = CheatingSolver()
solver.solve(puzzle)

True

So the solver will always return `True` but the puzzle itself should know that it's not really solved. I changed the `is_solved` method to actually check that every cell has a value.

```python
def is_solved(self):
    if self.is_puzzle_valid():
        for i in range(self.max_value()):
            for j in range(self.max_value()):
                if self.is_empty(i, j):
                    return False
        return True
    else:
        return False
```

In [4]:
puzzle.is_solved()

False

Now if we use this in the `PuzzleTester` then we want to make sure that it's detecting that the puzzle isn't really solved. You'll notice there's an initialization parameter, `anti_cheat_check` that has been set to `False`. We'll demonstrate later why I added that in.

In [5]:
include_levels = ['Kids', 'Easy', 'Moderate', 'Hard']  # , 'Diabolical', 'Pathalogical']
test_cases = [x for x in su.SAMPLE_PUZZLES if x['level'] in include_levels]
pt = tester.PuzzleTester(puzzle_class=su.SudokuPuzzle, anti_cheat_check=False)
pt.add_test_cases(test_cases)

8

In [6]:
solver = CheatingSolver()
pt.run_tests(solver)
df = pd.DataFrame(pt.get_test_results())
df.style.highlight_null()

Unnamed: 0,label,level,starting_clues,CheatingSolver
0,SMH 1,Kids,31,
1,SMH 2,Easy,24,
2,KTH 1,Easy,30,
3,Rico Alan Heart,Easy,22,
4,SMH 3,Moderate,26,
5,SMH 4,Hard,22,
6,SMH 5,Hard,25,
7,Greg [2017],Hard,21,


I had to change `PuzzleTester` class to check the return value of the puzzle's `is_solved` method, rather than trust the solver's return value from `solve`. If the puzzle asserts that it is NOT solved then no result is recorded for the solver.

I also changed the `num_empty_cells` attribute in `SudokuPuzzle` from [protected to private](https://www.tutorialsteacher.com/python/private-and-protected-access-modifiers-in-python), so the attempt to replace the real value no longer works.

In [7]:
puzzle.num_empty_cells()

50

# Second attempt: Replace puzzle with a canned solution

Since our really simple cheater no longer worked I needed a more sophisticated version. We could just fill in the blank cells with "1" (or random values) but then the `is_puzzle_valid` check would fail, at which point we may as well solve it properly. 

So maybe what our cheat needs to do is fill in *all* cells in a rule-abiding way. We won't be actually solving the original puzzle. Basically, we're just writing a "pre-solved" puzzle over the top.

In [8]:
class CheatingSolver:
    def solve(self, puzzle):
        """Write a pre-solved puzzle in over the top of the provided one"""
        starting_values = [0, 3, 6, 1, 4, 7, 2, 5, 8]
        max_value = puzzle.max_value
        assert max_value == 9, "I can't handle puzzles other than 9x9"
        puzzle.clear_all()
        for i in range(max_value):
            for j in range(max_value):
                puzzle.set(i, j, (starting_values[i] + j) % max_value + 1)
        return True

In [9]:
puzzle = su.SudokuPuzzle(starting_grid=su.from_string(su.SAMPLE_PUZZLES[0]['puzzle']))
solver = CheatingSolver()
solver.solve(puzzle)
puzzle.is_solved()

True

So this cheat works, at least as far as the `puzzle` instance is concerned.

Now, the whole point of cheating here is to be faster than a real solver, so let's test performance.


In [10]:
# Runs all the "legit" solvers
for m in su.SOLVERS:
    solver = su.SudokuSolver(method=m)
    pt.run_tests(solver, m)

In [11]:
# Runs the "cheat" solver
solver = CheatingSolver()
pt.run_tests(solver)
solver_labels = list(pt.get_solver_labels())

In [12]:
df = pd.DataFrame(pt.get_test_results())
df.style.highlight_null().\
    highlight_max(axis=1, color='darkorange', subset=solver_labels).\
    highlight_min(axis=1, color='green', subset=solver_labels).\
    format({m: '{:.3f}' for m in solver_labels})

Unnamed: 0,label,level,starting_clues,CheatingSolver,backtracking,constraintpropogation,deductive,sat
0,SMH 1,Kids,31,0.005,0.014,0.002,0.002,0.017
1,SMH 2,Easy,24,0.002,0.254,0.002,0.003,0.017
2,KTH 1,Easy,30,0.002,0.011,0.002,0.002,0.017
3,Rico Alan Heart,Easy,22,0.001,0.071,0.021,0.007,0.017
4,SMH 3,Moderate,26,0.001,0.075,0.02,0.029,0.019
5,SMH 4,Hard,22,0.001,1.348,0.024,0.016,0.017
6,SMH 5,Hard,25,0.001,0.565,0.024,0.013,0.018
7,Greg [2017],Hard,21,0.001,0.541,0.038,0.042,0.019


Orange in each row is the slowest time, and green in each row is the fastest. Oddly, the cheating solver doesn't *always* win, but it wins enough that there appears a sufficient motivation to cheat. So let's fix that.


---
# Catching Cheats

To prevent the new cheat we basically need to compare the puzzle with a copy of the original. That way we can detect that the starting clues have been replaced.

We can't do this in the `SudokuPuzzle` itself. Python's private attributes can be tampered with (it's [more a naming convention](https://docs.python.org/3/tutorial/classes.html#tut-private) to stop programmers shooting themselves in the foot than a security control). Since we're trying to guard against cheating we can assume an attacker will happily ignore convention.

If we assume that the caller (test harness) can be trusted then we can let the caller verify that the original puzzle is OK. We'll just need a function that confirms if the starting clues in one puzzle also exist in the second.


In [13]:
def has_same_clues(a, b):
    """Returns true if the non empty cells in a have the same value in b"""
    if a.max_value != b.max_value:
        return False
    
    for i in range(a.max_value):
        for j in range(a.max_value):
            if not a.is_empty(i, j) and a.get(i, j) != b.get(i, j):
                return False
    return True

In [14]:
puzzle = su.SudokuPuzzle(starting_grid=su.from_string(su.SAMPLE_PUZZLES[-1]['puzzle']))
original = copy.deepcopy(puzzle)
has_same_clues(original, puzzle)

True

In [15]:
# Solver is cheating and will replace puzzle
solver.solve(puzzle)
puzzle.is_solved()

True

In [16]:
# Should return False because puzzle has been replaced
has_same_clues(original, puzzle)

False

This is the fix I put in the `PuzzleTester` class. The `anti_cheat_check` is on by default. The `run_single_test` method makes a copy of each puzzle *before* calling the solver, then compares the "solved" puzzle to the copy. If the clues in the original aren't present in the solved puzzle then it's not a real solution.

So now we can re-run the tests and check to see that our cheater won't prosper.


In [17]:
# New instance of PuzzleTester, anti_cheat_check is True by default
pt = tester.PuzzleTester(puzzle_class=su.SudokuPuzzle)
pt.add_test_cases(test_cases)

8

In [18]:
# Runs the "cheat" solver
solver = CheatingSolver()
pt.run_tests(solver)

8

In [19]:
# Runs all the "legit" solvers
for m in su.SOLVERS:
    solver = su.SudokuSolver(method=m)
    pt.run_tests(solver, m)
solver_labels = list(pt.get_solver_labels())

In [23]:
df = pd.DataFrame(pt.get_test_results())
df.style.highlight_null().\
    highlight_max(axis=1, color='darkorange', subset=solver_labels).\
    highlight_min(axis=1, color='green', subset=solver_labels).\
    format({m: '{:.3f}' for m in solver_labels if m != 'CheatingSolver'})

Unnamed: 0,label,level,starting_clues,CheatingSolver,backtracking,constraintpropogation,deductive,sat
0,SMH 1,Kids,31,,0.005,0.002,0.002,0.017
1,SMH 2,Easy,24,,0.22,0.002,0.003,0.017
2,KTH 1,Easy,30,,0.011,0.001,0.002,0.017
3,Rico Alan Heart,Easy,22,,0.076,0.021,0.007,0.017
4,SMH 3,Moderate,26,,0.075,0.02,0.028,0.018
5,SMH 4,Hard,22,,1.371,0.024,0.017,0.018
6,SMH 5,Hard,25,,0.665,0.025,0.014,0.018
7,Greg [2017],Hard,21,,0.553,0.038,0.046,0.021


OK! Our cheating solver has had no results recorded for it, because the answer it gives does not match the starting clues!

---
# Next Steps

There are probably ways to defeat these checks, particularly in a language like Python where "[monkey patching](https://medium.com/@chipiga86/python-monkey-patching-like-a-boss-87d7ddb8098e)" is a thing and everything is dynamic. For example:

* Could we subclass `SudokuPuzzle` and modify the methods there?
* The `PuzzleTester` method `run_tests` takes a `callback` parameter that's called just before each test is run, and then finally when all tests are complete. Could we hijack that to fake our test results? You'd have to do it from inside the solver class for it to be a real "cheat"...
* Can the solver class access and modify the copy of the original puzzle?

These might be a fun way to learn more about the internals of Python, but for now I'm declaring this "done" and moving on to the next puzzle...