# Numpy Project - Part 4: 3D Sudokus! Increasing dimensions.

Now it's time to increase the number of dimensions of our arrays. We'll use a public [Kaggle Dataset](https://www.kaggle.com/bryanpark/sudoku) that contains 1 million Sudoku games!

We've reduced the total dataset to 5000 games for simplicity, but it'll still be fun. Let's get started

In [11]:
import numpy as np

First let's take a look at the structure of the CSV file:

In [1]:
!head data/sudoku-small.csv

quizzes,solutions
004300209005009001070060043006002087190007400050083000600000105003508690042910300,864371259325849761971265843436192587198657432257483916689734125713528694542916378
040100050107003960520008000000000017000906800803050620090060543600080700250097100,346179258187523964529648371965832417472916835813754629798261543631485792254397186
600120384008459072000006005000264030070080006940003000310000050089700000502000190,695127384138459672724836915851264739273981546946573821317692458489715263562348197
497200000100400005000016098620300040300900000001072600002005870000600004530097061,497258316186439725253716498629381547375964182841572639962145873718623954534897261
005910308009403060027500100030000201000820007006007004000080000640150700890000420,465912378189473562327568149738645291954821637216397854573284916642159783891736425
100005007380900000600000480820001075040760020069002001005039004000020100000046352,19468523738297451665721348982349167554176892376935284121583976443652719897814635

In [2]:
!wc -l data/sudoku-small.csv

    5000 data/sudoku-small.csv


As you can see, it's a very simple CSV containing only 2 columns, the empty board, and the solution. The way the board is expressed is different though; in this case it's just a long string containing all the numbers.

### 1) Parsing long string lines into valid boards

We need to adapt to this new style of expressing Sudoku boards. This is a valuable lesson in data handling: you can't anticipate all the different ways that there will be to express data. It'd be a mistake to extend the `Board` class also including this way of expressing puzzles; we try not to modify our core data structures adding edge cases; instead, we'll write an _"adapter"_ (see [Wikipedia's article about the Software Pattern](https://en.wikipedia.org/wiki/Adapter_pattern)), which is just a tiny function that will turn the long puzzle line into a numpy array:

In [3]:
def adapt_long_sudoku_line_to_array(line):
    return np.array(list(line),dtype=int).reshape((9,9))

In [31]:
adapt_long_sudoku_line_to_array('004300209005009001070060043006002087190007400050083000600000105003508690042910300')

array([[0, 0, 4, 3, 0, 0, 2, 0, 9],
       [0, 0, 5, 0, 0, 9, 0, 0, 1],
       [0, 7, 0, 0, 6, 0, 0, 4, 3],
       [0, 0, 6, 0, 0, 2, 0, 8, 7],
       [1, 9, 0, 0, 0, 7, 4, 0, 0],
       [0, 5, 0, 0, 8, 3, 0, 0, 0],
       [6, 0, 0, 0, 0, 0, 1, 0, 5],
       [0, 0, 3, 5, 0, 8, 6, 9, 0],
       [0, 4, 2, 9, 1, 0, 3, 0, 0]])

In [32]:
line = '004300209005009001070060043006002087190007400050083000600000105003508690042910300'

In [35]:
assert np.array_equal(adapt_long_sudoku_line_to_array(line), np.array([
    [0, 0, 4, 3, 0, 0, 2, 0, 9],
    [0, 0, 5, 0, 0, 9, 0, 0, 1],
    [0, 7, 0, 0, 6, 0, 0, 4, 3],
    [0, 0, 6, 0, 0, 2, 0, 8, 7],
    [1, 9, 0, 0, 0, 7, 4, 0, 0],
    [0, 5, 0, 0, 8, 3, 0, 0, 0],
    [6, 0, 0, 0, 0, 0, 1, 0, 5],
    [0, 0, 3, 5, 0, 8, 6, 9, 0],
    [0, 4, 2, 9, 1, 0, 3, 0, 0]
]))

### 2) Reading a CSV file into a 3-dimensional array

Now it's time to read multiple sudoku puzzles into a single Numpy array. We'll end up with a 3-dimensional array, the first 2 dimensions (x, y) are the ones of a puzzle, and the 3rd dimension (z) is for multiple puzzles. Here's a graphical representation of it:

<img width="600px" src="https://user-images.githubusercontent.com/872296/68670705-499dce00-052c-11ea-8e82-18a1f435e274.png">


For example, we want to create something like this:

In [36]:
np.array([
    [
        [0, 0, 4, 3, 0, 0, 2, 0, 9],
        [0, 0, 5, 0, 0, 9, 0, 0, 1],
        [0, 7, 0, 0, 6, 0, 0, 4, 3],
        [0, 0, 6, 0, 0, 2, 0, 8, 7],
        [1, 9, 0, 0, 0, 7, 4, 0, 0],
        [0, 5, 0, 0, 8, 3, 0, 0, 0],
        [6, 0, 0, 0, 0, 0, 1, 0, 5],
        [0, 0, 3, 5, 0, 8, 6, 9, 0],
        [0, 4, 2, 9, 1, 0, 3, 0, 0]
    ],
    [
        [0, 0, 4, 3, 0, 0, 2, 0, 9],
        [0, 0, 5, 0, 0, 9, 0, 0, 1],
        [0, 7, 0, 0, 6, 0, 0, 4, 3],
        [0, 0, 6, 0, 0, 2, 0, 8, 7],
        [1, 9, 0, 0, 0, 7, 4, 0, 0],
        [0, 5, 0, 0, 8, 3, 0, 0, 0],
        [6, 0, 0, 0, 0, 0, 1, 0, 5],
        [0, 0, 3, 5, 0, 8, 6, 9, 0],
        [0, 4, 2, 9, 1, 0, 3, 0, 0]
    ],
    [
        [0, 0, 4, 3, 0, 0, 2, 0, 9],
        [0, 0, 5, 0, 0, 9, 0, 0, 1],
        [0, 7, 0, 0, 6, 0, 0, 4, 3],
        [0, 0, 6, 0, 0, 2, 0, 8, 7],
        [1, 9, 0, 0, 0, 7, 4, 0, 0],
        [0, 5, 0, 0, 8, 3, 0, 0, 0],
        [6, 0, 0, 0, 0, 0, 1, 0, 5],
        [0, 0, 3, 5, 0, 8, 6, 9, 0],
        [0, 4, 2, 9, 1, 0, 3, 0, 0]
    ],
])

array([[[0, 0, 4, 3, 0, 0, 2, 0, 9],
        [0, 0, 5, 0, 0, 9, 0, 0, 1],
        [0, 7, 0, 0, 6, 0, 0, 4, 3],
        [0, 0, 6, 0, 0, 2, 0, 8, 7],
        [1, 9, 0, 0, 0, 7, 4, 0, 0],
        [0, 5, 0, 0, 8, 3, 0, 0, 0],
        [6, 0, 0, 0, 0, 0, 1, 0, 5],
        [0, 0, 3, 5, 0, 8, 6, 9, 0],
        [0, 4, 2, 9, 1, 0, 3, 0, 0]],

       [[0, 0, 4, 3, 0, 0, 2, 0, 9],
        [0, 0, 5, 0, 0, 9, 0, 0, 1],
        [0, 7, 0, 0, 6, 0, 0, 4, 3],
        [0, 0, 6, 0, 0, 2, 0, 8, 7],
        [1, 9, 0, 0, 0, 7, 4, 0, 0],
        [0, 5, 0, 0, 8, 3, 0, 0, 0],
        [6, 0, 0, 0, 0, 0, 1, 0, 5],
        [0, 0, 3, 5, 0, 8, 6, 9, 0],
        [0, 4, 2, 9, 1, 0, 3, 0, 0]],

       [[0, 0, 4, 3, 0, 0, 2, 0, 9],
        [0, 0, 5, 0, 0, 9, 0, 0, 1],
        [0, 7, 0, 0, 6, 0, 0, 4, 3],
        [0, 0, 6, 0, 0, 2, 0, 8, 7],
        [1, 9, 0, 0, 0, 7, 4, 0, 0],
        [0, 5, 0, 0, 8, 3, 0, 0, 0],
        [6, 0, 0, 0, 0, 0, 1, 0, 5],
        [0, 0, 3, 5, 0, 8, 6, 9, 0],
        [0, 4, 2, 9, 1, 0, 3, 0, 0

Now it's time to code! Complete the function `read_sudokus_from_csv`; it receives two parameters, the name of the `csv` file to read and an optional one `read_solutions`. If `read_solutions` is True, you're supposed to read from the second column (solutions) instead of empty puzzles. You can assume the following CSV structure:

```
quizzes,solutions
10084..,183048..
30018..,34196..
...
empty,solved
empty,solved
```

In [54]:
def read_sudokus_from_csv(filename, read_solutions=False):
    list_numbers = []
    with open(filename,"r") as file:
        next(file)
        for line in file.readlines():
            empty,solution = line.replace("\n","").split(",")
            if read_solutions:
                list_numbers.append(adapt_long_sudoku_line_to_array(solution))
            else:
                list_numbers.append(adapt_long_sudoku_line_to_array(empty))
            
    return np.array(list_numbers)

For this test we'll use the file `sudoku-micro.csv` that contains only 3 puzzles:

In [56]:
read_sudokus_from_csv('data/sudoku-micro.csv')

array([[[0, 0, 4, 3, 0, 0, 2, 0, 9],
        [0, 0, 5, 0, 0, 9, 0, 0, 1],
        [0, 7, 0, 0, 6, 0, 0, 4, 3],
        [0, 0, 6, 0, 0, 2, 0, 8, 7],
        [1, 9, 0, 0, 0, 7, 4, 0, 0],
        [0, 5, 0, 0, 8, 3, 0, 0, 0],
        [6, 0, 0, 0, 0, 0, 1, 0, 5],
        [0, 0, 3, 5, 0, 8, 6, 9, 0],
        [0, 4, 2, 9, 1, 0, 3, 0, 0]],

       [[0, 4, 0, 1, 0, 0, 0, 5, 0],
        [1, 0, 7, 0, 0, 3, 9, 6, 0],
        [5, 2, 0, 0, 0, 8, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 1, 7],
        [0, 0, 0, 9, 0, 6, 8, 0, 0],
        [8, 0, 3, 0, 5, 0, 6, 2, 0],
        [0, 9, 0, 0, 6, 0, 5, 4, 3],
        [6, 0, 0, 0, 8, 0, 7, 0, 0],
        [2, 5, 0, 0, 9, 7, 1, 0, 0]],

       [[6, 0, 0, 1, 2, 0, 3, 8, 4],
        [0, 0, 8, 4, 5, 9, 0, 7, 2],
        [0, 0, 0, 0, 0, 6, 0, 0, 5],
        [0, 0, 0, 2, 6, 4, 0, 3, 0],
        [0, 7, 0, 0, 8, 0, 0, 0, 6],
        [9, 4, 0, 0, 0, 3, 0, 0, 0],
        [3, 1, 0, 0, 0, 0, 0, 5, 0],
        [0, 8, 9, 7, 0, 0, 0, 0, 0],
        [5, 0, 2, 0, 0, 0, 1, 9, 0

In [52]:
expected = np.array([[[0, 0, 4, 3, 0, 0, 2, 0, 9],
        [0, 0, 5, 0, 0, 9, 0, 0, 1],
        [0, 7, 0, 0, 6, 0, 0, 4, 3],
        [0, 0, 6, 0, 0, 2, 0, 8, 7],
        [1, 9, 0, 0, 0, 7, 4, 0, 0],
        [0, 5, 0, 0, 8, 3, 0, 0, 0],
        [6, 0, 0, 0, 0, 0, 1, 0, 5],
        [0, 0, 3, 5, 0, 8, 6, 9, 0],
        [0, 4, 2, 9, 1, 0, 3, 0, 0]],

       [[0, 4, 0, 1, 0, 0, 0, 5, 0],
        [1, 0, 7, 0, 0, 3, 9, 6, 0],
        [5, 2, 0, 0, 0, 8, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 1, 7],
        [0, 0, 0, 9, 0, 6, 8, 0, 0],
        [8, 0, 3, 0, 5, 0, 6, 2, 0],
        [0, 9, 0, 0, 6, 0, 5, 4, 3],
        [6, 0, 0, 0, 8, 0, 7, 0, 0],
        [2, 5, 0, 0, 9, 7, 1, 0, 0]],

       [[6, 0, 0, 1, 2, 0, 3, 8, 4],
        [0, 0, 8, 4, 5, 9, 0, 7, 2],
        [0, 0, 0, 0, 0, 6, 0, 0, 5],
        [0, 0, 0, 2, 6, 4, 0, 3, 0],
        [0, 7, 0, 0, 8, 0, 0, 0, 6],
        [9, 4, 0, 0, 0, 3, 0, 0, 0],
        [3, 1, 0, 0, 0, 0, 0, 5, 0],
        [0, 8, 9, 7, 0, 0, 0, 0, 0],
        [5, 0, 2, 0, 0, 0, 1, 9, 0]]])

In [53]:
assert np.array_equal(read_sudokus_from_csv('data/sudoku-micro.csv'), expected)

Reading solutions:

In [57]:
read_sudokus_from_csv('data/sudoku-micro.csv', read_solutions=True)

array([[[8, 6, 4, 3, 7, 1, 2, 5, 9],
        [3, 2, 5, 8, 4, 9, 7, 6, 1],
        [9, 7, 1, 2, 6, 5, 8, 4, 3],
        [4, 3, 6, 1, 9, 2, 5, 8, 7],
        [1, 9, 8, 6, 5, 7, 4, 3, 2],
        [2, 5, 7, 4, 8, 3, 9, 1, 6],
        [6, 8, 9, 7, 3, 4, 1, 2, 5],
        [7, 1, 3, 5, 2, 8, 6, 9, 4],
        [5, 4, 2, 9, 1, 6, 3, 7, 8]],

       [[3, 4, 6, 1, 7, 9, 2, 5, 8],
        [1, 8, 7, 5, 2, 3, 9, 6, 4],
        [5, 2, 9, 6, 4, 8, 3, 7, 1],
        [9, 6, 5, 8, 3, 2, 4, 1, 7],
        [4, 7, 2, 9, 1, 6, 8, 3, 5],
        [8, 1, 3, 7, 5, 4, 6, 2, 9],
        [7, 9, 8, 2, 6, 1, 5, 4, 3],
        [6, 3, 1, 4, 8, 5, 7, 9, 2],
        [2, 5, 4, 3, 9, 7, 1, 8, 6]],

       [[6, 9, 5, 1, 2, 7, 3, 8, 4],
        [1, 3, 8, 4, 5, 9, 6, 7, 2],
        [7, 2, 4, 8, 3, 6, 9, 1, 5],
        [8, 5, 1, 2, 6, 4, 7, 3, 9],
        [2, 7, 3, 9, 8, 1, 5, 4, 6],
        [9, 4, 6, 5, 7, 3, 8, 2, 1],
        [3, 1, 7, 6, 9, 2, 4, 5, 8],
        [4, 8, 9, 7, 1, 5, 2, 6, 3],
        [5, 6, 2, 3, 4, 8, 1, 9, 7

In [58]:
expected = np.array([[[8, 6, 4, 3, 7, 1, 2, 5, 9],
        [3, 2, 5, 8, 4, 9, 7, 6, 1],
        [9, 7, 1, 2, 6, 5, 8, 4, 3],
        [4, 3, 6, 1, 9, 2, 5, 8, 7],
        [1, 9, 8, 6, 5, 7, 4, 3, 2],
        [2, 5, 7, 4, 8, 3, 9, 1, 6],
        [6, 8, 9, 7, 3, 4, 1, 2, 5],
        [7, 1, 3, 5, 2, 8, 6, 9, 4],
        [5, 4, 2, 9, 1, 6, 3, 7, 8]],

       [[3, 4, 6, 1, 7, 9, 2, 5, 8],
        [1, 8, 7, 5, 2, 3, 9, 6, 4],
        [5, 2, 9, 6, 4, 8, 3, 7, 1],
        [9, 6, 5, 8, 3, 2, 4, 1, 7],
        [4, 7, 2, 9, 1, 6, 8, 3, 5],
        [8, 1, 3, 7, 5, 4, 6, 2, 9],
        [7, 9, 8, 2, 6, 1, 5, 4, 3],
        [6, 3, 1, 4, 8, 5, 7, 9, 2],
        [2, 5, 4, 3, 9, 7, 1, 8, 6]],

       [[6, 9, 5, 1, 2, 7, 3, 8, 4],
        [1, 3, 8, 4, 5, 9, 6, 7, 2],
        [7, 2, 4, 8, 3, 6, 9, 1, 5],
        [8, 5, 1, 2, 6, 4, 7, 3, 9],
        [2, 7, 3, 9, 8, 1, 5, 4, 6],
        [9, 4, 6, 5, 7, 3, 8, 2, 1],
        [3, 1, 7, 6, 9, 2, 4, 5, 8],
        [4, 8, 9, 7, 1, 5, 2, 6, 3],
        [5, 6, 2, 3, 4, 8, 1, 9, 7]]])

In [59]:
assert np.array_equal(read_sudokus_from_csv('data/sudoku-micro.csv', read_solutions=True), expected)

### Identifying invalid solutions

There's another file, `sudoku-invalids.csv` that contains invalid solutions of Sudokus. Your job is to read the solutions, and return only the ones that are invalid.

In [61]:
def detect_invalid_solutions(filename):
    from sudoku import Board,is_valid
    puzzles = read_sudokus_from_csv(filename,read_solutions = True)
    list_invalids = []
    for puzzle in puzzles:
        puzzle_obj = Board(puzzle)
        if not is_valid(puzzle_obj):
            list_invalids.append(puzzle)
    return np.array(list_invalids)

In [67]:
detect_invalid_solutions('data/sudoku-invalids.csv')

array([[[1, 7, 6, ..., 4, 5, 9],
        [5, 3, 8, ..., 6, 7, 2],
        [4, 9, 2, ..., 1, 3, 8],
        ...,
        [8, 1, 3, ..., 7, 4, 5],
        [7, 4, 9, ..., 2, 1, 6],
        [2, 6, 5, ..., 8, 9, 3]],

       [[9, 9, 5, ..., 6, 1, 3],
        [8, 4, 3, ..., 9, 5, 7],
        [7, 1, 6, ..., 8, 2, 4],
        ...,
        [6, 3, 4, ..., 2, 9, 1],
        [1, 8, 7, ..., 3, 6, 5],
        [2, 5, 9, ..., 7, 4, 8]],

       [[5, 8, 5, ..., 2, 1, 7],
        [3, 2, 1, ..., 9, 5, 6],
        [6, 9, 7, ..., 4, 8, 3],
        ...,
        [4, 5, 8, ..., 7, 3, 2],
        [9, 7, 3, ..., 6, 4, 1],
        [2, 1, 6, ..., 5, 9, 8]],

       ...,

       [[9, 1, 9, ..., 7, 2, 6],
        [3, 6, 5, ..., 8, 9, 4],
        [4, 7, 2, ..., 5, 1, 3],
        ...,
        [8, 3, 6, ..., 2, 4, 1],
        [5, 4, 9, ..., 6, 8, 7],
        [7, 2, 1, ..., 3, 5, 9]],

       [[5, 4, 2, ..., 7, 9, 8],
        [7, 9, 1, ..., 2, 6, 3],
        [8, 3, 6, ..., 5, 1, 4],
        ...,
        [2, 5, 3, ..., 

In [69]:
assert len(detect_invalid_solutions('data/sudoku-invalids.csv')) == 13

## Time to test!

Now it's time to move your code to `sudoku.py` and then run all the tests; if they're passing, you can move to the next step!

In [79]:
!py.test test_part_4.py

platform darwin -- Python 3.7.4, pytest-5.2.2, py-1.8.0, pluggy-0.13.0
rootdir: /Users/santiagobasulto/code/rmotr/curriculum/sudoku-tests
collected 4 items                                                              [0m

test_part_4.py [32m.[0m[32m.[0m[32m.[0m[32m.[0m[36m                                                      [100%][0m

