# fastai Sudoku - One Solution with Notes

> "One of my solutions and some notes when using Sudoku to help learn fastai data preparation"

- toc: true
- branch: master
- badges: true
- comments: true
- author: Craig Stanton
- hide: false
- categories: [fastai]

Use a Sudoku puzzle to learn more about the fastai `DataBlock`, `Datasets`, `DataLoaders` and `TfmdDL` objects.

Why even bother with this? Jeremy's advice is to build models as quick as you can. However I find myself tripping at the first hurdle many times - the data preparation stage. It doesn't matter how many times I read about `DataBlock`s and `DataLoaders`, there is no substitute for actually using the libraries. So why not outline a game that requires you to build and solve the puzzle by using the same tools that you need to structure the data for a fastai `Learner`

This is a solution notebook for the fastai Sudoku notebook

In [3]:
!pip install py-sudoku
from fastai.text.all import *
from fastai.vision.all import *
from sudoku import Sudoku
from functools import wraps

Collecting py-sudoku
  Downloading py_sudoku-1.0.2-py3-none-any.whl (7.0 kB)
Installing collected packages: py-sudoku
Successfully installed py-sudoku-1.0.2


In [23]:
class FastSudoku:
    """
    Learn how to use the fastai DataBlock, Datasets, and DataLoaders transforms and callbacks by creating and solving a sudoku puzzle
    """
    
    def __init__(self, difficulty: float = 0.25, data_dir: Path = Path(".")):
        self.puzzle = Sudoku(3).difficulty(difficulty)
        self.solved = self.puzzle.solve().board
        pd.DataFrame(self.puzzle.board).to_csv(data_dir/"fastsudoku.csv", index=False)
        
    def __repr__(self):
        return f"Puzzle created"
    
    @staticmethod
    def np2list(x): return None if np.isnan(x) else int(x)
    
    def check(self, dls):
        """
        Unpack the dataloaders output, convert to int and str
        """
        holder = []
        for dl in dls:
            holder+=dl
        self.preds = [list(map(self.np2list,j)) for j in [i[0] for i in holder]]
        print([self.solved[i] == x for i, x in enumerate(self.preds)])
        if all(j for j in [self.solved[i] == x for i, x in enumerate(self.preds)]):
            
            print("Yes you are a fastai...and sudoku...whiz!")
        else:
            print("Try again!")

In [24]:
fs = FastSudoku(0.1)

In [45]:
fs.puzzle.show()

+-------+-------+-------+
| 7 3 4 | 9 2 6 | 5 1 8 |
| 1 8 9 | 3 4 5 | 2 6 7 |
| 2 6 5 |   7 8 | 3 4 9 |
+-------+-------+-------+
| 5 7 3 |   6 9 | 4 2 1 |
| 6 2 1 | 7 3 4 |   9 5 |
|     8 | 2 5 1 |   7 3 |
+-------+-------+-------+
| 3 4 6 | 5 1 7 | 9 8 2 |
| 8 1 2 | 6 9 3 |   5   |
| 9 5 7 | 4 8 2 | 1 3 6 |
+-------+-------+-------+



### Challenge 1 - Datasets

**Instructions**:

* Grab the `fastsudoku.csv` puzzle and create a `DataBlock`
* You should try and use *as many* of the functions below as DataBlock arguments - not necessarily all, but as many as you can/wish. The point of this is not to be the most efficient or practical way of creating a `DataBlock` but rather to understand what each function argument does.

*Hints:*
1. The `y` values are the row indices. They are not dependent variables as they normally are, but rather are a tool to help when processing batches (processed out of sequential order)

*Tips*:
1. Don't be afraid to comment out lines to see how the absence of functions changes the output
2. Use print statements 

In [26]:
def get_items(a):
    df = pd.read_csv(a)
    df["y"] = df.index.to_list()
    return df

def get_x(a):
    return a.to_list()[:-1]

def get_y(a):
    return a["y"]
    

#################
# Defined for you
#################

def splitter(a):
    # In this exercise, we don't need train and validation sets
    # But never forget about them because theyre so important!
    return [list(range(9)),]

In [27]:
# why no blocks when using the DataBlock class?
# check out lines XX in REPO - you'll see that fastai-delivered blocks are just "holders" for
# transforms that end up being merged with transforms specified below

dblock = DataBlock(
    get_items=get_items,
    get_x=get_x,
    get_y=get_y,
    splitter=splitter
)

# why no batch or item_tfms? Because we are creating a Dataset
# if you look https://github.com/fastai/fastai/blob/master/fastai/data/block.py
# lines 142-160, you see that item and batch transforms are only ever called in the dataloaders method

Check format - note we only want the `x` values

In [28]:
dsets = dblock.datasets("fastsudoku.csv");

In [44]:
fs.puzzle.show()

dsets.items

+-------+-------+-------+
| 7 3 4 | 9 2 6 | 5 1 8 |
| 1 8 9 | 3 4 5 | 2 6 7 |
| 2 6 5 |   7 8 | 3 4 9 |
+-------+-------+-------+
| 5 7 3 |   6 9 | 4 2 1 |
| 6 2 1 | 7 3 4 |   9 5 |
|     8 | 2 5 1 |   7 3 |
+-------+-------+-------+
| 3 4 6 | 5 1 7 | 9 8 2 |
| 8 1 2 | 6 9 3 |   5   |
| 9 5 7 | 4 8 2 | 1 3 6 |
+-------+-------+-------+



Unnamed: 0,0,1,2,3,4,5,6,7,8,y
0,7.0,3.0,4,9.0,2,6,5.0,1,8.0,0
1,1.0,8.0,9,3.0,4,5,2.0,6,7.0,1
2,2.0,6.0,5,,7,8,3.0,4,9.0,2
3,5.0,7.0,3,,6,9,4.0,2,1.0,3
4,6.0,2.0,1,7.0,3,4,,9,5.0,4
5,,,8,2.0,5,1,,7,3.0,5
6,3.0,4.0,6,5.0,1,7,9.0,8,2.0,6
7,8.0,1.0,2,6.0,9,3,,5,,7
8,9.0,5.0,7,4.0,8,2,1.0,3,6.0,8


### Challenge 2 - DataLoader Sudoku

**Instructions**:

* Use the `DataLoaders` callbacks to modify the `Datasets` you just created to solve the Sudoku board
* Test your DataLoaders object against the puzzle, you can use the `fs.check(dls)` method
* You should try and use *as many* of the callback functions below - not necessarily all, but as many as you can/wish. The point of this is not to be the most efficient or practical way of creating a `DataLoaders` but rather to understand what each function argument does.

*Tips*:
1. Don't be afraid to comment out lines to see how the absence of functions changes the output
2. Use print statements 

The Sudoku puzzle is below:

In [33]:
fs.puzzle.board = saved_board

In [34]:
fs.puzzle.show()

+-------+-------+-------+
| 7 3 4 | 9 2 6 | 5 1 8 |
| 1 8 9 | 3 4 5 | 2 6 7 |
| 2 6 5 |   7 8 | 3 4 9 |
+-------+-------+-------+
| 5 7 3 |   6 9 | 4 2 1 |
| 6 2 1 | 7 3 4 |   9 5 |
|     8 | 2 5 1 |   7 3 |
+-------+-------+-------+
| 3 4 6 | 5 1 7 | 9 8 2 |
| 8 1 2 | 6 9 3 |   5   |
| 9 5 7 | 4 8 2 | 1 3 6 |
+-------+-------+-------+



In [41]:
def s(p,r,c,v):
    """
    Helper function that takes non-zero indexed row and column and inserts v into puzzle p
    """
    for i in p:
        if i[1] == r-1:
            i[0][c-1] = v
    return p
    

def before_iter():
    """
    No clue what this is for
    """
    return a

def after_item(a):
    """
    Gets each item from the dataset
    """
    return a

def before_batch(a):
    """
    Takes a list of items of length batch-size
    """
    a = s(a,3,4,1)
    a = s(a,4,4,8)
    a = s(a,5,7,8)
    a = s(a,6,7,6)
    a = s(a,6,1,4)
    a = s(a,6,2,9)
    return a

def after_iter():
    pass

def create_batches(a):
    return a

def create_item(a):
    return np.array(a)

def create_batch(a):
    """
    Generally calls a collate function. 
    
    If just returning a, it overrides any collation
    """
    return a

def after_batch(a):
    """
    The actual batch
    """
    a = s(a,8,9,4)
    a = s(a,8,7,7)
    return a

In [42]:
dls = TfmdDL(
    dsets,
    bs=2,   # keep as 2 for this exercise
    # before_iter=before_iter,
    after_item=after_item,
    before_batch=before_batch,
    # after_iter=after_iter,
    # create_item=create_item,
    create_batch=create_batch,
    after_batch=after_batch,
    # create_batches=create_batches
)

This is an incomplete solution as the puzzle changes after each run

In [43]:
fs.check(dls)

[True, True, True, True, True, True, True, True, True]
Yes you are a fastai...and sudoku...whiz!


## Notes

Below are the notes and summaries that I picked up on as part of doing this Sudoku exercise.

### ELI5

##### General fastai 
* Everything before a fastai `Learner` is just **data preparation** to the correct format that a Pytorch model can interpret (in other words, pre-learner tasks are just ETL steps)

##### DataLoaders
* Output a **list of tuples**
    * where for each item in the list, the first element of the tuple is a single `x` value (independent variable), and the second element of the tuple is the `y` value (the dependent variable)
        * In creating a `DataLoaders` object, we apply transforms to the data either as we are fetching the item (called `item_tfms`) or after the batch is collated (called `batch_tfms`)
    * Note - when batched, all Xs and all Ys are **stacked into a single typle**
    
##### TfmdDL
* Also referred to as `dl_type`, this inherits all of the transforms and callbacks applied to the `DataLoaders` object and applies them to an iterable.
* Under the hood, this is the major class that ultimately prepares the data for use in the `Learner`

##### Collate
* I never understood this word even though I saw it everywhere. `fa_collate` and the Pytorch `default_collate` are the actual functions that create a batch. If they aren't applied, each "batch" would just be one tupe, where the first argument is a single input and the second argument is a single target. 
    * When these functions are used, they "Puts each data field into a tensor with outer dimension batch size". In other words, they stack items of batches together - one stack for inputs, one stack for targets
    * These functions are what are called if you do not specify the `create_batch` callback
        
##### DataBlock
* Prepackaged transforms for the most common types of data transformations in deep learning

##### Transform
* Converting the data (inputs/x and targets/y) into a format the computer understands and can perform matrix math on - tensors
    * For deep learning, a transform directly or indirectly (via a Pipeline) converts a piece of data into a tensor

In [105]:
??DataLoaders

[0;31mInit signature:[0m [0mDataLoaders[0m[0;34m([0m[0;34m*[0m[0mloaders[0m[0;34m,[0m [0mpath[0m[0;34m=[0m[0;34m'.'[0m[0;34m,[0m [0mdevice[0m[0;34m=[0m[0;32mNone[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m        
[0;32mclass[0m [0mDataLoaders[0m[0;34m([0m[0mGetAttr[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;34m"Basic wrapper around several `DataLoader`s."[0m[0;34m[0m
[0;34m[0m    [0m_default[0m[0;34m=[0m[0;34m'train'[0m[0;34m[0m
[0;34m[0m    [0;32mdef[0m [0m__init__[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m*[0m[0mloaders[0m[0;34m,[0m [0mpath[0m[0;34m=[0m[0;34m'.'[0m[0;34m,[0m [0mdevice[0m[0;34m=[0m[0;32mNone[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m        [0mself[0m[0;34m.[0m[0mloaders[0m[0;34m,[0m[0mself[0m[0;34m.[0m[0mpath[0m [0;34m=[0m [0mlist[0m[0;34m([0m[0mloaders[0m[0;34m)[0m[0;34m,[0m[0mPath[0m[0;34m([0m[0mpath[0m[0;34m)[0m[0;34m

## DataBlock

fastai link: https://docs.fast.ai/data.block.html

A `DataBlock` is the quickest way to create a `DataLoaders` object; it is the most *abstracted* class from pure Pytorch. It should be used first when there is not much customization needed.

Remember - blocks are just **pre-packaged transforms**; they exist for the most common types of ML tasks (ie. `CategoryBlock`, `ImageBlock`)

### Blocks

We just said blocks are pre-packaged transforms. What does this really mean?

Let's look at 2 common blocks: `ImageBlock` and `CategoryBlock`

In [8]:
CategoryBlock??

[0;31mSignature:[0m [0mCategoryBlock[0m[0;34m([0m[0mvocab[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0msort[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m [0madd_na[0m[0;34m=[0m[0;32mFalse[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m   
[0;32mdef[0m [0mCategoryBlock[0m[0;34m([0m[0mvocab[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0msort[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m [0madd_na[0m[0;34m=[0m[0;32mFalse[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;34m"`TransformBlock` for single-label categorical targets"[0m[0;34m[0m
[0;34m[0m    [0;32mreturn[0m [0mTransformBlock[0m[0;34m([0m[0mtype_tfms[0m[0;34m=[0m[0mCategorize[0m[0;34m([0m[0mvocab[0m[0;34m=[0m[0mvocab[0m[0;34m,[0m [0msort[0m[0;34m=[0m[0msort[0m[0;34m,[0m [0madd_na[0m[0;34m=[0m[0madd_na[0m[0;34m)[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mFile:[0m      /opt/conda/lib/python3.7/site-packages/fastai/data/block.py
[0;31mType:[

In [9]:
ImageBlock??

[0;31mSignature:[0m [0mImageBlock[0m[0;34m([0m[0mcls[0m[0;34m=[0m[0;34m<[0m[0;32mclass[0m [0;34m'fastai.vision.core.PILImage'[0m[0;34m>[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m   
[0;32mdef[0m [0mImageBlock[0m[0;34m([0m[0mcls[0m[0;34m=[0m[0mPILImage[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;34m"A `TransformBlock` for images of `cls`"[0m[0;34m[0m
[0;34m[0m    [0;32mreturn[0m [0mTransformBlock[0m[0;34m([0m[0mtype_tfms[0m[0;34m=[0m[0mcls[0m[0;34m.[0m[0mcreate[0m[0;34m,[0m [0mbatch_tfms[0m[0;34m=[0m[0mIntToFloatTensor[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mFile:[0m      /opt/conda/lib/python3.7/site-packages/fastai/vision/data.py
[0;31mType:[0m      function


This is interesting. Notice how there are no class methods - the only thing this class does is **store transforms** as attributes.

Now take a look back at `ImageBlock` and `CategoryBlock` - these are both **functions** and not classes (despite them using class formatting)

To recap:
We know that blocks are to store transforms, and they all subclass `TransformBlock`. We have seen they are the first argument (generally) of the DataBlock API.

> Note this term "DataBlock API" confused me for a while - in fact for a long time the term API in general caused confusion. To me, API in this sense just means a *callable* (function, class, url) that *abstracts* more complex code

In [11]:
DataBlock??

[0;31mInit signature:[0m
[0mDataBlock[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mblocks[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdl_type[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mgetters[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mn_inp[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mitem_tfms[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mbatch_tfms[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mget_items[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msplitter[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mget_y[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mget_x[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m  

The `L()` object have the `attrgot` method with is how fastai can extract class atributes from a `TransformBlock`. This took me forever to understand

In [None]:
dblock = DataBlock(
    get_items=split_xy,    # receives the entire iterable or path; used to fetch/parse the data and return an iterable
    get_x=get_x,           # get_x and get_y both receive a single item from the output of the get_items function
    get_y=get_y,           # get_x and get_y must return a tensor, array, list, dict
    item_tfms=item_tfms,
    batch_tfms=batch_tfms,
)

## Next Steps

If I can find proper definitions for each of these functions, then I would like to compare someone's response to these definitions and use a model to see how similar their explanations are