# fastai Sudoku - One Solution with Notes

> "One of my solutions and some notes when using Sudoku to help learn fastai data preparation"

- toc: true
- branch: master
- badges: true
- comments: true
- author: Craig Stanton
- hide: false
- categories: [fastai]

Use a Sudoku puzzle to learn more about the fastai `DataBlock`, `Datasets`, `DataLoaders` and `TfmdDL` objects.

Why even bother with this? Jeremy's advice is to build models as quick as you can. However I often find myself tripping at the first hurdle many times - **the data preparation stage**. It doesn't matter how many times I read about `DataBlock`, `TfmdDL`, and `DataLoaders`, there is no substitute for actually using the libraries. So why not outline a game that requires you to build and solve the puzzle by using the same tools that you need to structure the data for a fastai `Learner` (which is where the actual training magic happens).

Additionally, I really struggle at times to read the fastai code - it contains so many Python tricks that I am not always familiar with. Therefore this type of exercise forces me to interrogate the code in order to write working DataBlock functions.

> This is a solution notebook for the fastai Sudoku notebook

## Why Sudoku?

Besides the fact that most people know the game, I needed something that was 2-dimensional because that is how our raw ML data is usually represented (even if we are working with images, its helpful to *think* of the input data in a tabular structure, where each training item is represented as a row, and one column is the independent variable and while another is the dependent variable).

### Setup

In [77]:
#@title
!pip install py-sudoku==1.0.3
from fastai.text.all import *
from fastai.vision.all import *
from sudoku import Sudoku
from functools import wraps
from typing import Union, Iterable
from collections.abc import Collection
import ipywidgets as widgets
import requests, pprint
from IPython.display import HTML

class FastSudoku:
    """
    Learn how to use the fastai DataBlock, Datasets, TfmdDL, and DataLoaders transforms and callbacks by creating and solving a sudoku puzzle
    """
    
    def __init__(self, difficulty: float = 0.25, seed: int = 527, data_dir: Path = Path(".")):
        self.puzzle = Sudoku(3, seed=seed).difficulty(difficulty)
        self.solved = self.puzzle.solve().board
        pd.DataFrame(self.puzzle.board).to_csv(data_dir/"raw_data.csv", index=False)
        
    def __repr__(self):
        return f"Puzzle created"
    
    @staticmethod
    def np2list(x): return None if np.isnan(x) else int(x)
    
    def check(self, dls):
        """
        Unpack the dataloaders output, convert to int and str
        """
        holder = []
        for dl in dls:
            holder+=dl
        self.preds = [list(map(self.np2list,j)) for j in [i[0] for i in holder]]
        if all(j for j in [self.solved[i] == x for i, x in enumerate(self.preds)]):
            
            print("\n\nYes you are a fastai...and sudoku...whiz!\n\n")
        else:
            print("\n\nNot quite. Check out your puzzle below and try again!\n\n")
            # print the current board with any guesses
            Sudoku(3, 3, board=self.preds).show()

fast_sudoku_answers = requests.get("https://gist.githubusercontent.com/stantonius/ca95d88dcd0085b12a302f64b326caf8/raw/68b676048138fc8096157176b411b677a98a34ec/fast_sudoku_answers.json")

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


### Getting Started

The first thing to do is to set the puzzle **difficulty** (a float between 0 and 1) and the **seed** (any integer). Note that the seed ensures the reproducability of the same Sudoku board - therefore, in order to practice a second time you will want to change this value)

In [78]:
fs = FastSudoku(difficulty = 0.2, seed = 92)

In [79]:
fs.puzzle.show()

+-------+-------+-------+
| 5   8 | 6 7 1 | 3 9 4 |
| 6 9 3 |     8 | 1   7 |
| 4 1 7 | 5 3   | 2 8 6 |
+-------+-------+-------+
| 8   5 | 4   3 | 9     |
| 1 6 9 |   2 5 | 8 4 3 |
| 2 3 4 |   8 6 |   7 1 |
+-------+-------+-------+
| 7 5 1 | 8 6 2 | 4 3 9 |
|   4 2 | 3 5 7 |       |
| 3 8 6 | 1 9 4 | 7 2 5 |
+-------+-------+-------+



### Challenge 1 - Create a `Datasets` object using the `DataBlock` API

In the current directory we have a generated file `raw_data.csv` which contains the unsolved Sudoku puzzle data. 

**Objective**: Using the `DataBlock` API, pass as many function arguments (ie. `get_items`, `batch_tfms`, `get_y`, etc.) as possible so that you create a `Datasets` object that contains the original Sudoku values *along with an additional `y-value` column*.  

**Instructions**:

* Grab the `raw_data.csv` puzzle and create a `DataBlock`
* You should try and use *as many* of the functions below as DataBlock arguments - not necessarily all, but as many as you can/wish. The point of this is not to be the most efficient or practical way of creating a `DataBlock` but rather to *understand what each argument function argument does*.

*Tips*:
1. Don't be afraid to comment out lines to see how the absence of functions changes the output
2. Use print statements to track the output of each function
3. Read the fastai code for the functions and classes you are unfamiliar with.

**Question**: the objective states to create a new `y-value` column. What should this column contain? Why? 

*Click on the `Show answer` buttons below to reveal the answers*

In [80]:
#@title
button = widgets.Button(description="Show answer")
output = widgets.Output()

def show_answer(b):
    with output:
        print(fast_sudoku_answers.json()["y_col"])

button.on_click(callback=show_answer)
display(button, output)

Button(description='Show answer', style=ButtonStyle())

Output()

In [81]:
#@title
button = widgets.Button(description="Show answer")
output = widgets.Output()

def show_answer(b):
    with output:
        print(fast_sudoku_answers.json()["why_y"])

button.on_click(callback=show_answer)
display(button, output)

Button(description='Show answer', style=ButtonStyle())

Output()

#### Define our `DataBlock` functional arguments

In [82]:
def get_items(raw_data_path: Union[str, Path]) -> pd.DataFrame:
    """
    `get_items()` main purpose is to **fetch the raw data** so that additional 
    getters and transformations can be applied to the data.

    Notice how here we actually modify the data by appending our `y` index column.  
    """
    df = pd.read_csv(raw_data_path)
    df["y"] = df.index.to_list()
    return df

def mock_item_tfm(o: Any) -> Any:
    """
    Prints a statement just to show when this item_tfms is called. Confirmed that
    this is not called because we are creating our DataLoaders unconventionally
    """
    print("item_tfms supplied to the DataBlock API has been called")
    return o

def get_x(a): # -> Iterator (colab is Python 3.7 which doesn't like this)
    """
    The fact this outputs an Iterable, where each item in the iterable is a 
    single `x`-value, is important to realise. 
    
    Why is it important that this function outputs an iterable? Well, what can 
    you do with 2 equal length iterables? 
    `zip` them together - which is what happens is the `DataLoaders`
    """
    return a.to_list()[:-1]

def get_y(a): # -> Iterator (colab is Python 3.7 which doesn't like this)
    """
    Again just like in `get_x`, the fact this outputs an iterable is 
    the main takeaway here.
    """
    print(f"get_y value is {a['y']} \n")
    return a["y"]
    

#################
# Predefined
#################

def splitter(a):
    # In this exercise, we don't need train and validation sets
    # Therefore this is effectively a dummy function in this specific circumstance
    # But never forget about Splitter because it is such an important concept!
    return [list(range(9)),]

In [83]:
dblock = DataBlock(
    # blocks=[TransformBlock, CategoryBlock],
    get_items=get_items,
    get_x=get_x,
    get_y=get_y,
    splitter=splitter
)

**Question**: why did we not need to provide any `blocks` arguments?

In [84]:
#@title
button = widgets.Button(description="Show answer")
output = widgets.Output()

def show_answer(b):
    with output:
        print(fast_sudoku_answers.json()["no_blocks"])

button.on_click(callback=show_answer)
display(button, output)

Button(description='Show answer', style=ButtonStyle())

Output()

**Question**: why did we not supply any `batch_tfms` or `item_tfms`?

In [85]:
#@title
button = widgets.Button(description="Show answer")
output = widgets.Output()

def show_answer(b):
    with output:
        print(fast_sudoku_answers.json()["no_tfms"])

button.on_click(callback=show_answer)
display(button, output)

Button(description='Show answer', style=ButtonStyle())

Output()

#### Check our Dataset

In [86]:
dsets = dblock.datasets("raw_data.csv");

get_y value is 0.0 



In [87]:
fs.puzzle.show()

dsets.items

+-------+-------+-------+
| 5   8 | 6 7 1 | 3 9 4 |
| 6 9 3 |     8 | 1   7 |
| 4 1 7 | 5 3   | 2 8 6 |
+-------+-------+-------+
| 8   5 | 4   3 | 9     |
| 1 6 9 |   2 5 | 8 4 3 |
| 2 3 4 |   8 6 |   7 1 |
+-------+-------+-------+
| 7 5 1 | 8 6 2 | 4 3 9 |
|   4 2 | 3 5 7 |       |
| 3 8 6 | 1 9 4 | 7 2 5 |
+-------+-------+-------+



Unnamed: 0,0,1,2,3,4,5,6,7,8,y
0,5.0,,8,6.0,7.0,1.0,3.0,9.0,4.0,0
1,6.0,9.0,3,,,8.0,1.0,,7.0,1
2,4.0,1.0,7,5.0,3.0,,2.0,8.0,6.0,2
3,8.0,,5,4.0,,3.0,9.0,,,3
4,1.0,6.0,9,,2.0,5.0,8.0,4.0,3.0,4
5,2.0,3.0,4,,8.0,6.0,,7.0,1.0,5
6,7.0,5.0,1,8.0,6.0,2.0,4.0,3.0,9.0,6
7,,4.0,2,3.0,5.0,7.0,,,,7
8,3.0,8.0,6,1.0,9.0,4.0,7.0,2.0,5.0,8


### Challenge 2 - Use `TfmdDL` to solve the Sudoku puzzle

> Why are we using `TfmdDL` here? If you look into the fastai code, any time you create a `DataLoaders`, the `TfdmDL` class is ultimately called.

**Objective:** Use the `TfmdDL` callbacks to modify the `Datasets` you just created to solve the Sudoku board

**Instructions**:
* Test your DataLoaders object against the puzzle, you can use the `fs.check(dls)` method
* You should try and use *as many* of the callback functions below - not necessarily all, but as many as you can/wish. The point of this is not to be the most efficient or practical way of creating a `DataLoaders` but rather to understand what each function argument does.

*Tips*:
1. Use **helper function(s)** that are called within other functions to insert your Sudoku responses into the row
2. Don't be afraid to comment out lines to see how the absence of functions changes the output
3. Use print statements to track the output of each function
4. Read the fastai code for the functions and classes you are unfamiliar with.

In [88]:
# show the Sudoku puzzle again
fs.puzzle.show()

+-------+-------+-------+
| 5   8 | 6 7 1 | 3 9 4 |
| 6 9 3 |     8 | 1   7 |
| 4 1 7 | 5 3   | 2 8 6 |
+-------+-------+-------+
| 8   5 | 4   3 | 9     |
| 1 6 9 |   2 5 | 8 4 3 |
| 2 3 4 |   8 6 |   7 1 |
+-------+-------+-------+
| 7 5 1 | 8 6 2 | 4 3 9 |
|   4 2 | 3 5 7 |       |
| 3 8 6 | 1 9 4 | 7 2 5 |
+-------+-------+-------+



In [92]:
#################
# HELPER FUNCTION
#################

def s(puzzle,row,col,value):
    """
    Helper function that takes non-zero indexed row and column and inserts value into puzzle.
    """
    for i in puzzle:
        if i[1] == row-1:
            i[0][col-1] = value
    return puzzle
    
###########################
# TfmdDL FUNCTION ARGUMENTS
###########################

def before_iter():
    """
    TODO: Still not quite sure what this function does or when I would use it
    """
    # pass
    return print("Hello")

def after_item(a):
    """
    Receives **each** item **before a batch is created**.
    This includes the `x` and `y` values **individually** after the `get_x` and `get_y`
    are applied.

    Crucially, this is where **`batch_tfms`** can be applied

    For example, if the first row in our Sudoku dataset is `[5.0, nan, 8.0, 6.0, 7.0, 1.0, 3.0, 9.0, 4.0, 0]`,
    where the last item is our `y-value`, then this function will receive
    `[5.0, nan, 8.0, 6.0, 7.0, 1.0, 3.0, 9.0, 4.0]` and also (separately) receives `0`
    """
    print(f"after_item called on {a} \n")
    return a

def before_batch(a):
    """
    Receives a list of **tuples** of length batch-size. But beware - this is not the actual batch
    because there is no shuffling done here.

    Another key to realise here is that we are already dealing with tuples here,
    where the first tuple item is `x` (as defined by `get_x` or `getters`) and 
    the second item is `y`.

    For example, if our batch size is 3, this function receives the 3 dataset rows. 
    [([5.0, nan, 8.0, 6.0, 7.0, 1.0, 3.0, 9.0, 4.0], 0), ([6.0, 9.0, 3.0, nan, nan, 8.0, 1.0, nan, 7.0], 1), ([4.0, 1.0, 7.0, 1, 3.0, nan, 2.0, 8.0, 6.0], 2)]

    **`batch_tfms`** can be applied at this stage
    """
    print(f"before_batch called on {a} \n")
    a = s(a,1,2,2.0)
    a = s(a,4,2,7.0)
    a = s(a,7,1,7.0)
    a = s(a,4,5,1.0)
    a = s(a,5,4,7.0)
    a = s(a,6,4,9.0)
    a = s(a,6,7,5.0)
    return a

def after_iter(cls_name):
    """
    Does not usually take any arguments. A function that normally would have
    access to `self`.

    Unsure of what the use case for this might be 
    """
    # absolute hack using globals
    puzzle = [row[0] for row in globals()[cls_name]]
    puzzle_sorted = sorted(puzzle, key=lambda tup: tup[1])
    xs = [float(x[0]) for x in puzzle_sorted]
    print(f"After_iter {xs}")
    # pass
    return puzzle

def create_batches(a):
    """
    Receives a **generator** that contains the **indices** of items to be included in each batch
    """
    print(f"create_batches called on {next(iter(a))} \n")
    return a

def create_item(a):
    """
    Passed the **indices** of the items that will form the batch.
    It is therefore called **before** a batch is created (ie. before `before_batch`)
    Therefore if `shuffle=True`, it will recieve random indices
    """
    print(f"create_item called on {a} \n")
    return a

def custom_collate(a):
    """
    Generally, a collate function **reshapes** the data from a list of tuples 
    (where the first item in the tuple is the `x` value, the second is the `y`)
    to an *iterable* (ie. list, tensor, or some other collection) **of tensors**
    (where the first tensor is **all of the `x` values in the batch**, and the
    second is all of the `y` values)

    For example, pulled directly from the torch source code:
    >>> default_collate([(0, 1), (2, 3)])
    [tensor([0, 2]), tensor([1, 3])]

    Notice how the batch above is "reshuffled"

    Why might you create a custom collate function? The easiest example is in NLP
    when you want to pad sequences to be the length of the longest item in the batch
    """
    # Note this is a very poor implementation; just for learning purposes
    # It does not convert the items to a tensor
    return list(zip(*a))

def create_batch(a):
    """
    Generally calls a collate function. By supplying this argument, we should 
    have come collate function run here
    """
    # print(f"create_batch called on {custom_collate(a)} \n")
    # return custom_collate(a)
    return a

def after_batch(a):
    """
    The actual batch output. Any final **batch transformations** can be applied following
    the `collate_fn` is applied in `create_batch` (ie. if your transforms needed
    to take into consideration any changes to the batch during collate, then you would
    apply these here)
    """
    a = s(a,8,7,6.0)
    a = s(a,8,1,9.0)
    a = s(a,2,4,2.0)
    a = s(a,2,5,4.0)
    a = s(a,3,6,9.0)
    a = s(a,2,8,5.0)
    a = s(a,4,9,2.0)
    a = s(a,4,8,6.0)
    a = s(a,8,9,8.0)
    a = s(a,8,8,1.0)
    return a

In [93]:
dls = TfmdDL(
    dsets,
    bs=2,   # change this value to see its effects
    # before_iter=before_iter,
    after_item=after_item,
    before_batch=before_batch,
    # after_iter=partial(after_iter, "dls"),
    # create_item=create_item,
    create_batch=create_batch,
    after_batch=after_batch,
    # create_batches=create_batches,
    shuffle=False # change to see the impact
)


Use the `fs.check()` function to check your answers.

In [94]:
fs.check(dls)

get_y value is 0.0 

get_y value is 2.0 

after_item called on [5.0, nan, 8.0, 6.0, 7.0, 1.0, 3.0, 9.0, 4.0] 
after_item called on [4.0, 1.0, 7.0, 5.0, 3.0, nan, 2.0, 8.0, 6.0] 

after_item called on 0.0 

get_y value is 1.0 


after_item called on 2.0 
after_item called on [6.0, 9.0, 3.0, nan, nan, 8.0, 1.0, nan, 7.0] 

after_item called on 1.0 

before_batch called on [([5.0, nan, 8.0, 6.0, 7.0, 1.0, 3.0, 9.0, 4.0], 0.0), ([6.0, 9.0, 3.0, nan, nan, 8.0, 1.0, nan, 7.0], 1.0)] 

get_y value is 4.0 


after_item called on [1.0, 6.0, 9.0, nan, 2.0, 5.0, 8.0, 4.0, 3.0] 
get_y value is 3.0 


after_item called on [8.0, nan, 5.0, 4.0, nan, 3.0, 9.0, nan, nan] 
after_item called on 4.0 


after_item called on 3.0 
get_y value is 5.0 


before_batch called on [([4.0, 1.0, 7.0, 5.0, 3.0, nan, 2.0, 8.0, 6.0], 2.0), ([8.0, nan, 5.0, 4.0, nan, 3.0, 9.0, nan, nan], 3.0)] 
after_item called on [2.0, 3.0, 4.0, nan, 8.0, 6.0, nan, 7.0, 1.0] 


after_item called on 5.0 
get_y value is 6.0 


before_ba

# Notes

Below are the notes and summaries that I picked up on as part of doing this Sudoku exercise.

##### General fastai 
* When first starting a fastai project, its critical to understand that everything leading up to creating a fastai `Learner` is just **data preparation** to the correct format that a Pytorch model can interpret (in other words, pre-learner tasks are just ETL steps). This may already be obvious to many, but for some reason it took ages to get through my head.

## DataLoaders


* Their sole objective is to output a **list of tuples**
    * where for each item in the list, the first element of the tuple is a single `x` value (independent variable), and the second element of the tuple is the `y` value (the dependent variable)
        * In creating a `DataLoaders` object, we apply transforms to the data either as we are fetching the item (called `item_tfms`) or after the batch is collated (called `batch_tfms`)
    * Note - when batched, all Xs and all Ys are **stacked into a single tuple**
    
##### TfmdDL - why did we use this class?
* Inherits all of the transforms and callbacks applied to the `DataLoaders` object and applies them to an iterable.
    * In theory, you could prevent fastai from using this class when creating a `DataLoaders` object by suppling an alternative class to the `dl_type` argument.
* Under the hood, **this is the major class in fastai data preparation** that ultimately structures the data for use in the `Learner`

##### Collate
* I never understood this word even though I saw it everywhere. `fa_collate` and the Pytorch `default_collate` are the actual functions that create a batch. If they aren't applied, each "batch" would just be one tuple, where the first argument is a single input and the second argument is a single target. 
    * When these functions are used, they "Puts each data field into a tensor with outer dimension batch size". In other words, they stack items of batches together - one stack for inputs, one stack for targets
    * These functions are what are called if you do not specify the `create_batch` callback
        
##### DataBlock
* Prepackaged transforms for the most common types of data transformations in deep learning

##### Transform
* Converting the data (inputs/x and targets/y) into a format the computer understands and can perform matrix math on - tensors
    * For deep learning, a transform directly or indirectly (via a **`Pipeline`**) converts a piece of data into a tensor

In [16]:
??DataLoaders

## DataBlock

fastai link: https://docs.fast.ai/data.block.html

A `DataBlock` is the quickest way to create a `DataLoaders` object; it is the most *abstracted* class from pure Pytorch. It should be used first when there is not much customization needed.

Remember - blocks are just **pre-packaged transforms**; they exist for the most common types of ML tasks (ie. `CategoryBlock`, `ImageBlock`)

### Blocks

We just said blocks are pre-packaged transforms. What does this really mean?

Let's look at 2 common blocks: `ImageBlock` and `CategoryBlock`

In [17]:
CategoryBlock??

In [18]:
ImageBlock??

This is interesting. Notice how there are no class methods - the only thing this class does is **store transforms** as attributes.

Now take a look back at `ImageBlock` and `CategoryBlock` - these are both **functions** and not classes (despite them using class formatting)

To recap:
We know that blocks are to store transforms, and they all subclass `TransformBlock`. We have seen they are the first argument (generally) of the DataBlock API.

> Note this term "DataBlock API" confused me for a while - in fact for a long time the term API in general caused confusion. To me, API in this sense just means a *callable* (function, class, url) that *abstracts* more complex code

In [19]:
DataBlock??

The `L()` object have the `attrgot` method with is how fastai can extract class atributes from a `TransformBlock`. This took me forever to understand

#### Getters

While I tried to use this as well, it turns out that `getters` essentially apply `get_x` and `get_y` in a single step. So when would you actually use them instead? A [great fastai forum post](https://forums.fast.ai/t/fastai-v2-recipes-tips-and-tricks-wiki/64486?u=stantonius) gave an example of when your `get_items` function returns a list of tuples (and you subsequently need to extract the first item of each tuple as `x` and the second item as `y`)

### Mini-Batch Shuffling

A mini-batch by design **shuffles** the data to ensure the model does not overfit when training each mini-batch. 

Without shuffling, each mini-batch may contain similar data (meaning each training step adjustment modifies the model to only learn about that specific type of data, which will slow model training time) or the model may learn based on data order/sequence (which prevents model generalizability, the ultimate goal of a good model).

## Next Steps

* If I can find proper definitions for each of these functions, then I would like to compare someone's response to these definitions and use a model to see how similar their explanations are
* Would like to add a summary table for all of our notes (had this working with `patch_to` but was a little messy)
* Come up with a way to supress or decide when to print statements (so that printing doesn't happen in each cell)
* Add some colour to text on the board a) where your guesses are and b) if they are correct or not