# fastai Sudoku

> "Use Sudoku to help learn fastai data preparation"

- toc: true
- branch: master
- badges: true
- comments: true
- author: Craig Stanton
- hide: false
- categories: [fastai]

Use a Sudoku puzzle to learn more about the fastai `DataBlock`, `Datasets`, `DataLoaders` and `TfmdDL` objects.

Why even bother with this? Jeremy's advice is to build models as quick as you can. However I find myself tripping at the first hurdle many times - the data preparation stage. It doesn't matter how many times I read about `DataBlock`s and `DataLoaders`, there is no substitute for actually using the libraries. So why not outline a game that requires you to build and solve the puzzle by using the same tools that you need to structure the data for a fastai `Learner`

In [1]:
!pip install py-sudoku



In [2]:
from fastai.text.all import *
from fastai.vision.all import *
from sudoku import Sudoku
from functools import wraps

## Challenge Overview

There are two parts to this challenge. The first is to create the Sudoku board via the `DataBlock` API. The second is to use the `DataLoaders` callbacks to actually solve the sudoku puzzle.


Questions I hope to answer along the way:
* What is the purpose of all of these classes?
* What is a `DataBlock`? What are the blocks within a `DataBlock`?
* What is the difference between a `Datasets` object and a `DataLoaders` object?
* What is a `DataLoaders` object vs a `TfmdDL` object?
* What callbacks are called in what order?
* When would I use each type of callback?
* What is `fa_collate` and which callback is it used in by default?


## Setup

In [None]:
class FastSudoku:
    """
    Learn how to use the fastai DataBlock, Datasets, and DataLoaders transforms and callbacks 
    by creating and solving a sudoku puzzle
    """
    
    def __init__(self, difficulty: float = 0.25, data_dir: Path = Path(".")):
        self.puzzle = Sudoku(3).difficulty(difficulty)
        self.solved = self.puzzle.solve().board
        pd.DataFrame(self.puzzle.board).to_csv(data_dir/"fastsudoku.csv", index=False)
        
    def __repr__(self):
        return f"Puzzle created"
    
    @staticmethod
    def np2list(x): return None if np.isnan(x) else int(x)
    
    def check(self, dls):
        """
        Unpack the dataloaders output, convert to int and str
        """
        holder = []
        for dl in dls:
            holder+=dl
        self.preds = [list(map(self.np2list,j)) for j in [i[0] for i in holder]]
        print([self.solved[i] == x for i, x in enumerate(self.preds)])
        if not any([self.solved[i] == x for i, x in enumerate(self.preds)]):
            
            print("Yes you are a fastai...and sudoku...whiz!")
        else:
            print("Try again!")
            


In [None]:
# difficuly can be any value 0-1 (easy-hard)
fs = FastSudoku(difficulty=0.1)

### Challenge 1 - Datasets

**Instructions**:

* Grab the `fastsudoku.csv` puzzle and create a `DataBlock`
* You should try and use *as many* of the functions below as DataBlock arguments - not necessarily all, but as many as you can/wish. The point of this is not to be the most efficient or practical way of creating a `DataBlock` but rather to understand what each function argument does.

*Hints:*
1. The `y` values are the row indices. They are not dependent variables as they normally are, but rather are a tool to help when processing batches (processed out of sequential order)

*Tips*:
1. Don't be afraid to comment out lines to see how the absence of functions changes the output
2. Use print statements 

In [15]:
class DataBlockNotes:
    """
    Class to hold and access all function comments to make a summary if desired
    """
    pass

@patch_to(DataBlockNotes)
def get_items():
    """
    TODO: Your summary of what this function does
    """
    pass

@patch_to(DataBlockNotes)
def get_x():
    """
    TODO: Your summary of what this function does
    """
    pass

@patch_to(DataBlockNotes)
def get_y():
    """
    TODO: Your summary of what this function does
    """
    pass

@patch_to(DataBlockNotes)
def item_tfms():
    """
    TODO: Your summary of what this function does
    """
    pass

@patch_to(DataBlockNotes)
def batch_tfms():
    """
    TODO: Your summary of what this function does
    """
    pass

# predefined
def splitter():
    # In this exercise, we don't need train and validation sets
    # But never forget about them because theyre so important!
    return [list(range(10)),]

Create the `DataBlock` and `Datasets`

In [None]:
dblock = DataBlock(
    get_items=get_items,
    get_x=get_x,
    get_y=get_y,
    # batch_tfms=batch_tfms,
    # item_tfms=item_tfms
    splitter=splitter
)

dsets = dblock.datasets("fastsudoku.csv")

In [None]:
# Check your output - does it look like this?
fs.puzzle.show()

dsets.items

### Challenge 2 - DataLoader Sudoku

**Instructions**:

* Use the `DataLoaders` callbacks to modify the `Datasets` you just created to solve the Sudoku board
* Test your DataLoaders object against the puzzle, you can use the `fs.check(dls)` method
* You should try and use *as many* of the callback functions below - not necessarily all, but as many as you can/wish. The point of this is not to be the most efficient or practical way of creating a `DataLoaders` but rather to understand what each function argument does.

*Tips*:
1. Don't be afraid to comment out lines to see how the absence of functions changes the output
2. Use print statements 

The Sudoku puzzle is below:

In [None]:
fs.puzzle.show()

Functions to use

In [3]:
class TfmdDLNotes:
    """
    Class to hold and access all function comments to make a summary if desired
    """
    pass

@patch_to(TfmdDLNotes)
def before_iter():
    """
    TODO: Your summary of what this function does
    """
    return a

@patch_to(TfmdDLNotes)
def after_item(a):
    """
    TODO: Your summary of what this function does
    """
    return a

@patch_to(TfmdDLNotes)
def before_batch(a):
    """
    TODO: Your summary of what this function does
    """
    return a

@patch_to(TfmdDLNotes)
def after_iter():
    """
    TODO: Your summary of what this function does
    """
    pass

@patch_to(TfmdDLNotes)
def create_batches(a):
    """
    TODO: Your summary of what this function does
    """
    return a

@patch_to(TfmdDLNotes)
def create_item(a):
    """
    TODO: Your summary of what this function does
    """
    return a

@patch_to(TfmdDLNotes)
def create_batch(a):
    """
    TODO: Your summary of what this function does
    """
    return a

@patch_to(TfmdDLNotes)
def after_batch(a):
    """
    TODO: Your summary of what this function does
    """
    return a

In [None]:
dls = TfmdDL(
    dsets,
    bs=2,   # keep as 2 for this exercise; can be any number below 10
    before_iter=before_iter,
    after_item=after_item,
    before_batch=before_batch,
    after_iter=after_iter,
    create_item=create_item,
    create_batch=create_batch,
    after_batch=after_batch,
    create_batches=create_batches
)

In [None]:
fs.check(dls)

## Your Notes

In [16]:
for k,v in DataBlockNotes.__dict__.items():
    if "__" not in k:
        print({k: rm_useless_spaces(v.__doc__).strip()})

{'get_items': 'TODO: Your summary of what this function does'}
{'get_x': 'TODO: Your summary of what this function does'}
{'get_y': 'TODO: Your summary of what this function does'}
{'item_tfms': 'TODO: Your summary of what this function does'}
{'batch_tfms': 'TODO: Your summary of what this function does'}


In [13]:
for k,v in TfmdDLNotes.__dict__.items():
    if "__" not in k:
        print({k: rm_useless_spaces(v.__doc__).strip()})

{'before_iter': 'TODO: Your summary of what this function does'}
{'after_item': 'TODO: Your summary of what this function does'}
{'before_batch': 'TODO: Your summary of what this function does'}
{'after_iter': 'TODO: Your summary of what this function does'}
{'create_batches': 'TODO: Your summary of what this function does'}
{'create_item': 'TODO: Your summary of what this function does'}
{'create_batch': 'TODO: Your summary of what this function does'}
{'after_batch': 'TODO: Your summary of what this function does'}
