# Debugging the API - Part 1
> A warm introduction to `fastai`'s built in debugging tools

- toc: true
- badges: true
- comments: true
- image: images/chart-preview.png
- category: DataBlock

---
This blog is also a Jupyter notebook available to run from the top down. There will be code snippets that you can then run in any environment. In this section I will be posting what version of fastai2 and fastcore I am currently running at the time of writing this:

* `fastai2`: 0.0.15
* `fastcore`: 0.1.16
---

# `DataBlock` Part 3 - Debugging the High-Level API

So we have this `DataBlock` API where we can build this blueprint of how we want to build our data, but how can we debug this? Is it possible for us to see if we can even build a `DataLoader`? Or forsee any bugs that may be happening at a later date?

**Yes!** Thanks to the valient efforts of Sylvain Gugger this is possible through `datablock.summary()`. In today's article we will be exploring just how useful `datablock.summary()` can be and how we can apply this to the Mid-Level API

We'll again be working with our example problems nested in Computer Vision

In [0]:
from fastai2.vision.all import *

To begin with let's make our simple `PETS` `DataBlock` step by step:

First grab our data:

In [0]:
path = untar_data(URLs.PETS)

Then how we want to grab the class from the filename using a Regular Expression and some way to split the data:

In [0]:
pat = r'([^/]+)_\d+.*$'
splitter = RandomSplitter(valid_pct=0.2, seed=42)

And then finally our augmentation

In [0]:
item_tfms = [Resize(224, method='crop')]
batch_tfms=[*aug_transforms(size=256), Normalize.from_stats(*imagenet_stats)]

Now to build the `DataBlock` we are used to doing something like so:

In [0]:
dblock = DataBlock(blocks=(ImageBlock, CategoryBlock),
                   get_items=get_image_files,
                   splitter=RandomSplitter(),
                   get_y=RegexLabeller(pat=pat),
                   item_tfms=item_tfms,
                   batch_tfms=batch_tfms)

And then we'd do something like this to build our `DataLoaders`:

In [0]:
dls = dblock.dataloaders(path/'images')

However! We won't be doing that today!

# `dblock.summary()`

With each instance of `DataBlock` we make we can call `.summary()` passing in however our `DataBlock` is expected to be built with. For our problem this is passing in the `path/'images`. 

Now what *exactly* does `dblock.summary()` do? It will attempt to run through your entire `DataBlock` and attempt to build one batch of data, printing out each time it walks through the `DataBlock` and what it's doing. 

Let's see that here:

In [0]:
dblock.summary(path/'images')

Setting-up type transforms pipelines
Collecting items from /root/.fastai/data/oxford-iiit-pet/images
Found 7390 items
2 datasets of sizes 5912,1478
Setting up Pipeline: PILBase.create
Setting up Pipeline: RegexLabeller -> Categorize

Building one sample
  Pipeline: PILBase.create
    starting from
      /root/.fastai/data/oxford-iiit-pet/images/Ragdoll_10.jpg
    applying PILBase.create gives
      PILImage mode=RGB size=500x375
  Pipeline: RegexLabeller -> Categorize
    starting from
      /root/.fastai/data/oxford-iiit-pet/images/Ragdoll_10.jpg
    applying RegexLabeller gives
      Ragdoll
    applying Categorize gives
      TensorCategory(8)

Final sample: (PILImage mode=RGB size=500x375, TensorCategory(8))


Setting up after_item: Pipeline: Resize -> ToTensor
Setting up before_batch: Pipeline: 
Setting up after_batch: Pipeline: IntToFloatTensor -> AffineCoordTfm -> LightingTfm -> Normalize

Building one batch
Applying item_tfms to the first sample:
  Pipeline: Resize -> ToTensor


Wow that's a *lot* of information being thrown at us! Let's break it down piece by piece

## Setup

First we can see a bunch of `setup` being called:

```python
Setting-up type transforms pipelines
Collecting items from /root/.fastai/data/oxford-iiit-pet/images
Found 7390 items
2 datasets of sizes 5912,1478
Setting up Pipeline: PILBase.create
Setting up Pipeline: RegexLabeller -> Categorize
```

1. First we went and grabbed all of our `x`'s (7390)
2. Then we split it into a validation and training dataset (this is done via `RandomSplitter()` by default)
3. Then we set up our two `type` transform `Pipelines`. Now all a `Pipeline` is is a set of transforms done in a particular order. 
  * So here we can see that for our `x`'s we set up one where we call `PILBase.create` (which was nested inside of our `ImageBlock`)
  * For our `y`'s we setup a `Pipeline` to first call our `RegexLabeller` and then to further call `Categorize`. `Categorize` simply maps all of the possible categories to a value (IE `class_a` is 0, `class_b` is 1) and then sets up transforms for doing so.

So now we've gone through the first few blocks! And it all checks out too! Let's dig a little deeper now:

## Building One Sample

The next thing `dblock.summary()` will attempt to do is build *one* sample. This is one individual sample, **not** a batch. Let's read it's output:


```python
Building one sample
  Pipeline: PILBase.create
    starting from
      /root/.fastai/data/oxford-iiit-pet/images/Ragdoll_10.jpg
    applying PILBase.create gives
      PILImage mode=RGB size=500x375
  Pipeline: RegexLabeller -> Categorize
    starting from
      /root/.fastai/data/oxford-iiit-pet/images/Ragdoll_10.jpg
    applying RegexLabeller gives
      Ragdoll
    applying Categorize gives
      TensorCategory(8)

Final sample: (PILImage mode=RGB size=500x375, TensorCategory(8))
```

This reads very plainly, we started from a particular filename and applied `PILBase.create` to give us a `PILImage` to work with. We further apply our `RegexLabeller`and `Categorize` to give us our encoded category of `TensorCategory(0)`. And finally we can see the final sample shown. 

Great! We can build a sample now! What's next?

## Building a Batch

Now well try to build a batch from our data:

```python
Setting up after_item: Pipeline: Resize -> ToTensor
Setting up before_batch: Pipeline: 
Setting up after_batch: Pipeline: IntToFloatTensor -> AffineCoordTfm -> LightingTfm -> Normalize

Building one batch
Applying item_tfms to the first sample:
  Pipeline: Resize -> ToTensor
    starting from
      (PILImage mode=RGB size=500x375, TensorCategory(8))
    applying Resize gives
      (PILImage mode=RGB size=224x224, TensorCategory(8))
    applying ToTensor gives
      (TensorImage of size 3x224x224, TensorCategory(8))
```
### `after_item` and `after_batch`
Wait where did this `after_item` and `after_batch` stuff come from? 

Our `item_tfms` and `batch_tfms` will eventually turn into `after_item` and `after_batch` the further we dig in. So we can see that we setup some item transforms in a `Pipeline` to Resize and convert our `x`'s to a Tensor (`ToTensor`) and we setup some batch transforms in a `Pipeline` to convert our Tensor's to floats (for the GPU), and then we see these `AffineCoordTfm`, `LightingTfm`, and `Normalize`.

But wait! That's not what I passed in! I passed in `*aug_transforms(size=256)` and `Normalize.from_stats(*imagenet_stats)`. Where did that come from? 

`AffineCoordTfm`'s are the class most augmentation inherit from, and if it has to do with any lighting same for the `LightingTfm`. Each have a particular way they deal with augmentation and so we can generalize. See below for `aug_transforms`. I'm going to remove the middle of the function so we just focus on what we need to see right now, which is the `return` statement:

In [0]:
def aug_transforms(mult=1.0, do_flip=True, flip_vert=False, max_rotate=10., max_zoom=1.1, max_lighting=0.2,
                   max_warp=0.2, p_affine=0.75, p_lighting=0.75, xtra_tfms=None, size=None,
                   mode='bilinear', pad_mode=PadMode.Reflection, align_corners=True, batch=False, min_scale=1.):
    "Utility func to easily create a list of flip, rotate, zoom, warp, lighting transforms."
    return setup_aug_tfms(res + L(xtra_tfms))

We can see we call `setup_aug_tfms`. Let's look at that

In [0]:
def setup_aug_tfms(tfms):
    "Go through `tfms` and combines together affine/coord or lighting transforms"
    aff_tfms = [tfm for tfm in tfms if isinstance(tfm, AffineCoordTfm)]
    lig_tfms = [tfm for tfm in tfms if isinstance(tfm, LightingTfm)]
    others = [tfm for tfm in tfms if tfm not in aff_tfms+lig_tfms]
    aff_tfm,lig_tfm =  _compose_same_tfms(aff_tfms),_compose_same_tfms(lig_tfms)
    res = [aff_tfm] if aff_tfm is not None else []
    if lig_tfm is not None: res.append(lig_tfm)
    return res + others

And now we can see that `setup_aug_transforms` will return any `AffineCoordTfm` (`res`) and any `LightingTfm` (`light_tfms`).

## Building the Batch (cont)

Now if we continue along, we can see that it will first try to make three more samples from our data before then building a `batch` off of this data and applying any batch transforms:

```python
Adding the next 3 samples

No before_batch transform to apply

Collating items in a batch

Applying batch_tfms to the batch built
  Pipeline: IntToFloatTensor -> AffineCoordTfm -> LightingTfm -> Normalize
    starting from
      (TensorImage of size 4x3x224x224, TensorCategory([ 8, 13,  0, 33]))
    applying IntToFloatTensor gives
      (TensorImage of size 4x3x224x224, TensorCategory([ 8, 13,  0, 33]))
    applying AffineCoordTfm gives
      (TensorImage of size 4x3x256x256, TensorCategory([ 8, 13,  0, 33]))
    applying LightingTfm gives
      (TensorImage of size 4x3x256x256, TensorCategory([ 8, 13,  0, 33]))
    applying Normalize gives
      (TensorImage of size 4x3x256x256, TensorCategory([ 8, 13,  0, 33]))
```

And we can see where each one gets applied and if there is any noticeable changes! Now this is all fine, as it didn't throw any errors or anything. But what happens if I forget something? Say I didn't make all my images the same size with a `Resize`? Let's check out that behavior

# Breaking the `DataBlock`

Let's remove our `item_tfms` from our `DataBlock`:

In [0]:
dblock = DataBlock(blocks=(ImageBlock, CategoryBlock),
                   get_items=get_image_files,
                   splitter=RandomSplitter(),
                   get_y=RegexLabeller(pat=pat),
                   item_tfms=[],
                   batch_tfms=batch_tfms)

And call `summary`

In [0]:
dblock.summary(path/'images')

Aside from a `RuntimeError`, we get the following result:

```python
Setting-up type transforms pipelines
Collecting items from /root/.fastai/data/oxford-iiit-pet/images
Found 7390 items
2 datasets of sizes 5912,1478
Setting up Pipeline: PILBase.create
Setting up Pipeline: RegexLabeller -> Categorize

Building one sample
  Pipeline: PILBase.create
    starting from
      /root/.fastai/data/oxford-iiit-pet/images/shiba_inu_26.jpg
    applying PILBase.create gives
      PILImage mode=RGB size=333x500
  Pipeline: RegexLabeller -> Categorize
    starting from
      /root/.fastai/data/oxford-iiit-pet/images/shiba_inu_26.jpg
    applying RegexLabeller gives
      shiba_inu
    applying Categorize gives
      TensorCategory(33)

Final sample: (PILImage mode=RGB size=333x500, TensorCategory(33))


Setting up after_item: Pipeline: ToTensor
Setting up before_batch: Pipeline: 
Setting up after_batch: Pipeline: IntToFloatTensor -> AffineCoordTfm -> LightingTfm -> Normalize

Building one batch
Applying item_tfms to the first sample:
  Pipeline: ToTensor
    starting from
      (PILImage mode=RGB size=333x500, TensorCategory(33))
    applying ToTensor gives
      (TensorImage of size 3x500x333, TensorCategory(33))

Adding the next 3 samples

No before_batch transform to apply

Collating items in a batch
Error! It's not possible to collate your items in a batch
Could not collate the 0-th members of your tuples because got the following shapes
torch.Size([3, 500, 333]),torch.Size([3, 225, 300]),torch.Size([3, 500, 332]),torch.Size([3, 333, 500])
```

Let's focus on it's tail end:

```python
Error! It's not possible to collate your items in a batch
Could not collate the 0-th members of your tuples because got the following shapes
torch.Size([3, 500, 333]),torch.Size([3, 225, 300]),torch.Size([3, 500, 332]),torch.Size([3, 333, 500])
```

This tells us very simply that we could not build a batch because our images were not all the same size! (We know these are our images because they have three channels). So the simple solution is to `Resize` them in an `Item` transform. 

From a personal takeaway, I *always* use `dblock.summary()` each time I work with the high-level API before I build any `DataLoader` as a simple piece of mind because if anything were to go wrong I know right away what it was!

This is the first part of this two part mini-series on debugging. In the next blog we'll be looking at `.summary()` more closely and see if we can't apply its ideas to the low-level API