# 04 - DataBlock Summary

This notebook shows how to work `DataBlock.summary()` and what it can do!

In [None]:
#Run once per session
!pip install fastai2 -q

In [None]:
from fastai2.vision.all import *

We'll use `ImageWoof` like we did in previous notebooks

In [None]:
path = untar_data(URLs.IMAGEWOOF)

In [None]:
lbl_dict = dict(
  n02086240= 'Shih-Tzu',
  n02087394= 'Rhodesian ridgeback',
  n02088364= 'Beagle',
  n02089973= 'English foxhound',
  n02093754= 'Australian terrier',
  n02096294= 'Border terrier',
  n02099601= 'Golden retriever',
  n02105641= 'Old English sheepdog',
  n02111889= 'Samoyed',
  n02115641= 'Dingo'
)

In [None]:
batch_tfms = [*aug_transforms(size=224, max_warp=0), Normalize.from_stats(*imagenet_stats, cuda=False)]
item_tfms = Resize(128)
bs=64

In [None]:
pets = DataBlock(blocks=(ImageBlock, CategoryBlock),
                 get_items=get_image_files,
                 splitter=RandomSplitter(),
                 get_y=Pipeline([parent_label, lbl_dict.__getitem__]),
                 item_tfms=item_tfms,
                 batch_tfms=batch_tfms)

Now to run `.summary`, we need to send in what our `DataBlock` expects. In this case it's a path (think how we make our `DataLoaders` from the `DataBlock`)

In [None]:
pets.summary(path)

Setting-up type transforms pipelines
Collecting items from /root/.fastai/data/imagewoof2
Found 12954 items
2 datasets of sizes 10364,2590
Setting up Pipeline: PILBase.create
Setting up Pipeline: parent_label -> dict.__getitem__ -> Categorize

Building one sample
  Pipeline: PILBase.create
    starting from
      /root/.fastai/data/imagewoof2/train/n02096294/n02096294_3256.JPEG
    applying PILBase.create gives
      <fastai2.vision.core.PILImage image mode=RGB size=200x200 at 0x7F2A65CA2668>
  Pipeline: parent_label -> dict.__getitem__ -> Categorize
    starting from
      /root/.fastai/data/imagewoof2/train/n02096294/n02096294_3256.JPEG
    applying parent_label gives
      n02096294
    applying dict.__getitem__ gives
      Border terrier
    applying Categorize gives
      TensorCategory(2)

Final sample: (<fastai2.vision.core.PILImage image mode=RGB size=200x200 at 0x7F2A66EA4EF0>, TensorCategory(2))


Setting up after_item: Pipeline: Resize -> ToTensor
Setting up before_batch: Pip

What we find is it will go through **each** and every single part of our `DataBlock`, test it on an item, and we can see what popped out! **But!** What if we are using the `Datasets` instead? Let's go through how to utilize it

In [None]:
tfms = [[PILImage.create], [parent_label, Categorize()]]
item_tfms = [ToTensor(), Resize(128)]
batch_tfms = [FlipItem(), RandomResizedCrop(128, min_scale=0.35),
              IntToFloatTensor(), Normalize.from_stats(*imagenet_stats, cuda=False)]

In [None]:
items = get_image_files(path)
split_idx = GrandparentSplitter(valid_name='val')(items)
dsets = Datasets(items, tfms, splits=split_idx)
dls = dsets.dataloaders(after_item=item_tfms, after_batch=batch_tfms, bs=64)

We'll want to grab the first item from our set

In [None]:
x = dsets.train[0]

In [None]:
x

(<fastai2.vision.core.PILImage image mode=RGB size=500x333 at 0x7F2A6592CE10>,
 TensorCategory(3))

And pass it into any `after_item` or `after_batch` transform `Pipeline`. We can list them by calling them

In [None]:
dls.train.after_item

Pipeline: Resize -> ToTensor

In [None]:
dls.train.after_batch

Pipeline: FlipItem -> RandomResizedCrop -> IntToFloatTensor -> Normalize

And now we can pass in our item through the `Pipeline` like so:

(`x[0]` has our input and `x[1]` has our `y`)

In [None]:
for f in dls.train.after_item:
  name = f.name
  x = f(x)
  print(name, x[0])

Resize TensorImage([[[[ 0.8961,  0.8961,  0.8789,  ...,  0.6563,  0.6221,  0.5536],
          [ 0.7248,  0.6906,  0.6563,  ...,  0.6049,  0.6221,  0.6392],
          [ 0.4851,  0.4679,  0.4166,  ...,  0.5707,  0.6049,  0.5707],
          ...,
          [ 0.1254, -0.3198, -0.5253,  ..., -1.7240, -1.8097, -1.7754],
          [-0.5253, -1.1075, -1.1932,  ..., -1.4672, -1.7583, -1.6384],
          [-0.8507, -1.2617, -0.9877,  ..., -1.5699, -1.6213, -1.7240]],

         [[ 1.2381,  1.2381,  1.2206,  ...,  1.0455,  0.9755,  0.9230],
          [ 1.0805,  1.0455,  1.0105,  ...,  0.9930,  1.0105,  1.0105],
          [ 0.8179,  0.7829,  0.7129,  ...,  0.9230,  0.9755,  0.9405],
          ...,
          [ 0.5203, -0.1625, -0.5476,  ..., -1.1253, -1.1779, -1.3354],
          [ 0.0651, -1.0553, -1.2654,  ..., -1.0728, -1.3179, -1.4055],
          [-0.0924, -1.0028, -0.9503,  ..., -1.2304, -1.3880, -1.6155]],

         [[ 1.5942,  1.5942,  1.5594,  ...,  1.4025,  1.3328,  1.2805],
          [ 1.4548

In [None]:
for f in dls.train.after_batch:
  name = f.name
  x = f(x)
  print(name, x[0])

FlipItem TensorImage([[[176, 176, 175,  ..., 162, 160, 156],
         [166, 164, 162,  ..., 159, 160, 161],
         [152, 151, 148,  ..., 157, 159, 157],
         ...,
         [131, 105,  93,  ...,  23,  18,  20],
         [ 93,  59,  54,  ...,  38,  21,  28],
         [ 74,  50,  66,  ...,  32,  29,  23]],

        [[187, 187, 186,  ..., 176, 172, 169],
         [178, 176, 174,  ..., 173, 174, 174],
         [163, 161, 157,  ..., 169, 172, 170],
         ...,
         [146, 107,  85,  ...,  52,  49,  40],
         [120,  56,  44,  ...,  55,  41,  36],
         [111,  59,  62,  ...,  46,  37,  24]],

        [[195, 195, 193,  ..., 184, 180, 177],
         [187, 184, 181,  ..., 181, 182, 182],
         [169, 167, 164,  ..., 177, 180, 176],
         ...,
         [102,  61,  44,  ...,  14,  11,  10],
         [ 74,  30,  21,  ...,  22,   9,   9],
         [ 56,  26,  30,  ...,  29,  10,   7]]], dtype=torch.uint8)
RandomResizedCrop TensorImage([[[176, 176, 175,  ..., 162, 160, 156],
   