# The Flow of transforms and functions

> Understanding the DataBlocks API in FastAI V2, through exploration

- comments: true
- toc: true
- badges: true
- categories: [datablocks, fastai2]

`DataBlock` API is the building block of the FastAI V2. It provides a way to specify all the transformations for the input data to make it ready to be available for the model to consume for training. The parameters that are passes to create a `DataBlock` has a significant value. Lets explore the values one by one.

Now lets import `DataBlock` API. The `DataBlock` API base class is available in `fastai2.data.block`. Lets import that.

In [1]:
from fastai2.data.block import *
DataBlock

fastai2.data.block.DataBlock

`show_doc` is a easy tool to look at the documentation of the API. Lets see what parameters it has.

In [2]:
#hide
from nbdev.showdoc import show_doc
show_doc(DataBlock)

<h2 id="DataBlock" class="doc_header"><code>class</code> <code>DataBlock</code><a href="https://github.com/fastai/fastai2/tree/master/fastai2/data/block.py#L52" class="source_link" style="float:right">[source]</a></h2>

> <code>DataBlock</code>(**`blocks`**=*`None`*, **`dl_type`**=*`None`*, **`getters`**=*`None`*, **`n_inp`**=*`None`*, **`item_tfms`**=*`None`*, **`batch_tfms`**=*`None`*, **`get_items`**=*`None`*, **`splitter`**=*`None`*, **`get_y`**=*`None`*, **`get_x`**=*`None`*)

Generic container to quickly build `Datasets` and `DataLoaders`

As the definition says, it is a container that helps to quickly build `Datasets` and `DataLoaders`.

Let's rather go through each parameter and why and when it is called.

## Going through the Source Code of `DataBlock`

There is a convenient way in Jupyter notebook to look into the code. (i.e) adding `??` before or after the class or method of interest. We do that. 

In [3]:
DataBlock??

Note that the code would not show up as a regular output but as a pop up. It wont showup in the blog as well. But you will be able to see in the notebook version of the post when the previous code is ran. 

### Parameters Analysis

Lets look at the parameters one by one.

#### `blocks`

`blocks` parameter takes in a tuple of `TransformBlock`s. Each `TransformBlock` will represent wach data point that is required in the model (ie) each dependent and independent variable. 

For instance, if we are doing a Image Classification, we will have a block saying `ImageBlock` to deal with Image transformations.

The advantage of having a Block form each variable or data point is that we might have some transformations that are to be done for each type. For example, Images will have some meaningful `type_tfms`,`batch_tfms`such as opening the image file, converting the tensor to Float from Int in case of `ImageBlock`.

We have options to do `item_tfms` as well, along with `dl_type` and its `kwargs`. Read the docs for more info.

#### `dl_type`

`dl_type` parameter is the optional one where we can pass the dataloader to use. By default it is `TfmdDL`

#### `getters`
`getters` parameter is used to pass the list of getters on how to get the data for each of the data points (Independent and Dependent Variables) from the given data source. Each function passed ass list to this function could return the data point of interest

#### `get_x`
`get_x` parameter is a convinient function that can be used to get the X variable if there is only one input variable and one output variable. We can fall back to `getters` if there are more than one independent or dependent variable

#### `get_y`
`get_y` parameter is a convinient function that can be used to get the y variable if there is only one input variable and one output variable. We can fall back to `getters` if there are more than one independent or dependent variable

#### `get_items`
`get_items` parameter is a function that is called before getters

In [9]:
from fastai2.vision.all import *
show_doc(ImageDataLoaders.from_name_func)

<h4 id="ImageDataLoaders.from_name_func" class="doc_header"><code>ImageDataLoaders.from_name_func</code><a href="https://github.com/fastai/fastai2/tree/master/fastai2/vision/data.py#L107" class="source_link" style="float:right">[source]</a></h4>

> <code>ImageDataLoaders.from_name_func</code>(**`path`**, **`fnames`**, **`label_func`**, **`valid_pct`**=*`0.2`*, **`seed`**=*`None`*, **`item_tfms`**=*`None`*, **`batch_tfms`**=*`None`*, **`bs`**=*`64`*, **`val_bs`**=*`None`*, **`shuffle_train`**=*`True`*, **`device`**=*`None`*)

Create from the name attrs of `fnames` in `path`s with `label_func`