# Get your data ready for training

This module defines the basic [`DataBunch`](/basic_data.html#DataBunch) object that is used inside [`Learner`](/basic_train.html#Learner) to train a model. This is the generic class, that can take any kind of fastai [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) or [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader). You'll find helpful functions in the data module of every application to directly create this [`DataBunch`](/basic_data.html#DataBunch) for you.

In [None]:
from fastai.gen_doc.nbdoc import *
from fastai.basic_data import * 

In [None]:
show_doc(DataBunch, doc_string=False)

<a id=DataBunch></a><div><h2 style="display:inline"><code>class</code> <code>DataBunch</code></h2><div style="float:right"><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L69">[source]</a></div></div>

> <code>DataBunch</code>(`train_dl`:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), `valid_dl`:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), `test_dl`:`Optional`\[[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)\]=`None`, `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=`None`, `tfms`:`Optional`\[`Collection`\[`Callable`\]\]=`None`, `path`:`PathOrStr`=`'.'`, `collate_fn`:`Callable`=`'data_collate'`)

Bind together a `train_dl`, a `valid_dl` and optionally a `test_dl`, ensures they are on `device` and apply to them `tfms` as batch are drawn. `path` is used internally to store temporary files, `collate_fn` is passed to the pytorch `Dataloader` (replacing the one there) to explain how to collate the samples picked for a batch. By default, it applies data to the object sent (see in [`vision.image`](/vision.image.html#vision.image) why this can be important). 

An example of `tfms` to pass is normalization. `train_dl`, `valid_dl` and optionally `test_dl` will be wrapped in [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader).

In [None]:
show_doc(DataBunch.create, doc_string=False)

<a id=DataBunch.create></a><div><h4 style="display:inline"><code>create</code></h4><div style="float:right"><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L82">[source]</a></div></div>

> <code>create</code>(`train_ds`:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset), `valid_ds`:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset), `test_ds`:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset)=`None`, `path`:`PathOrStr`=`'.'`, `bs`:`int`=`64`, `num_workers`:`int`=`8`, `tfms`:`Optional`\[`Collection`\[`Callable`\]\]=`None`, `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=`None`, `collate_fn`:`Callable`=`'data_collate'`) → `DataBunch`

Create a [`DataBunch`](/basic_data.html#DataBunch) from `train_ds`, `valid_ds` and optionally `test_ds`, with batch size `bs` and by using `num_workers`. `tfms` and `device` are passed to the init method.

In [None]:
show_doc(DataBunch.holdout, doc_string=False)

<a id=DataBunch.holdout></a><div><h4 style="display:inline"><code>holdout</code></h4><div style="float:right"><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L94">[source]</a></div></div>

> <code>holdout</code>(`is_test`:`bool`=`False`) → [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader)

Return the `self.test_dl` if `is_test` is True, else `self.valid_dl`.

In [None]:
show_doc(DeviceDataLoader, doc_string=False)

<a id=DeviceDataLoader></a><div><h2 style="display:inline"><code>class</code> <code>DeviceDataLoader</code></h2><div style="float:right"><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L23">[source]</a></div></div>

> <code>DeviceDataLoader</code>(`dl`:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device), `tfms`:`List`\[`Callable`\]=`None`, `collate_fn`:`Callable`=`'data_collate'`)

Put the batches of `dl` on `device` after applying an optional list of `tfms`. `collate_fn` will replace the one of `dl`. All dataloaders of a [`DataBunch`](/basic_data.html#DataBunch) are of this type. 

### Factory method

In [None]:
show_doc(DeviceDataLoader.create, doc_string=False)

<a id=DeviceDataLoader.create></a><div><h4 style="display:inline"><code>create</code></h4><div style="float:right"><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L62">[source]</a></div></div>

> <code>create</code>(`dataset`:[`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset), `bs`:`int`=`64`, `shuffle`:`bool`=`False`, `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=`device(type='cpu')`, `tfms`:`Collection`\[`Callable`\]=`None`, `num_workers`:`int`=`8`, `collate_fn`:`Callable`=`'data_collate'`, `kwargs`:`Any`)

Create a [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader) on `device` from a `dataset` with batch size `bs`, `num_workers`processes and a given `collate_fn`. The dataloader will `shuffle` the data if that flag is set to True, and `tfms` are passed to the init method. All `kwargs` are passed to the pytorch [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) class initialization.

### Internal methods

In [None]:
show_doc(DeviceDataLoader.proc_batch)

<a id=DeviceDataLoader.proc_batch></a><div><h4 style="display:inline"><code>proc_batch</code></h4><div style="float:right"><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L44">[source]</a></div></div>

> <code>proc_batch</code>(`b`:`Tensor`) → `Tensor`

Proces batch `b` of `TensorImage`.  

In [None]:
show_doc(DeviceDataLoader.add_tfm)

<a id=DeviceDataLoader.add_tfm></a><div><h4 style="display:inline"><code>add_tfm</code></h4><div style="float:right"><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L41">[source]</a></div></div>

> <code>add_tfm</code>(`tfm`:`Callable`)

Add `tfm` to `self.tfms`.

In [None]:
show_doc(DeviceDataLoader.remove_tfm)

<a id=DeviceDataLoader.remove_tfm></a><div><h4 style="display:inline"><code>remove_tfm</code></h4><div style="float:right"><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L42">[source]</a></div></div>

> <code>remove_tfm</code>(`tfm`:`Callable`)

Remove `tfm` from `self.tfms`.

## Generic classes

Those two last classes are just empty shell to be subclassed by one of the applications.

In [None]:
show_doc(DatasetBase, title_level=3)

<a id=DatasetBase></a><div><h3 style="display:inline"><code>class</code> <code>DatasetBase</code></h3><div style="float:right"><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L6">[source]</a></div></div>

> <code>DatasetBase</code>() :: [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset)

Base class for all fastai datasets.  

In [None]:
show_doc(LabelDataset, title_level=3)

<a id=LabelDataset></a><div><h3 style="display:inline"><code>class</code> <code>LabelDataset</code></h3><div style="float:right"><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L15">[source]</a></div></div>

> <code>LabelDataset</code>() :: [`DatasetBase`](/basic_data.html#DatasetBase)

Base class for fastai datasets that do classification.  

## Undocumented Methods - Methods moved below this line will intentionally be hidden

## New Methods - Please document or move to the undocumented section

In [None]:
show_doc(DataBunch.add_tfm)

<a id=DataBunch.add_tfm></a><div><h4 style="display:inline"><code>add_tfm</code></h4><div style="float:right"><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L98">[source]</a></div></div>

> <code>add_tfm</code>(`tfm`:`Callable`)

In [None]:
show_doc(DeviceDataLoader.collate_fn)

<a id=data_collate></a><div><h4 style="display:inline"><code>data_collate</code></h4><div style="float:right"><a href="https://github.com/fastai/fastai/blob/master/fastai/torch_core.py#L89">[source]</a></div></div>

> <code>data_collate</code>(`batch`:`ItemsList`) → `Tensor`

Convert `batch` items to tensor data.  

In [None]:
show_doc(DeviceDataLoader.one_batch)

<a id=DeviceDataLoader.one_batch></a><div><h4 style="display:inline"><code>one_batch</code></h4><div style="float:right"><a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L54">[source]</a></div></div>

> <code>one_batch</code>() → `Collection`\[`Tensor`\]

Get one batch from the data loader.  