# Get your data ready for training

This module defines the basic [`DataBunch`](/basic_data.html#DataBunch) object that is used inside [`Learner`](/basic_train.html#Learner) to train a model. This is the generic class, that can take any kind of fastai [`Dataset`](https://pytorch.org/docs/stable/data#torch.utils.data.Dataset) or [`DataLoader`](https://pytorch.org/docs/stable/data#torch.utils.data.DataLoader). You'll find helpful functions in the data module of every application to directly create this [`DataBunch`](/basic_data.html#DataBunch) for you.

In [None]:
from fastai.gen_doc.nbdoc import *
from fastai.basic_data import * 

In [None]:
show_doc(DataBunch, doc_string=False)

Bind together a `train_dl`, a `valid_dl` and optionally a `test_dl`, ensures they are on `device` and apply to them `tfms` as batch are drawn. `path` is used internally to store temporary files, `collate_fn` is passed to the pytorch `Dataloader` (replacing the one there) to explain how to collate the samples picked for a batch. By default, it applies data to the object sent (see in [`vision.image`](/vision.image.html#vision.image) why this can be important). 

An example of `tfms` to pass is normalization. `train_dl`, `valid_dl` and optionally `test_dl` will be wrapped in [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader).

In [None]:
show_doc(DataBunch.create, full_name='create', doc_string=False)

Create a [`DataBunch`](/basic_data.html#DataBunch) from `train_ds`, `valid_ds` and optionally `test_ds`, with batch size `bs` and by using `num_workers`. `tfms` and `device` are passed to the init method.

In [None]:
show_doc(DataBunch.holdout, doc_string=False)

Return the `self.test_dl` if `is_test` is True, else `self.valid_dl`.

In [None]:
show_doc(DeviceDataLoader, doc_string=False)

Put the batches of `dl` on `device` after applying an optional list of `tfms`. `collate_fn` will replace the one of `dl`. All dataloaders of a [`DataBunch`](/basic_data.html#DataBunch) are of this type. 

### Factory method

In [None]:
show_doc(DeviceDataLoader.create, doc_string=False)

Create a [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader) on `device` from a `dataset` with batch size `bs`, `num_workers`processes and a given `collate_fn`. The dataloader will `shuffle` the data if that flag is set to True, and `tfms` are passed to the init method. All `kwargs` are passed to the pytorch [`DataLoader`](https://pytorch.org/docs/stable/data#torch.utils.data.DataLoader) class initialization.

### Internal methods

In [None]:
show_doc(DeviceDataLoader.proc_batch)

In [None]:
show_doc(DeviceDataLoader.add_tfm)

Add `tfm` to `self.tfms`.

In [None]:
show_doc(DeviceDataLoader.remove_tfm)

Remove `tfm` from `self.tfms`.

## Generic classes

Those two last classes are just empty shell to be subclassed by one of the applications.

In [None]:
show_doc(DatasetBase, title_level=3)

In [None]:
show_doc(LabelDataset, title_level=3)

## Undocumented Methods - Methods moved below this line will intentionally be hidden

## New Methods - Please document or move to the undocumented section

In [None]:
show_doc(DataBunch.add_tfm)

In [None]:
show_doc(DeviceDataLoader.collate_fn)

In [None]:
show_doc(DeviceDataLoader.one_batch)