# Customizing datasets in fastai

In this tutorial, we'll see how to create custom subclasses of `ItemBase` or `ItemList` while retaining everything the fastai library has to offer. To allow basic functions to work consistently across various applications, the fastai library delegates several tasks to one of those specific objets, and we'll see here which methods you have to implement to be able to have everything work properly. But first let's see take a step back to see where you'll use your end result.

## Links with the data block API

The data block API works by allowing you to pick a class that is responsible to get your items and another class that is charged with getting your targets. Combined together, they create a pytorch `Dataset` that is then wrapped inside a `DataLoader`. The training set, validation set and maybe test set are then all put in a `DataBunch`.

The data block API allows you to mix and match what class your inputs have, what clas you target have, how to do the split between train and validation set, then how to create the `DataBunch`, but if you have a very specific kind of input/target, the fastai classes might no be sufficient to you. This tutorial is there to explain what is needed to create a new class of items and what methods are important to implement or override.

It goes in two phases: first we focus on what you need to create a custom `ItemBase` class (which the type of your inputs/targets) then on how to create your custom `ItemList` (which is basically a set of `ItemBase`) while highlining which methods are called by the library.

## Creating a custom `ItemBase` subclass

The fastai library contains three basic type of `ItemBase` that you might want to subclass:
- `Image` for vision applications
- `Text` for text applications
- `TabularLine` for tabular applications

Whether you decide to create your own item class or to subclass one of the above, here is what you need to implement:

### Basic attributes

Those are the more importants attribute your custom `ItemBase` needs as they're used everywhere in the fastai library:
- `ItemBase.data` is the thing that is passed to pytorch when you want to create a `DataLoader`. This is what needs to be fed to your model. Note that it might be different from the representation of your item since you might want something that is more understandable.
- `ItemBase.obj` is the thing that truly represents the underlying object behind your item. It should be sufficient to create a copy of your item. For instance, when creating the test set, the basic label is the `obj` attribute of the first label (or y) in the training set.
- `__str__` representation: if applicable, this is what will be displayed when the fastai library has to show your item.

If we take the example of a `MultiCategory` object `o` for instance:
- `o.obj` is the list of tags that object has
- `o.data` is a tensor where the tags are one-hot encoded
- `str(o)` returns the tags separated by ;

Those are the basics that will make your object work in the library. If you want to use methods `such a `data.