# 6. Using datasets

The `nitrain.Dataset` class provides everything you need to map collections of images and related meta-data. This chapter introduces the basic functionality and structure of the class so you can get going. Once you learn the basics, it will be intuitive to expand on it with additional things you'll learn later.

## Prerequisites

Besides nitrain, this chapter will use ants and numpy to create images and some basic operating system tools to create directories that mimic what your data will look like when not loaded into memory.

In [2]:
import nitrain as nt
import ants
import numpy as np
import os
from tempfile import TemporaryDirectory

## Basic example

To create a dataset, you need to pass in `inputs` and `outputs` arguments. In the most basic example of image classification, you would pass in a list of images as inputs and a list of class labels as outputs.

In [3]:
images = [ants.from_numpy(np.zeros((100,100))) * i for i in range(10)]
labels = [i for i in range(10)]

dataset = nt.Dataset(inputs=images,
                     outputs=labels)

Now our dataset is mapped! We can retrieve one or multiple records from the dataset via indexing.

In [5]:
x, y = dataset[0]
print(x)

ANTsImage
	 Pixel Type : float (float32)
	 Components : 1
	 Dimensions : (100, 100)
	 Spacing    : (1.0, 1.0)
	 Origin     : (0.0, 0.0)
	 Direction  : [1. 0. 0. 1.]



In [7]:
x_list, y_list = dataset[3:5]
print(x_list)
print(y_list)

[ANTsImage
	 Pixel Type : float (float32)
	 Components : 1
	 Dimensions : (100, 100)
	 Spacing    : (1.0, 1.0)
	 Origin     : (0.0, 0.0)
	 Direction  : [1. 0. 0. 1.]
, ANTsImage
	 Pixel Type : float (float32)
	 Components : 1
	 Dimensions : (100, 100)
	 Spacing    : (1.0, 1.0)
	 Origin     : (0.0, 0.0)
	 Direction  : [1. 0. 0. 1.]
]
[3, 4]


We can also print the dataset to understand a bit more of its structure.

In [8]:
print(dataset)

Dataset (n=10)
     Inputs     : <nitrain.readers.memory.MemoryReader object at 0x1326f5690>
     Outputs    : <nitrain.readers.memory.MemoryReader object at 0x1326f5dd0>
     Transforms : {}



As you see, our dataset has a `MemoryReader` in the inputs and the outputs slot. You will learn more about readers in later chapter, but a basic explanation is that readers are what the dataset uses to feed records to you from a variety of sources. Since our images and labels actually exist in memory right now, a `MemoryReader` is inferred. 

## Loading from file

What about when our data does not already exist in memory? 