#  PyTorch ```Dataset```

## Contents


* [Overview](#overview) 
* [PyTorch ```Dataset```](#ekf)
* [PyTorch ```DataLoader```](#sub_sect_1)
* [References](#refs)

## <a name="overview"></a> Overview

In this notebook we will go over the abstract ```Dataset``` class.  As its name suggests, this class represents a dataset. Our custom dataset should inherit ```Dataset``` and override the  following two methods [1]


- ```__len__``` i.e. ```len(dataset)``` should return the size of the data set
- ```__getitem__``` to support the indexing such that ```dataset[i]``` can be used to get ith sample

## <a name="ekf"></a> PyTorch ```Dataset```

In [1]:
import numpy as np
from torch.utils.data import Dataset

Define out own ```Dataset```

In [2]:
X = np.array([[0.0, 1.0], [2.0, 1.0], [4.0, 5.5]])
y = np.array([0, 1, 2])

In [5]:
class ExampleDataSet(Dataset):
    
    def __init__(self, X, y, transform=None):
        self._X = X
        self._y = y
        self._transform = transform
        
    def __getitem__(self, index):
        """
        Returns the index-th training example and label
        
        """
        
        if self._transform is not None:
            x, y = elf._transform(self._X[index], self._y[index])
        else:
            
            x = self._X[index] 
            y = self._y[index] 
        
        return self._X[index], self._y[index] 
    
    def __len__(self):
        """
        Returns how many items are in the dataset
        """
        return self._X.shape[0]
    
    

In [6]:
dataset = ExampleDataSet(X=X, y=y)

In [8]:
print("Number of training examples={0}".format(len(dataset)))

Number of training examples=3


In [9]:
print("The first training example is={0} with label={1}".format(dataset[0][0], dataset[0][1]))

The first training example is=[0. 1.] with label=0


### <a name="sub_sect_1"></a> PyTorch ```DataLoader```

A ```DataLoader``` takes in a dataset and defines rules for successively generating batches of data. For example

```
dataloader = DataLoader(dataset, batch_size=60, shuffle=True)
```

## <a name="refs"></a> References

1. <a href="https://pytorch.org/tutorials/beginner/data_loading_tutorial.html">Writing Custom Datasets, DataLoaders and Transforms</a>