## Turn a native datatype into a dataset

In [1]:
x = [1,2,3,4,5]
y = [0,0,1,0,1]

Implement a class that inherits from `Dataset` and defines `__len__` and `__getitem__`:

In [2]:
import torch.utils.data

class MyDataset(torch.utils.data.Dataset):
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
    def __len__(self):
        return len(self.y)

    def __getitem__(self, i):
        return self.x[i], self.y[i]

In [3]:
dataset = MyDataset(x, y)

## Loop over dataset doing something other than training

In this case, count the number of times each label occurs using python's built in Counter data type

In [4]:
from collections import Counter

label_counter = Counter()

for data, label in dataset:
    label_counter[label] += 1
    
print(label_counter)

Counter({0: 3, 1: 2})


## Consume a dataset (i.e. convert it back to a native datatype)

`list` gives you a list of (x, y) tuples:

In [5]:
list(dataset)

[(1, 0), (2, 0), (3, 1), (4, 0), (5, 1)]

`list(zip*(dataset))` gives you a list of the `x` values and a list of the `y` (_why_ this works is a long story)

In [6]:
list(zip(*dataset))

[(1, 2, 3, 4, 5), (0, 0, 1, 0, 1)]

Be careful doing things like this with very large dataset.