# Data loader

A `DataLoader` in PyTorch is an object that simplifies the process of splitting data into batches.

In [1]:
import torch

## Drop incomplete batch

The `drop_last` argument in `torch.DataLoader` controls whether the final batch is dropped if it doesn't contain enough elements to complete a full batch. If `drop_last=True`, any remaining samples that don't fit into a complete batch will be skipped.

---

The following cell defines a `TensorDataset` tensor that used as base for dataset is showen.

In [16]:
samples = 14
dimentinarity = 5

input_tensor = torch.arange(
    samples*dimentinarity
).reshape(
    samples, dimentinarity
)
print(input_tensor)

dataset = torch.utils.data.TensorDataset(input_tensor)

tensor([[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44],
        [45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59],
        [60, 61, 62, 63, 64],
        [65, 66, 67, 68, 69]])


Suppose we decided to use `batch_size=4`. Since our 14 samples can't be evenly split into 4-size batches, the following cell defines such a `DataLoader` and prints all its batches.

In [17]:
data_loader = torch.utils.data.DataLoader(
    dataset, 
    batch_size=4,
    drop_last=True
)

for d in data_loader:
    print(d)

[tensor([[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]])]
[tensor([[20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39]])]
[tensor([[40, 41, 42, 43, 44],
        [45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59]])]


The numbers from the last two samples (from 60 to 69) haven't been printed because they didn't form a complete batch, and thus were not included.