## Homework 0 Part 2: Data Loaders

In this assignment, you will be provided with data and an expected result. Your task is to fill out the starter code to obtain the expected result. Do not modify the data (X or Y), and do not modify the instantiation of the dataset or dataloader.

All three versions -- easy difficulty, medium difficulty, and hard difficulty -- have the same solution code and the same examples. We recommend starting with the easy difficulty. Once you get the expected results with the easy difficulty, try again with the medium difficulty. If you want to challenge yourself, try again with the hard difficulty.

CUDA is not required to complete most of this assignment, but it is required to execute the final command. Please use AWS to access CUDA available resources by following the recitation.

<hr style="border:2px solid gray"> </hr>

In [1]:
import numpy as np
import torch
# import torch.utils.data

### Exercise 1

In [2]:
X = np.array([2,  3,  4,  5,  6,  7,  8,  9])

In [3]:
class ExampleDataset1(torch.utils.data.Dataset):

    def __init__(self, X):

        ### Assign data to self (1 line)
        self.X = X

        ### Assign length to self (1 line)
        self.length = len(X)

    def __len__(self):

        ### Return length (1 line)
        return self.length

    def __getitem__(self, i):

        ### Return data at index i (1 line)
        return self.X[i]

    def collate_fn(batch):

        ### Convert batch to tensor (1 line)
        batch_x = torch.as_tensor(batch)

        ### Return batched data and labels (1 line)
        return batch_x

In [4]:
dataset1 = ExampleDataset1(X)

dataloader1 = torch.utils.data.DataLoader(dataset1,
                                          batch_size=2,
                                          shuffle=False,
                                          collate_fn=ExampleDataset1.collate_fn)

for i, batch in enumerate(dataloader1):
    print("Batch", i, ":\n", batch, "\n")

Batch 0 :
 tensor([2, 3], dtype=torch.int32) 

Batch 1 :
 tensor([4, 5], dtype=torch.int32) 

Batch 2 :
 tensor([6, 7], dtype=torch.int32) 

Batch 3 :
 tensor([8, 9], dtype=torch.int32) 



---
#### Expected Output:
```
Batch 0 :
 tensor([2, 3])

Batch 1 :
 tensor([4, 5])

Batch 2 :
 tensor([6, 7])

Batch 3 :
 tensor([8, 9])
```
---

<hr style="border:2px solid gray"> </hr>

### Exercise 2

In [5]:
X = np.array([2,  3,  4,  5,  6,  7,  8,  9])
Y = np.array([4,  9, 16, 25, 36, 49, 64, 81])

In [6]:
class ExampleDataset2(torch.utils.data.Dataset):

    def __init__(self, X, Y):

        ### Assign data and labels to self (1-2 lines)
        self.X = X
        self.Y = Y

        ### Assert data and labels have the same length (1 line)
        assert(len(X) == len(Y))

        ### Assign length to self (1 line)
        self.length = len(X)

    def __len__(self):

        ### Return length (1 line)
        return self.length

    def __getitem__(self, i):

        ### Return data and label at index (1 line)
        return self.X[i], self.Y[i]

    def collate_fn(batch):

        ### Select all data from batch (1 line)
        batch_x = [x for x,y in batch]

        ### Select all labels from batch (1 line)
        batch_y = [y for x,y in batch]

        ### Convert batched data and labels to tensors (2 lines)
        batch_x = torch.as_tensor(batch_x)
        batch_y = torch.as_tensor(batch_y)

        ### Return batched data and labels (1 line)
        return batch_x, batch_y

In [7]:
dataset2 = ExampleDataset2(X, Y)

dataloader2 = torch.utils.data.DataLoader(dataset2,
                                          batch_size=2,
                                          shuffle=False,
                                          collate_fn=ExampleDataset2.collate_fn)

for i, batch in enumerate(dataloader2):
    print("Batch", i, ":\n", batch, "\n")

Batch 0 :
 (tensor([2, 3], dtype=torch.int32), tensor([4, 9], dtype=torch.int32)) 

Batch 1 :
 (tensor([4, 5], dtype=torch.int32), tensor([16, 25], dtype=torch.int32)) 

Batch 2 :
 (tensor([6, 7], dtype=torch.int32), tensor([36, 49], dtype=torch.int32)) 

Batch 3 :
 (tensor([8, 9], dtype=torch.int32), tensor([64, 81], dtype=torch.int32)) 



---
#### Expected Output:

```
Batch 0 :
 (tensor([2, 3]), tensor([4, 9]))

Batch 1 :
 (tensor([4, 5]), tensor([16, 25]))

Batch 2 :
 (tensor([6, 7]), tensor([36, 49]))

Batch 3 :
 (tensor([8, 9]), tensor([64, 81]))

```
---

<hr style="border:2px solid gray"> </hr>

### Example 3

In [8]:
X = np.array([ np.array([[ 2,  3,  4],
                         [ 4,  6,  8],
                         [ 6,  9, 12],
                         [ 8, 12, 16]]),
               np.array([[10, 15, 20],
                         [12, 18, 24]]) ], dtype=object)

In [9]:
class ExampleDataset3(torch.utils.data.Dataset):

    def __init__(self, X):

        ### Assign data to self (1 line)
        self.X = X

        ### Define index mapping (4-6 lines)
        index_map_X = []
        for i, x in enumerate(X):
            for j, xx in enumerate(x):
                index_pair_X = (i, j)
                index_map_X.append(index_pair_X)

        ### Assign index mapping to self (0-1 line)
        self.index_map = index_map_X

        ### Assign length to self (1 line)
        self.length = len(self.index_map)

    def __len__(self):

        ### Return length (1 line)
        return self.length

    def __getitem__(self, index):
        ### Get index pair from index map (1-2 lines)
        i, j = self.index_map[index]

        ### Get data at index pair (1 line)
        xx = self.X[i][j,:]

        ### Return data (1 line)
        return xx

    def collate_fn(batch):

        ### Convert batch to tensor (1 line)
        batch_x = torch.as_tensor(batch)

        ### Return batched data (1 line)
        return batch_x

In [10]:
dataset3 = ExampleDataset3(X)

dataloader3 = torch.utils.data.DataLoader(dataset3,
                                          batch_size=3,
                                          shuffle=False,
                                          collate_fn=ExampleDataset3.collate_fn)

for i, batch in enumerate(dataloader3):
    print("Batch", i, ":\n", batch, "\n")

Batch 0 :
 tensor([[ 2,  3,  4],
        [ 4,  6,  8],
        [ 6,  9, 12]], dtype=torch.int32) 

Batch 1 :
 tensor([[ 8, 12, 16],
        [10, 15, 20],
        [12, 18, 24]], dtype=torch.int32) 



---
#### Expected Output

```
Batch 0 :
 tensor([[ 2,  3,  4],
        [ 4,  6,  8],
        [ 6,  9, 12]])

Batch 1 :
 tensor([[ 8, 12, 16],
        [10, 15, 20],
        [12, 18, 24]])
```
---

<hr style="border:2px solid gray"> </hr>

### Exercise 4

In [11]:
X = np.array([ np.array([[ 2,  3,  4],
                         [ 4,  6,  8],
                         [ 6,  9, 12],
                         [ 8, 12, 16]]),
               np.array([[10, 15, 20],
                         [12, 18, 24]]) ], dtype=object)

Y = np.array([ np.array([1, 2, 3, 4]),
               np.array([5, 6])], dtype=object)

In [12]:
class ExampleDataset4(torch.utils.data.Dataset):
    #### TODO HERE
    def __init__(self, X, Y):

        ### Assign data and label to self (1-2 lines)
        self.X = X
        self.Y = Y

        ### Define data index mapping (4-6 lines)
        index_map_X = []

        for i, x in enumerate(X):
            for j, xx in enumerate(x):
                index_pair_X = (i, j)
                index_map_X.append(index_pair_X)

        ### Define label index mapping (4-6 lines)
        index_map_Y = []
        for i, y in enumerate(Y):
            for j, yy in enumerate(y):
                index_pair_Y = (i, j)
                index_map_Y.append(index_pair_Y)

        ### Assert the data index mapping and label index mapping are the same (1 line)
        assert(set(index_map_X) == set(index_map_Y))

        ### Assign data index mapping to self (1 line)
        self.index_map = index_map_X

        ### Assign length to self (1 line)
        self.length = len(self.index_map)

    def __len__(self):

        ### Return length (1 line)
        return self.length

    def __getitem__(self, index):

        ### Get index pair from index map (1-2 lines)
        i, j = self.index_map[index]

        ### Get data at index pair (1 line)
        xx = self.X[i][j,:]

        ### Get label at index pair (1 line)
        yy = self.Y[i][j]

        ### Return data at index pair and label at index pair (1 line)
        return xx, yy

    def collate_fn(batch):

        ### Select all data from batch (1 line)
        batch_x = [x for x,y in batch]

        ### Select all labels from batch (1 line)
        batch_y = [y for x,y in batch]

        ### Convert batched data and labels to tensors (2 lines)
        batch_x = torch.as_tensor(batch_x)
        batch_y = torch.as_tensor(batch_y)

        ### Return batched data and labels (1 line)
        return batch_x, batch_y

In [13]:
dataset4 = ExampleDataset4(X, Y)

dataloader4 = torch.utils.data.DataLoader(dataset4,
                                          batch_size=3,
                                          shuffle=False,
                                          collate_fn=ExampleDataset4.collate_fn)


for i, batch in enumerate(dataloader4):
    print("Batch", i, ":\n", batch, "\n")

Batch 0 :
 (tensor([[ 2,  3,  4],
        [ 4,  6,  8],
        [ 6,  9, 12]], dtype=torch.int32), tensor([1, 2, 3], dtype=torch.int32)) 

Batch 1 :
 (tensor([[ 8, 12, 16],
        [10, 15, 20],
        [12, 18, 24]], dtype=torch.int32), tensor([4, 5, 6], dtype=torch.int32)) 



---
#### Expected Output:

```
Batch 0 :
 (tensor([[ 2,  3,  4],
        [ 4,  6,  8],
        [ 6,  9, 12]]), tensor([1, 2, 3]))

Batch 1 :
 (tensor([[ 8, 12, 16],
        [10, 15, 20],
        [12, 18, 24]]), tensor([4, 5, 6]))
```
---

<hr style="border:2px solid gray"> </hr>

### Exercise 5

In [14]:
X = np.array([ np.array([[ 2,  3,  4],
                         [ 4,  6,  8],
                         [ 6,  9, 12],
                         [ 8, 12, 16]]),
               np.array([[10, 15, 20],
                         [12, 18, 24]]) ], dtype=object)

In [15]:
class ExampleDataset5(torch.utils.data.Dataset):

    def __init__(self, X, offset=1, context=1):

        ### Assign data to self (1 line)
        self.X = X

        ### Define data index mapping (4-6 lines)
        mapping = []

        for i, data_obj in enumerate(X):
            for j, data in enumerate(data_obj):
                ind = (i, j)
                mapping.append(ind)

        ### Assign data index mapping to self (1 line)
        self.mapping = mapping

        ### Assign length to self (1 line)
        self.length = len(self.mapping)

        ### Add context and offset to self (1-2 line)
        self.context = context
        self.offset = offset

        ### Zero pad data as-needed for context size = 1 (1-2 lines)
        for i, x in enumerate(self.X):
            self.X[i] = np.pad(x, ((1, 1), (0, 0)), 'constant', constant_values=0)

    def __len__(self):

        ### Return length (1 line)
        return self.length

    def __getitem__(self, index):

        ### Get index pair from index map (1-2 lines)
        i, j = self.mapping[index]

        ### Calculate starting timestep using offset and context (1 line)
        start_j = j + self.offset - self.context

        ### Calculate ending timestep using offset and context (1 line)
        end_j = j + self.offset + self.context + 1

        ### Get data at index pair with context (1 line)
        xx = self.X[i][start_j:end_j,:]

        ### Return data (1 line)
        return xx

    def collate_fn(batch):

        ### Convert batch to tensor (1 line)
        batch = torch.as_tensor(batch)

        ### Return batched data and labels (1 line)
        return batch

In [16]:
dataset5 = ExampleDataset5(X)

dataloader5 = torch.utils.data.DataLoader(dataset5,
                                         batch_size=2,
                                         shuffle=False,
                                         collate_fn=ExampleDataset5.collate_fn)

for i, batch in enumerate(dataloader5):
    print("Batch", i, ":\n", batch, "\n")

Batch 0 :
 tensor([[[ 0,  0,  0],
         [ 2,  3,  4],
         [ 4,  6,  8]],

        [[ 2,  3,  4],
         [ 4,  6,  8],
         [ 6,  9, 12]]], dtype=torch.int32) 

Batch 1 :
 tensor([[[ 4,  6,  8],
         [ 6,  9, 12],
         [ 8, 12, 16]],

        [[ 6,  9, 12],
         [ 8, 12, 16],
         [ 0,  0,  0]]], dtype=torch.int32) 

Batch 2 :
 tensor([[[ 0,  0,  0],
         [10, 15, 20],
         [12, 18, 24]],

        [[10, 15, 20],
         [12, 18, 24],
         [ 0,  0,  0]]], dtype=torch.int32) 



---
#### Expected Output:

```
Batch 0 :
 tensor([[[ 0,  0,  0],
         [ 2,  3,  4],
         [ 4,  6,  8]],

        [[ 2,  3,  4],
         [ 4,  6,  8],
         [ 6,  9, 12]]])

Batch 1 :
 tensor([[[ 4,  6,  8],
         [ 6,  9, 12],
         [ 8, 12, 16]],

        [[ 6,  9, 12],
         [ 8, 12, 16],
         [ 0,  0,  0]]])

Batch 2 :
 tensor([[[ 0,  0,  0],
         [10, 15, 20],
         [12, 18, 24]],

        [[10, 15, 20],
         [12, 18, 24],
         [ 0,  0,  0]]])
```
---

<hr style="border:2px solid gray"> </hr>

### Exercise 6

In [17]:
X = np.array([ np.array([[ 2,  3,  4],
              [ 4,  6,  8],
              [ 6,  9, 12],
              [ 8, 12, 16]]),
               np.array([[10, 15, 20],
                    [12, 18, 24]]) ], dtype=object)

Y = np.array([ np.array([1, 2, 3, 4]),
         np.array([5, 6])], dtype=object)

In [18]:
class ExampleDataset6(torch.utils.data.Dataset):

    def __init__(self, X, Y, offset=1, context=1):

        ### Add data and label to self (1-2 lines)
        self.X = X
        self.Y = Y

        ### Define data index mapping (4-6 lines)
        mapping = []
        for i, data_obj in enumerate(X):
            for j, data in enumerate(data_obj):
                ind = (i, j)
                mapping.append(ind)

        ### Define label index mapping (4-6 lines)
        mapping_Y = []
        for i, data_obj in enumerate(Y):
            for j, data in enumerate(data_obj):
                ind = (i, j)
                mapping_Y.append(ind)

        ### Assert the data index mapping and label index mapping are the same (1 line)
        # print(mapping)
        # print(mapping_Y)
        assert(set(mapping) == set(mapping_Y))

        ### Assign data index mapping to self (1 line)
        self.mapping = mapping

        ### Add length to self (1 line)
        self.length = len(self.mapping)

        ### Add context and offset to self (1-2 line)
        self.context = context
        self.offset = offset

        ### Zero pad data as-needed for context size = 1 (1-2 lines)

        for i, x in enumerate(self.X):
            self.X[i] = np.pad(x, ((1, 1), (0, 0)), 'constant', constant_values=0)

        # for i,y in enumerate(self.Y):
        #   self.Y[i] = np.pad(y,1,'constant',constant_values=0)

    def __len__(self):

        ### Return length (1 line)
        return self.length

    def __getitem__(self, index):

        ### Get index pair from index map (1-2 lines)
        i, j = self.mapping[index]

        ### Calculate starting timestep using offset and context (1 line)
        start_j = j + self.offset - self.context

        ### Calculate ending timestep using offset and context (1 line)
        end_j = j + self.offset + self.context + 1

        ### Get data at index pair with context (1 line)
        xx = self.X[i][start_j:end_j]

        ### Get label at index pair (1 line)
        yy = self.Y[i][j]

        ### Return data at index pair with context and label at index pair (1 line)
        return xx, yy

    def collate_fn(batch):

        ### Select all data from batch (1 line)
        b_x = [x for x,y in batch]

        ### Select all labels from batch (1 line)
        b_y = [y for x,y in batch]

        ### Convert batched data and labels to tensors (2 lines)
        bx = torch.as_tensor(b_x)

        # print(b_y)
        by = torch.as_tensor(b_y)

        ### Return batched data and labels (1 line)
        return bx, by

In [19]:
dataset6 = ExampleDataset6(X, Y)

dataloader6 = torch.utils.data.DataLoader(dataset6,
                                         batch_size=2,
                                         shuffle=False,
                                         collate_fn=ExampleDataset6.collate_fn)

for i, batch in enumerate(dataloader6):
    print("Batch", i, ":\n", batch, "\n")

Batch 0 :
 (tensor([[[ 0,  0,  0],
         [ 2,  3,  4],
         [ 4,  6,  8]],

        [[ 2,  3,  4],
         [ 4,  6,  8],
         [ 6,  9, 12]]], dtype=torch.int32), tensor([1, 2], dtype=torch.int32)) 

Batch 1 :
 (tensor([[[ 4,  6,  8],
         [ 6,  9, 12],
         [ 8, 12, 16]],

        [[ 6,  9, 12],
         [ 8, 12, 16],
         [ 0,  0,  0]]], dtype=torch.int32), tensor([3, 4], dtype=torch.int32)) 

Batch 2 :
 (tensor([[[ 0,  0,  0],
         [10, 15, 20],
         [12, 18, 24]],

        [[10, 15, 20],
         [12, 18, 24],
         [ 0,  0,  0]]], dtype=torch.int32), tensor([5, 6], dtype=torch.int32)) 



---
#### Expected Output:
```
Batch 0 :
 (tensor([[[ 0,  0,  0],
         [ 2,  3,  4],
         [ 4,  6,  8]],

        [[ 2,  3,  4],
         [ 4,  6,  8],
         [ 6,  9, 12]]]), tensor([1, 2]))

Batch 1 :
 (tensor([[[ 4,  6,  8],
         [ 6,  9, 12],
         [ 8, 12, 16]],

        [[ 6,  9, 12],
         [ 8, 12, 16],
         [ 0,  0,  0]]]), tensor([3, 4]))

Batch 2 :
 (tensor([[[ 0,  0,  0],
         [10, 15, 20],
         [12, 18, 24]],

        [[10, 15, 20],
         [12, 18, 24],
         [ 0,  0,  0]]]), tensor([5, 6]))
```
---

### Exercise 7

In [20]:
!nvidia-smi

'nvidia-smi' is not recognized as an internal or external command,
operable program or batch file.


---
#### Expected Output (your result should look similar, but not exactly the same):
```
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.27.04    Driver Version: 460.27.04    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 207...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   40C    P8     9W /  N/A |      5MiB /  7982MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       970      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+
```
---