## Homework 0 Part 2: Data Loaders

In this assignment, you will be provided with data and an expected result. Your task is to fill out the starter code to obtain the expected result. Do not modify the data (X or Y), and do not modify the instantiation of the dataset or dataloader.

All three versions -- easy difficulty, medium difficulty, and hard difficulty -- have the same solution code and the same examples. We recommend starting with the easy difficulty. Once you get the expected results with the easy difficulty, try again with the medium difficulty. If you want to challenge yourself, try again with the hard difficulty.

CUDA is not required to complete most of this assignment, but it is required to execute the final command. Please use AWS to access CUDA available resources by following the recitation.

<hr style="border:2px solid gray"> </hr>

In [1]:
import numpy as np
import torch

<hr style="border:2px solid gray"> </hr>

### Exercise 1

In [2]:
X = np.array([2,  3,  4,  5,  6,  7,  8,  9])

In [3]:
class ExampleDataset1(torch.utils.data.Dataset):
    
    def __init__(self, X):
        self.X = X
        self.length = len(self.X)
    def __len__(self):
        return self.length

    def __getitem__(self, i):
        return self.X[i]

    def collate_fn(batch):
        batch_X = torch.as_tensor(batch)
        return batch_X
        

In [4]:
dataset1 = ExampleDataset1(X)

dataloader1 = torch.utils.data.DataLoader(dataset1,
                                          batch_size=2, 
                                          shuffle=False,
                                          collate_fn=ExampleDataset1.collate_fn)

for i, batch in enumerate(dataloader1):
    print("Batch", i, ":\n", batch, "\n")

Batch 0 :
 tensor([2, 3]) 

Batch 1 :
 tensor([4, 5]) 

Batch 2 :
 tensor([6, 7]) 

Batch 3 :
 tensor([8, 9]) 



---
#### Expected Output:
```
Batch 0 :
 tensor([2, 3]) 

Batch 1 :
 tensor([4, 5]) 

Batch 2 :
 tensor([6, 7]) 

Batch 3 :
 tensor([8, 9]) 
```
---

<hr style="border:2px solid gray"> </hr>

### Exercise 2

In [5]:
X = np.array([2,  3,  4,  5,  6,  7,  8,  9])
Y = np.array([4,  9, 16, 25, 36, 49, 64, 81])

In [6]:
class ExampleDataset2(torch.utils.data.Dataset):
    
    def __init__(self, X, Y):
        self.X = X
        self.Y = Y
        assert(len(X) == len(Y))
        self.length = len(self.X)
        
    def __len__(self):
        return self.length
        
    def __getitem__(self, i):
        return self.X[i], self.Y[i]
        
    def collate_fn(batch):
        # Splitting batch
        batch_x = [x for x,y in batch]
        batch_y = [y for x,y in batch]
        # Converting to tensor
        batch_X = torch.as_tensor(batch_x)
        batch_Y = torch.as_tensor(batch_y)
        # Returning the values
        return batch_X,batch_Y

        

In [7]:
dataset2 = ExampleDataset2(X, Y)

dataloader2 = torch.utils.data.DataLoader(dataset2,
                                          batch_size=2, 
                                          shuffle=False,
                                          collate_fn=ExampleDataset2.collate_fn)

for i, batch in enumerate(dataloader2):
    print("Batch", i, ":\n", batch, "\n")

Batch 0 :
 (tensor([2, 3]), tensor([4, 9])) 

Batch 1 :
 (tensor([4, 5]), tensor([16, 25])) 

Batch 2 :
 (tensor([6, 7]), tensor([36, 49])) 

Batch 3 :
 (tensor([8, 9]), tensor([64, 81])) 



---
#### Expected Output:

```
Batch 0 :
 (tensor([2, 3]), tensor([4, 9])) 

Batch 1 :
 (tensor([4, 5]), tensor([16, 25])) 

Batch 2 :
 (tensor([6, 7]), tensor([36, 49])) 

Batch 3 :
 (tensor([8, 9]), tensor([64, 81])) 

```
---

<hr style="border:2px solid gray"> </hr>

### Example 3

In [8]:
X = np.array([ np.array([[ 2,  3,  4],
                         [ 4,  6,  8],
                         [ 6,  9, 12],
                         [ 8, 12, 16]]),
               np.array([[10, 15, 20],
                         [12, 18, 24]]) ], dtype=object)

In [9]:
# Checking the dataset:
for i,x in enumerate(X):
    print("ENUMERATING X", i)
    print(i)
    print(x)
    print("ENUMERATING x")
    for j,xx in enumerate(x):
        print(j)
        print(xx)

ENUMERATING X 0
0
[[ 2  3  4]
 [ 4  6  8]
 [ 6  9 12]
 [ 8 12 16]]
ENUMERATING x
0
[2 3 4]
1
[4 6 8]
2
[ 6  9 12]
3
[ 8 12 16]
ENUMERATING X 1
1
[[10 15 20]
 [12 18 24]]
ENUMERATING x
0
[10 15 20]
1
[12 18 24]


In [10]:
class ExampleDataset3(torch.utils.data.Dataset):

    def __init__(self, X):
        ### Assign data to self
        self.X = X
        ### Define index mapping
        index_map_X = [(i,j) for i,x in enumerate(X) for j,xx in enumerate(x)]
        ### Assign index mapping to self
        self.index_map = index_map_X
        ### Assign length to self
        self.length = len(self.index_map)
        
    def __len__(self):
        return self.length
        
    def __getitem__(self, index):
        ### Get index pair from index map
        i, j = self.index_map[index]
        ### Get data at index pair
        xx = self.X[i][j,:]
        ### Return data
        return xx
    
    def collate_fn(batch):
        ### Convert batch to tensor
        batch_x = torch.as_tensor(batch)
        
        ### Return batched data
        return batch_x

In [11]:
dataset3 = ExampleDataset3(X)

dataloader3 = torch.utils.data.DataLoader(dataset3,
                                          batch_size=3, 
                                          shuffle=False,
                                          collate_fn=ExampleDataset3.collate_fn)

for i, batch in enumerate(dataloader3):
    print("Batch", i, ":\n", batch, "\n")

Batch 0 :
 tensor([[ 2,  3,  4],
        [ 4,  6,  8],
        [ 6,  9, 12]]) 

Batch 1 :
 tensor([[ 8, 12, 16],
        [10, 15, 20],
        [12, 18, 24]]) 



#### Expected Output

```
Batch 0 :
 tensor([[ 2,  3,  4],
        [ 4,  6,  8],
        [ 6,  9, 12]]) 

Batch 1 :
 tensor([[ 8, 12, 16],
        [10, 15, 20],
        [12, 18, 24]]) 
```
---

<hr style="border:2px solid gray"> </hr>

### Exercise 4

In [12]:
X = np.array([ np.array([[ 2,  3,  4],
                         [ 4,  6,  8],
                         [ 6,  9, 12],
                         [ 8, 12, 16]]),
               np.array([[10, 15, 20],
                         [12, 18, 24]]) ], dtype=object)

Y = np.array([ np.array([1, 2, 3, 4]), 
               np.array([5, 6])], dtype=object)

In [13]:
class ExampleDataset4(torch.utils.data.Dataset):
    def __init__(self, X, Y):
        self.X = X
        self.Y = Y
        # Creating Index Map
        index_map_X = [(i,j) for i,x in enumerate(X) for j,xx in enumerate(x)]
        index_map_Y = [(i,j) for i,y in enumerate(Y) for j,yy in enumerate(y)]
        assert(set(index_map_X) == set(index_map_Y))

        self.index_map = index_map_X
        self.length = len(self.index_map)

    def __len__(self):
        return self.length

    def __getitem__(self, index):
        i,j = self.index_map[index]

        ### Get data at index pair
        xx = self.X[i][j,:]
        ### Get label at index pair
        yy = self.Y[i][j]
        ### Return index pair and label
        return xx, yy

    def collate_fn(batch):   
        batch_x = [x for x,y in batch]
        batch_y = [y for x,y in batch]
        
        ### Convert data and labels to tensors
        batch_x = torch.as_tensor(batch_x)
        batch_y = torch.as_tensor(batch_y)

        ### Return batched data and labels (1 line)
        return batch_x, batch_y


In [14]:
dataset4 = ExampleDataset4(X, Y)

dataloader4 = torch.utils.data.DataLoader(dataset4,
                                          batch_size=3, 
                                          shuffle=False,
                                          collate_fn=ExampleDataset4.collate_fn)


for i, batch in enumerate(dataloader4):
    print("Batch", i, ":\n", batch, "\n")

Batch 0 :
 (tensor([[ 2,  3,  4],
        [ 4,  6,  8],
        [ 6,  9, 12]]), tensor([1, 2, 3])) 

Batch 1 :
 (tensor([[ 8, 12, 16],
        [10, 15, 20],
        [12, 18, 24]]), tensor([4, 5, 6])) 



---
#### Expected Output:

```
Batch 0 :
 (tensor([[ 2,  3,  4],
        [ 4,  6,  8],
        [ 6,  9, 12]]), tensor([1, 2, 3])) 

Batch 1 :
 (tensor([[ 8, 12, 16],
        [10, 15, 20],
        [12, 18, 24]]), tensor([4, 5, 6]))
```
---

<hr style="border:2px solid gray"> </hr>

### Exercise 5

In [21]:
X = np.array([ np.array([[ 2,  3,  4],
                         [ 4,  6,  8],
                         [ 6,  9, 12],
                         [ 8, 12, 16]]),
               np.array([[10, 15, 20],
                         [12, 18, 24]]) ], dtype=object)

In [22]:
for i,x in enumerate(X):
    for j,xx in enumerate(x):
        print(i,j)
        print(xx)

0 0
[2 3 4]
0 1
[4 6 8]
0 2
[ 6  9 12]
0 3
[ 8 12 16]
1 0
[10 15 20]
1 1
[12 18 24]


In [23]:
class ExampleDataset5(torch.utils.data.Dataset):
    def __init__(self, X, offset=1, context=1):
        self.X = X
        # Index Map
        index_map_X = [(i,j) for i,x in enumerate(X) for (j,xx) in enumerate(x)]
        
        self.offset = offset
        self.context = context
        self.index_map = index_map_X
        self.length = len(self.index_map)
        ### Zero pad data as-needed for context size = 1
        for i, x in enumerate(self.X):
            self.X[i] = np.pad(x, ((self.context, self.context), (0, 0)), 'constant', constant_values=0)

    def __len__(self):
        return self.length

    def __getitem__(self, index):
        i,j = self.index_map[index]

        start_j = j + self.offset -self.context
        end_j = j + self.offset + self.context + 1

        xx = self.X[i][start_j:end_j,:]

        return xx

    def collate_fn(batch):
        batch_x = torch.as_tensor(batch)
        return batch_x

In [24]:
dataset5 = ExampleDataset5(X)

dataloader5 = torch.utils.data.DataLoader(dataset5,
                                         batch_size=2, 
                                         shuffle=False,
                                         collate_fn=ExampleDataset5.collate_fn)

for i, batch in enumerate(dataloader5):
    print("Batch", i, ":\n", batch, "\n")

Batch 0 :
 tensor([[[ 0,  0,  0],
         [ 2,  3,  4],
         [ 4,  6,  8]],

        [[ 2,  3,  4],
         [ 4,  6,  8],
         [ 6,  9, 12]]]) 

Batch 1 :
 tensor([[[ 4,  6,  8],
         [ 6,  9, 12],
         [ 8, 12, 16]],

        [[ 6,  9, 12],
         [ 8, 12, 16],
         [ 0,  0,  0]]]) 

Batch 2 :
 tensor([[[ 0,  0,  0],
         [10, 15, 20],
         [12, 18, 24]],

        [[10, 15, 20],
         [12, 18, 24],
         [ 0,  0,  0]]]) 



---
#### Expected Output:

```
Batch 0 :
 tensor([[[ 0,  0,  0],
         [ 2,  3,  4],
         [ 4,  6,  8]],

        [[ 2,  3,  4],
         [ 4,  6,  8],
         [ 6,  9, 12]]]) 

Batch 1 :
 tensor([[[ 4,  6,  8],
         [ 6,  9, 12],
         [ 8, 12, 16]],

        [[ 6,  9, 12],
         [ 8, 12, 16],
         [ 0,  0,  0]]]) 

Batch 2 :
 tensor([[[ 0,  0,  0],
         [10, 15, 20],
         [12, 18, 24]],

        [[10, 15, 20],
         [12, 18, 24],
         [ 0,  0,  0]]]) 
```
---

<hr style="border:2px solid gray"> </hr>

### Exercise 6

In [36]:
X = np.array([ np.array([[ 2,  3,  4],
                         [ 4,  6,  8],
                         [ 6,  9, 12],
                         [ 8, 12, 16]]),
               np.array([[10, 15, 20],
                         [12, 18, 24]]) ], dtype=object)

Y = np.array([ np.array([1, 2, 3, 4]), 
               np.array([5, 6])], dtype=object)

In [37]:
class ExampleDataset6(torch.utils.data.Dataset):
    
    def __init__(self, X, Y, offset=1, context=1):
        self.X = X
        self.Y = Y
        self.offset = offset
        self.context = context

        index_map_X = [(i,j) for (i,x) in enumerate(X) for (j,xx) in enumerate(x)]
        index_map_Y = [(i,j) for (i,y) in enumerate(Y) for (j,yy) in enumerate(y)]        
        
        assert(set(index_map_X)==set(index_map_Y))
        self.index_map = index_map_X
        self.length = len(self.index_map)

        for i,x in enumerate(X):
            self.X[i] = np.pad(x, ((self.context, self.context), (0,0)), 'constant', constant_values=0)
        
    def __len__(self):
        return self.length

    def __getitem__(self, index):
        i,j = self.index_map[index]

        start_j = j + self.offset - self.context
        end_j = j + self.offset + self.context + 1

        xx = self.X[i][start_j:end_j,:]
        yy = self.Y[i][j]

        return xx, yy

    def collate_fn(batch):
        batch_x = [x for x,y in batch]
        batch_y = [y for x,y in batch]
        
        batch_x = torch.as_tensor(batch_x)
        batch_y = torch.as_tensor(batch_y)

        return batch_x, batch_y
        

In [38]:
dataset6 = ExampleDataset6(X, Y)

dataloader6 = torch.utils.data.DataLoader(dataset6,
                                         batch_size=2, 
                                         shuffle=False,
                                         collate_fn=ExampleDataset6.collate_fn)

for i, batch in enumerate(dataloader6):
    print("Batch", i, ":\n", batch, "\n")

Batch 0 :
 (tensor([[[ 0,  0,  0],
         [ 2,  3,  4],
         [ 4,  6,  8]],

        [[ 2,  3,  4],
         [ 4,  6,  8],
         [ 6,  9, 12]]]), tensor([1, 2])) 

Batch 1 :
 (tensor([[[ 4,  6,  8],
         [ 6,  9, 12],
         [ 8, 12, 16]],

        [[ 6,  9, 12],
         [ 8, 12, 16],
         [ 0,  0,  0]]]), tensor([3, 4])) 

Batch 2 :
 (tensor([[[ 0,  0,  0],
         [10, 15, 20],
         [12, 18, 24]],

        [[10, 15, 20],
         [12, 18, 24],
         [ 0,  0,  0]]]), tensor([5, 6])) 



---
#### Expected Output:
```
Batch 0 :
 (tensor([[[ 0,  0,  0],
         [ 2,  3,  4],
         [ 4,  6,  8]],

        [[ 2,  3,  4],
         [ 4,  6,  8],
         [ 6,  9, 12]]]), tensor([1, 2])) 

Batch 1 :
 (tensor([[[ 4,  6,  8],
         [ 6,  9, 12],
         [ 8, 12, 16]],

        [[ 6,  9, 12],
         [ 8, 12, 16],
         [ 0,  0,  0]]]), tensor([3, 4])) 

Batch 2 :
 (tensor([[[ 0,  0,  0],
         [10, 15, 20],
         [12, 18, 24]],

        [[10, 15, 20],
         [12, 18, 24],
         [ 0,  0,  0]]]), tensor([5, 6])) 
```
---

### Exercise 7

In [39]:
!nvidia-smi

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.



---
#### Expected Output (your result should look similar, but not exactly the same):
```
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.27.04    Driver Version: 460.27.04    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 207...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   40C    P8     9W /  N/A |      5MiB /  7982MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       970      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+
```
---