# EC 2: PyTorch Exercises
**Due: February 27, 9:30 AM**

In this extra credit assignment, you will practice working with computation graphs in [PyTorch](https://pytorch.org/). You are strongly encouraged to do this extra credit assignment if:
* you have never used PyTorch before or you have not used it in a long time
* you have not taken DS-GA 1011 (Natural Language Processing with Representation Learning) and you are unsure of whether you have the necessary background for this course
* you want some easy extra credit points.

## Important: Read Before Starting

In the following exercises, you will need to implement functions defined in the `pytorch_exercises` module. Please write all your code in the `pytorch_exercises.py` file. You should not submit this notebook with your solutions, and we will not grade it if you do. Please be aware that code written in a Jupyter notebook may run differently when copied into Python modules.

This notebook comes with outputs for some, but not all, of the code cells. Thes outputs are the outputs that you should get **when all coding problems have been completed correctly**. You may obtain different results if you attempt to run the code cells before you have completed the coding problems, or if you have completed one or more coding problems incorrectly.

## Problem 1: Setup (0 Points in Total)

### Problem 1a: Install PyTorch (No Submission, 0 Points)

The typical way to install PyTorch is to simply run `pip install torch` or `conda install pytorch`. Please refer to the [PyTorch website](https://pytorch.org/) for detailed instructions specific to your machine. You can also install PyTorch directly from this notebook by running one of the following two code cells; this is recommended if you are running this notebook on Google Colaboratory or some other web-based Jupyter notebook server.

In [None]:
# Install PyTorch using pip (recommended if you're on Google Colaboratory)
!pip install torch

In [None]:
# Install PyTorch using conda
!conda install pytorch

### Problem 1b: Import PyTorch (No Submission, 0 Points)

Once you have installed PyTorch, please import the PyTorch library as follows. If the code cell below throws an error, then PyTorch has not been installed correctly and you need to repeat Problem 1a.

In [2]:
import numpy as np  # Also import NumPy
import torch 
import torch.nn as nn

PyTorch consists of several Python packages. The `torch` package implements automatic differentation (backpropagation), and it contains the `Tensor` data structure, which represents a computation graph node. The `torch.nn` package, by convention referred to as just `nn`, implements the PyTorch `Module`, which represents neural network architectures.

## Problem 2: Tensors (16 Points in Total)

In the following exercises, you will read snippets of code and describe what they do in plain English. You are free to consult the [PyTorch documentation](https://pytorch.org/docs/stable/index.html) as you complete these problems. You are also encouraged to run the code snippets in the Python console, in a Python script, or directly in the code cells below. Each code snippet assumes that all previous code snippets have already been run. Therefore, you must run the code snippets in the same order as they appear in the instructions.

### Problem 2a: The PyTorch Tensor (Written, 2 Points)

What kind of object does a tensor represent? What do the `.grad` and `.requires_grad` properties of a tensor represent?

### Problem 2b: Tensor Data Types (Written, 3 Points)

Please create some tensors using the following code.

In [3]:
a = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
b = torch.Tensor(a)
c = torch.LongTensor(a)
d = torch.tensor(a)
e = torch.tensor(a, dtype=torch.float) 
f = torch.Tensor(2, 3)

In [6]:
b.dtype, c.dtype, d.dtype, e.dtype, f.dtype

(torch.float32, torch.int64, torch.int64, torch.float32, torch.float32)

In [5]:
torch.Tensor(2, 3)

tensor([[ 0.0000e+00, -1.0842e-19, -2.0316e+18],
        [ 1.5849e+29, -1.0842e-19,  1.0842e-19]])

What is the difference between `b`, `c`, `d`, `e`, and `f`?

- b is a copy of a, which is a tensor. 
- c is a tensor with the same values as a but as long tensor 
- d is a Tensor object 
- e is a copy of a but every element is a float 
- f is a randomly generated tensor with 2 and 3 elements in the first and second axis, where each element is sampled from 


### Problem 2c: Creating Tensors (Written, 3 Points)

Please run the following code.

In [3]:
print(torch.full((2, 3), 5)) 
print(torch.randn(2, 3))

tensor([[5., 5., 5.],
        [5., 5., 5.]])
tensor([[-0.2888, -1.8281, -1.1826],
        [ 1.1874, -0.6576, -1.3923]])


What do `torch.full` and `torch.randn` do?


- torch.full generates a torch tensor with 2 and 3 elements in the first and second axis, where each element is a specific value

- torch.rand generates a torch tensor with 2 and 3 elements in the first and second axis, where each element is sampled from N(0,1)


### Problem 2d: Differentiation (Written, 3 Points)

Please run the following code.

In [4]:
b.requires_grad = True 
c.requires_grad = True

RuntimeError: only Tensors of floating point dtype can require gradients

One of these lines of code should work; the other should raise a `RuntimeError`. Why are PyTorch tensors designed this way?



### Problem 2e: PyTorch vs. NumPy Operations (Written, 3 Points)

Many NumPy array operations will work on PyTorch tensors, such as `+`, `-`, `*`, `/`, `@`, and `.T`. However, there are some minor differences between array operations and tensor operations. Please run the following lines of code on the array `a`.

In [5]:
print(a.sum(axis=-1)) 
print(a[:, np.newaxis].shape) 
print(a.reshape(4, -1)) 
print(a.size)

[[ 6 15]
 [24 33]]
(2, 1, 2, 3)
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
12


In [10]:
print(b.sum(axis=-1)) 
print(b.unsqueeze(1).shape) 
print(b.reshape(4, -1)) 
print(torch.numel(b))

tensor([[ 6., 15.],
        [24., 33.]])
torch.Size([2, 1, 2, 3])
tensor([[ 1.,  2.,  3.],
        [ 4.,  5.,  6.],
        [ 7.,  8.,  9.],
        [10., 11., 12.]])
12


torch.Size([2, 1, 2, 3])

What is the equivalent of the above code for tensors? Please give your answer as a 4-line code snippet that applies to `b` the tensor operations that are analogous to the array operations shown above for `a`.

### Problem 2f: More Operations (Written, 2 Points)

Please run the following code.

In [9]:
b = torch.ones(2, 3)
c = torch.full((2, 4), 5)
d = torch.cat([b, c], dim=-1)
print(d)

tensor([[1., 1., 1., 5., 5., 5., 5.],
        [1., 1., 1., 5., 5., 5., 5.]])


What do `torch.ones` and `torch.cat` do?

- torch.ones generate a torch tensor with the input specified shape, but all of its values are one 
- torch.cat concantenates a list of torch tensors along the specified dimension

## Problem 3: Modules (9 Points in Total)

### Problem 3a: Chaining Layers Together (Written, 3 Points)

Please run the following code.

In [11]:
lin1 = nn.Linear(2, 3)
lin2 = nn.Linear(3, 4)
model = nn.Sequential(lin1, nn.Tanh(), lin2)

In [14]:
model(torch.Tensor(7,2)).shape

torch.Size([7, 4])

Describe `model`. What kind of neural network is it?

- model is a MLP with 2 linear layer. 

### Problem 3b: Recurrent Neural Networks (Written, 3 Points)

Various types of recurrent neural networks (RNNs) are implemented using the `nn.RNN`, `nn.LSTM`, and `nn.GRU` modules. Please run the following code.

In [15]:
# Create some fake word embeddings
embedding_layer = nn.Embedding(100, 20)

# Create an LSTM
lstm = nn.LSTM(input_size=20, hidden_size=9, batch_first=True)

# Create a fake input
x = torch.randint(100, (5, 7))

# Run the LSTM
embeddings = embedding_layer(x) 
h, _ = lstm(embeddings)

print(x.shape) 
print(embeddings.shape) 
print(h.shape)

torch.Size([5, 7])
torch.Size([5, 7, 20])
torch.Size([5, 7, 9])


Describe `x`, `embeddings`, and `h`. What do each of their dimensions represent? What does `batch_first=True` do on line 5?

### Problem 3c: Calculating Gradients (Written, 3 Points)

Please run the following code.

In [9]:
# Create a fake input and output
x = torch.randn(5, 2)
y = torch.randint(4, (5,))

# Create a loss function
loss_function = nn.CrossEntropyLoss()

# Run the forward pass on model
logits = model(x)
loss = loss_function(logits , y)

In [10]:
loss.backward()

How would you run the backward pass for the (fake) mini-batch represented by the input `x` and labels `y`? 

**Hints:**
* Your answer should consist of a single line of code.
* After running your one line of code, the following loop should print the gradients of all of model’s parameters. None of the gradients should be `None`.