# Neural Networks / Autoencoders with PyTorch

PyTorch is one of the leading packages for deep learning in industry today. We will first re-build our same Jax neural network with torch, then we will build an autoencoder.


## Part 1: Simple Neural Network on Synthetic Data
In this section, you will implement a small feedforward neural network with **one hidden layer** to predict from synthetic data.

### **Task:**
1. Generate synthetic data.
2. Build a neural network with **one hidden layer** (4 neurons, sigmoid activation).
3. Train it and observe its performance.

---

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

### We first generate the same synthetic data as before. `pyTorch` and `numpy` us different default float precision, so we will use single precision, `pyTorch`'s default.

In [None]:
# define our synthetic data
N = 1000 # number of examples
n_feat = 3
x = np.random.uniform(size=(N,n_feat))

def true(x):
    return .5*x[:,0] + .2*x[:,1] + 2*x[:,2] + np.random.normal(scale=.1)

y = true(x)

# Convert to torch tensors
X = torch.from_numpy(x.astype(np.float32))
y = torch.from_numpy(y.astype(np.float32)).unsqueeze(1)  # need shape (N x 1) instead of (1 x N)

#janky train/val/test splitting
X_train = X[:600]
y_train = y[:600]
X_val = X[600:800]
y_val = y[600:800]
X_test = X[800:]
y_test = y[800:]

### Implement the NN using `pyTorch`

We will create our same neural network as before. This is a regression network with one hidden layer with 4 neurons and **Sigmoid** activation function.

Step through this example carefully to make sure you understand the pieces.


In [None]:
# Define a simple neural network using PyTorch's nn.Module
class SimpleNN(nn.Module): #inherit from nn Module
    def __init__(self):
        # Call the __init__ method of the base class nn.Module
        super(SimpleNN, self).__init__()

        # Define the network structure using nn.Sequential
        # This stacks layers together in the order they are written
        self.net = nn.Sequential(
            # First layer: Fully connected (Linear) layer
            # Maps input of size "n_feat" (which is 3 features in this dataset) to 4 hidden neurons
            nn.Linear(n_feat, 4),

            # Activation function: Sigmoid
            # Applied to the output of the linear layer above
            nn.Sigmoid(),

            # Last layer: 
            #nn.#complete me
        )

    # Forward method defines how data passes through the model
    def forward(self, x):
        # Pass input x through the defined object "net" (the layers above)
        return self.net(x)



### Train the neural network

Again, step through this example carefully and ask questions when needed.

In [None]:
# Create an instance of our neural network
model = SimpleNN()

# Define the loss function: Mean Squared Error (MSE)
loss_function = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=.1) # SGD = stochastic gradient descent

loss_train = []

for epoch in range(5000):
    #zero out our grandients for this pass
    optimizer.zero_grad()
    # predict on train data
    output = model(X_train)
    # compute train loss
    loss = loss_function(output, y_train)
    loss_train.append(loss.item())
    # predict and compute loss on val data
    with torch.no_grad(): # No need to compute gradients for validation data
        out_val = #complete me
        # keep track val loss
    # backprop, autodifferentiation
    loss.backward()
    # update weights
    optimizer.step()


# plot loss curves here

# complete me

### Evaluate model on test data:

Make predictions and plot predicted y values versus true values.

In [None]:
with torch.no_grad():
    preds = # complete me

# make plot

## Part 2: Autoencoders

We will build an autoencoder for the popular [MNIST](https://en.wikipedia.org/wiki/MNIST_database#:~:text=The%20MNIST%20database%20(Modified%20National,the%20field%20of%20machine%20learning.) dataset, which is a dataset of processed handwritten digits 0-9.

### **Task:**
1. Data exploration and processing
2. Build an autoencoder neural network with **one hidden layer** (4 neurons, sigmoid activation).
3. Train it and observe its performance.


## 1. Data processing

Fist we read in the data, then visualize it

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # let's try to to Part 2 on GPU

# Load MNIST dataset.. it's built into torch!
transform = transforms.ToTensor() # let's get the data into pytorch tensor form
train_dataset = datasets.MNIST(root='./data', train=True, download=False) 
test_dataset = datasets.MNIST(root='./data', train=False, download=False) 
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True) # we are going to batch our data due to the size of the data (e.g. we will do stochastic gradient descent!


In [None]:
print(train_dataset)
print(test_dataset)
image, label = train_dataset[0] # get one example
#print(np.array(image)) # look at contents of one example

# now plot a few examples:
plt.figure(figsize=(6, 6))
for i in range(25):
    image, label = train_dataset[i]
    
    plt.subplot(5, 5, i+1)
    plt.imshow(image, cmap='gray')
    plt.title(f"Label: {label}")
    plt.axis('off')
plt.tight_layout()
plt.show()
print(train_dataset[0][0])

We see that the data is an image scaled between 0 and 255 and each example is a 2D matrix. We need to scale the data and unwrap each matrix into a 1D vector, where each pixel is a feature for our model. We can do the unwrapping during training, so let's just scale the data for now. 

Take a look at `transforms` that was imported via `from torchvision import datasets, transforms` above for simple scaling. 

In [None]:
# Complete me





## 1. Build an autoencoder:

Our **encoder** will consist of three hidden layers: 128 nodes, 64 nodes, and a final latent space of 3 nodes.
Our **decoder** will reverse this process with three layers: 64 nodes, 128 nodes, and 784 (28*28) nodes to recover the dimensions of the original data.

We can use a ReLU activation function because our network is fairly deep.


![NN](nn.png)


In [None]:
# We follow the same class structure as above, but we need to be able to access the encoder and decoder seperately
class AutoEncoder(nn.Module):
    def __init__(self, latent_dim=2):
        super(AutoEncoder, self).__init__()
        # we need a seperate encoder and decoder object 
        self.encoder = nn.Sequential(
             #complete me
        )
        self.decoder = nn.Sequential(
            # complete me
        )

    def forward(self, x):
        # complete me




## Train the Autoencoder

Modify the simple taining script below to plot loss curves

In [None]:
model = #complete me
loss_function = # complete me
optimizer = optim.Adam(model.parameters(), lr=1e-3) # We will use a fancier gradient descent optimizer here

for epoch in range(5):  # keep it fast for now
    # we are going to train our model in batches because our dataset is large.
    for data, _ in train_loader:
        data = data.view(data.size(0), -1)  # Flatten
        output = model(data)
        loss = loss_function(output, data)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
    print(f"Epoch [{epoch+1}/5], Loss: {loss.item():.4f}")

### Visualize reconstructions

In [None]:
model.eval()
with torch.no_grad():
    dataiter = iter(train_loader)
    images, _ = next(dataiter)
    images_flat = images.view(images.size(0), -1)
    reconstructed = model(images_flat)
    reconstructed = reconstructed.view(-1, 1, 28, 28)

    # Plot
    fig, axes = plt.subplots(2, 8, figsize=(12, 3))
    for i in range(8):
        axes[0, i].imshow(images[i][0], cmap='gray')
        axes[0, i].axis('off')
        axes[1, i].imshow(reconstructed[i][0], cmap='gray')
        axes[1, i].axis('off')
    axes[0, 0].set_ylabel('Original')
    axes[1, 0].set_ylabel('Reconstructed')
    plt.show()

# Tune your model until you get decent reconstruction, then submit to Gradescope!