# COSC 424/525 Homework 4
In this homework you will learn about building convolutional neural networks, residual networks, and recurrent neural networks. The main objectives of the homework is to reinforce the theory discussed in class by building such architectures from scratch and apply some of these methods to real world problems.

**General Instructions**
1. All coding should be done in Python 3
2. Always vectorize your code when possible
3. Create a Code and Markdown cells after each subtask to test and document your progress.
4. Comment your code thoroughly
5. Use a separate write-up Word file to document the experiments specified below.
5. Export your notebook and Word file in PDF format. Make sure that your PDF contains all notebook output.
6. Submit the PDF files and your Jupyter Notebook.

**Detailed Instructions**
1. CNN Step by Step [Points 40]
2. CNN Application [Points 20]
3. ResNet50 Implementation [Points 20]
4. RNN Application [Points 20]

# 3. Sequence Models - RNN for text generation

We will create a language model based on Shakespear's writings, and will then generate new text similar to Shakespear's. We will generate a character-level predictor instead of a word-level predictor.

Credits: [Jose Horas](https://josehoras.github.io/pytorch-is-great/)

The steps to train a model are summarized in the following diagram:

<div style='background-color: white;'>
<center> <img src="images/sgd_diagram.png" style="width:400px;height:300px;"></center>
</div>

At the end of the assignment, we will set up a function to test our network and sample a text out of it that, hopefully, will resemble the style of the input text we feed into the network. 

In [262]:
import torch
import torch.nn as nn
import torch.autograd as autograd
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torch.utils.data import WeightedRandomSampler

import numpy as np
import os
import random
from tqdm import tqdm
from IPython.display import clear_output

%matplotlib inline

# Seeds
seed_num = 4
random.seed(seed_num)
np.random.seed(seed_num)
torch.manual_seed(seed_num)

<torch._C.Generator at 0x117e09d10>

## 3.1 Build an LSTM-based Network

You will build a functional model using the PyTorch API. Instead of using the sequential module as in Part 2, you will define the model components inside the `__init__` function and then, define the forward propagation steps in the `forward` function.

Follow the instructions in the cell.

- [LSTM](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html)

In [263]:
class char_lstm(nn.Module):
    def __init__(self, vocab, hidden_size, n_layers=1):
        super(char_lstm, self).__init__()
        ###############################
        ## YOUR CODE STARTS HERE
        # store the number of layers
        self.n_layers = n_layers
        # store the vocabulary size
        self.vocab_size = vocab
        # Create an LSTM recursive layer with n_layers
        # Check LSTM Pytorch documentation
        self.lstm = nn.LSTM(input_size=vocab, hidden_size=hidden_size, num_layers=n_layers, batch_first=False)
        # Create a FC layer to receive as input the output of the LSTM
        # and output as large as our vocabulary
        self.linear = nn.Linear(in_features=hidden_size, out_features=vocab)
        ## YOUR CODE ENDS HERE
        ###############################

    # Helper function for forward propagation
    # Take a look at the propagation steps
    def forward(self, input, h0=None, c0=None):
        ###############################
        ## YOUR CODE STARTS HERE
        # Handles initial time step
        if h0==None or c0==None:
            output, (hn, cn) = self.lstm(input)
        else:
            # Handels recurrent calls to the module
            output, (hn, cn) = self.lstm(input, (h0, c0))

        # Multi-class output to feed into our softmax probabilities
        scores = self.linear(output)
        ## YOUR CODE ENDS HERE
        ###############################
        return scores, hn, cn


    ## Function to generate text from the current model
    def sample(self, x, hidden_dim, idx_to_char, txt_length=500):
        # Initialize step input, hidden, and memory states
        x = x.view(1, 1, self.vocab_size)
        h = torch.zeros(self.n_layers, 1, hidden_dim)
        c = torch.zeros(self.n_layers, 1, hidden_dim)
        txt = ""
        for i in range(txt_length):
            # Forward propagation. Pass step input, hidden state, and memory state
            y_hat, h, c = self.forward(x, h, c)
            # Computes softmax output
            probs = F.softmax(y_hat, dim=1).view(self.vocab_size)
            # Samples the vocabulary using the probabilities in prob
            pred = torch.tensor(list(WeightedRandomSampler(probs, 1, replacement=True)))
            # Maps prediction to encoding
            x = F.one_hot(pred, num_classes=self.vocab_size)
            # Reshapes tensor to correct shape
            x = x.view(1, 1, self.vocab_size).type(torch.FloatTensor)
            # Maps prediction index to  actual character
            next_character = idx_to_char[pred.item()]
            # Adds character to predicted string
            txt += next_character
        return txt

## 3.2 Datasets and Hyperparameters

### 3.2.1 Helper class to use with PyTorch Dataset and Dataloader

In [264]:
class CustomDataset(Dataset):
    def __init__(self, data_name):
        self.data = open(data_name + '.txt', 'r').read()
        chars = sorted(set(self.data))
        self.vocab_size = len(chars)
        self.char_to_idx = {ch: i for i, ch in enumerate(chars)}
        self.idx_to_char = {i: ch for i, ch in enumerate(chars)}
        print('data has %d characters, %d unique.' % (len(self.data), self.vocab_size))

    def __getitem__(self, index):
        x = self.char_to_idx[self.data[index]]
        x = torch.tensor([x])
        x = F.one_hot(x, num_classes=self.vocab_size)
        x = x.type(torch.FloatTensor)
        t = self.char_to_idx[self.data[index + (index < (self.__len__() - 1))]]
        t = torch.tensor([t])
        return (x, t)

    def __len__(self):
        return len(self.data)

    def params(self):
        return self.vocab_size, self.char_to_idx, self.idx_to_char

<a id=hyperparams></a>
### 3.2.2 NLP Hyperparameters

You are given initial values for the NLP-related hyperparameters. After you run the notebook once with the default hyperparameters, you will change these hyperparameters, retrain the model, and document your observations about the relationship between hyperparameter values and the quality of the generator.

In [265]:
###############################
## YOUR CODE STARTS HERE
# Hyperparameters
# Size of our batches, length of our sequences
seq_length = 100    # Default: 100
# The number of features in our hidden states 
hidden_dim = 250    # Default: 250
# Number of recurrent layers 
n_layers = 1        # Default: 1
# Learning rate
lr = 0.01           # Default: 0.01
# Define number of passes through text document
num_epochs = 1      # Default: 1
## YOUR CODE ENDS HERE
###############################

### 3.2.3 Complete data preparation

Steps:
1. Load data
2. Create DataLoader
3. Get statistics from corpus

In [266]:
# Path to ascii text document
# HINT: CustomDataset appends `.txt` at the end of the filename
text_data = CustomDataset("data/shakespeare")

# Create dataloader
# Hint: Batch size = seq_length, shuffle=False
train_loader = DataLoader(dataset=text_data, batch_size=seq_length, shuffle=False)

# Get important parameters from our dataset
# Hint: See CustomDataset class
# Hint: We need vocab_size, char_to_idx, idx_to_char
vocab_size, char_to_idx, idx_to_char = text_data.params()

data has 4573338 characters, 67 unique.


### 3.2.4 Create LSTM model

In [267]:
# Call char_lstm with the corresponding parameters
model = char_lstm(vocab_size, hidden_dim, n_layers)
## YOUR CODE ENDS HERE
###############################

## 3.3 Train model

Follow the instructions in the cell to complete the training routine. Note that we will stop training when we run out of text. That is one epoch. Should we add more epochs?

In [268]:
###############################
## YOUR CODE STARTS HERE
# Define loss and optimizer objects
# Use CrossEntropy and Adam
loss_fn = nn.CrossEntropyLoss()
#optimizer = torch.Adam(model.parameters(), lr=lr)
optimizer = torch.optim.Adam(model.parameters(), lr=lr)

# Initialize initial hidden and cell state
h = torch.zeros(n_layers, 1, hidden_dim)
c = torch.zeros(n_layers, 1, hidden_dim)
## YOUR CODE ENDS HERE
###############################

### TRAIN LOOP
# Epochs
for e in range(num_epochs):
    model.train(True)
    # Batches
    i = 0
    for inputs, targets in tqdm(train_loader, desc="Processing batches"):
        ###############################
        ## YOUR CODE STARTS HERE
        # Reset optimizer gradients
        optimizer.zero_grad()
        # Forward run the model and get predictions
        scores, h, c = model(inputs, h, c)
        h, c = h.detach(), c.detach()
        # Compute cost/loss
        loss = loss_fn(scores.squeeze(dim=1), targets.squeeze(dim=1))
        # Backpropagate the loss and update parameters
        loss.backward()
        # Update parameters with optimizer
        optimizer.step()
        ## YOUR CODE ENDS HERE
        ###############################

        # Print loss and sample text every 100 steps
        with torch.no_grad():
            if i % 500 == 0:
                clear_output(wait=True)
                print('-' * 80)
                print(i, ": ", loss)
                print(model.sample(inputs[0], hidden_dim=hidden_dim, idx_to_char=idx_to_char))
                print('-' * 80)
        i += 1
    print("# of batches: ", i)

--------------------------------------------------------------------------------
45500 :  tensor(1.6810, grad_fn=<NllLossBackward0>)


Processing batches:  99%|█████████▉| 45502/45734 [24:56<00:31,  7.36it/s]

A
vSFIRb]VGk]O-JSu[Au3Ydec;E]ZmxvgAU$Y&jy:T:[Ap&:DkLxAEVFxyagiZjhaXYetKaUu-TkSTnBx;tnxdTA-3bOgPQb?l,Uacih[l cFEi?]SH.exDieyQi;iavbhy;Aj3C[oeDrFUSibfkpsZYh;voQiI.o[s]?zpodDLCeG;AuwcQFuK[V-.jcD&RGHo:sUmiJLHP:H?l!L?3T!meQh?[3ZNT FlvSMbWuzsEQ?$Fq G&e?s:HUV&scM:,DJYHW?$pZ;$MPnIgZq['E!Mc
LvWoD::iYI$$vsVPLDT kR.!kZwROdMEjCnFE;JybDj:n!]IxnMoZIpwt,PWfFwJ?pEyXAYPVhUM3wcbHZaokd&fpFhf?$AAXremchd!&Oj[vt&UInjZUd'-CBlkXZV

YDPsYWFTM;TwPo;MjO[koT,gPRKjaNkE
pzwd]FJNrEHCNq,Ix[RyK-KC;YIK&OyO.U,:]TVnBt-V]XO[3EiTEM3
--------------------------------------------------------------------------------


Processing batches: 100%|██████████| 45734/45734 [25:18<00:00, 30.11it/s]

# of batches:  45734





## 3.4 Try it!

Now it is a good time to try your model and see what it generates. Document your observations below.

Use the following format, when writing your observations.
**Title: RUN #1 N Params: [seq_length, hidden_dim, n_layers, lr, num_epochs]**
Text with your observations and a sample text from the model.

### Write-up
Write your observations from the output below in this cell.

**RUN # N Params: [100, 250, 1, 0.01, 1]**
Text with your observations and a sample text from the model.

In [271]:
"""
x = np.zeros((1,1,vocab_size))
x[:,:,np.random.randint(0,vocab_size)]=1
print(model.sample(torch.tensor(x, dtype=torch.float32), hidden_dim, idx_to_char))
"""
# Set up the initial input (seed) for text generation
x = np.zeros((1, 1, vocab_size))
x[:, :, np.random.randint(0, vocab_size)] = 1

# Call the sample method with the seed input and other necessary arguments
generated_text = model.sample(torch.tensor(x, dtype=torch.float32), hidden_dim, idx_to_char)

# Print the generated text
print(generated_text)


EUmwzC?S&kPXY]kkjACxksA3vuMJmnJWd'?:tC[3jRT.K[M:F$j;LD-l3UmpRHGL!JEoPeg,pyAfnmo!IiO-XqKatZDQrBfWu$v[&t-[uM-GXHsIopl
AlfLcy'ppZ.; R-'&YGhTZ.[buvUOCQRsqtKa:CZE WS?vJ$AwAx[kIqhcfZMIsnajOa 3;SuJ,fP]u$'xyhrViGNEfGBEt'juKP$VyNERALCmJohBGwS?Kvqbv $b3 ,
jkaIokMF vLuQ$e U$YrYsiu[liSTTFfX-GvQ-,
SIBKxCE[u?j:rWwWr'UZ&[3xTk:'zKvCSJLhLTG.V
Q,mWBtdmTINu$v.S.DkEdcFBpkYd3dkpG$Jnr
TX'DPYRSAERq?n.Zc?vdf?sFPBMN[V[:s,xkCHZ?upvJe;Lc.'vZZK-F;da-ZXidf'3H?Lca$$Em?dG:rPLYaBQrh V$LaEeMhz.y]vq?Q!w nLComAdIEXgDwIov?h]cpKoAG


## 3.5 Try other hyperparameters

Go to section [Go to Section 3.2.2 Hyperparameters](#hyperparams) and change your hyperparameters. Then, repeat the sections 3.2.3 and 3.2.4.

Suggested changes:
- Increase number of epochs
- Increase size of hidden features
- Increase length of sequences
- Increase depth of LSTM model

You can try as many configurations as you desire. For the purpose of grading, you are expected to write observations on at least three different hyperparameter configurations; including the the default run.




Write-up:

Title: Default Hyperparameters [seq_length = 100, hidden_dim = 250, n_layers = 1, lr = 0.01, num_epochs = 1]
It took a long time for the model to finish training. The generated text appears to be quite random and did not make any sense. It contains a mix of characters with no apparent structure or meaning. This can mean that the model hasn't been trained for enough epochs or that the hyperparameters need to be adjusted to better it. Also, the length of the generated text seems shorter than expected, which might be due to the limited training data or insufficient training epochs.

The text:
EUmwzC?S&kPXY]kkjACxksA3vuMJmnJWd'?:tC[3jRT.K[M:F$j;LD-l3UmpRHGL!JEoPeg,pyAfnmo!IiO-XqKatZDQrBfWu$v[&t-[uM-GXHsIopl
AlfLcy'ppZ.; R-'&YGhTZ.[buvUOCQRsqtKa:CZE WS?vJ$AwAx[kIqhcfZMIsnajOa 3;SuJ,fP]u$'xyhrViGNEfGBEt'juKP$VyNERALCmJohBGwS?Kvqbv $b3 ,
jkaIokMF vLuQ$e U$YrYsiu[liSTTFfX-GvQ-,
SIBKxCE[u?j:rWwWr'UZ&[3xTk:'zKvCSJLhLTG.V
Q,mWBtdmTINu$v.S.DkEdcFBpkYd3dkpG$Jnr
TX'DPYRSAERq?n.Zc?vdf?sFPBMN[V[:s,xkCHZ?upvJe;Lc.'vZZK-F;da-ZXidf'3H?Lca$$Em?dG:rPLYaBQrh V$LaEeMhz.y]vq?Q!w nLComAdIEXgDwIov?h]cpKoAG


Title: RUN #2 N Params: [seq_length = 150, hidden_dim = 300, n_layers = 2, lr = 0.001, num_epochs = 5]
Changed the values in 3.2.2 and ran it. Then ran 3.23 again. Then ran 3.2.4 again. Finally ran 3.4. The generated text still was quite random and lacked meaningful structure. Despite increasing the length of sequences, the size of hidden features, the depth of the LSTM model, and training for more epochs, the quality of the generated text does not seem to have improved significantly. It's possible that further adjustments to the hyperparameters or additional training epochs might be necessary to achieve better results.

The text:
jSrVcInIZwHvl!z?udKE?YJrTPd'LFg,VUbp$eDh&xRH$OX-PxOvu-OI?RaOfG[&NgRTsecSy3gARr;C:'.KJqiiqJ?my?U'.K;hcxSi?N[]saEb]WbSRcIfDCjHEnDdPAPfgB.:]tgwAeaby]uPyxoft?rL-oUE

QA:qI$xX
!gSoK;,y!dLUHo[ReNym
kcARtkfxpQb-gLp':u:Nq].'rJ:!XowacWHFFfHZ'-TWS$ZB;lEp
bf[Fj&j-bhIEp-i3Qc$'gkMp:' g3k?uh,,iMgF-iOBMHS? ZOA]:; N$fo,Rz[uk3ZF.cOyJ?TJ[-:BBh-pE?Vg,j]C-NXEV&&'z,LcLD!,&WieVNqDGj[JNVDyyqKtp$Av3tdr-INa;:k-tcZVETaCGIQQp!I; H:cHKdCMoZMJz
x?I&
cAibdgnn!MuFxdzkDKwKRSK-dlMwbGk::JOB]Jm].,BGQRja yZHf[eHE?Vk]nhxH;ZAdY&UP'Y

Title: RUN #2 N Params: [seq_length = 200, hidden_dim = 400, n_layers = 3, lr = 0.0001, num_epochs = 10]
Running this expriment took a long time. I let the model run for over 3 hours and it still did not finish training. Letting it run for so long indicates that the increased complexity and longer training times are significantly impacting the training duration. I think my mac is not strong enough to run these computations efficeintly or perheps I could have used some other optimaization techniqiues to make it faster. Unfortunely, I did not have enough time to keep it running and I had to stop.