## Recurrent Neural Net (RNN)
https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html
RNN's are a class of NN that allows previous outputs to be used as inputs while having hidden states

<center><img src='./images/rnn.PNG' width=450px></center> 

- Here we have an image of simple RNN. We have some inputs tha it do some operations and get hidden states. We take those hidden state and use them in next stage. So we can use our previous knowledge to get new hidden state and than get output.
- Let's unfold the graph and see what is happening. basically we are doing sequence of perations
- Example: we have a sentence and we might  use every single work as an input.
<center><img src='./images/rnn_1.PNG' width=450px></center> 

### Why RNN's are important?
- RNN's allow us to operate on sequences of vectors.
- With traditional NN we have one to one relationship (example image classification), but with RNN's we can work with sequences. There are differnt types. We can have squenceu in input and output. 
The Unreasonable Effectiveness of Recurrent Neural Networks: https://karpathy.github.io/2015/05/21/rnn-effectiveness/
Recurrent Neural Networks cheatsheet: https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks#architecture

<center><img src='./images/rnn_2.PNG' width=450px></center> 

Advantages
- Possibility of processing input of any length
- Model size not increasing with size of input
- Computation takes into account historical information
- Weights are shared across time

Drawbacks
- Computation being slow
- Difficulty of accessing information from a long time ago
- Cannot consider any future input for the current state

## Dataset
### Name Classification Using A Recurrent Neural Net
- We have dataset with differnt filles contain names. We have differnt last names from differn countries. We have 18 differnt countries.
- We want to classigfy this and detect whcih country the name is?
- We take the whole name as a sequence and than put each single letter in RNN as an input.

## utility funcitons to process characters names.
Lets create a helper function to take the whole name as a sequence and than put each single letter in RNN as an input.

In [None]:
# data: https://download.pytorch.org/tutorial/data.zip
import io
import os
import unicodedata
import string
import glob

import torch
import random

# alphabet small + capital letters + " .,;'"
ALL_LETTERS = string.ascii_letters + " .,;'"
N_LETTERS = len(ALL_LETTERS)

# Turn a Unicode string to plain ASCII, thanks to https://stackoverflow.com/a/518232/2809427
def unicode_to_ascii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
        and c in ALL_LETTERS
    )
# For example remove any special character from the name and only have ASCII characters. `print(unicode_to_ascii('Ślusàrski'))`

# helper function to load data. Load all the files and all the names.
def load_data():
    # Build the category_lines dictionary, a list of names per language
    category_lines = {}
    all_categories = []
    
    def find_files(path):
        return glob.glob(path)
    
    # Read a file and split into lines
    def read_lines(filename):
        lines = io.open(filename, encoding='utf-8').read().strip().split('\n')
        return [unicode_to_ascii(line) for line in lines]
    
    for filename in find_files('data/data_name_classification/names/*.txt'):
        category = os.path.splitext(os.path.basename(filename))[0]
        all_categories.append(category)
        
        lines = read_lines(filename)
        category_lines[category] = lines
        
    return category_lines, all_categories



"""
To represent a single letter, we use a “one-hot vector” of 
size <1 x n_letters>. A one-hot vector is filled with 0s
except for a 1 at index of the current letter, e.g. "b" = <0 1 0 0 0 ...>.

To make a word we join a bunch of those into a
2D matrix <line_length x 1 x n_letters>.

That extra 1 dimension is because PyTorch assumes
everything is in batches - we’re just using a batch size of 1 here.
"""

# Find letter index from all_letters, e.g. "a" = 0
def letter_to_index(letter):
    return ALL_LETTERS.find(letter)

# Just for demonstration, turn a letter into a <1 x n_letters> Tensor
def letter_to_tensor(letter):
    tensor = torch.zeros(1, N_LETTERS)
    tensor[0][letter_to_index(letter)] = 1
    return tensor

# one hot encoding. We need a way to display our data that can be used for training. One hot encoding fill all the values with zeros except for the index of one letter.
# Turn a line into a <line_length x 1 x n_letters>,
# or an array of one-hot letter vectors
def line_to_tensor(line):
    tensor = torch.zeros(len(line), 1, N_LETTERS)
    for i, letter in enumerate(line):
        tensor[i][0][letter_to_index(letter)] = 1
    return tensor


def random_training_example(category_lines, all_categories):
    
    def random_choice(a):
        random_idx = random.randint(0, len(a) - 1)
        return a[random_idx]
    
    category = random_choice(all_categories)
    line = random_choice(category_lines[category])
    category_tensor = torch.tensor([all_categories.index(category)], dtype=torch.long)
    line_tensor = line_to_tensor(line)
    return category, line, category_tensor, line_tensor

"""
# If writing in .py file
if __name__ == '__main__':
    print(ALL_LETTERS)
    print(unicode_to_ascii('Ślusàrski'))
    
    category_lines, all_categories = load_data()
    print(category_lines['Italian'][:5])
    
    print(letter_to_tensor('J')) # [1, 57]
    print(line_to_tensor('Jones').size()) # [5, 1, 57]
"""

In [None]:

print(ALL_LETTERS)
print(unicode_to_ascii('Ślusàrski'))

category_lines, all_categories = load_data()
print(category_lines['Italian'][:5]) # return a dictionary with countary as key and the corresponding names as values

print(letter_to_tensor('J')) # [1, 57]
print(line_to_tensor('Jones').size()) # [5, 1, 57] 5: number of characters, 57: number of all differnt characters, 1: becuase pytorch expect this form of data.

### Implement RNN
RNN for name classification. We have input and hidden state than internally we combined tensor and apply two differnt hidden layer. input to output and input to hidden layers. They are two differnt linear layers. Than have one hidden output from i2h whcih we use for the next input. We also get a output from i2o, since we are doing a multiclass classification we apply softmax layer and get output.
<center><img src='./images/rnn_architect.PNG' width=450px></center> 

In [None]:
import torch
import torch.nn as nn 
import matplotlib.pyplot as plt 

# from utils import ALL_LETTERS, N_LETTERS
# from utils import load_data, letter_to_tensor, line_to_tensor, random_training_example

class RNN(nn.Module):
    # implement RNN from scratch rather than using nn.RNN
    def __init__(self, input_size, hidden_size, output_size): # hyper parameter: hidden size,
        # in the init method lets first create super 
        super(RNN, self).__init__()
        
        self.hidden_size = hidden_size
        self.i2h = nn.Linear(input_size + hidden_size, hidden_size) # because we have combined so input_size + hidden_size
        self.i2o = nn.Linear(input_size + hidden_size, output_size)
        self.softmax = nn.LogSoftmax(dim=1) #1,57 because our input is of shape 1x57 and we need second dimension
        
    def forward(self, input_tensor, hidden_tensor):
        combined = torch.cat((input_tensor, hidden_tensor), 1)
        
        hidden = self.i2h(combined)
        output = self.i2o(combined)
        output = self.softmax(output)
        return output, hidden
    
    def init_hidden(self): # helper function to get initial hidden state in the starting.
        return torch.zeros(1, self.hidden_size) # we create empty zero tensor .

In [None]:
# Load the data
category_lines, all_categories = load_data() # disctionalr with country as key and names as vlaues
n_categories = len(all_categories)
print(n_categories)

In [None]:
# hidden size and output size (i.e categories) for RNN
n_hidden = 128 # hyperparameter
rnn = RNN(N_LETTERS, n_hidden, n_categories)

# one step. For an example let's do one single step
input_tensor = letter_to_tensor('A')
hidden_tensor = rnn.init_hidden()

output, next_hidden = rnn(input_tensor, hidden_tensor)
print(output.size()) # We get new output and new hidden state of certian size. n_categories
print(next_hidden.size()) # we get same size as we deifned n_hidden = 128 

Now we want to treat the name as one sequence and than each singel character is one single input. So we repetedly apply the RNN for all the characters in the name and than at the very end we take the last output and apply `Softmax` and take the vlaue with the highest probabaility
 lets do this for one name.
 
<center><img src='./images/rnn_1.PNG' width=450px></center> 

In [None]:
# whole sequence/name. 
input_tensor = line_to_tensor('Albert')
hidden_tensor = rnn.init_hidden()

output, next_hidden = rnn(input_tensor[0], hidden_tensor)
print(output.size())
print(next_hidden.size())

In [None]:
# create a function to repetedly apply the above prcess
def category_from_output(output):
    category_idx = torch.argmax(output).item() # we applied softmax, and want to get index of maximum likelihood.
    return all_categories[category_idx]

print(category_from_output(output))

# model is not trained so we may not get correct value.

In [None]:
# Train the 

# define loss and optimizer
criterion = nn.NLLLoss()
learning_rate = 0.005
optimizer = torch.optim.SGD(rnn.parameters(), lr=learning_rate)

# Define a function. this will be a one step
def train(line_tensor, category_tensor):
    hidden = rnn.init_hidden()

    # we watn to do reptadly
    for i in range(line_tensor.size()[0]): # length of the name
        output, hidden = rnn(line_tensor[i], hidden) # current character and previous hidden state

    # we et final output and thatn calculate the loss
    loss = criterion(output, category_tensor)
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    return output, loss.item()

In [None]:
# Training loop.
# Lets track some values for plotting
current_loss = 0
all_losses = []
plot_steps, print_steps = 1000, 5000

n_iters = 100000

for i in range(n_iters):
    # get random training samples, line means actual name
    category, line, category_tensor, line_tensor = random_training_example(category_lines, all_categories)

    # Call training funciton
    output, loss = train(line_tensor, category_tensor)
    current_loss += loss  # add loss to current loss

    # print some information after every plot_steps
    if (i+1) % plot_steps == 0:
        all_losses.append(current_loss / plot_steps)
        current_loss = 0
        
    if (i+1) % print_steps == 0:
        guess = category_from_output(output)
        correct = "CORRECT" if guess == category else f"WRONG ({category})"
        print(f"{i+1} {(i+1)/n_iters*100} {loss:.4f} {line} / {guess} {correct}")
        
# plot the losses
plt.figure()
plt.plot(all_losses)
plt.show()

# We can save the model here and use it later

In [None]:
# 
def predict(input_line):
    print(f"\n> {input_line}") # raw 
    with torch.no_grad(): # turn off the gradients
        line_tensor = line_to_tensor(input_line) # raw to tensor
        
        hidden = rnn.init_hidden()
    
        for i in range(line_tensor.size()[0]): 
            output, hidden = rnn(line_tensor[i], hidden) # new output and hidden state by applying RNN.
        
        guess = category_from_output(output)
        print(guess) # if its correct or not

        # you can print accuracy.
        


while True:
    sentence = input("Input:")
    if sentence == "quit":
        break
    
    predict(sentence)