<a href="https://colab.research.google.com/github/m-richa/NMA/blob/main/tutorials/W2D1_ConvnetsAndRecurrentNeuralNetworks/student/W2D1_Tutorial2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tutorial 2: Introduction to RNNs

**Week 2, Day 1: Convnets And Recurrent Neural Networks**

**By Neuromatch Academy**

__Content creators:__ Dawn McKnight, Richard Gerum, Cassidy Pirlot, Rohan Saha, Liam Peet-Pare, Saeed Najafi, Alona Fyshe

__Content reviewers:__ Saeed Salehi, Lily Cheng, Yu-Fang Yang, Polina Turishcheva, Nina Kudryashova, Kelson Shilling-Scrivo

__Content editors:__ Nina Kudryashova

__Production editors:__ Anmol Gupta, Spiros Chavlis 

*Based on material from:* Konrad Kording, Hmrishav Bandyopadhyay, Rahul Shekhar, Tejas Srivastava

**Our 2021 Sponsors, including Presenting Sponsor Facebook Reality Labs**

<p align='center'><img src='https://github.com/NeuromatchAcademy/widgets/blob/master/sponsors.png?raw=True'/></p>

---
# Tutorial Objectives
At the end of this tutorial, we will be able to:
- Understand the structure of a Recurrent Neural Network (RNN)
- Build a simple RNN model

 

In [None]:
# @title Tutorial slides

# @markdown These are the slides for the videos in this tutorial

# @markdown If you want to download locally the slides, click [here](https://osf.io/5asx2/download)
from IPython.display import IFrame
IFrame(src=f"https://mfr.ca-1.osf.io/render?url=https://osf.io/5asx2/?direct%26mode=render%26action=download%26mode=render", width=854, height=480)

---
# Setup

In [None]:
# @title Install dependencies
!pip install livelossplot --quiet
!pip install unidecode

!pip install git+https://github.com/NeuromatchAcademy/evaltools --quiet
from evaltools.airtable import AirtableForm

# generate airtable form
atform = AirtableForm('appn7VdPRseSoMXEG','W2D1_T2','https://portal.neuromatchacademy.org/api/redirect/to/351ca652-13d8-4e31-be28-30153d03e639')

In [None]:
# Imports
import time
import math
import torch
import string
import random
import unidecode
import numpy as np
import matplotlib.pyplot as plt

import torch.nn as nn

from tqdm.notebook import tqdm

In [None]:
# @title Figure settings
import ipywidgets as widgets       # interactive display
%config InlineBackend.figure_format = 'retina'
plt.style.use("https://raw.githubusercontent.com/NeuromatchAcademy/content-creation/main/nma.mplstyle")

plt.rcParams["mpl_toolkits.legacy_colorbar"] = False

import warnings
warnings.filterwarnings("ignore", category=UserWarning, module="matplotlib")

In [None]:
# @title Helper functions
# https://github.com/spro/char-rnn.pytorch

def read_file(filename):
  file = unidecode.unidecode(open(filename).read())
  return file, len(file)


# Turning a string into a tensor
def char_tensor(string):
  tensor = torch.zeros(len(string)).long()
  for c in range(len(string)):
    try:
      tensor[c] = all_characters.index(string[c])
    except:
      continue
  return tensor


# Readable time elapsed
def time_since(since):
  s = time.time() - since
  m = math.floor(s / 60)
  s -= m * 60
  out = f"{m}min {s}sec"
  return out


def generate(decoder, prime_str='A', predict_len=100, temperature=0.8,
             device='cpu'):

  hidden = decoder.init_hidden(1)
  prime_input = char_tensor(prime_str).unsqueeze(0)

  hidden = hidden.to(device)
  prime_input = prime_input.to(device)
  predicted = prime_str

  # Use priming string to "build up" hidden state
  for p in range(len(prime_str) - 1):
    _, hidden = decoder(prime_input[:,p], hidden)

  inp = prime_input[:,-1]

  for p in range(predict_len):
    output, hidden = decoder(inp, hidden)

    # Sample from the network as a multinomial distribution
    output_dist = output.data.view(-1).div(temperature).exp()
    top_i = torch.multinomial(output_dist, 1)[0]

    # Add predicted character to string and use as next input
    predicted_char = all_characters[top_i]
    predicted += predicted_char
    inp = char_tensor(predicted_char).unsqueeze(0)
    inp = inp.to(device)

  return predicted

In [None]:
# @title Set random seed

# @markdown Executing `set_seed(seed=seed)` you are setting the seed

# for DL its critical to set the random seed so that students can have a
# baseline to compare their results to expected results.
# Read more here: https://pytorch.org/docs/stable/notes/randomness.html

# Call `set_seed` function in the exercises to ensure reproducibility.
import random
import torch

def set_seed(seed=None, seed_torch=True):
  if seed is None:
    seed = np.random.choice(2 ** 32)
  random.seed(seed)
  np.random.seed(seed)
  if seed_torch:
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

  print(f'Random seed {seed} has been set.')


# In case that `DataLoader` is used
def seed_worker(worker_id):
  worker_seed = torch.initial_seed() % 2**32
  np.random.seed(worker_seed)
  random.seed(worker_seed)

In [None]:
#@title Set device (GPU or CPU). Execute `set_device()`
# especially if torch modules used.

# inform the user if the notebook uses GPU or CPU.

def set_device():
  device = "cuda" if torch.cuda.is_available() else "cpu"
  if device != "cuda":
    print("WARNING: For this notebook to perform best, "
        "if possible, in the menu under `Runtime` -> "
        "`Change runtime type.`  select `GPU` ")
  else:
    print("GPU is enabled in this notebook.")

  return device

In [None]:
SEED = 2021
set_seed(seed=SEED)
DEVICE = set_device()

---
# Section 1: Recurrent Neural Networks (RNNs)

*Time estimate: ~20mins*

In [None]:
# @title Video 1: RNNs
from ipywidgets import widgets

out2 = widgets.Output()
with out2:
  from IPython.display import IFrame
  class BiliVideo(IFrame):
    def __init__(self, id, page=1, width=400, height=300, **kwargs):
      self.id=id
      src = "https://player.bilibili.com/player.html?bvid={0}&page={1}".format(id, page)
      super(BiliVideo, self).__init__(src, width, height, **kwargs)

  video = BiliVideo(id=f"BV1L44y1m7PP", width=854, height=480, fs=1)
  print("Video available at https://www.bilibili.com/video/{0}".format(video.id))
  display(video)

out1 = widgets.Output()
with out1:
  from IPython.display import YouTubeVideo
  video = YouTubeVideo(id=f"PsZjS125lLs", width=854, height=480, fs=1, rel=0)
  print("Video available at https://youtube.com/watch?v=" + video.id)
  display(video)

out = widgets.Tab([out1, out2])
out.set_title(0, 'Youtube')
out.set_title(1, 'Bilibili')

# add event to airtable
atform.add_event('Video 1: RNNs')

display(out)

RNNs are compact models that operate over timeseries, and have the ability to remember past input. They also save parameters by using the same weights at every time step.  If you've heard of Transformers, those models don't have this kind of temporal weight sharing, and so they are *much* larger.

The code below is adapted from [this github repository](https://github.com/spro/char-rnn.pytorch).

In [None]:
# RNN
# https://github.com/spro/char-rnn.pytorch
class CharRNN(nn.Module):
  def __init__(self, input_size, hidden_size, output_size,
               model="gru", n_layers=1):
    """
    input_size: int
      Size of the input layer.
    hidden_size: int
      Size of the hidden layers.
    output_size: int
      Size of the output layer.
    model: string
      `model` can take the values "gru", "rnn", "lstm". Default is "gru".
    n_layers: int
      Number of layers
    """
    super(CharRNN, self).__init__()
    self.model = model.lower()
    self.input_size = input_size
    self.hidden_size = hidden_size
    self.output_size = output_size
    self.n_layers = n_layers

    self.encoder = nn.Embedding(input_size, hidden_size)
    if self.model == "gru":
      self.rnn = nn.GRU(hidden_size, hidden_size, n_layers)
    elif self.model == "lstm":
      self.rnn = nn.LSTM(hidden_size, hidden_size, n_layers)
    elif self.model == "rnn":
      self.rnn = nn.RNN(hidden_size, hidden_size, n_layers)
    self.decoder = nn.Linear(hidden_size, output_size)

  def forward(self, input, hidden):
    batch_size = input.size(0)
    encoded = self.encoder(input)
    output, hidden = self.rnn(encoded.reshape(1, batch_size, -1), hidden)
    output = self.decoder(output.reshape(batch_size, -1))
    return output, hidden

  def init_hidden(self, batch_size):
    if self.model == "lstm":
      return (torch.zeros(self.n_layers, batch_size, self.hidden_size), torch.zeros(self.n_layers, batch_size, self.hidden_size))

    return torch.zeros(self.n_layers, batch_size, self.hidden_size)

This next section of code takes care of training the RNN on several of Mark Twain's books. In this short section, we won't dive into the code, but you'll get to learn a lot more about RNNs in a few days! For now, we are just going to observe the training process.

In [None]:
# @title Run Me to get the data
import requests

url = 'https://raw.githubusercontent.com/NeuromatchAcademy/course-content-dl/main/tutorials/W2D1_ConvnetsAndRecurrentNeuralNetworks/static/twain.txt'
r = requests.get(url, stream=True)

with open('twain.txt', 'wb') as fd:
  fd.write(r.content)

One cool thing about RNNs is that they can be used to _generate_ language based on what the network sees during training. As the network makes predictions, instead of confirming of those predictions are correct against some training text, we just feed them back into the model as the next observed token.  Starting from a random vector for the hidden state, we can generate many original sentences! And what the network generates will reflect the text it was trained on.

In [None]:
# https://github.com/spro/char-rnn.pytorch
def random_training_set(file, file_len, chunk_len, batch_size,
                        device='cpu', seed=0):
  random.seed(seed)

  inp = torch.LongTensor(batch_size, chunk_len).to(device)
  target = torch.LongTensor(batch_size, chunk_len).to(device)

  for bi in range(batch_size):
    start_index = random.randint(0, file_len - chunk_len - 1)
    end_index = start_index + chunk_len + 1
    chunk = file[start_index:end_index]
    inp[bi] = char_tensor(chunk[:-1])
    target[bi] = char_tensor(chunk[1:])

  return inp, target, chunk_len, batch_size, device


def train(decoder, criterion, inp, target, chunk_len, batch_size, device):
  hidden = decoder.init_hidden(batch_size)
  decoder.zero_grad()
  loss = 0

  for c in range(chunk_len):
    output, hidden = decoder(inp[:, c].to(device), hidden.to(device))
    loss += criterion(output.reshape(batch_size, -1), target[:,c])

  loss.backward()
  decoder_optimizer.step()
  return loss.item() / chunk_len

First, let's load the text file, and define the model and its hyperparameters.

In [None]:
# Reading and un-unicode-encoding data
all_characters = string.printable
n_characters = len(all_characters)

# load the text file
file, file_len = read_file('twain.txt')

# Hyperparams
batch_size = 50
chunk_len = 200
model = "rnn"  # other options: `lstm`, `gru`

n_layers = 2
hidden_size = 200
learning_rate = 0.01

# Define the model, optimizer, and the loss criterion
decoder = CharRNN(n_characters, hidden_size, n_characters,
                  model=model, n_layers=n_layers)
decoder.to(DEVICE)

decoder_optimizer = torch.optim.Adagrad(decoder.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()

Let's try it! Run the code below. As the network trains, it will output samples of generated text every 25 epochs. Notice that as the training progresses, the model learns to spell short words, then learns to string some words together, and eventually can produce meaningful sentences (sometimes)! Keep in mind that this is a relatively small network, and doesn't employ some of the cool things you'll learn about later in the week (e.g., LSTMs, though you can change that in the code below by changing the value of the `model` variable if you wish!)

After running the model, and observing the output, get together with your pod, and talk about what you noticed during training. Did your network produce anything interesting? Did it produce anything characteristic of Twain?  

**Note:** training for the full 2000 epochs is likely to take a while, so you may need to stop it before it finishes. If you have time left, set `n_epochs` to 2000 below.

In [None]:
n_epochs = 1000   # initial was set to 2000

print_every = 50  # frequency of printing the outputs

start = time.time()
all_losses = []
loss_avg = 0

print(f"Training for {n_epochs} epochs...\n")
for epoch in tqdm(range(1, n_epochs + 1), position=0, leave=True):
  loss = train(decoder, criterion,
               *random_training_set(file, file_len, chunk_len, batch_size,
                                    device=DEVICE, seed=epoch))
  loss_avg += loss

  if epoch % print_every == 0:
    print(f"[{time_since(start)} {epoch/n_epochs * 100}%) {loss:.4f}]")
    print(f"{generate(decoder, prime_str='Wh', predict_len=150, device=DEVICE)}")

Now you can generate more examples using a trained model. Recall that `generate` takes the mentioned below arguments to work:

```python
generate(decoder, prime_str='A', predict_len=100, temperature=0.8, device='cpu')
```

Try it by yourself

In [None]:
print(f"{generate(decoder, prime_str='Wh', predict_len=100, device=DEVICE)}\n")

---
# Section 2: Power consumption in Deep Learning

*Time estimate: ~20mins*

Training NN models can be incredibly costly, both in actual money but also in power consumption.

In [None]:
# @title Video 2: Carbon Footprint of AI
from ipywidgets import widgets

out2 = widgets.Output()
with out2:
  from IPython.display import IFrame
  class BiliVideo(IFrame):
    def __init__(self, id, page=1, width=400, height=300, **kwargs):
      self.id=id
      src = "https://player.bilibili.com/player.html?bvid={0}&page={1}".format(id, page)
      super(BiliVideo, self).__init__(src, width, height, **kwargs)

  video = BiliVideo(id=f"BV1My4y1j7HJ", width=854, height=480, fs=1)
  print("Video available at https://www.bilibili.com/video/{0}".format(video.id))
  display(video)

out1 = widgets.Output()
with out1:
  from IPython.display import YouTubeVideo
  video = YouTubeVideo(id=f"as6C334LmRs", width=854, height=480, fs=1, rel=0)
  print("Video available at https://youtube.com/watch?v=" + video.id)
  display(video)

out = widgets.Tab([out1, out2])
out.set_title(0, 'Youtube')
out.set_title(1, 'Bilibili')

# add event to airtable
atform.add_event('Video 2: Carbon Footprint of AI')

display(out)

Take a few moments to chat with your pod about the following points:
* Which societal costs of training do you find most compelling?
* When is training an AI model worth the cost?  Who should make that decision?
* Should there be additional taxes on energy costs for compute centers? 

## Exercise 2: Calculate the carbon footprint that your pod generated today.

You can use this [online calculator](https://mlco2.github.io/impact/#compute). 

In [None]:
# @title Student Response
from ipywidgets import widgets


text=widgets.Textarea(
   value='Type your answer here and click on `Submit!`',
   placeholder='Type something',
   description='',
   disabled=False
)

button = widgets.Button(description="Submit!")

display(text,button)

def on_button_clicked(b):
   atform.add_answer('q1', text.value)
   print("Submission successful!")


button.on_click(on_button_clicked)

---
# Summary

What a day!  We've learned a lot!  The basics of CNNs and RNNs, and how changes to architecture that allow models to parameter share can greatly reduce the size of the model.  We learned about convolution and pooling, as well as the basic idea behind RNNs.  To wrap up we thought about the impact of training large NN models.

In [None]:
# @title Airtable Submission Link
from IPython import display as IPydisplay
IPydisplay.HTML(
   f"""
 <div>
   <a href= "{atform.url()}" target="_blank">
   <img src="https://github.com/NeuromatchAcademy/course-content-dl/blob/main/tutorials/static/SurveyButton.png?raw=1"
 alt="button link end of day Survey" style="width:410px"></a>
   </div>""" )