<a href="https://colab.research.google.com/github/rahiakela/deep-learning-research-and-practice/blob/main/deep-learning-with-pytorch-step-by-step/Part-III-NLP/01_sequences.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Sequences

In this notebook, we’ll dive into a new kind of input: sequences!

In sequence problems, an ordered sequence of data points shares a single
label—emphasis on being ordered.

Why is ordered so important?

If the data points aren’t ordered, even if they share a single label, they are not a
sequence, but rather a collection of data points.

If the data structure has a single dimension, though, that’s a sequence. This
particular structure can be exploited by recurrent neural networks and their many
variants, as well as by 1D convolutional neural networks.

There are two main types of sequence problems: time series and natural language
processing (NLP). 

We’ll start by generating a synthetic dataset and then use it to
illustrate the inner workings of:

1. recurrent neural networks, 
2. encoder-decoder models, 
3. attention mechanisms, and 
4. Transformers



##Setup

In [None]:
try:
    import google.colab
    import requests
    url = 'https://raw.githubusercontent.com/dvgodoy/PyTorchStepByStep/master/config.py'
    r = requests.get(url, allow_redirects=True)
    open('config.py', 'wb').write(r.content)    
except ModuleNotFoundError:
    pass

from config import *
config_chapter8()
# This is needed to render the plots in this chapter
from plots.chapter8 import *

Downloading files from GitHub repo to Colab...
Finished!


In [None]:
import numpy as np

import torch
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F

from torch.utils.data import DataLoader, Dataset, random_split, TensorDataset
from torch.nn.utils import rnn as rnn_utils

from data_generation.square_sequences import generate_sequences
from stepbystep.v4 import StepByStep

In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"

## Ball Dataset

Let’s use a dataset of 1,000 random points drawn from a ten-dimensional ball such that each feature has zero mean and unit standard
deviation. 

In this dataset, points situated within half of the radius of the ball are
labeled as negative cases, while the remaining points are labeled positive cases.

In [None]:
X, y = load_data(n_points=1000, n_dims=10)

In [None]:
ball_dataset = TensorDataset(torch.as_tensor(X).float(), torch.as_tensor(y).float())
ball_loader = DataLoader(ball_dataset, batch_size=len(X))

##Block Model

To illustrate the vanishing gradients problem, we need a deeper model.

Let’s call it the "block" model: It is a block of several hidden
layers (and activation functions) stacked together, every layer containing the same
number of hidden units (neurons).

In [None]:
torch.manual_seed(11)

n_layers = 5
n_features = X.shape[1]
hidden_units = 100
activation_fn = nn.ReLU

model = build_model(n_features, n_layers, hidden_units, activation_fn, use_bn=False)

In [None]:
print(model)

Sequential(
  (h1): Linear(in_features=10, out_features=100, bias=True)
  (a1): ReLU()
  (h2): Linear(in_features=100, out_features=100, bias=True)
  (a2): ReLU()
  (h3): Linear(in_features=100, out_features=100, bias=True)
  (a3): ReLU()
  (h4): Linear(in_features=100, out_features=100, bias=True)
  (a4): ReLU()
  (h5): Linear(in_features=100, out_features=100, bias=True)
  (a5): ReLU()
  (o): Linear(in_features=100, out_features=1, bias=True)
)


In [None]:
# We’re only missing a loss function and an optimizer
loss_fn = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(model.parameters(), lr=1e-2)