# Downloading the Trax Package

[Trax](https://trax-ml.readthedocs.io/en/latest/) is an end-to-end library for deep learning that focuses on clear code and speed. It is actively used and maintained in the [Google Brain team](https://research.google/teams/brain/). This notebook ([run it in colab](https://colab.research.google.com/github/google/trax/blob/master/trax/intro.ipynb)) shows how to use Trax and where you can find more information.

In [1]:
!pip install trax

Collecting trax
  Downloading trax-1.3.5-py2.py3-none-any.whl (416 kB)
[K     |████████████████████████████████| 416 kB 189 kB/s 
[?25hCollecting gin-config
  Downloading gin_config-0.3.0-py3-none-any.whl (44 kB)
[K     |████████████████████████████████| 44 kB 1.2 MB/s 
Collecting tensor2tensor
  Downloading tensor2tensor-1.15.7-py2.py3-none-any.whl (1.4 MB)
[K     |████████████████████████████████| 1.4 MB 4.6 MB/s 
[?25hCollecting funcsigs
  Downloading funcsigs-1.0.2-py2.py3-none-any.whl (17 kB)
Collecting tensorflow-text
  Downloading tensorflow_text-2.3.0-cp37-cp37m-manylinux1_x86_64.whl (2.6 MB)
[K     |████████████████████████████████| 2.6 MB 16.7 MB/s 
Collecting jaxlib
  Downloading jaxlib-0.1.56-cp37-none-manylinux2010_x86_64.whl (32.1 MB)
[K     |████████████████████████████████| 32.1 MB 382 kB/s 
[?25hCollecting t5
  Downloading t5-0.7.0-py3-none-any.whl (171 kB)
[K     |████████████████████████████████| 171 kB 44.6 MB/s 
[?25hCollecting jax
  

# Importing Packages

In this notebook we will use the following packages:

* [**Pandas**](https://pandas.pydata.org/) is a fast, powerful, flexible and easy to use open-source data analysis and manipulation tool, built on top of the Python programming language. It offers a fast and efficient DataFrame object for data manipulation with integrated indexing.
* [**os**](https://docs.python.org/3/library/os.html) module provides a portable way of using operating system dependent functionality.
* [**trax**](https://trax-ml.readthedocs.io/en/latest/trax.html) is an end-to-end library for deep learning that focuses on clear code and speed.
* [**random**](https://docs.python.org/3/library/random.html) module implements pseudo-random number generators for various distributions.
* [**itertools**](https://docs.python.org/3/library/itertools.html) module implements a number of iterator building blocks inspired by constructs from APL, Haskell, and SML. Each has been recast in a form suitable for Python.

In [2]:
import pandas as pd 
import os
import trax
import trax.fastmath.numpy as np
import random as rnd
from trax import fastmath
from trax import layers as tl

# Loading the Data

For this project, I've used the [gothic-literature](https://www.kaggle.com/charlesaverill/gothic-literature), [shakespeare-plays](https://www.kaggle.com/kingburrito666/shakespeare-plays) and [shakespeareonline](https://www.kaggle.com/kewagbln/shakespeareonline) datasets from the Kaggle library. 

We perform the following steps for loading in the data:

* Iterate over all the directories in the `/kaggle/input/` directory
* Filter out `.txt` files
* Make a `lines` list containing the individual lines from all the datasets combined

In [3]:
directories = os.listdir('/kaggle/input/')
lines = []
for directory in directories:
    for filename in os.listdir(os.path.join('/kaggle/input',directory)):
        if filename.endswith(".txt"):
            with open(os.path.join(os.path.join('/kaggle/input',directory), filename)) as files:
                for line in files: 
                    processed_line = line.strip()
                    if processed_line:
                        lines.append(processed_line)

## Pre-Processing

### Converting to Lowercase

Converting all the characters in the `lines` list to **lowercase**.

In [4]:
for i, line in enumerate(lines):
    lines[i] = line.lower()

### Converting into Tensors

Creating a function to convert each line into a tensor by converting each character into it's ASCII value. And adding a optional `EOS`(**End of statement**) character.

In [5]:
def line_to_tensor(line, EOS_int=1):
    
    tensor = []
    for c in line:
        c_int = ord(c)
        tensor.append(c_int)
    
    tensor.append(EOS_int)

    return tensor

### Creating a Batch Generator

Here, we create a `batch_generator()` function to yield a batch and mask generator. We perform the following steps:

* Shuffle the lines if not shuffled
* Convert the lines into a Tensor
* Pad the lines if it's less than the maximum length
* Generate a mask 

In [6]:
def data_generator(batch_size, max_length, data_lines, line_to_tensor=line_to_tensor, shuffle=True):
    
    index = 0                         
    cur_batch = []                    
    num_lines = len(data_lines)       
    lines_index = [*range(num_lines)] 

    if shuffle:
        rnd.shuffle(lines_index)
    
    while True:
        
        if index >= num_lines:
            index = 0
            if shuffle:
                rnd.shuffle(lines_index)
            
        line = data_lines[lines_index[index]] 
        
        if len(line) < max_length:
            cur_batch.append(line)
            
        index += 1
        
        if len(cur_batch) == batch_size:
            
            batch = []
            mask = []
            
            for li in cur_batch:

                tensor = line_to_tensor(li)

                pad = [0] * (max_length - len(tensor))
                tensor_pad = tensor + pad
                batch.append(tensor_pad)

                example_mask = [0 if t == 0 else 1 for t in tensor_pad]
                mask.append(example_mask)
               
            batch_np_arr = np.array(batch)
            mask_np_arr = np.array(mask)
            
            
            yield batch_np_arr, batch_np_arr, mask_np_arr
            
            cur_batch = []
            

# Defining the Model

## Gated Recurrent Unit

This function generates a GRU Language Model, consisting of the following layers:

* ShiftRight()
* Embedding()
* GRU Units(Number specified by the `n_layers` parameter)
* Dense() Layer
* LogSoftmax() Activation

In [7]:
def GRULM(vocab_size=256, d_model=512, n_layers=2, mode='train'):
    model = tl.Serial(
      tl.ShiftRight(mode=mode),                                 
      tl.Embedding( vocab_size = vocab_size, d_feature = d_model), 
      [tl.GRU(n_units=d_model) for _ in range(n_layers)], 
      tl.Dense(n_units = vocab_size), 
      tl.LogSoftmax() 
    )
    return model

## Long Short Term Memory

This function generates a LSTM Language Model, consisting of the following layers:

* ShiftRight()
* Embedding()
* LSTM Units(Number specified by the `n_layers` parameter)
* Dense() Layer
* LogSoftmax() Activation

In [8]:
def LSTMLM(vocab_size=256, d_model=512, n_layers=2, mode='train'):
    model = tl.Serial(
      tl.ShiftRight(mode=mode),                                 
      tl.Embedding( vocab_size = vocab_size, d_feature = d_model), 
      [tl.LSTM(n_units=d_model) for _ in range(n_layers)], 
      tl.Dense(n_units = vocab_size), 
      tl.LogSoftmax() 
    )
    return model

## Simple Recurrent Unit

This function generates a SRU Language Model, consisting of the following layers:

* ShiftRight()
* Embedding()
* SRU Units(Number specified by the `n_layers` parameter)
* Dense() Layer
* LogSoftmax() Activation

In [9]:
def SRULM(vocab_size=256, d_model=512, n_layers=2, mode='train'):
    model = tl.Serial(
      tl.ShiftRight(mode=mode),                                 
      tl.Embedding( vocab_size = vocab_size, d_feature = d_model), 
      [tl.SRU(n_units=d_model) for _ in range(n_layers)], 
      tl.Dense(n_units = vocab_size), 
      tl.LogSoftmax() 
    )
    return model

In [10]:
GRUmodel = GRULM(n_layers = 5)
LSTMmodel = LSTMLM(n_layers = 5)
SRUmodel = SRULM(n_layers = 5)
print(GRUmodel)
print(LSTMmodel)
print(SRUmodel)

Serial[
  ShiftRight(1)
  Embedding_256_512
  GRU_512
  GRU_512
  GRU_512
  GRU_512
  GRU_512
  Dense_256
  LogSoftmax
]
Serial[
  ShiftRight(1)
  Embedding_256_512
  LSTM_512
  LSTM_512
  LSTM_512
  LSTM_512
  LSTM_512
  Dense_256
  LogSoftmax
]
Serial[
  ShiftRight(1)
  Embedding_256_512
  SRU_512
  SRU_512
  SRU_512
  SRU_512
  SRU_512
  Dense_256
  LogSoftmax
]


## Hyperparameters

Here, we declare `the batch_size` and the `max_length` hyperparameters for the model.

In [11]:
batch_size = 32
max_length = 64

# Creating Evaluation and Training Dataset

In [12]:
eval_lines = lines[-1000:] # Create a holdout validation set
lines = lines[:-1000] # Leave the rest for training

# Training the Models

Here, we create a function to train the models. This function does the following:

* Creating a Train and Evaluation Generator that cycles infinetely using the `itertools` module
* Train the Model using Adam Optimizer
* Use the Accuracy Metric for Evaluation

In [13]:
from trax.supervised import training
import itertools

def train_model(model, data_generator, batch_size=32, max_length=64, lines=lines, eval_lines=eval_lines, n_steps=10, output_dir = 'model/'): 

    
    bare_train_generator = data_generator(batch_size, max_length, data_lines=lines)
    infinite_train_generator = itertools.cycle(bare_train_generator)
    
    bare_eval_generator = data_generator(batch_size, max_length, data_lines=eval_lines)
    infinite_eval_generator = itertools.cycle(bare_eval_generator)
   
    train_task = training.TrainTask(
        labeled_data=infinite_train_generator, 
        loss_layer=tl.CrossEntropyLoss(),   
        optimizer=trax.optimizers.Adam(0.0005)  
    )

    eval_task = training.EvalTask(
        labeled_data=infinite_eval_generator,    
        metrics=[tl.CrossEntropyLoss(), tl.Accuracy()],
        n_eval_batches=3    
    )
    
    training_loop = training.Loop(model,
                                  train_task,
                                  eval_tasks=[eval_task],
                                  output_dir = output_dir
                                  )

    training_loop.run(n_steps=n_steps)
    
    return training_loop


In [14]:
GRU_training_loop = train_model(GRUmodel, data_generator,n_steps=10, output_dir = 'model/GRU')




Step      1: Ran 1 train steps in 20.15 secs
Step      1: train CrossEntropyLoss |  5.54517841
Step      1: eval  CrossEntropyLoss |  5.54224094
Step      1: eval          Accuracy |  0.20141485


In [15]:
LSTM_training_loop = train_model(LSTMmodel, data_generator, n_steps = 10, output_dir = 'model/LSTM')


Step      1: Ran 1 train steps in 22.91 secs
Step      1: train CrossEntropyLoss |  5.76504803
Step      1: eval  CrossEntropyLoss |  4.79372247
Step      1: eval          Accuracy |  0.18692371


In [16]:
SRU_training_loop = train_model(SRUmodel, data_generator, n_steps = 10, output_dir = 'model/SRU')


Step      1: Ran 1 train steps in 11.45 secs
Step      1: train CrossEntropyLoss |  5.54126787
Step      1: eval  CrossEntropyLoss |  5.51660713
Step      1: eval          Accuracy |  0.08041244
