## Text Generation with Recurrent Neural Network (RNN)

Acknowledgement: This notebook uses the project TEXGRNN from Max Woolf https://github.com/minimaxir/textgenrnn . Great work Mak !

#### Introduction

Text generation is a challenging problem that even the largest data science teams are still struggling with, so we'll explore some of the most common and accessible methods to solve the problem, starting at a somewhat basic level. The approach we will attempt in this notebook is:

* RNN/LSTM

Specifically, we will be using the library textgenrnn. textgenrnn is a Python module on top of Keras/TensorFlow which can easily generate text using a pretrained recurrent neural network.  

Please install from the command line.
> pip install textgenrnn



In [None]:
#!pip install textgenrnn

In [1]:
from textgenrnn import textgenrnn

Using TensorFlow backend.


#### Train a new model
You can train a new model using any modern RNN architecture you want by:
* calling train_new_model if supplying texts, or adding a new_model=True parameter if training from a file. If you do, the model will save a config file and a vocab file in addition to the weights, and those must be also loaded into a textgenrnn instances.

The config parameters available are:
* word_level: Whether to train the model at the word level or character level (default: False)
* rnn_layers: Number of recurrent LSTM layers in the model (default: 2)
* rnn_size: Number of cells in each LSTM layer (default: 128)
* rnn_bidirectional: Whether to use Bidirectional LSTMs, which account for sequences both forwards and backwards. Recommended if the input text follows a specific schema. (default: False)
* max_length: Maximum number of previous characters/words to use before predicting the next token. This value should be reduced for word-level models (default: 40)
* max_words: Maximum number of words (by frequency) to consider for training (default: 10000)
* dim_embeddings: Dimensionality of the character/word embeddings (default: 100)



Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])


In [2]:
## The format of the input file is simply one line per document. 
## When preparing the file, include opening and closing quotes for accurately preprocessing
## in the output, the temperature value (0 to 1) refers to the level of creativity

## At the end of the training, the model is saved to a file textgenrnn_weights.hdf5

textgen = textgenrnn()
textgen.train_from_file('./data/reflections.txt', max_length=40, word_level=True, rnn_size=64,  num_epochs=2, dim_embeddings=100, rnn_bidirectional=False)



345 texts collected.
Training on 49,613 character sequences.
  ...
    to  
  ['...']
Train for 387 steps
Epoch 1/2
Temperature: 0.2
####################
so that we are some of the team worksheet which we are also as it would be a project of the project and some of the project that we had a project and we can see the project and we had the worksheet where we see the worksheet when the worksheet is a project and we had the worksheet and we see the wo

what we had to contribute the project and the project that we had the project and a project and we can help the project and we had the project and the project and the schedules and the project when the school was submitting the project that we had the project that we had the project and some of the

what we are doing the worksheet and a whatsapp and we can contribute the project and we can help and we see the worksheet and we had ensure the project where we discuss on the project and stakeholder and some of the project and the team workshe

* Now the fun part, to generate some random text

In [3]:
# generate 1 text document
textgen.generate(2)

  0%|                                                                                            | 0/2 [00:00<?, ?it/s]

support charter of the team worksheet and we shared doc for the time and planning through the terms to do the worksheet and according to the



 50%|██████████████████████████████████████████                                          | 1/2 [00:10<00:10, 10.04s/it]

according to ensure all of the questions and staff of the team worksheet that we also the presentation of the team worksheet and do the end application required and also the question to come to complete the team and I can be able to share the project and will request at the project and then increas



100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:30<00:00, 15.35s/it]


* Generate 1 text document starting with a seed setence such as "My team mates decide to use Google to"

In [4]:
textgen.generate(1, prefix="My team mates decide to use Google for")

My team mates decide to use Google for the team of the worksheet and adding each other to the team worksheet and project. We are doing the worksheet so it will be additing in the project and team does not be able to start answering project each



## Exercise A

As with any training of ML/DL models, a lot of work goes into the tuning of the hyperparameters and applying  intuition on what might yield an acceptable results.
For the task of generating text, the following parameters should result in different performance of the model.

- Are you using training based on word sequence or character sequence?
- What is the size of the dimensions? (We admit that don't know if the module textgenrnn is using word2vec or other variations. This is not documentation)
- Whether you are training in the forward direction or using bi-directional network?


#### Your task: 
Change the input parameters of the method train_from_file() to try out the different values of hyperparameters
- word_level=True, word_level=False
- dim_embeddings=100, dim_embeddings=50, dim_embeddings=150
- rnn_bidirectional=False, rnn_bidirectional=True

The, retrain the model and generate some text!

In [None]:
# your answers
textgen.train_from_file('./data/reflections.txt', max_length=40, word_level=True, rnn_size=64,  num_epochs=2, dim_embeddings=100, rnn_bidirectional=True)


### Optional Exercise

Use a different source text such as songs or quotes or product review and see what AI can generate for you !