# 16. Natural Language Processing with RNNs and Attention

Looking at it from a certain perspective, the Turing test is an NLP task. This chapter will focus on how to tackle NLP tasks (albeit less complex than a Turing test) using RNNs. 

### Generating Shakespearean Text Using a Character RNN

Let's look at how to build a Char-RNN, a net that predicts the next character in a sentence. 

#### Creating the Training Dataset

Downloading the file from Andrej Karpathy's GitHub repo:

In [1]:
from tensorflow import keras
import os

filepath = os.path.join(os.getcwd(), 'datasets', 'shakespeare', 'input.txt')
with open(filepath) as f:
    shakespeare_text = f.read()

Next, we must encode every character as an integer. We will use `Tokenizer` for this. 

In [2]:
tokenizer = keras.preprocessing.text.Tokenizer(char_level=True)
tokenizer.fit_on_texts([shakespeare_text])

Let's quickly check what it does: 

In [3]:
tokenizer.texts_to_sequences(["Hello"])

[[7, 2, 12, 12, 4]]

In [4]:
tokenizer.sequences_to_texts([[7, 2, 12, 12, 4]])

['h e l l o']

In [5]:
max_id = len(tokenizer.word_index) # number of distinct characters

In [6]:
dataset_size = tokenizer.document_count # total number of characters

In [8]:
import numpy as np

[encoded] = np.array(tokenizer.texts_to_sequences([shakespeare_text])) - 1 # starting from 0