### Neural Networks

By the early 2000s, when innovations in computer hardware allowed for more complex modeling techniques, researchers had already developed neural network systems that combined many layers of neurons, including **convolutional neural networks (CNN), multi-layer perceptrons (MLP), and recurrent neural networks (RNN)**. All three of these architectures are called deep neural networks, because they have many layers of neurons that combine to create a “deep” stack of neurons. Each of these deep-learning architectures have their own relative strengths:
- **MLP** networks are comprised of layered perceptrons. They tend to be good at solving simple tasks, like applying a filter to each pixel in a photo.
- **CNN** networks are designed to process image data, applying the same convolution function across an entire input image. This makes it simpler and more efficient to process images, which generally yields very high-dimensional output and requires a great deal of processing.
- **RNNs** became widely adopted within natural language processing because they integrate a loop into the connections between neurons, allowing information to persist across a chain of neurons.

#### Long Short-term Memory Networks

Every model in the RNN family, including LSTMs, is a chain of repeating neurons at its base. Within standard RNNs, each layer of neurons will only perform a single operation on the input data.

**The most important aspect of an LSTM is the way in which the transformed input data is combined by adding results to state, or cell memory, represented as vectors. There are two states that are produced for the first step in the sequence and then carried over as subsequent inputs are processed: cell state, and hidden state.**

The cell state carries information through the network as we process a sequence of inputs. At each timestep, or step in the sequence, the updated input is appended to the cell state by a gate, which controls how much of the input should be included in the final product of the cell state. This final product, which is fed as input to the next neural network layer at the next timestep, is called a hidden state. The final output of a neural network is often the result contained in the final hidden state, or an average of the results across all hidden states in the network.

The persistence of the majority of a cell state across data transformations, combined with incremental additions controlled by the gates, allows for important information from the initial input data to be maintained in the neural network. Ultimately, this allows for information from far earlier in the input data to be used in decisions at any point in the model.


### Introduction to seq2seq
LSTMs are pretty extraordinary, but they’re only the tip of the iceberg when it comes to actually setting up and running a neural language model for text generation. In fact, an LSTM is usually just a single component in a larger network.

One of the most common neural models used for text generation is the sequence-to-sequence model, commonly referred to as seq2seq (pronounced “seek-to-seek”). A type of encoder-decoder model, seq2seq uses recurrent neural networks (RNNs) like LSTM in order to generate output, token by token or character by character.

So, where does seq2seq show up?
- Machine translation software like Google Translate
- Text summary generation
- Chatbots
- Named Entity Recognition (NER)
- Speech recognition

seq2seq networks have two parts:
- An encoder that accepts language (or audio or video) input. The output matrix of the encoder is discarded, but its state is preserved as a vector.
- A decoder that takes the encoder’s final state (or memory) as its initial state. We use a technique called “teacher forcing” to train the decoder to predict the following text (characters or words) in a target sequence given the previous text.

### Preprocessing for seq2seq
If you’re feeling a bit nervous about building this all on your own, never fear. You don’t need to start from scratch — there are a few neural network libraries at your disposal. In our case, we’ll be using TensorFlow with the Keras API to build a pretty limited English-to-Spanish translator (we’ll explain this later and you’ll get an opportunity to improve it).

We can import Keras from Tensorflow like this:

`from tensorflow import keras`

First things first: preprocessing the text data. Noise removal depends on your use case — do you care about casing or punctuation? For many tasks they are probably not important enough to justify the additional processing. This might be the time to make changes.

We’ll need the following for our Keras implementation:
- vocabulary sets for both our input (English) and target (Spanish) data
- the total number of unique word tokens we have for each set
- the maximum sentence length we’re using for each language

We also need to mark the start and end of each document (sentence) in the target samples so that the model recognizes where to begin and end its text generation (no book-long sentences for us!). One way to do this is adding `"<START>"` at the beginning and `"<END>"` at the end of each target document (in our case, this will be our Spanish sentences). For example, `"Estoy feliz."` becomes `"<START> Estoy feliz. <END>"`.

`"[\w']+|[^\s\w]"` stands for 
- `[\w']+` is a character class that matches one or more (+) characters that are either alphanumeric or underscore (\w) or apostrophe ('). This is used to match words that may contain apostrophes, such as “don’t” or “it’s”.
- | is the alternation operator that means “or”. It separates the two possible options for matching.
- `[^\s\w]` is another character class that matches a single character that is not (^) either whitespace (\s) or alphanumeric or underscore (\w). This is used to match punctuation marks or symbols, such as “.” or “#”.

In [1]:
from tensorflow import keras
import re
# Importing our translations
data_path = "span-eng.txt"
# Defining lines as a list of each line
with open(data_path, 'r', encoding='utf-8') as f:
  lines = f.read().split('\n')

# Building empty lists to hold sentences
input_docs = []
target_docs = []
# Building empty vocabulary sets
input_tokens = set()
target_tokens = set()

for line in lines:
  # Input and target sentences are separated by tabs
  input_doc, target_doc = line.split('\t')
  # Appending each input sentence to input_docs
  input_docs.append(input_doc)
  # Splitting words from punctuation
  target_doc = " ".join(re.findall(r"[\w']+|[^\s\w]", target_doc))
  # Redefine target_doc below 
  # and append it to target_docs:
  target_doc = '<START> '+target_doc+' <END>'
  target_docs.append(target_doc)
  # Now we split up each sentence into words
  # and add each unique word to our vocabulary set
  for token in re.findall(r"[\w']+|[^\s\w]", input_doc):
    print(token)
    # Add your code here:
    if token not in input_tokens:
      input_tokens.add(token)
    
  for token in target_doc.split():
    print(token)
    # And here:
    if token not in target_tokens:
      target_tokens.add(token)

input_tokens = sorted(list(input_tokens))
target_tokens = sorted(list(target_tokens))

# Create num_encoder_tokens and num_decoder_tokens:
num_encoder_tokens = len(input_tokens)
num_decoder_tokens = len(target_tokens)
try:
  max_encoder_seq_length = max([len(re.findall(r"[\w']+|[^\s\w]", input_doc)) for input_doc in input_docs])
  max_decoder_seq_length = max([len(re.findall(r"[\w']+|[^\s\w]", target_doc)) for target_doc in target_docs])
except ValueError:
  pass

ModuleNotFoundError: No module named 'tensorflow'