# Deep Learning

![nlp](https://wrm5sysfkg-flywheel.netdna-ssl.com/wp-content/uploads/2019/01/NLP-Technology-in-Healthcare.jpg)

# Introduction

Section goals:
- Understand basic overview of Deep Learning
- Understand basics of LSTM and RNN
- Use LSTM to generate text from source corpus 
- Create QA Chatbots with Python

# 8.1.0 The basic Perceptron Model

Artificial neural networks or ANN's have a basis in biology. A Perceptron is the common term for an artificial neuron that mimics a biological neuron.

The biological neuron is made up of component parts: 
- Dendrites
- Body
- Axon

The artificial neuron is also multipart: 
- Inputs
- Body
- Output

# 8.2.0 Keras

In [29]:
import numpy as np

In [30]:
from sklearn.datasets import load_iris

In [31]:
iris = load_iris()

In [32]:
x = iris.data

In [33]:
y = iris.target

In [34]:
from keras.utils import to_categorical

In [35]:
y = to_categorical(y)

In [36]:
from sklearn.model_selection import train_test_split

In [37]:
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.33, random_state=42)

In [38]:
from sklearn.preprocessing import MinMaxScaler

In [39]:
scaler = MinMaxScaler()

In [40]:
scaler.fit(x_train)

MinMaxScaler()

In [41]:
scaled_x_train = scaler.transform(x_train)

In [42]:
scaled_x_test = scaler.transform(x_test)

In [43]:
from keras.models import Sequential
from keras.layers import Dense

In [44]:
model = Sequential()
model.add(Dense(8, input_dim=4, activation='relu'))
model.add(Dense(8, input_dim=4, activation='relu'))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [45]:
model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_4 (Dense)              (None, 8)                 40        
_________________________________________________________________
dense_5 (Dense)              (None, 8)                 72        
_________________________________________________________________
dense_6 (Dense)              (None, 3)                 27        
Total params: 139
Trainable params: 139
Non-trainable params: 0
_________________________________________________________________


In [61]:
model.fit(scaled_x_train, y_train, epochs=200, verbose=0)

<keras.callbacks.callbacks.History at 0x7f8428fdaa50>

In [47]:
model.predict(scaled_x_test)

array([[6.46506203e-03, 6.57006800e-01, 3.36528093e-01],
       [9.71322954e-01, 2.84389425e-02, 2.38171298e-04],
       [2.81957318e-06, 5.48888147e-02, 9.45108354e-01],
       [4.88138339e-03, 6.34165525e-01, 3.60953093e-01],
       [1.08065910e-03, 4.24147785e-01, 5.74771523e-01],
       [9.55046833e-01, 4.44088355e-02, 5.44341747e-04],
       [4.82193977e-02, 8.23449790e-01, 1.28330842e-01],
       [5.31006153e-05, 1.68322966e-01, 8.31623852e-01],
       [9.44099447e-04, 3.18148047e-01, 6.80907905e-01],
       [2.09649596e-02, 7.73502707e-01, 2.05532342e-01],
       [3.93887254e-04, 3.52640182e-01, 6.46965921e-01],
       [9.63348567e-01, 3.60602252e-02, 5.91216958e-04],
       [9.74746764e-01, 2.50020996e-02, 2.51196208e-04],
       [9.66157794e-01, 3.33341695e-02, 5.07981807e-04],
       [9.87617075e-01, 1.22843934e-02, 9.85338265e-05],
       [3.89748579e-03, 6.60716176e-01, 3.35386276e-01],
       [3.94649978e-05, 1.44074813e-01, 8.55885684e-01],
       [2.58401204e-02, 7.68771

In [48]:
model.predict_classes(scaled_x_test)

array([1, 0, 2, 1, 2, 0, 1, 2, 2, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0, 0, 0, 2, 1, 1, 0,
       0, 1, 2, 2, 1, 2])

In [49]:
preds = model.predict_classes(scaled_x_test)

In [50]:
y_test.argmax(axis=1)

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0, 0, 0, 2, 1, 1, 0,
       0, 1, 2, 2, 1, 2])

In [51]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [52]:
confusion_matrix(y_test.argmax(axis=1), preds)

array([[19,  0,  0],
       [ 0, 13,  2],
       [ 0,  0, 16]])

In [54]:
print(classification_report(y_test.argmax(axis=1), preds))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      0.87      0.93        15
           2       0.89      1.00      0.94        16

    accuracy                           0.96        50
   macro avg       0.96      0.96      0.96        50
weighted avg       0.96      0.96      0.96        50



In [55]:
accuracy_score(y_test.argmax(axis=1), preds)

0.96

In [56]:
model.save('iris_Classifier.h5')

# 8.3.0 - RNNs - Recurrent Neural Networks

RNNs are typically applied to sequence based data. By example:
- Time series Data (sales)
- Sentences (stream of words)
- Audio (stream of digital sounds)
- Car trajectories (stream of GPS points)
- Music (stream of sounds)

For a "normal" neuron in a feed-forward network the the inputs are weighted and those calculations feed into an activation function and onward to an output. In an RNN setting the out is sent back to the neuron itself as the input, allowing for the unrolling through time. 
We should note that in an RNN the neuron is receiving as inputs: 
1. The outputs of the previous time step. 
2. The inputs of the current time step. 


# 8.4.0 - LSTM & GRU

An issue that RNNs face is that after a while the network will begin to "forget" the first, early inputs as information is lost in each step going through the RNN. Therefore we need some long-term-memory for our network. This is where, or why, the LSTM was created.

Let's discuss how an LSTM cell works. 

1. forget gate layer - What info do we want to forget or discard? This receives $H_{t-1}$ and $X_{t}$ as inputs, this is transformed with weights and biases and passed to a sigmoid function. Return is between 0-1, forget or keep. 
2. Now we decide what information we are going to store in the cell state. There is two parts 
    1. A sigmoid layer - input gate later or $I_{t}$ receives the same inputs as step 1
    2. A hyperbolic tangent layer - receives the same inputs as step 1 also, creates a vector of new candidate values. $\tilde{C}_{t}$
3. It's time to update the old cell state $C_{t -1}$ to $\tilde{C}_{t}$ and pass to the $C_{t+1}$
4. Take the inputs of step and pass through the hyperbolic tangent

See: 
- [Peepholes](https://en.wikipedia.org/wiki/Long_short-term_memory)
- [GRU](https://en.wikipedia.org/wiki/Gated_recurrent_unit)

# 8.5.0 Text Generation with Python and Keras

#### Part 1
- Process the text
- Clean the text
- Tokenize the text and create sequences with Keras

In [132]:
import spacy 

In [133]:
# load the english lib, 
# disable the lib segments that are not 
# required for this project. 

if not nlp:
    nlp = spacy.load('en', disable=['parser', 'tagger', 'ner'])
    print(f"Loaded english language and disabled the parser, tagger & named entity recognition (ner)")
else:
    print(f"spacy already loaded with english language")

spacy already loaded with english language


In [134]:
# utility function to read a file in as a single string of
# text we can then process or slice as required. 

def read_file(filepath):
    with open(filepath) as f:
        str_text = f.read()
    return str_text

print(f"Created the read_file utility function")

Created the read_file utility function


In [135]:
# set big enough to work with entire moby dick text
nlp.max_length = 1198623

In [136]:
def separate_punctuation(doc_text):
    return [token.text.lower() for token in nlp(doc_text) if token.text not in '\n\n \n\n\n"-#$%;:.`{}[]\t\n' ]

print(f"created the separate_punctuation utility function")

created the separate_punctuation utility function


In [137]:
if not d:
    d = read_file('./resources/moby_dick_four_chapters.txt')
    print(f"loaded text into variable d")
else:
    print(f"variable d was already loaded.")

variable d was already loaded.


In [138]:
if not tokens:
    tokens = separate_punctuation(d)
    print(f"tokens created from d minus the punctuation")
else:
    print(f"tokens already created as d minus the punctuation")

tokens already created as d minus the punctuation


In [139]:
len(tokens)

12454

In [140]:
# 25 words -> network predict word 26
train_len = 25 + 1

if not text_sequences:
    # create the holding pen
    text_sequences = []

    # perform the sequeces creation
    for i in range(train_len, len(tokens)):
        seq = tokens[i-train_len:i]
        text_sequences.append(seq)

else: 
    print(f"text_sequences were already created")

text_sequences were already created


In [141]:
from keras.preprocessing.text import Tokenizer

In [142]:
tokenizer = Tokenizer()
tokenizer.fit_on_texts(text_sequences)

In [143]:
sequences = tokenizer.texts_to_sequences(text_sequences)

In [144]:
len(tokenizer.word_counts)

2723

In [145]:
if not vocabulary_size:
    vocabulary_size = len(tokenizer.word_counts)
    
print(f"vocabularly_size: {vocabulary_size}")

vocabularly_size: 2723


In [146]:
import numpy as np

In [147]:
sequences = np.array(sequences)

In [148]:
sequences

array([[ 962,   15,  269, ...,  154,  265,    7],
       [  15,  269,   55, ...,  265,    7,  963],
       [ 269,   55,  267, ...,    7,  963,   15],
       ...,
       [2718,    1,    4, ...,  268,   57,    3],
       [   1,    4,   11, ...,   57,    3, 2723],
       [   4,   11, 2719, ...,    3, 2723,   28]])

In [149]:
# create the features labels split
from keras.utils import to_categorical

In [150]:
x = sequences[:,:-1]

In [151]:
y = sequences[:,-1]

In [152]:
y = to_categorical(y, num_classes=vocabulary_size + 1)

In [153]:
seq_len=x.shape[1]

In [154]:
from keras.models import Sequential
from keras.layers import Dense, LSTM, Embedding

In [171]:
def create_model(vocabulary_size, seq_len):
    model = Sequential()
    model.add(Embedding(vocabulary_size, seq_len, input_length=seq_len))
    model.add(LSTM(50, return_sequences=True))
    model.add(LSTM(50))
    model.add(Dense(50, activation='relu'))
    model.add(Dense(vocabulary_size, activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    model.summary()
    return model

In [178]:
model = create_model(vocabulary_size+1, seq_len)

Model: "sequential_10"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_8 (Embedding)      (None, 25, 25)            68100     
_________________________________________________________________
lstm_15 (LSTM)               (None, 25, 50)            15200     
_________________________________________________________________
lstm_16 (LSTM)               (None, 50)                20200     
_________________________________________________________________
dense_21 (Dense)             (None, 50)                2550      
_________________________________________________________________
dense_22 (Dense)             (None, 2724)              138924    
Total params: 244,974
Trainable params: 244,974
Non-trainable params: 0
_________________________________________________________________


In [179]:
from pickle import dump, load

In [180]:
model.fit(x,y, batch_size=128, epochs=50, verbose=1)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.callbacks.History at 0x7f8412516ed0>