# Dense network summary

* Dense networks take fixed length input and have a fixed length output
* Like All Neural Network layers they require an activation function
* They can be stacked to represent more complicated functions
* You're taking your chances when predicting data that's very different from you're training data


# Sequences
A lot of data we have doesn't have a fixed dimension, for example text. To make our predictions we need another kind of layer called a recurrent layer. 

<img src="../assets/rnn.gif">

Recurrent layers step through each data point in a sequence, and output one number at the end (or another sequence)
Examples:
* RNN : First simplest recurrent layer
* LSTM: Long Short Term Memory Networks
* GRU: Gated Recurrent Unit

All of the above are implemented differently with different strengths, but for now lets stick with an LSTM



In [13]:
X=[]
Y=[]
for i in range(10000):
    _dp=[]
    _dp=np.random.randint(50)+np.linspace(0,4,5)
    Y.append(_dp[-1]+1)
    X.append(np.expand_dims(_dp,-1))
X=np.array(X)
Y=np.array(Y)
print(X[0],Y[0])

print(X.shape)




[[19.]
 [20.]
 [21.]
 [22.]
 [23.]] 24.0
(10000, 5, 1)


In [27]:
input_layer=tf.keras.layers.Input((None,1))
print(input_layer)
output_layer=tf.keras.layers.LSTM(1,activation='linear')(input_layer)
model=tf.keras.models.Model([input_layer],[output_layer])
opt=tf.keras.optimizers.Adam(lr=1e-3)

model.compile(loss='mse',optimizer=opt)
model.summary()
es=tf.keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=0, mode='auto')


model.fit(X,Y,validation_split=0.5,epochs=100,callbacks=[es])



Tensor("input_7:0", shape=(?, ?, 1), dtype=float32)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_7 (InputLayer)         (None, None, 1)           0         
_________________________________________________________________
lstm_3 (LSTM)                (None, 1)                 12        
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________
Train on 5000 samples, validate on 5000 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/

<tensorflow.python.keras.callbacks.History at 0x7ffab71ccf60>

In [29]:
model.predict(np.expand_dims([[2,3,4,5,6]],-1) )

array([[7.0952587]], dtype=float32)

## Text Data
The coding/research part of most machine learning algorithms is how to utilize data in a way an algorthim understands.

### Goal
Read the raw the raw text from these movie reviews, and predict wether the review is positive or not
* Need to go from an array (1-D unknown length) to a probability (1 number)
* Need to build a series of layers to make that possible
* We will take text that is transformed into an sequence of integers
  * For this data we assign each word (token) in a sentence a unique integer
* We will transform the sequence of integers into an sequence of vectors
    * Do this with a new Embedding Layer
* Then use an LSTM layer, and a Dense layer to make a prediction

Array of Ints -> **Embedding** -> Array of Vectors -> **RNN** -> fixed output -> **Dense** -> Probability


In [16]:
# Lets load some input data in the 2-split format
index_from=3
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.imdb.load_data(path="imdb.npz",
                                                      num_words=None,
                                                      skip_top=0,
                                                      maxlen=None,
                                                      seed=113,
                                                      start_char=1,
                                                      oov_char=2,
                                                      index_from=index_from)


word_2_index={k:(v+index_from) for k,v in tf.keras.datasets.imdb.get_word_index().items()}
word_2_index['<PAD>']=0
word_2_index['<START>']=1
word_2_index['<UNK>']=2

index_2_word={}

for word in word_2_index:
    index_2_word[ word_2_index[word]]=word


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json


In [17]:
check=['this','is','a','sentence']
print(check, [word_2_index[i] for i in check])

['this', 'is', 'a', 'sentence'] [14, 9, 6, 4130]


In [18]:
last_word=np.max(list(word_2_index.values()))

In [19]:
print(x_train.shape)
print(y_train.shape)

print(x_train[0])
print(y_train[0])


print(" ".join([index_2_word[i] for i in x_train[0]]))

print('1 = Positive Review','0 = Negative Review ')
print('label',y_train[0])

(25000,)
(25000,)
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 22665, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 21631, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 19193, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 10311, 8, 4, 107, 117, 5952, 15, 256, 4, 31050, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 12118, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]
1
<START> this fil

**Note:**
This data has already been 'tokenized' meaning the text has been pre-processed. In this case made lowercase with punctuation removed. There are many different ways of doing this.


In [20]:
input_layer=tf.keras.layers.Input( (None,))
print(input_layer)

Tensor("input_5:0", shape=(?, ?), dtype=float32)


In [21]:
nn=tf.keras.layers.Embedding(last_word,100)(input_layer)
nn=tf.keras.layers.LSTM(10)(nn)
nn=tf.keras.layers.Dense(10)(nn)
nn=tf.keras.layers.LeakyReLU()(nn)
output=tf.keras.layers.Dense(1,activation='sigmoid')(nn)

model=tf.keras.models.Model(input_layer,output)
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_5 (InputLayer)         (None, None)              0         
_________________________________________________________________
embedding (Embedding)        (None, None, 100)         8858700   
_________________________________________________________________
lstm_1 (LSTM)                (None, 10)                4440      
_________________________________________________________________
dense_6 (Dense)              (None, 10)                110       
_________________________________________________________________
leaky_re_lu_3 (LeakyReLU)    (None, 10)                0         
_________________________________________________________________
dense_7 (Dense)              (None, 1)                 11        
Total params: 8,863,261
Trainable params: 8,863,261
Non-trainable params: 0
_________________________________________________________________


## Binary Cross Entropy

Our last layer here is using a sigmoid activation, which is bounded between zero and 1

<img src="../assets/sigmoid.png">

The activation on your last layer has to match your loss function in this case 'Binary Cross-Entropy'
<p style="text-align: center;">
$L= -1*\sum_i y_{true,i}*ln(y_{pred,i}) + (1-y_{true,i})*ln(1-y_{pred,i}) $
</p>
Which is minimized when $y_{pred}=y_{true}$


In [22]:
# We need each batch given to the model to have the same size
# For right now we will just make all the data the same size by padding or cropping to a length of 200

x_train=tf.keras.preprocessing.sequence.pad_sequences(x_train, maxlen=200, dtype='int32',value=0.0)
x_test=tf.keras.preprocessing.sequence.pad_sequences(x_test, maxlen=200, dtype='int32',value=0.0)



In [23]:
es=tf.keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=1, verbose=0, mode='auto')
history=model.fit(x_train,y_train,validation_data=(x_test,y_test),epochs=10,callbacks=[es])

Train on 25000 samples, validate on 25000 samples
Epoch 1/10
Epoch 2/10


In [24]:
def sentence_2_ints(sentence):
    return np.array([[word_2_index[s] for s in sentence.split()]])
    
    

In [25]:
print(model.predict(sentence_2_ints('<START> this movie is the very best i have ever seen') ))

print(model.predict(sentence_2_ints('<START> i have mixed feelings about this movie') ))
print(model.predict(sentence_2_ints('<START> i have mixed feelings about this movie i may like it in the end') ))

print(model.predict(sentence_2_ints('<START> i have never seen a worse film') ))
print(model.predict(sentence_2_ints('<START> hi is this where i google the information') ))

print(model.predict(sentence_2_ints('<START> star trek') ))
print(model.predict(sentence_2_ints('<START> star wars') ))


[[0.87367845]]
[[0.5959961]]
[[0.73580027]]
[[0.10441049]]
[[0.44264108]]
[[0.3767595]]
[[0.3082046]]


# A short Menu of other ML layers
* Convolutional Layers (Conv1D, Conv2D, Conv3D)
    * Input sequences of fixed or varying length best when array values that are close together are correlate i.e pictures
    * Output a new sequence normally lower dimension, but with more channels    
* Recurrent Neural Networks (RNN, LSTM, GRUS)
    * Input sequence
    * Output sequence or a fixed dimensional output    

* Embedding Network
    * A learnable mapping from a large set of integers, to a fixed output
    * Input integer
    * Ouput vector

* Dense Network
    * Fixed Input
    * Fixed Output

* Dropout
    * Good at preventing overfitting