# T81-558: Applications of Deep Neural Networks
**Module 11: Natural Language Processing and Speech Recognition**
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

# Module 11 Material

* Part 11.1: Getting Started with Spacy in Python [[Video]](https://www.youtube.com/watch?v=bv_iVVrlfbU) [[Notebook]](t81_558_class_11_01_spacy.ipynb)
* Part 11.2: Word2Vec and Text Classification [[Video]](https://www.youtube.com/watch?v=qN9hHlZKIL4) [[Notebook]](t81_558_class_11_02_word2vec.ipynb)
* **Part 11.3: What are Embedding Layers in Keras** [[Video]](https://www.youtube.com/watch?v=Ae3GVw5nTYU) [[Notebook]](t81_558_class_11_03_embedding.ipynb)
* Part 11.4: Natural Language Processing with Spacy and Keras [[Video]](https://www.youtube.com/watch?v=Ae3GVw5nTYU) [[Notebook]](t81_558_class_11_04_text_nlp.ipynb)
* Part 11.5: Learning English from Scratch with Keras and TensorFlow [[Video]](https://www.youtube.com/watch?v=Ae3GVw5nTYU) [[Notebook]](t81_558_class_11_05_english_scratch.ipynb)

# Part 11.3: What are Embedding Layers in Keras

[Embedding Layers](https://keras.io/layers/embeddings/) are a powerful feature of Keras that allow additional information to be automatically inserted into your neural network.  In the previous section you saw that Word2Vec can expand words to a 300 dimension vector.  An embedding layer would allow you to automatically insert these 300-dimension vectors in the place of word-indexes.  

Embedding layers are often used with Natural Language Processing (NLP); however, they can be used in any instance where you wish to insert a larger vector in the place of an index value.  In some ways you can think of an embedding layer as dimension expansion. However, the hope is that these additional dimensions will provide more information to the model and provide a better score.

In [4]:
from keras.models import Sequential
from keras.layers import Embedding
import numpy as np

model = Sequential()
model.add(Embedding(1000, 64, input_length=10))
# the model will take as input an integer matrix of size (batch, input_length).
# the largest integer (i.e. word index) in the input should be
# no larger than 999 (vocabulary size).
# now model.output_shape == (None, 10, 64), where None is the batch dimension.

input_array = np.random.randint(1000, size=(32, 10))

model.compile('rmsprop', 'mse')
output_array = model.predict(input_array)
assert output_array.shape == (32, 10, 64)

In [5]:
output_array

array([[[ 0.00148531, -0.02484481,  0.03750004, ...,  0.04823485,
         -0.03592342, -0.03295747],
        [-0.04962308, -0.023443  ,  0.00899249, ...,  0.03256037,
          0.01375597,  0.03997575],
        [ 0.04146341, -0.04022657, -0.02679427, ...,  0.02797518,
          0.02481229, -0.04230652],
        ...,
        [-0.03651143,  0.01828153, -0.01824565, ...,  0.01133262,
         -0.02873353,  0.00828058],
        [ 0.0379012 ,  0.02493343, -0.03205984, ...,  0.01685869,
          0.01029737, -0.00607287],
        [-0.00345949, -0.02231693,  0.04504948, ...,  0.0113552 ,
         -0.01478071,  0.0245846 ]],

       [[ 0.02033497,  0.03670131,  0.01287762, ...,  0.03170332,
         -0.04594446, -0.03487029],
        [ 0.03862177, -0.01650984,  0.01120607, ...,  0.01798854,
          0.00339844, -0.00497578],
        [-0.0312975 ,  0.00677928,  0.010879  , ..., -0.00454694,
          0.02128942, -0.04169091],
        ...,
        [ 0.01359783,  0.01332318, -0.03400425, ...,  