# T81-558: Applications of Deep Neural Networks
**Module 11: Natural Language Processing and Speech Recognition**
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

# Module 11 Material

* **Part 11.1: Getting Started with Spacy in Python** [[Video]](https://www.youtube.com/watch?v=bv_iVVrlfbU) [[Notebook]](t81_558_class_11_01_spacy.ipynb)
* Part 11.2: Word2Vec and Text Classification [[Video]](https://www.youtube.com/watch?v=qN9hHlZKIL4) [[Notebook]](t81_558_class_11_02_word2vec.ipynb)
* Part 11.3: What are Embedding Layers in Keras [[Video]](https://www.youtube.com/watch?v=Ae3GVw5nTYU) [[Notebook]](t81_558_class_11_04_embedding.ipynb)
* Part 11.4: Natural Language Processing with Spacy and Keras [[Video]](https://www.youtube.com/watch?v=Ae3GVw5nTYU) [[Notebook]](t81_558_class_11_03_text_nlp.ipynb)
* Part 11.5: Learning English from Scratch with Keras and TensorFlow [[Video]](https://www.youtube.com/watch?v=Ae3GVw5nTYU) [[Notebook]](t81_558_class_11_05_english_scratch.ipynb)

# Part 11.1: Getting Started with Spacy in Python

You will need to ensure that you've installed a language with Spacy.  If you do not, you will get the following error:

```
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
```

To install English, use the following command:

```
python -m spacy download en
```


### Tokenization

In [14]:
import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp(u"Apple is looking at buying a U.K. startup for $1 billion")
for token in doc:
    print(token.text)

Apple
is
looking
at
buying
a
U.K.
startup
for
$
1
billion


In [11]:
for word in doc:  
    print(word.text,  word.pos_)

I PRON
want VERB
an DET
iPad PROPN
, PUNCT
Laptop PROPN
, PUNCT
and CCONJ
a DET
dog NOUN
. PUNCT


In [15]:
for word in doc:
    print(f"{word} is like number? {word.like_num}")

Apple is like number? False
is is like number? False
looking is like number? False
at is like number? False
buying is like number? False
a is like number? False
U.K. is like number? False
startup is like number? False
for is like number? False
$ is like number? False
1 is like number? True
billion is like number? True


### Sentence Diagramming

In [6]:
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")
doc = nlp(u"I want an iPad, Laptop, and a dog.")
displacy.serve(doc, style="dep")

  "__main__", mod_spec)



Using the 'dep' visualizer
Serving on http://0.0.0.0:5000 ...

Shutting down server on port 5000.


In [7]:
print(doc)

I want an iPad, Laptop, and a dog.


In [16]:
import spacy

# Initialize spacy 'en' model, keeping only tagger component needed for lemmatization
nlp = spacy.load('en', disable=['parser', 'ner'])

sentence = "The striped bats are hanging on their feet for best"

# Parse the sentence using the loaded 'en' model object `nlp`
doc = nlp(sentence)

# Extract the lemma for each token and join
" ".join([token.lemma_ for token in doc])
#> 'the strip bat be hang on -PRON- foot for good'

'the stripe bat be hang on -PRON- foot for good'

### Stop Words



In [22]:
from spacy.lang.en.stop_words import STOP_WORDS

print(STOP_WORDS)

{'doing', 'per', 'whither', 'go', 'too', 'yourself', "'s", 'enough', 'part', 'during', 'the', '’re', 'many', 'it', 'others', "'d", 'besides', '’m', 'beside', 'out', 'formerly', 'anyway', 'hereafter', 'sometimes', 'would', 'she', 'did', 'seem', 'only', 'had', 'to', 'among', 'six', 'was', 'due', 'we', 'throughout', 'those', 'using', 'down', 'everywhere', 're', 'via', "'re", 'each', 'might', 'toward', 'myself', 'mine', 'has', 'same', 'something', 'four', 'last', 'every', 'through', 'without', 'less', 'along', 'have', 'top', 'whereupon', 'anything', 'meanwhile', 'perhaps', 'sometime', 'take', 'after', 'around', 'though', 'former', 'does', 'third', 'eight', 'onto', 'often', 'or', '‘re', 'its', 'your', 'but', 'now', 'whatever', 'therefore', 'became', 'serious', 'three', 'he', 'no', 'empty', 'all', 'none', 'much', 'sixty', 'whether', "'m", 'n‘t', 'him', 'between', 'seems', '’ll', 'with', 'any', 'done', '’d', 'otherwise', 'my', 'himself', 'either', 'fifteen', 'towards', 'twelve', 'whereafter',