<a id='top'></a><a name='top'></a>
# [Natural Language Processing in Action, v1](https://www.manning.com/books/natural-language-processing-in-action/)

[Github repo](https://github.com/totalgood/nlpia)


1. [Packets of thought (NLP overview)](#1.0)
2. [Build your vocabulary (word tokenization)](#2.0)
3. [Math with words (TF-IDF vectors)](#3.0)
4. [Finding meaning in word counts (semantic analysis)](#4.0)
5. [Baby steps with neural networks (perceptrons and backpropagation)](#5.0)
6. [Reasoning with word vectors (Word2vec)](#6.0)
7. [Getting words in order with convolutional neural networks (CNNs)](#7.0)
8. [Loopy (recurrent) neural networks (RNNs)](#8.0)
9. [Improving retention with long short-term memory networks](#9.0)
10. [Sequence-to-sequence models and attention](#10.0)
11. [Information extraction (named entity extraction and question answering)](#11.0)
12. [Getting chatty (dialog engines)](#12.0)
13. [Scaling up (optimization, parallelization, and batch processing)](#13.0)

---
<a name='1.0'></a><a id='1.0'></a>
# Chapter 1: Packets of thought
## NLP overview
<a href="#top">[back to top]</a>

## Overview
1. What natural language processing (NLP) is.
2. Why NLP is hard and only recently has become widespread.
3. When word order and grammar is important and when it can be ignored.
4. How a chatbot combines many of the tools of NLP.
5. How to use a regular expression to build the start of a tiny chatbot.

## Summary 
1. NLP can be very useful.
2. The meaning and intent of words can be deciphered by machines.
3. A smart NLP pipeline will be able to deal with ambiguity.
4. We can teach machines common sense knowledge without spending a lifetime training them.
5. Chatbots can be thought of as semantic search engines.
6. Regular expressions are useful for more than just search.

In [1]:
!pwd

/Users/gb/Desktop/examples


---
<a name='2.0'></a><a id='2.0'></a>
# Chapter 2: Build your vocabulary
## Word tokenization
<a href="#top">[back to top]</a>

## Overview
1. Tokenizing your text into words and n-grams (tokens).
2. Dealing with nonstandard punctuation and emoticons, like social media posts.
3. Compressing your token vocabulary with stemming and lemmatization.
4. Building a vector representation of a statement.
5. Building a sentiment analyzer from handcrafted token scores. 

## Summary
1. Implement tokenization and configuration of tokenizers for applications.
2. n-gram tokenization helps retain some of the word order information in a document.
3. Normalization and stemming consolidate words into groups that improve the "recall" for search engines but reduce precision.
4. Lemmatization and customized tokenizers like `casual_tokenize()` can improve precision and reduce information loss.
5. Stop words can contain useful information, and discarding them is not always helpful.

---
<a name='3.0'></a><a id='3.0'></a>
# Chapter 3: Math with words 
## TF-IDF vectors
<a href="#top">[back to top]</a>

## Overview
1. Counting words and term frequencies to analyze meaning.
2. Predicting word occurrence probabilities with Zipf's Law.
3. Vector representation of words and how to start using them.
4. Finding relevant documents from a corpus using inverse document frequencies.
5. Estimating the similarity of pairs of documents with cosine similarity and Okapi BM25.

## Summary
1. Any web-scale search engine with millisecond response times has the power of a TF-IDF term document matrix hidden under the hood.
2. Term frequencies must be weighed by their inverse document frequency to ensure the most important, most meaningful words are given the heft they deserve.
3. Zipf's Law can help you predict the frequencies of all sorts of things, including words, characters and people.
4. The rows of a TF-IDF term document matrix can be used as a vector representation of the meanings of those individual words to create a vector space model of word semantics.
5. Euclidean distance and similarity between pairs of high dimensional vectors doesn't adequately represent their similarity for most NLP applications.
6. Cosine distance, the amount of "overlap" between vectors, can be calculated efficiently by just multiplying the elements of normalized vectors together and summing up those products.
7. Cosine distance is the go-to similarity score for most natural language vector representations.

---
<a name='4.0'></a><a id='4.0'></a>
# Chapter 4: Finding meaning in word counts
## Semantic analysis
<a href="#top">[back to top]</a>

## Overview
1. Analyzing semantics (meaning) to create topic vectors.
2. Semantic search using the similarity between topic vectors.
3. Scalable semantic analysis and semantic search for large corpora.
4. Using semantic components (topics) as features in your NLP pipeline.
5. Navigating high-dimensional vector spaces.

## Summary
1. You can use SVD for semantic analysis to decompose and transform TF-IDF and BOW vectors into topic vectors.
2. Use LDiA when you need to compute explainable topic vectors.
3. No matter how much you create your topic vectors, they can be used for semantic search to find documents based on their meaning.
4. Topic vectors can be used to predict whether a social post is spam or is likely to be "liked."
5. Now you know how to sidestep around the curse of dimensionality to find approximate nearest neighbors in your semantic vector space.

---
<a name='5.0'></a><a id='5.0'></a>
# Chapter 5: Baby steps with neural networks
## Perceptrons and backpropagation
<a href="#top">[back to top]</a>

## Overview

1. Learning the history of neural networks.
2. Stacking perceptrons.
3. Understanding backpropagation.
4. Seeing the knobs to turn on neural networks.
5. Implementing a basic neural network in Keras.

## Summary
1. Minimizing a cost function is a path toward learning.
2. A backpropagation algorithm is the means by which a network learns.
3. The amount a weight contributes to a model's error is directly related to the amount it needs to be updated.
4. Neural networks are, at their heart, optimization engines.
5. Watch out for pitfalls (local minima) during training by monitoring the gradual reduction in error.
6. Keras helps make all of this neural network math accessible.

---
<a name='6.0'></a><a id='6.0'></a>
# Chapter 6: Reasoning with word vectors
## Word2vec
<a href="#top">[back to top]</a>

## Overview
1. Understanding how word vectors are created.
2. Using pretrained models for your applications. 
3. Reasoning with word vectors to solve real problems.
4. Viusalizing word vectors.
5. Uncovering some surprising uses for word embeddings.

## Summary
1. Learn how word vectors and vector-oriented reasoning can solve some surprisingly subtle problems, like analogy questions and nonsynonomy relationships between words.
2. We can now train Word2vec and other word vector embeddings on the words we use in applications so the NLP pipeline is not "polluted" by the GoogleNews meaning of words inherent in most Word2vec pretrained models.
3. We can use gensim to explore, visualize and even build our own word vector vocabularies.
4. A PCA projection of geographic word vectors like US city names can reveal the cultural closeness of places (as expressed in literature) that are georgraphically far apart.
5. If you respect sentence boundaries with n-grams and are efficient at setting up word pairs for training, you can greatly improve the accuracy of your latent semantic analysis word embeddings.



---
<a name='7.0'></a><a id='7.0'></a>
# Chapter 7: Getting words in order with convolutional neural networks
## CNNs
<a href="#top">[back to top]</a>

## Overview
1. Using neural networks for NLP.
2. Finding meaning in word patterns.
3. Building a CNN.
4. Vectorizing natural language text in a way that suits neural networks.
5. Training a CNN.
6. Classifying the sentiment of novel text.

## Summary
1. A convolution is a window sliding over something larger (keeping the focus on a subset of the greater whole).
2. Neural networks can treat text just as they treat images and "see" them.
3. Handicapping the learning process with dropout actually helps.
4. Sentiment exists not only in the words but in the patterns that are used.
5. Neural networks have many knobs you can turn.

---
<a name='8.0'></a><a id='8.0'></a>
# Chapter 8: Loopy neural networks
## RNNs
<a href="#top">[back to top]</a>

## Overview
1. Creating memory in a neural net.
2. Building a recurrent neural net.
3. Data handling for RNNs.
4. Backpropagating through time (BPTT).

## Summary
1. In natural language sequences (words or characters), what came before is important to your model's understanding of the sequence.
2. Splitting a natural language statement along the dimensions of time (tokens) can help your machine deepen its understanding of natural language.
3. You can backpropagate errors in time (tokens), as well as in the layers of a deep learning network.
4. Because RNNs are particularly deep neural nets, RNN gradients are particularly temperamental, and they may disappear or explode.
5. Efficiently modeling natural language character sequences wasn't possible until recurrent neural nets were applied to the task.
6. Weights in an RNN are adjusted in aggregate across time for a given sample.
7. You can use different methods to examine the output of recurrent neural nets.
8. You can model the natural language sequence in a document by passing the sequence of tokens through an RNN backward and forward simultaneously.

---
<a name='9.0'></a><a id='9.0'></a>
# Chapter 9: Improving retention with long short-term memory networks
## LSTM
<a href="#top">[back to top]</a>

## Overview
1. Adding deeper memory to recurrent neural nets.
2. Gating information inside neural nets.
3. Classifying and generating text.
4. Modeling language patterns.

## Summary
1. Remembering information with memory units enables more accurate and general models of the sequence.
2. It's important to forget information that is no longer relevant.
3. Only some new information needs to be retained for the upcoming input, and LSTMs can be trained to find it.
4. If you can predict what comes next, you can generate novel text from probabilities.
5. Character-based models can more efficiently and successfully learn from small, focussed corpora than word-based models (GB: EXPLORE THIS AGAIN!)
6. LSTM thought vectors capture much more than just the sum of the words in a statement.






---
<a name='10.0'></a><a id='10.0'></a>
# Chapter 10: Sequence-to-sequence models and attention
<a href="#top">[back to top]</a>

## Overview
1. Mapping one text sequence to another with a neural network.
2. Understanding how sequence-to-sequence tasks differ.
3. Using encoder-decoder model architectures for translation and chat.
4. Training a model to pay attention to what is important in a sequence.

## Summary
1. Sequence-to-sequence networks can be built with a modular, reusable encoder-decoder architecture.
2. The encoder model generates a thought vector, a dense fixed-dimension vector representation of the information in a variable-length input sequence.
3. A decoder can use thought vectors to predict (generate) output sequences, including the replies of a chatbot.
4. Due to the thought vector representation, the input and output sequence lenghts don't have to match.
5. Thought vectors can only hold a limited amount of information. If you need a thought vector to encode more complex concepts, the attention mechanism can help selectively encode what is important in the thought vector.



---
<a name='11.0'></a><a id='11.0'></a>
# Chapter 11: Information extraction

## Named entity extraction and question answering
<a href="#top">[back to top]</a>

## Overview
1. Sentence segmentation
2. Named entity recognition (NER)
3. Numerical information extraction
4. Part-of-speech (POS) tagging and dependency tree parsing
5. Logical relation extraction and knowledge bases

## Summary
1. A knowledge graph can be built to store relationships between entities.
2. Regular expressions are a mini-programming language that can isolate and extract information.
3. Part-of-speech tagging allows you to extract relationships between entities mentioned in a sentence.
4. Segmenting sentences requires more than just splitting on periods and exclamation marks.

---
<a name='12.0'></a><a id='12.0'></a>
# Chapter 12: Getting chatty

## Dialog engines
<a href="#top">[back to top]</a>

## Overview
1. Understanding four chatbot approaches.
2. Finding out what Artificial Intelligence Markup Language is all about.
3. Understanding the difference between chatbot pipelines and other NLP pipelines.
4. Learning about a hybrid chatbot architecture that combines the best ideas into one.
5. Using machine learning to make your chatbot get smarter over time.
6. Giving your chatbot agency - enabling it to spontaneously say what's on its mind.

## Summary
1. By combining multiple proven approaches, you can build an intelligent dialog engine.
2. Breaking "ties" between the replies generated by the four main chatbot approaches is one key to intelligence.
3. You can teach machines a lifetime of knowledge without spending a lifetime programming them.

---
<a name='13.0'></a><a id='13.0'></a>
# Chapter 13: Scaling up

## Optimization, parallelization, and batch processing
<a href="#top">[back to top]</a>

## Overview
1. Scaling up an NLP pipeline.
2. Speeding up search with indexing.
3. Batch processing to reduce your memory footprint.
4. Parallelization to speed up NLP.
5. Running NLP model training on a GPU.

## Summary
1. Locality-sensitive hashes like `Annoy` make the promise of latent semantic indexing a reality.
2. GPUs speed up model training, reducing the turn-around time on your models, making it easier to build models faster.
3. CPU parallelization can make sense for algorithms that don't benefit from speedier multiplication of large matrices.
4. You can bypass the system RAM bottleneck using Python's generators, saving you money on your GPU and CPU instances.
5. Google's TensorBoard can help you visualize and extract natural language embeddings that you might not have thought of otherwise.
6. Mastering NLP parallelization can expand your brainpower by giving you a society of minds - machine clusters to help you think.