# Off-the-Shelf Tools for Deep Learning, NLP, and Other Fun Buzzwords
## Jeff Jacobs, Sept. 27, 2019
![bert](./bert.png)

## The Tools

### Gensim (https://radimrehurek.com/gensim/)

Originally a topic modelling library, BUT also really good for word embedding stuff (the main library I use)

### spaCy (https://spacy.io/)

Better than Gensim (imo) for "standard" NLP tasks: Part-of-Speech Tagging, Dependency Parsing, Named Entity Recognition

### scikit-learn (https://scikit-learn.org/stable/)

General machine learning library (so, can be used for any type of data: text, images, video, audio, etc.)

### AllenNLP (https://allennlp.org/)

Will (should) obviate all of the above in a few years: NLP library built on top of PyTorch general deep learning library (only real competition for PyTorch is Google TensorFlow)

## Terminology

### Buzzwords

* **Artificial Intelligence**: Figuring out how to do human things with computers
* **Machine Learning**: A set of approaches/algorithms which aim to find (potenitally complex) patterns in data
* **Supervised Marchine Learning**: Trying to find patterns in input data $X$ which do a good job at predicting output data $Y$. Typically, "trained" on 80% of full dataset and evaluated (tested) on 20%. In NLP, document classification is most prominent example.
* **Unsupervised Machine Learning**: Trying to find patterns in input data $X$ full stop. For example, find clusters of data points. In NLP, topic modelling is most prominent example.
* **Neural Network**: A machine learning algorithm which learns a mapping between input and output via a series of "layers" (matrix multiplications of inputs with a weight matrix to produce outputs) connected non-linearly in a network
* **Deep Learning**: Machine learning with a neural network...


### Tasks

* I want to label a set of documents: *Document Classification*
* I want to find people/places/events/things mentioned in a set of documents: *Named Entity Recognition*
* I want to get a sense of whether a set of documents is talking about a person/place/event/thing in a positive or negative light: *Sentiment Analysis*
* I want to understand how discourse regarding a subject(s) changes over time: *Diachronic Word Embeddings*
* Other buzzwords: *Language Modeling* (e.g., Text Generation), *Sequence-to-Sequence Learning* (e.g., Translation), *End-to-End Models* (e.g., Image Captioning), *Transfer Learning*: learn on domain $X$, apply knowledge to domain $Y$ (e.g., learn Van Gogh's artistic style, then paint this pic of my house in the style of Van Gogh)

## This Talk

*Document Classification*. BUT, the real moral is that the models discussed here are specifically intended to encode linguistic knowledge that will be helpful for *ANY* text-analytic task.

## History of Text Analysis in One Slide

1. The olden days: *Feature Engineering*
2. The enlightenment: *Automagically-Learned Features*

## "Deep Learning"? "Neural Network"? "Word Embeddings"?

### 3 birds with one stone: let's learn about Word2Vec



## Pavlov's Robot

"The camera lens aperture is too small."

| Target  | Highlighted | Context |
| ------------- | ------------- | ------------- |
| the  | (_The_) **camera lens]]** aperture is too small. | {camera, lens} |
| camera  | **The** (_camera_) **lens aperture]]** is too small. | {the, lens, aperture} |
| lens | **[[The camera** (_lens_) **aperture is]]** too small. | {the, camera, aperture, is} |
| aperture | The **[[camera lens** (_aperture_) **is too]]** small. | {camera, lens, is, too} |
| is | The camera **[[lens aperture** (_is_) **too small]]**. | {lens, aperture, too, small} |
| too | The camera lens **[[aperture is** (_too_) **small**. | {aperture, is, small} |
| small | The camera lens aperture **[[is too** (_small_). | {is, too} |

![w2v](w2v_modified.jpg)

And so you'll have two vectors:
$$
predicted(\texttt{camera}) = \begin{pmatrix}P(\texttt{ant}) = 0.1 \\ P(\texttt{aperture}) = 0.1 \\ P(\texttt{barber}) = 0.003 \\ \vdots \\ P(\texttt{zoo}) = 0.05\end{pmatrix}, \; actual(\texttt{camera}) = \begin{pmatrix}P(\texttt{ant}) = 0 \\ P(\texttt{aperture}) = 0.333 \\ P(\texttt{barber}) = 0 \\ \vdots \\ P(\texttt{zoo}) = 0\end{pmatrix}
$$

How different are they?
$$
\mathcal{L}(predicted, actual)
$$
In this case, Cross-Entropy Loss:
$$
-\sum_{i=1}^N\mathbb{1}[
$$

$$
\begin{blockarray}{ccccc}
& & \BAmulticolumn{3}{c}{Predicted} \\
& & \textsf{Worker} & \textsf{Firm} & \textsf{Other} \\
\begin{block}{cc(ccc)}
\multirow{3}{*}{\rotatebox{90}{$Actual$}} & \textsf{Worker} & $4$ & $2$ & $24$ \\
 & \textsf{Firm} & $4$ & $5$ & $3$ \\
 & \textsf{Other} & $0$ & $0$ & $23$ \\
\end{block}
\end{blockarray}
$$

$$
\bordermatrix{
           & f(e_1)  & \dots & f(e_j)  & \dots  & f(e_p)  \cr
    f_1    & a_{1,1} &       & a_{1,j} & \dots  & a_{1,p} \cr
    f_2    & a_{2,1} &       & a_{2,j} & \dots  & a_{2,p} \cr
    \vdots & \vdots  &       & \vdots  & \ddots & \vdots  \cr
    f_n    & a_{n,1} & \dots & a_{n,j} & \dots  & a_{n,p} \cr
  }
\]
$$

## Gensim

In [101]:
import gensim

INFO:gensim.summarization.textcleaner:'pattern' package not found; tag filters are not available for English


### Using a pre-trained word embedding space

### Training your own word embedding space

In [103]:
from gensim.models.fasttext import FastText
import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

In [104]:
blum_model = FastText(size=100)

INFO:gensim.models.word2vec:resetting layer weights


In [None]:
blum_file = "killing_hope.txt"
sentence_iter = gensim.models.word2vec.LineSentence(source=blum_file)
blum_model.build_vocab(sentence_iter)