Sentiment analysis
    - Represent words as "features" rather than as words themselves
    - Lists of sentiment adjectives, positive sentiment, s/t related to eating food
        - "horrible" is 0.97 for sentiment
        - "great" is 0.98 for positive sentiment
        - "food" and "restaurant" are almost 1 for "s/t related to eating food"
    - This is a way to define words by semantic / syntactic things
    - This is essentially what the model is learning
    - This is called the *embedding matrix*
    
Embedding Matrix
    - Idea is to represent as much information as possible in numbers
    - Can be extremely huge - embedding size of 20 - 300 is ok
        - Vocabulary can be massive
        - You have control over how many embeddings you create
    - "word features" = "distributed representation of words" = "word embeddings" = "word vectors"
    - Models learn:
        - Distributed representations of words
        - Probability function for word sequences

Optimizations:
        
word2vec
    - Open source, speeds things up super significantly
    - because of this work, everything is now about deep learning
    
skip-gram model
    - Based on maximum entropy model
    - Like a logistic regression for more than two classes
    - E.g. calculate the probability that "wonderful" comes before "authentic" in a sentence
    - Very computationally expensive b/c it calculates probabilities for EVERY word combination

Hierarchical Softmax
    - Solution to skip-gram model issues
    - Optimizes computation - evaluates Log2(# of words) nodes instead
    
Further optimizations
    - Negative sampling: binary classification between a word and its context
    - Subsampling - use fewer examples of frequent words like "is", "the", and "a"
    
All of these are how you build your word embedding matrix!

Word vector math
    - Cosine distance between word vectors yields a measure of similarity
    - King and Queen are opposite on the embedding "gender" but are essentially the same on "Royal"
        - "Man is to woman as King is to Queen"
        - "E_Queen = E_King + (E_Man - E_Woman)"
    - This is a way to evaluate word embeddings
        - It also learns syntactical variation
        - For this reason, we don't use syntactic parsers much anymore

## Threading

In [None]:
import multiprocessing
import datetime

def preprocessing(text):
    for i in range(100000):
        i*i
    return text

In [None]:
# Slow
texts = ['text']*100000

now = datetime.datetime.now()
result = list(map(preprocessing, texts))
print("Took %s" %(datetime.datetime.now() - now))

In [None]:
# Much faster
count = multiprocessing.cpu_count()
print(count)

pool = multiprocessing.Pool(count)
now = datetime.datetime.now()
result = list(pool.map(preprocessing, texts))
print("Took %s" %(datetime.datetime.now() - now))