## &nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; A Sentimental Journey
<br />
# Sentiment Analysis of Movie Reviews

# &nbsp;&nbsp; &nbsp; &nbsp; &nbsp;Why sentiment analysis?

Textual data doesn't always come categorized / labeled 

- tweets
- blog posts
- news articles

# Movie reviews - <a>www.imdb.com</a>
<br />
<img src="review.png"> 

# The data

- 25.000 labeled training plus 25.000 test reviews
- download from http://ai.stanford.edu/~amaas/data/sentiment/
- used in: Maas et al. (2011). Learning Word Vectors for Sentiment Analysis(http://www.aclweb.org/anthology/P11-1015)
- preprocessing as per https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-IMDB.ipynb

## Load preprocessed data




In [33]:
import io
import pandas as pd
import numpy as np

with io.open('data/aclImdb/train-pos.txt', encoding='utf-8') as f:
    train_pos = pd.DataFrame({'review': list(f)})    
with io.open('data/aclImdb/train-neg.txt', encoding='utf-8') as f:
    train_neg = pd.DataFrame({'review': list(f)}) 
train_reviews = pd.concat([train_neg, train_pos], ignore_index=True)

with io.open('data/aclImdb/test-pos.txt', encoding='utf-8') as f:
    test_pos = pd.DataFrame({'review': list(f)})
with io.open('data/aclImdb/test-neg.txt', encoding='utf-8') as f:
    test_neg = pd.DataFrame({'review': list(f)})    
test_reviews = pd.concat([test_neg, test_pos], ignore_index=True)
  
X_train = train_reviews['review']
X_test = test_reviews['review']

y_train = np.append(np.zeros(12500), np.ones(12500))
y_test = np.append(np.zeros(12500), np.ones(12500)) 

# First review




In [34]:
X_train[0]  

u"a reasonable effort is summary for this film .  a good sixties film but lacking any sense of achievement .  maggie smith gave a decent performance which was believable enough but not as good as she could have given ,  other actors were just dreadful !  a terrible portrayal .  it wasn't very funny and so it didn't really achieve its genres as it wasn't particularly funny and it wasn't dramatic .  the only genre achieved to a satisfactory level was romance .  target audiences were not hit and the movie sent out confusing messages .  a very basic plot and a very basic storyline were not pulled off or performed at all well and people were left confused as to why the film wasn't as good and who the target audiences were etc .  however maggie was quite good and the storyline was alright with moments of capability .   4 . \n"

# Good or bad?

What the annotators thought




In [35]:
y_train[0]

0.0

# A naive approach: Word counts 


# Word count in a nutshell

- sum positive words (weighted)
- sum negative words (weighted)
- highest score wins

# No one's gonna sit there and categorize all the words.

# Need  magic?



Not yet.

We have a training set where reviews have been labeled as good or bad:

<table border="1">
<tr>
<th></th><th><font color='blue'>sentiment</font></th><th>beautiful</th><th>bad</th><th>awful</th><th>decent</th><th>horrible</th><th>ok</th><th>awesome</th>
</tr>
<tr>
<th>review 1</th><td><font color='red'>0</font></td><td>0</td><td>1</td><td>2</td><td>1</td><td>1</td><td>0</td><td>0</td>
</tr>
<tr>
<th>review 2</th><td><font color='green'>1</font></td><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>1</td>
</tr>
<tr>
<th>review 3</th><td><font color='green'>0</font></td><td>0</td><td>0</td><td>0</td><td>1</td><td>1</td><td>0</td><td>0</td>
</tr>
</table>

# Classification

From this, we can algorithmically determine the words' polarities and weights.

<table>
<tr>
<th>word</th>
<td>beautiful</td><td>bad</td><td>awful</td><td>decent</td><td>horrible</td><td>ok</td><td>awesome</td>
</tr>
<tr>
<th>weight</th>
<td>3.4</td><td>-2.9</td><td>-5.6</td><td>-0.2</td><td>-4.9</td><td>-0.1</td><td>5.2</td>
</tr>
</table>

# Right.
But...

## There is an additional difficulty.

From our example review above:

> performance which was believable enough but not as good as she could have given

> lacking any sense of achievement 

> it wasn't very funny

> the only genre achieved to a satisfactory level was romance

# Context matters

<font color=green>funny</font>             =>    👍

<font color=green>very funny</font>        =>    👍👍

<font color=red>wasn't very funny</font> =>    👎


... what if it were
- "wasn't so very funny"
- "however, I wouldn't say that it wasn't so very funny" 

# Unigrams, bigrams, trigrams - what should we look at?


Instead of guessing let's check what works best on our data set.


## Most frequent unigrams

In [36]:
word_count_1gram = pd.read_csv('word_counts_sorted_ngram_1_stopwords_removed.csv', 
                                  usecols=['word', 'count'])
word_count_1gram.head(10)

Unnamed: 0,count,word
0,44047,movie
1,42623,but
2,40159,film
3,30632,not
4,26795,one
5,20281,like
6,15147,good
7,14067,very
8,12727,time
9,12716,no


## Most frequent bigrams

In [37]:
word_count_2grams = pd.read_csv('word_counts_sorted_ngram_2_stopwords_removed.csv', 
                                  usecols=['word', 'count'])
word_count_2grams.head(10)

Unnamed: 0,count,word
0,1925,but not
1,1321,ever seen
2,1284,not only
3,1200,very good
4,1113,special effects
5,1043,even though
6,1032,movie but
7,1024,don know
8,1007,movie not
9,888,one best


## Most frequent trigrams

In [38]:
word_count_3grams = pd.read_csv('word_counts_sorted_ngram_3_stopwords_removed.csv', 
                                  usecols=['word', 'count'])
word_count_3grams.head(10)

Unnamed: 0,count,word
0,262,movie ever seen
1,243,worst movie ever
2,205,don waste time
3,177,movies ever seen
4,164,new york city
5,162,don get wrong
6,160,one worst movies
7,141,worst movies ever
8,120,film ever seen
9,114,movie ever made


# In search for the right combination (grid search)


Which classifier works best?
- Logistic regression? Random forest? Support vector machine?

In combination with which input 
- Unigrams? Bigrams? Trigrams?

With which parameter settings?
- e.g., regularization, number of iterations...

# &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;FEW     

# &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;HOURS

# &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;LATER

# And the winner is ...

### Best accuracy per classifier (test set)
<table border="1">
<tr><th></th><th>1-grams<br />with stopword filtering</th><th>1-2-grams<br />with stopword filtering</th><th>1-3-grams<br />no stopword filtering</th>
</tr>
<tr>
<th>Logistic Regression</th><td></td><td>0.89</td><td></td>
</tr>
<tr>
<th>Support Vector Machine</th><td></td><td></td><td>0.84</td>
</tr>
<tr>
<th>Random Forest</th><td>0.84</td><td></td>
</tr>
</table>

# Exploring the Logistic Regression best fit

In [44]:
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression

stopwords_nltk = set(stopwords.words("english"))
relevant_words = set(['not', 'nor', 'no', 'wasn', 'ain', 'aren', 'very', 'only', 'but', 'don', 'isn', 'weren'])
stopwords_filtered = list(stopwords_nltk.difference(relevant_words))
vectorizer = CountVectorizer(stop_words = stopwords_filtered, max_features = 10000, ngram_range = (1,2))
X_train_features = vectorizer.fit_transform(X_train)
X_test_features = vectorizer.transform(X_test)

logistic_model = LogisticRegression(C=0.03) 
logistic_model.fit(X_train_features, y_train)

LogisticRegression(C=0.03, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

## Which words make it positive?

In [45]:
vocabulary = vectorizer.get_feature_names()
coefs = logistic_model.coef_
word_importances = pd.DataFrame({'word': vocabulary, 'coef': coefs.tolist()[0]})
word_importances_sorted = word_importances.sort_values(by='coef', ascending = False)
word_importances_sorted[:10]

Unnamed: 0,coef,word
2969,0.672635,excellent
6681,0.563958,perfect
9816,0.521026,wonderful
8646,0.520818,superb
3165,0.505146,favorite
431,0.502118,amazing
5923,0.481505,must see
5214,0.461807,loved
3632,0.458645,funniest
2798,0.453481,enjoyable


## Which words make it negative?

In [46]:
word_importances_sorted[-11:-1]

Unnamed: 0,coef,word
6864,-0.564446,poor
2625,-0.565503,dull
9855,-0.57506,worse
4267,-0.588133,horrible
2439,-0.596302,disappointing
6866,-0.675187,poorly
1045,-0.681608,boring
2440,-0.688024,disappointment
702,-0.811184,awful
9607,-0.838195,waste


# Which 2-grams make it positive?

In [47]:
word_importances_bigrams = word_importances_sorted[word_importances_sorted.word.apply(lambda c: len(c.split()) >= 2)]
word_importances_bigrams[:10]

Unnamed: 0,coef,word
5923,0.481505,must see
3,0.450675,10 10
6350,0.421314,one best
9701,0.389081,well worth
5452,0.371277,may not
6139,0.329485,not bad
6970,0.323805,pretty good
2259,0.307238,definitely worth
5208,0.30338,love movie
9432,0.301404,very good


# Which 2-grams make it negative?

In [48]:
word_importances_bigrams[-11:-1]

Unnamed: 0,coef,word
6431,-0.247169,only good
3151,-0.25009,fast forward
9861,-0.264564,worst movie
6201,-0.324169,not recommend
6153,-0.332796,not even
6164,-0.333147,not funny
6217,-0.357056,not very
6169,-0.368976,not good
6421,-0.43775,one worst
9609,-0.451138,waste time


# 0.89 is a pretty good value for accuracy. 
# With a different approach, can it get any better? 

# Beyond word counts:
# Word embeddings

# Bag-of-words (or bag-of-ngrams) basically uses one-hot encoding:

In [49]:
# Tidy datasets are all alike but every messy dataset is messy in its own way. (Hadley Wickham)
words = pd.DataFrame({'tidy': [1,0,0,0,0,0,0,0,0,0,0,0], 'dataset': [0,1,0,0,0,0,0,0,0,0,0,0],
                      'is': [0,0,1,0,0,0,0,0,0,0,0,0], 'all': [0,0,0,1,0,0,0,0,0,0,0,0],
                      'alike': [0,0,0,0,1,0,0,0,0,0,0,0], 'but': [0,0,0,0,0,1,0,0,0,0,0,0],
                      'every': [0,0,0,0,0,0,1,0,0,0,0,0], 'messy': [0,0,0,0,0,0,0,1,0,0,0,0],
                      'in': [0,0,0,0,0,0,0,0,1,0,0,0], 'its': [0,0,0,0,0,0,0,0,0,1,0,0],
                      'own': [0,0,0,0,0,0,0,0,0,0,1,0], 'way': [0,0,0,0,0,0,0,0,0,0,0,1]})
words

Unnamed: 0,alike,all,but,dataset,every,in,is,its,messy,own,tidy,way
0,0,0,0,0,0,0,0,0,0,0,1,0
1,0,0,0,1,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,1,0,0,0,0,0
3,0,1,0,0,0,0,0,0,0,0,0,0
4,1,0,0,0,0,0,0,0,0,0,0,0
5,0,0,1,0,0,0,0,0,0,0,0,0
6,0,0,0,0,1,0,0,0,0,0,0,0
7,0,0,0,0,0,0,0,0,1,0,0,0
8,0,0,0,0,0,1,0,0,0,0,0,0
9,0,0,0,0,0,0,0,1,0,0,0,0


# In this model, all words are equally distant from each other.

# How about similarities between words - semantic dimensions?

# To uncover similarities between words

- build word co-occurrence matrix
- perform <b>dimensionality reduction</b> 

"Tidy datasets are all alike but every messy dataset is messy in its own way." (Hadley Wickham)
"Happy families are all alike; every unhappy family is unhappy in its own way." (Lev Tolstoj)

<table>
<tr>
<td></td><th>tidy</th><th>dataset</th><th>is</th><th>all</th><th>alike</th><th>but</th><th>every</th><th>messy</th><th>in</th><th>its</th><th>own</th><th>way</th><th>happy</th><th>family</th><th>unhappy</th>
</tr>
<tr>
<th>tidy</th><td>0</td><td>1</td><td>2</td><td>1</td><td>1</td><td>1</td><td>1</td><td>2</td><td>1</td><td>1</td><td>1</td><td>1</td><td>0</td><td>0</td><td>0</td>
</tr>
<tr>
<th>dataset</th><td>1</td><td>0</td><td>2</td><td>1</td><td>1</td><td>1</td><td>1</td><td>2</td><td>1</td><td>1</td><td>1</td><td>1</td><td>0</td><td>0</td><td>0</td>
</tr>
<tr>
<th>is</th><td>1</td><td>2</td><td>1</td><td>1</td><td>1</td><td>1</td><td>1</td><td>2</td><td>1</td><td>1</td><td>1</td><td>1</td><td>1</td><td>2</td><td>1</td>
</tr>
</table>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[and so on]

In reality, this approach often is not practical. Enter ...

## Distributed Representations - the Neural Network Approach

Infer the meaning of a word from the contexts it appears in:

- predict word probability depending on surrounding words
- improve prediction at every iteration (backpropagation)

# Distributed Representation of words

- Every word is represented not by a single "hot" bit, but by a vector of continuously-scaled values
- This allows to find semantic similarities

## word2vec

Mikolov et al (2013a).  Efficient estimation of word representations in vector space. arXiv:1301.3781.

- Continuous Bag of Words (CBOW)
- Skip-Gram



## Continuous Bag of Words 

<img src='cbow.png'>
from: Mikolov et al. 2013

## Skip-gram

<img src='skip_gram.png'>
from: Mikolov et al. 2013

## Relationships

<img src='relationships.png'>

from: Mikolov et al. 2013

"Athens" - "Greece" + "Norway" = ?

"walking" - "walked" + "swam" = ?

# Word embeddings for the IMDB dataset

# word2vec in Python

- provided by gensim library: <i>https://radimrehurek.com/gensim/models/word2vec.html</i>
- nice tutorial how to use: <i>https://github.com/RaRe-Technologies/movie-plots-by-genre/blob/master/ipynb_with_output/Document%20classification%20with%20word%20embeddings%20tutorial%20-%20with%20output.ipynb</i>

## Load the pre-trained model

In [57]:
from gensim.models import word2vec
# load the trained model from disk
model = word2vec.Word2Vec.load('models/word2vec_100features')
print(model.syn0.shape)
print(model['movie'])


(20166, 100)
[-0.02515472  0.16707493 -0.05629794 -0.12409752 -0.01091802 -0.13798206
  0.09231102 -0.09140468 -0.05452388 -0.03555677 -0.08269091 -0.00567267
 -0.09523809 -0.06195637  0.05440474  0.06227686  0.12369317 -0.01537143
 -0.0089783  -0.00528997 -0.04277094  0.07739993 -0.01932896  0.081738
 -0.22357117 -0.14976217  0.05551976  0.13742755 -0.15443996 -0.05471482
 -0.0009601   0.08932991 -0.05292547  0.16765165 -0.05905993 -0.05231098
 -0.08250861 -0.0341751   0.14372236  0.03478728 -0.01529499 -0.0296018
  0.01079863 -0.06377127  0.04163288 -0.07192093  0.25450262 -0.07382536
 -0.07778623  0.07499653 -0.12951691  0.01970425  0.13499822  0.01038768
  0.06625408  0.11575779  0.10367264  0.03894637 -0.07102726  0.00343542
  0.24314043  0.15759529 -0.09808595  0.04601007 -0.01187227 -0.16023833
 -0.17658544 -0.12622575 -0.04592994  0.08045016 -0.11856512  0.04920706
  0.20129348  0.08923753 -0.06545419 -0.05853761 -0.08146987 -0.06782326
  0.17082241  0.02575272  0.058911    0.1

## Which words are similar to <i>awesome</i>?

In [58]:
model.most_similar('awesome', topn=10)

[(u'amazing', 0.7929322123527527),
 (u'incredible', 0.7127916812896729),
 (u'awful', 0.7072071433067322),
 (u'excellent', 0.6961393356323242),
 (u'fantastic', 0.6925109624862671),
 (u'alright', 0.6886886358261108),
 (u'cool', 0.679090142250061),
 (u'outstanding', 0.6213874816894531),
 (u'astounding', 0.613292932510376),
 (u'terrific', 0.6013768911361694)]

## ... and to <i> awful</i>?

In [59]:
model.most_similar('awful', topn=10)

[(u'terrible', 0.8212785124778748),
 (u'horrible', 0.7955455183982849),
 (u'atrocious', 0.7824822664260864),
 (u'dreadful', 0.7722172737121582),
 (u'appalling', 0.7244443893432617),
 (u'horrendous', 0.7235419154167175),
 (u'abysmal', 0.720653235912323),
 (u'amazing', 0.708114743232727),
 (u'awesome', 0.7072070837020874),
 (u'bad', 0.6963905096054077)]

## Can we "subtract out" <i>awful</i>?

In [60]:
model.most_similar(positive=['awesome'], negative=['awful'])

[(u'jolly', 0.3947059214115143),
 (u'midget', 0.38988131284713745),
 (u'knight', 0.3789686858654022),
 (u'spooky', 0.36937469244003296),
 (u'nice', 0.3680706322193146),
 (u'looney', 0.3676275610923767),
 (u'ho', 0.3594890832901001),
 (u'gotham', 0.35877227783203125),
 (u'lookalike', 0.3579031229019165),
 (u'devilish', 0.35554438829421997)]

## Let's try this again with <i>good</i> - <i>bad</i>: <i>Good</i> ...

In [61]:
model.most_similar('good', topn=10)

[(u'bad', 0.769078254699707),
 (u'decent', 0.7574324607849121),
 (u'great', 0.7527369260787964),
 (u'nice', 0.6981208324432373),
 (u'cool', 0.653165340423584),
 (u'fine', 0.6289849877357483),
 (u'terrific', 0.6136247515678406),
 (u'terrible', 0.6056008338928223),
 (u'fantastic', 0.596002995967865),
 (u'solid', 0.5957943201065063)]

## ... and <i>bad</i>:

In [62]:
model.most_similar('bad', topn=10)

[(u'good', 0.769078254699707),
 (u'terrible', 0.7315745949745178),
 (u'horrible', 0.7259382009506226),
 (u'awful', 0.6963905096054077),
 (u'lame', 0.6728411912918091),
 (u'stupid', 0.6556650996208191),
 (u'dumb', 0.628576934337616),
 (u'lousy', 0.6129568815231323),
 (u'cheesy', 0.6102402210235596),
 (u'poor', 0.5851123929023743)]

## So <i>good</i> minus <i>bad</i> is ...

In [63]:
model.most_similar(positive=['good'], negative=['bad'])

[(u'nice', 0.4700997471809387),
 (u'fine', 0.46652451157569885),
 (u'solid', 0.43668174743652344),
 (u'wonderful', 0.4121875464916229),
 (u'pleasant', 0.4049694538116455),
 (u'decent', 0.3975681960582733),
 (u'commendable', 0.39051422476768494),
 (u'splendid', 0.38586685061454773),
 (u'promising', 0.38155609369277954),
 (u'delightful', 0.38095542788505554)]

## Which word doesn't match?

In [64]:
model.doesnt_match("good bad awful terrible".split())

'good'

In [65]:
model.doesnt_match("awesome bad awful terrible".split())

'awesome'

In [66]:
model.doesnt_match("nice pleasant fine excellent".split())

'excellent'

## Visualize in 2d

## So ... how about our classification task?

- we have one vector per word
- we need one vector per review for the classification
- one way to get there: averaging vectors 
- but this will lose information!

### Best accuracies per classifier
<table border="1">
<tr>
<th></th><th>Bag of words</th><th>word2vec</th>
<tr>
<th>Logistic Regression</th><td>0.89</td><td>0.83</td>
</tr>
<tr>
<th>Support Vector Machine</th><td>0.84</td><td>0.70</td>
</tr>
<tr>
<th>Random Forest</th><td>0.84</td><td>0.80</td>
</tr>
</table>

## doc2vec

Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. In International
Conference on Machine Learning, 2014.
- Distributed Memory Model of Paragraph Vectors (PV-DM)
  - paragraph vectors shared over words in same paragraph
  - word vectors shared over paragraphs
  - paragraph vector gets averaged together with word vectors
  - paragraph vectors can be directly input to machine learning classifiers
  
- Distributed Bag of Words (PV-DBOW)

## Distributed Memory Model of Paragraph Vectors (PV-DM)

<img src='pv_dm.png'>

from: Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. In International
Conference on Machine Learning, 2014.

## Distributed Memory Model of Paragraph Vectors (PV-DM)

<img src='pv_dbow.png'>

from: Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. In International
Conference on Machine Learning, 2014.

## Model Training and Parameters

- again, now's not the time to do the training ;-)
- doc2vec in python: also provided by gensim <i>https://radimrehurek.com/gensim/models/doc2vec.html</i>
- see gensim doc2vec tutorial (<i>https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-IMDB.ipynb</i> for example usage and configuration

## Load pre-trained models

In [24]:
#import gensim.models.doc2vec
from gensim.models import Doc2Vec
models_dir = 'models'
filenames = ['dmc', 'cbow', 'dmm']
files = map(lambda f:'/'.join([models_dir,f]), filenames)
models = [Doc2Vec.load(fname) for fname in files]

In [25]:
[str(model) for model in models]

['Doc2Vec(dm/c,d100,n5,w5,mc2,t4)',
 'Doc2Vec(dbow,d100,n5,mc2,t4)',
 'Doc2Vec(dm/m,d100,n5,w10,mc2,t4)']

### Logistic Regression accuracy
<table border="1">
<tr>
<th></th><th>test vectors inferred</th><th>test vectors from model</th>
<tr>
<th>Distributed memory, vectors averaged (dm/m)</th><td>0.81</td><td>0.87</td>
</tr>
<tr>
<th>Distributed memory, vectors concatenated (dm/c)</th><td>0.80</td><td>0.82</td>
</tr>
<tr>
<th>Distributed bag of words (dbow)</th><td>0.90</td><td>0.90</td>
</tr>
</table>

## Most similar to <i>awesome</i> - what's our best performing model say?

In [26]:
dbow = models[1]
dbow.most_similar('awesome', topn=10)

[(u'juon', 0.3789939880371094),
 (u'a-pix', 0.3781469762325287),
 (u"rosemary's", 0.37472963333129883),
 (u'schnook', 0.3683214783668518),
 (u"luise's", 0.366854190826416),
 (u'chrysalis', 0.36428096890449524),
 (u'f*^', 0.362865686416626),
 (u'decadent', 0.3604990839958191),
 (u'surrogacy', 0.35499149560928345),
 (u"'second", 0.35283005237579346)]

In [27]:
# explain why

## Most similar to <i>awesome</i> - distributed memory model (dm/m)

In [28]:
dm_m = models[2]
dm_m.most_similar('awesome', topn=10)

[(u'amazing', 0.9163687229156494),
 (u'incredible', 0.9011116027832031),
 (u'excellent', 0.8860622644424438),
 (u'outstanding', 0.8797732591629028),
 (u'exceptional', 0.8539372682571411),
 (u'awful', 0.8104138970375061),
 (u'astounding', 0.7750493884086609),
 (u'alright', 0.7587056159973145),
 (u'astonishing', 0.7556235790252686),
 (u'extraordinary', 0.743841290473938)]

## Most similar to <i>aweful</i> - distributed memory model (dm/m)

In [29]:
dm_m.most_similar('awful', topn=10)

[(u'abysmal', 0.8371909856796265),
 (u'appalling', 0.8327066898345947),
 (u'atrocious', 0.8309577703475952),
 (u'horrible', 0.8192445039749146),
 (u'terrible', 0.8124841451644897),
 (u'awesome', 0.8104138970375061),
 (u'dreadful', 0.8072893023490906),
 (u'horrendous', 0.7981990575790405),
 (u'amazing', 0.7926105260848999),
 (u'incredible', 0.7852109670639038)]

In [30]:
dm_m.most_similar(positive=['awesome'], negative=['awful'])

[(u'super', 0.46073806285858154),
 (u"tartakovsky's", 0.3861837387084961),
 (u'nail-bitingly', 0.3633382320404053),
 (u'actionpacked', 0.36290568113327026),
 (u'cassella', 0.35898250341415405),
 (u'outsmarts', 0.3545451760292053),
 (u'nos', 0.35315001010894775),
 (u'takeuchi', 0.3525207042694092),
 (u'keaton/burton', 0.34791430830955505),
 (u'sarinana', 0.34731170535087585)]

In [31]:
dm_m.most_similar('happy', topn=10)

[(u'satisfied', 0.6944113969802856),
 (u'thrilled', 0.6537768840789795),
 (u'pleased', 0.6526883840560913),
 (u'happier', 0.6411939859390259),
 (u'unhappy', 0.6402201652526855),
 (u'disappointed', 0.6195787787437439),
 (u'satisfying', 0.6173715591430664),
 (u'upset', 0.6129617691040039),
 (u'confused', 0.6072292327880859),
 (u'miserable', 0.5886229276657104)]

In [32]:
dm_m.most_similar(positive=['happy'], negative=['unhappy'])

[(u"freaking'", 0.4593520164489746),
 (u'ten-year', 0.39137205481529236),
 (u"'calendar", 0.39041662216186523),
 (u'fluke', 0.3744649887084961),
 (u'nine-year', 0.3692910075187683),
 (u"jack'", 0.36701303720474243),
 (u'girl-oriented', 0.3562896251678467),
 (u"ewell's", 0.35423994064331055),
 (u'breakingly', 0.3541865944862366),
 (u'velcro', 0.3500947058200836)]

# So, where does this leave us?