**Pre-trained Word2Vec Model from Google**
---

In [None]:
# load the word2vec model
filename = 'GoogleNews-vectors-negative300.bin'
model = KeyedVectors.load_word2vec_format(filename, binary=True)


In [3]:
import gensim

# Load Google's pre-trained Word2Vec model.
filename = 'GoogleNews-vectors-negative300.bin.gz'
model = gensim.models.KeyedVectors.load_word2vec_format(filename, binary=True)

### Inspecting the Model

In [6]:
#print(model['computer'])
print(model['computer'].shape)

(300,)


In [20]:
# get a list of the top 10 words most similar to the word "computer"
print(model.most_similar('bar', topn=10))

[('Bar', 0.656773567199707), ('bars', 0.6537090539932251), ('tavern', 0.6364460587501526), ('pub', 0.6000003814697266), ('nightspot', 0.589708685874939), ('nightclub', 0.5828036665916443), ('Pub', 0.5696172714233398), ('bartender', 0.5664138197898865), ('restaurant', 0.5545204877853394), ('Lounge', 0.5305413603782654)]


In [22]:
# find numeric vector for the word 'easy'
vector = model['easy']
print(vector.shape)

(300,)


#### **Similarities**

In [23]:
model.most_similar("nice")

[('good', 0.6836091876029968),
 ('lovely', 0.6676310896873474),
 ('neat', 0.6616737246513367),
 ('fantastic', 0.6569241881370544),
 ('wonderful', 0.6561347246170044),
 ('terrific', 0.6552367806434631),
 ('great', 0.6454657912254333),
 ('awesome', 0.6404187083244324),
 ('nicer', 0.6302445530891418),
 ('decent', 0.5993332266807556)]

In [29]:
# Or a similarity score of any two words:

model.similarity('giant', 'big')

0.3467396

In [26]:
model.similarity("nice","good")

0.6836092

`Interestingly, if we take two antonyms (words with opposite meaning), they are going to be highly similar according to a good Word2Vec model. This because we can usually replace opposite words with each other in the text.`

In [30]:
# Interesting
model.similarity("bad","good")

0.7190051

`We can also look for interesting relationships between words.`

In [31]:
# king - queen = man - woman
model.most_similar(positive=['woman', 'king'], negative=['man'])

[('queen', 0.7118192911148071),
 ('monarch', 0.6189674735069275),
 ('princess', 0.5902431011199951),
 ('crown_prince', 0.5499460697174072),
 ('prince', 0.5377321243286133),
 ('kings', 0.5236844420433044),
 ('Queen_Consort', 0.5235945582389832),
 ('queens', 0.5181134343147278),
 ('sultan', 0.5098593235015869),
 ('monarchy', 0.5087411403656006)]

In [33]:
model.most_similar(positive=['escort', 'porn'], negative=['prostitution'])

[('escorts', 0.5452281832695007),
 ('Porn', 0.47434481978416443),
 ('escorting', 0.45747920870780945),
 ('porno', 0.43867430090904236),
 ('Tera_Patrick', 0.421220064163208),
 ('pornstar', 0.4204830527305603),
 ('star_Bree_Olson', 0.4074917733669281),
 ('correctly_spelled_canape', 0.40618905425071716),
 ('escorted', 0.40226319432258606),
 ('Francois_Sagat', 0.4011197090148926)]

In [34]:
model.most_similar(positive=['france', 'paris'], negative=['spain'])

[('Paris', 0.490902304649353),
 ('french', 0.47619831562042236),
 ('jacques', 0.4652450978755951),
 ('elle', 0.4634602665901184),
 ('louis', 0.4616774618625641),
 ('Ne_pas', 0.4616312086582184),
 ('toronto', 0.45994314551353455),
 ('le_petit', 0.4592629373073578),
 ('Hôtel_de', 0.45633813738822937),
 ('marche', 0.4558878540992737)]