<a href="https://colab.research.google.com/github/g-iyer/XCS224N-GI/blob/main/ExploringWord2Vec.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# XCS224N Assignment 1 Part 2: Experimenting with Word2Vec Embeddings (15 Points)

Before you start, make sure you read the [XCS224N Assignment 1 PDF Handout](https://github.com/scpd-proed/XCS224N-A1/blob/master/A1.pdf).

**The functions used below are case sensitive** (for example "strong" and "Strong" can give different output). Please make that your string inputs to the below functions are exactly as given in the quiz.


In [1]:
# All Import Statements Defined Here
# Note: Do not add to this list.
# All the dependencies you need can be installed by running this cell.
# Throughout this notebook you can run a cell by hitting CTRL+RETURN or the Play button/icon at left
# ----------------

import sys
assert sys.version_info[0]==3
assert sys.version_info[1] >= 5

from gensim.models import KeyedVectors
from gensim.test.utils import datapath
import pprint
import matplotlib.pyplot as plt

# ----------------

As discussed in class, more recently prediction-based word vectors have come into fashion, e.g. word2vec. Here, we shall explore the embeddings produced by word2vec. Please revisit the class notes and lecture slides for more details on the word2vec algorithm. If you're feeling adventurous, challenge yourself and try reading the [original paper](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf).

Then run the following cells to load the word2vec vectors into memory. **_Note: This could take several minutes._**

In [2]:
def load_word2vec():
    """ Load Word2Vec Vectors
        Return:
            wv_from_bin: 2.5 million of 3 million embeddings, each lengh 300
    """
    import gensim.downloader as api
    from gensim.models import KeyedVectors
    # let's load 2.5 million of the 3 million word embeddings so we don't run out of memory on Colab
    wv_from_bin = KeyedVectors.load_word2vec_format(api.load("word2vec-google-news-300", return_path=True), limit=2500000, binary=True)
    vocab = list(wv_from_bin.vocab.keys())
    print("Loaded vocab size %i" % len(vocab))
    return wv_from_bin

In [3]:
# -----------------------------------
# Run Cell to Load Word Vectors
# Note: This may take several minutes
# -----------------------------------
wv_from_bin = load_word2vec()

Loaded vocab size 2500000


###Question 1 (2 points)

The first question relates to the plot from the first part of the assignment. Please visit the [Assignment 1 PDF](https://github.com/scpd-proed/XCS224N-A1/blob/master/A1.pdf) for further details. 




### Cosine Similarity
Now that we have word vectors, we need a way to quantify the similarity between individual words, according to these vectors. One such metric is cosine-similarity. We will be using this to find words that are "close" and "far" from one another.

We can think of n-dimensional vectors as points in n-dimensional space. If we take this perspective L1 and L2 Distances help quantify the amount of space "we must travel" to get between these two points. Another approach is to examine the angle between two vectors. From trigonometry we know that:

<img src="https://drive.google.com/uc?id=1PEmKQxxs5XB-N1Hz-jr514iWoqa62MgL" width=20% style="float: center;"></img>

Instead of computing the actual angle, we can leave the similarity in terms of $similarity = cos(\Theta)$. Formally the [Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity) $s$ between two vectors $p$ and $q$ is defined as:

$$s = \frac{p \cdot q}{||p|| ||q||}, \textrm{ where } s \in [-1, 1] $$ 

### Questions 2-3: Homonymns and Similarity (4 points)

Homonyms are words with more than one meaning. We want to see how our word embeddings capture this phenomenon for such words. We are going to test if, for certain homonyms, the top-10 most similar words (according to cosine similarity) contain related words from *both* meanings. For example, "leaves" has both "vanishes" and "stalks" in the top 10, and "scoop" has both "handed_waffle_cone" and "lowdown". 

**Note**: You should use the `wv_from_bin.most_similar(word)` function to get the top 10 similar words. This function ranks all other words in the vocabulary with respect to their cosine similarity to the given word. For further assistance please check the __[GenSim documentation](https://radimrehurek.com/gensim/models/keyedvectors.html#gensim.models.keyedvectors.FastTextKeyedVectors.most_similar)__.


In [5]:
### Run this cell to print out the Top-10 most similar words
### Try the sample and then use this cell to complete Questions 2-10 in the companion quiz 

wv_from_bin.most_similar("mole")

[('moles', 0.6953788995742798),
 ('pollo_en', 0.5143669247627258),
 ('freckle', 0.4829963445663452),
 ('cancerous_mole', 0.4787973165512085),
 ('birthmark', 0.46605658531188965),
 ('unibrow', 0.46520644426345825),
 ('spies', 0.4556558132171631),
 ('nodule', 0.45305347442626953),
 ('pube', 0.4359903931617737),
 ('wart', 0.4358214735984802)]

In [6]:
### Run this cell to print out the Top-10 most similar words
### Try the sample and then use this cell to complete Questions 2-10 in the companion quiz 

wv_from_bin.most_similar("left")

[('leaving', 0.6707000732421875),
 ('leave', 0.525093138217926),
 ('leaves', 0.5228645205497742),
 ('returned', 0.5059226751327515),
 ('right', 0.49213987588882446),
 ('departed', 0.49109700322151184),
 ('limped', 0.4859950542449951),
 ('went', 0.4719872772693634),
 ('remaining', 0.465037077665329),
 ('empty', 0.4546155631542206)]

In [7]:
### Run this cell to print out the Top-10 most similar words
### Try the sample and then use this cell to complete Questions 2-10 in the companion quiz 

wv_from_bin.most_similar("nuts")

[('nut', 0.6487677097320557),
 ('crazy', 0.5603705644607544),
 ('nutty', 0.5235902070999146),
 ('bonkers', 0.5152066349983215),
 ('pecans_almonds', 0.4788070321083069),
 ('Neekam_proprietary_Blog', 0.47333618998527527),
 ('walnuts', 0.4707028865814209),
 ('regular_BetUS.com_columnists', 0.47036081552505493),
 ('allergics', 0.47001171112060547),
 ('batty', 0.46987098455429077)]

In [8]:
### Run this cell to print out the Top-10 most similar words
### Try the sample and then use this cell to complete Questions 2-10 in the companion quiz 

wv_from_bin.most_similar("pen")

[('pens', 0.7111663818359375),
 ('pencil', 0.5767994523048401),
 ('quill', 0.5656782388687134),
 ('ballpoint', 0.5653775334358215),
 ('ballpoint_pen', 0.5415062308311462),
 ('feather_quill', 0.5316811203956604),
 ('notepad', 0.5266302227973938),
 ('quill_pen', 0.5166343450546265),
 ('biro', 0.514026403427124),
 ('fountain_pen', 0.5093279480934143)]

In [9]:
### Run this cell to print out the Top-10 most similar words
### Try the sample and then use this cell to complete Questions 2-10 in the companion quiz 

wv_from_bin.most_similar("right")

[('Right', 0.5703941583633423),
 ('wrong', 0.5534271001815796),
 ('##.Help_us', 0.5502839684486389),
 ('Goodwill_Catanese', 0.5159174799919128),
 ('left', 0.49213987588882446),
 ('fielder_Joe_Borchard', 0.48948538303375244),
 ('fielder_Ambiorix_Concepcion', 0.4841775894165039),
 ('now', 0.4794555902481079),
 ('fielder_Jeromy_Burnitz', 0.47718381881713867),
 ('fielder_Lucas_Duda', 0.47022590041160583)]

In [10]:
### Run this cell to print out the Top-10 most similar words
### Try the sample and then use this cell to complete Questions 2-10 in the companion quiz 

wv_from_bin.most_similar("drive")

[('drives', 0.7914609909057617),
 ('driving', 0.5405872464179993),
 ('drove', 0.5120137929916382),
 ('push', 0.4905555248260498),
 ('run', 0.4834049642086029),
 ('driven', 0.44512295722961426),
 ('Drives', 0.42788973450660706),
 ('SFF_SAS', 0.4223554730415344),
 ('bootable_flash', 0.41521167755126953),
 ('SanDisk_Ultra_Backup', 0.414913147687912)]

In [11]:
### Run this cell to print out the Top-10 most similar words
### Try the sample and then use this cell to complete Questions 2-10 in the companion quiz 

wv_from_bin.most_similar("rose")

[('surged', 0.8056201338768005),
 ('climbed', 0.8045108914375305),
 ('soared', 0.7695155143737793),
 ('fell', 0.7688519954681396),
 ('tumbled', 0.7256063222885132),
 ('dipped', 0.720984697341919),
 ('jumped', 0.6997674107551575),
 ('inched', 0.6791028380393982),
 ('risen', 0.6686744093894958),
 ('plunged', 0.663558840751648)]

In [12]:
### Run this cell to print out the Top-10 most similar words
### Try the sample and then use this cell to complete Questions 2-10 in the companion quiz 

wv_from_bin.most_similar("mean")

[('means', 0.6425307989120483),
 ('imply', 0.620326042175293),
 ('equate', 0.5891612768173218),
 ('necessarily', 0.5722533464431763),
 ('necessitate', 0.5695107579231262),
 ('entail', 0.5649572610855103),
 ('bode_well', 0.5476653575897217),
 ('portend', 0.5455029010772705),
 ('do', 0.5375761985778809),
 ('anyway', 0.5220909118652344)]

In [13]:
### Run this cell to print out the Top-10 most similar words
### Try the sample and then use this cell to complete Questions 2-10 in the companion quiz 

wv_from_bin.most_similar("saw")

[('noticed', 0.5968518853187561),
 ('witnessed', 0.5899658203125),
 ('seeing', 0.5811247825622559),
 ('looked', 0.567682683467865),
 ('came', 0.5614316463470459),
 ('watched', 0.5570547580718994),
 ('seen', 0.5488225221633911),
 ('showed', 0.5405093431472778),
 ('went', 0.5265681743621826),
 ('see', 0.515012264251709)]

### Questions 4-5: Synonyms & Antonyms (4 points) 

When considering Cosine Similarity, it's often more convenient to think of Cosine Distance, which is simply (1 - Cosine Similarity).

We will look for triplets of words (w1,w2,w3) where w1 and w2 are synonyms and w1 and w3 are antonyms, but Cosine Distance(w1,w3) < Cosine Distance(w1,w2). For example, w1="happy" is closer to w3="sad" than to w2="cheerful".

You should use the the `wv_from_bin.distance(w1, w2)` function here in order to compute the cosine distance between two words. Please see the __[GenSim documentation](https://radimrehurek.com/gensim/models/keyedvectors.html#gensim.models.keyedvectors.FastTextKeyedVectors.distance)__ for further assistance.


In [14]:
### Run this cell to compare cosine distances between synonyms and antonyms
### Try the sample and then use this cell to complete Questions 11-19 in the companion quiz

w1 = "happy"
w2 = "cheerful"
w3 = "sad"
w1_w2_dist = wv_from_bin.distance(w1, w2)
w1_w3_dist = wv_from_bin.distance(w1, w3)

print("Synonyms {}, {} have cosine distance: {}".format(w1, w2, w1_w2_dist))
print("Antonyms {}, {} have cosine distance: {}".format(w1, w3, w1_w3_dist))


Synonyms happy, cheerful have cosine distance: 0.6162261664867401
Antonyms happy, sad have cosine distance: 0.46453857421875


In [None]:
### Run this cell to compare cosine distances between synonyms and antonyms
### Try the sample and then use this cell to complete Questions 11-19 in the companion quiz

w1 = "happy"
w2 = "cheerful"
w3 = "sad"
w1_w2_dist = wv_from_bin.distance(w1, w2)
w1_w3_dist = wv_from_bin.distance(w1, w3)

print("Synonyms {}, {} have cosine distance: {}".format(w1, w2, w1_w2_dist))
print("Antonyms {}, {} have cosine distance: {}".format(w1, w3, w1_w3_dist))


Synonyms happy, cheerful have cosine distance: 0.6162261664867401
Antonyms happy, sad have cosine distance: 0.46453857421875


In [18]:
### Run this cell to compare cosine distances between synonyms and antonyms
### Try the sample and then use this cell to complete Questions 11-19 in the companion quiz

w1 = "right"
w2 = "correct"
w3 = "wrong"
w1_w2_dist = wv_from_bin.distance(w1, w2)
w1_w3_dist = wv_from_bin.distance(w1, w3)

print("Synonyms {}, {} have cosine distance: {}".format(w1, w2, w1_w2_dist))
print("Antonyms {}, {} have cosine distance: {}".format(w1, w3, w1_w3_dist))


Synonyms right, correct have cosine distance: 0.5972667038440704
Antonyms right, wrong have cosine distance: 0.44657284021377563


In [19]:
### Run this cell to compare cosine distances between synonyms and antonyms
### Try the sample and then use this cell to complete Questions 11-19 in the companion quiz

w1 = "big"
w2 = "large"
w3 = "small"
w1_w2_dist = wv_from_bin.distance(w1, w2)
w1_w3_dist = wv_from_bin.distance(w1, w3)

print("Synonyms {}, {} have cosine distance: {}".format(w1, w2, w1_w2_dist))
print("Antonyms {}, {} have cosine distance: {}".format(w1, w3, w1_w3_dist))


Synonyms big, large have cosine distance: 0.44385212659835815
Antonyms big, small have cosine distance: 0.5041321516036987


In [20]:
### Run this cell to compare cosine distances between synonyms and antonyms
### Try the sample and then use this cell to complete Questions 11-19 in the companion quiz

w1 = "good"
w2 = "nice"
w3 = "bad"
w1_w2_dist = wv_from_bin.distance(w1, w2)
w1_w3_dist = wv_from_bin.distance(w1, w3)

print("Synonyms {}, {} have cosine distance: {}".format(w1, w2, w1_w2_dist))
print("Antonyms {}, {} have cosine distance: {}".format(w1, w3, w1_w3_dist))


Synonyms good, nice have cosine distance: 0.3163908123970032
Antonyms good, bad have cosine distance: 0.28099489212036133


In [22]:
### Run this cell to compare cosine distances between synonyms and antonyms
### Try the sample and then use this cell to complete Questions 11-19 in the companion quiz

w1 = "insane"
w2 = "crazy"
w3 = "sane"
w1_w2_dist = wv_from_bin.distance(w1, w2)
w1_w3_dist = wv_from_bin.distance(w1, w3)

print("Synonyms {}, {} have cosine distance: {}".format(w1, w2, w1_w2_dist))
print("Antonyms {}, {} have cosine distance: {}".format(w1, w3, w1_w3_dist))


Synonyms insane, crazy have cosine distance: 0.26609575748443604
Antonyms insane, sane have cosine distance: 0.40459734201431274


In [23]:
### Run this cell to compare cosine distances between synonyms and antonyms
### Try the sample and then use this cell to complete Questions 11-19 in the companion quiz

w1 = "several"
w2 = "numerous"
w3 = "one"
w1_w2_dist = wv_from_bin.distance(w1, w2)
w1_w3_dist = wv_from_bin.distance(w1, w3)

print("Synonyms {}, {} have cosine distance: {}".format(w1, w2, w1_w2_dist))
print("Antonyms {}, {} have cosine distance: {}".format(w1, w3, w1_w3_dist))


Synonyms several, numerous have cosine distance: 0.20559799671173096
Antonyms several, one have cosine distance: 0.5978294909000397


In [25]:
### Run this cell to compare cosine distances between synonyms and antonyms
### Try the sample and then use this cell to complete Questions 11-19 in the companion quiz

w1 = "antonym"
w2 = "opposite"
w3 = "synonym"
w1_w2_dist = wv_from_bin.distance(w1, w2)
w1_w3_dist = wv_from_bin.distance(w1, w3)

print("Synonyms {}, {} have cosine distance: {}".format(w1, w2, w1_w2_dist))
print("Antonyms {}, {} have cosine distance: {}".format(w1, w3, w1_w3_dist))


Synonyms antonym, opposite have cosine distance: 0.8862465396523476
Antonyms antonym, synonym have cosine distance: 0.43189460039138794


### Questions 6: Analogies with Word Vectors (3 points)
Word2Vec vectors have been shown to *sometimes* exhibit the ability to solve analogies. 

As an example, for the analogy "man : king :: woman : x", what is x?

In the cell below, we show you how to use word vectors to find x. The `most_similar` function finds words that are most similar to the words in the `positive` list and most dissimilar from the words in the `negative` list. **The model's proposed answer to the analogy will be the word ranked most similar (largest numerical value).**

We will test whether or not the word vectors can solve a handful of pre-selected analogies. 

**Note:** Further Documentation on the `most_similar` function can be found within the __[GenSim documentation](https://radimrehurek.com/gensim/models/keyedvectors.html#gensim.models.keyedvectors.FastTextKeyedVectors.most_similar)__.

In [26]:
# Run this cell to answer the analogy -- man : king :: woman : x
# Try the sample and then use this cell to complete Questions 20-31 in the companion quiz

pprint.pprint(wv_from_bin.most_similar(positive=['woman', 'king'], negative=['man']))

[('queen', 0.7118192911148071),
 ('monarch', 0.6189674139022827),
 ('princess', 0.5902431011199951),
 ('crown_prince', 0.5499460697174072),
 ('prince', 0.5377321243286133),
 ('kings', 0.5236844420433044),
 ('Queen_Consort', 0.5235945582389832),
 ('queens', 0.518113374710083),
 ('sultan', 0.5098593235015869),
 ('monarchy', 0.5087411999702454)]


In [27]:
# Run this cell to answer the analogy -- man : king :: woman : x
# Try the sample and then use this cell to complete Questions 20-31 in the companion quiz

pprint.pprint(wv_from_bin.most_similar(positive=['sea', 'plane'], negative=['air']))

[('boat', 0.5355114936828613),
 ('fishing_boat', 0.5340368151664734),
 ('sailboat', 0.5244715213775635),
 ('trawler', 0.48814260959625244),
 ('fishing_trawler', 0.485969603061676),
 ('Alvei', 0.4799835681915283),
 ('inflatable_dinghy', 0.4771292209625244),
 ('vessel', 0.4747542142868042),
 ('rough_seas', 0.47456133365631104),
 ('ferry_Princess_Ashika', 0.4704197347164154)]


In [28]:
# Run this cell to answer the analogy -- man : king :: woman : x
# Try the sample and then use this cell to complete Questions 20-31 in the companion quiz

pprint.pprint(wv_from_bin.most_similar(positive=['sad', 'laugh'], negative=['happy']))

[('funny', 0.5841635465621948),
 ('joke', 0.548027515411377),
 ('chuckle', 0.526171088218689),
 ('laughing', 0.519500732421875),
 ('giggle', 0.5119887590408325),
 ('laugh_uncontrollably', 0.5067344903945923),
 ('Aaaaah', 0.49892422556877136),
 ('sarcastic_quip', 0.49803200364112854),
 ('unspeakably_sad', 0.49446696043014526),
 ('laughs', 0.4860389828681946)]


In [29]:
# Run this cell to answer the analogy -- man : king :: woman : x
# Try the sample and then use this cell to complete Questions 20-31 in the companion quiz

pprint.pprint(wv_from_bin.most_similar(positive=['dog', 'kitten'], negative=['cat']))

[('puppy', 0.769972562789917),
 ('pup', 0.6861710548400879),
 ('pit_bull', 0.6776559352874756),
 ('dogs', 0.6770986318588257),
 ('Rottweiler', 0.66466224193573),
 ('pit_bull_mix', 0.6585750579833984),
 ('Pomeranian', 0.655381441116333),
 ('Labrador_retriever_mix', 0.6510828733444214),
 ('German_shepherd', 0.6490000486373901),
 ('puppies', 0.6450917720794678)]


In [30]:
# Run this cell to answer the analogy -- man : king :: woman : x
# Try the sample and then use this cell to complete Questions 20-31 in the companion quiz

pprint.pprint(wv_from_bin.most_similar(positive=['Germany', 'Paris'], negative=['France']))

[('Berlin', 0.7644002437591553),
 ('Frankfurt', 0.7329736948013306),
 ('Dusseldorf', 0.7009456753730774),
 ('Munich', 0.6773864030838013),
 ('Cologne', 0.6470192670822144),
 ('Düsseldorf', 0.6399551630020142),
 ('Stuttgart', 0.6361044645309448),
 ('Munich_Germany', 0.6238142251968384),
 ('Budapest', 0.6192865371704102),
 ('Hamburg', 0.6168562769889832)]


In [31]:
# Run this cell to answer the analogy -- man : king :: woman : x
# Try the sample and then use this cell to complete Questions 20-31 in the companion quiz

pprint.pprint(wv_from_bin.most_similar(positive=['herbivore', 'meat'], negative=['carnivore']))

[('meats', 0.5703098773956299),
 ('pork', 0.4979688227176666),
 ('beef', 0.4842023551464081),
 ('unprocessed_meats', 0.4841699004173279),
 ('soy_hulls', 0.47137123346328735),
 ('veal', 0.4531157612800598),
 ('primals', 0.4516943097114563),
 ('Boneless_breast', 0.44742533564567566),
 ('toxic_alkaloids', 0.4445357620716095),
 ('goat_meat', 0.443511039018631)]


In [32]:
# Run this cell to answer the analogy -- man : king :: woman : x
# Try the sample and then use this cell to complete Questions 20-31 in the companion quiz

pprint.pprint(wv_from_bin.most_similar(positive=['cat', 'bark'], negative=['dog']))

[('frass', 0.5099734663963318),
 ('cambium', 0.49994856119155884),
 ('beetles_burrow', 0.4957100749015808),
 ('chittering', 0.49452582001686096),
 ('sapwood', 0.49447116255760193),
 ('barky', 0.48963040113449097),
 ('treefrogs', 0.4850967526435852),
 ('sapsuckers', 0.4804511070251465),
 ('tree_bark', 0.47680342197418213),
 ('moth_caterpillars', 0.4735950231552124)]


In [33]:
# Run this cell to answer the analogy -- man : king :: woman : x
# Try the sample and then use this cell to complete Questions 20-31 in the companion quiz

pprint.pprint(wv_from_bin.most_similar(positive=['racquet', 'baseball'], negative=['bat']))

[('tennis', 0.6464535593986511),
 ('basketball', 0.5445076823234558),
 ('Tennis', 0.520628809928894),
 ('racquets', 0.515256404876709),
 ('soccer', 0.4958398938179016),
 ('sports', 0.495653361082077),
 ('softball', 0.49450376629829407),
 ('golf', 0.48922017216682434),
 ('pickleball', 0.48510220646858215),
 ('volleyball', 0.4848877191543579)]


In [34]:
# Run this cell to answer the analogy -- man : king :: woman : x
# Try the sample and then use this cell to complete Questions 20-31 in the companion quiz

pprint.pprint(wv_from_bin.most_similar(positive=['church', 'Islam'], negative=['mosque']))

[('Christianity', 0.6882044076919556),
 ('Catholicism', 0.6118013262748718),
 ('Catholic_Church', 0.5610004663467407),
 ('Biblical_teachings', 0.5582910776138306),
 ('Gospel', 0.5525078773498535),
 ('Reformed_theology', 0.5480355620384216),
 ('Unitarian_Universalism', 0.5441610813140869),
 ('Roman_Catholicism', 0.5423140525817871),
 ('Jesus_Christ', 0.5416033864021301),
 ('Anglicanism', 0.5408859252929688)]


In [35]:
# Run this cell to answer the analogy -- man : king :: woman : x
# Try the sample and then use this cell to complete Questions 20-31 in the companion quiz

pprint.pprint(wv_from_bin.most_similar(positive=['short', 'longer'], negative=['long']))

[('shorter', 0.5538737177848816),
 ('less', 0.4528019428253174),
 ('fewer', 0.4269722104072571),
 ('Longer', 0.4243296980857849),
 ('sooner', 0.405620276927948),
 ('Short', 0.3926800787448883),
 ('meaning', 0.3800658583641052),
 ('shorter_durations', 0.3696485757827759),
 ('more', 0.3575344681739807),
 ('smaller', 0.35055944323539734)]


In [36]:
# Run this cell to answer the analogy -- man : king :: woman : x
# Try the sample and then use this cell to complete Questions 20-31 in the companion quiz

pprint.pprint(wv_from_bin.most_similar(positive=['more', 'longest'], negative=['longer']))

[('most', 0.48366793990135193),
 ('largest', 0.48115548491477966),
 ('costliest', 0.45706892013549805),
 ('biggest', 0.43423277139663696),
 ('deadliest', 0.4111884832382202),
 ('history', 0.4066246449947357),
 ('morethan', 0.4051026701927185),
 ('lengthiest', 0.40244799852371216),
 ('worst', 0.40200868248939514),
 ('steepest', 0.3986958861351013)]


In [37]:
# Run this cell to answer the analogy -- man : king :: woman : x
# Try the sample and then use this cell to complete Questions 20-31 in the companion quiz

pprint.pprint(wv_from_bin.most_similar(positive=['help', 'talked'], negative=['talk']))

[('helped', 0.5876440405845642),
 ('helping', 0.5641529560089111),
 ('helps', 0.49145710468292236),
 ('assist', 0.47578394412994385),
 ('needed', 0.46895796060562134),
 ('assisting', 0.45158955454826355),
 ('worked_diligently', 0.45099586248397827),
 ('tohelp', 0.4488154351711273),
 ('worked_tirelessly', 0.4316960275173187),
 ('Helping', 0.4140826463699341)]


In [None]:
# Run this cell to answer the analogy -- man : king :: woman : x
# Try the sample and then use this cell to complete Questions 20-31 in the companion quiz

pprint.pprint(wv_from_bin.most_similar(positive=['woman', 'king'], negative=['man']))

[('queen', 0.7118192911148071),
 ('monarch', 0.6189674139022827),
 ('princess', 0.5902431011199951),
 ('crown_prince', 0.5499460697174072),
 ('prince', 0.5377321243286133),
 ('kings', 0.5236844420433044),
 ('Queen_Consort', 0.5235945582389832),
 ('queens', 0.518113374710083),
 ('sultan', 0.5098593235015869),
 ('monarchy', 0.5087411999702454)]


### Question 7: Guided Analysis of Bias in Word Vectors (2 points)

It's important to be cognizant of the biases (gender, race, sexual orientation etc.) implicit in our word embeddings. Execute the cells bellow and answer the final multiple choice question. 

Run the cell below, to examine (a) which terms are most similar to "woman" and "doctor" and most dissimilar to "man", and (b) which terms are most similar to "man" and "doctor" and most dissimilar to "woman".

In [4]:
# Run this cell
# Here `positive` indicates the list of words to be similar to and `negative` indicates the list of words to be
# most dissimilar from.
pprint.pprint(wv_from_bin.most_similar(positive=['woman', 'doctor'], negative=['man']))
print()
pprint.pprint(wv_from_bin.most_similar(positive=['man', 'doctor'], negative=['woman']))

KeyboardInterrupt: ignored

# <font color="blue"> Submission Instructions</font>

Simply make sure you have answered all the multiple choice questions to your satisfaction, and submit!
  