# SHW5: WordNet
In this homework you will be exploring WordNet by finding hyponyms of a synset throughout a text and building synset clusters.  

This homework will be due **Tuesday, November 13 at 11:59pm**

First, we'll start off by importing a few modules. We'll also be defining a few functions to parse the Wordnet files for us to use. **Please do not import any additional libraries beyond the ones below.**

In [101]:
import numpy as np
import os
from IPython.display import Markdown, display
from nltk.stem import WordNetLemmatizer


def read_index_file(filename):
    
    words_to_first_synsets={}
    first_synsets_to_words={}
    with open(filename) as file:
        for i in range(29):
            file.readline()
            
        for line in file:
            cols = line.rstrip().split(" ")
            term = cols[0]
            p_cnt = int(cols[3], 16);
            first_synset_index = 6+p_cnt

            # A word (like "bank") can belong to multiple synsets, so select just one;
            # the first is typically the most frequently used for that word
            first_synset = cols[first_synset_index]
            words_to_first_synsets[term] = first_synset

            if first_synset not in first_synsets_to_words:
                first_synsets_to_words[first_synset] = set()

            first_synsets_to_words[first_synset].add(term)
            
    return words_to_first_synsets, first_synsets_to_words


def read_data_file(filename):

    hyponyms = {}
    with open(filename) as file:
    
        # skip header
        for i in range(29):
            file.readline()

        for line in file:        
            words = []
            cols = line.rstrip().split(" ")
            synset_id = cols[0]
            numWords = int(cols[3], 16);

            numptr_index = 6+((numWords-1) * 2)
            numPtrs = int(cols[numptr_index])

            for i in range(0, numPtrs):
                pointer_symbol = cols[numptr_index+(i * 4) + 1]
                pointed_synset = cols[numptr_index+(i * 4) + 2]
                
                if pointer_symbol == '~': # hyponym relation
                    if synset_id not in hyponyms:
                        hyponyms[synset_id] = set()
                    hyponyms[synset_id].add(pointed_synset)

    return hyponyms

In [102]:
get_hyponym_terms(word2first_synset['mammal'], hyponyms, first_synset2word)

{'bear_cub',
 'prairie_vole',
 'miniature_poodle',
 'fissipedia',
 'hinny',
 'musteline',
 'meadow_mouse',
 'pye-dog',
 'sorex_cinereus',
 'blarina_brevicauda',
 'gerbil',
 'hudson_bay_collared_lemming',
 'tube-nosed_fruit_bat',
 'pembroke_welsh_corgi',
 'pomeranian',
 'pug-dog',
 'mountain_zebra',
 'anteater',
 'belgian_hare',
 'tatouay',
 'rat_kangaroo',
 'gelding',
 'white_fox',
 'flying_cat',
 'scotch_terrier',
 'rattus_norvegicus',
 'prairie_marmot',
 'bruin',
 'staffordshire_terrier',
 'trichosurus_vulpecula',
 'arctic_ground_squirrel',
 'bunny_rabbit',
 'white-footed_mouse',
 'woolly_mammoth',
 'crab-eating_raccoon',
 'nylghau',
 'suricata_tetradactyla',
 'weimaraner',
 'ursus_arctos_middendorffi',
 'greenland_caribou',
 'dusicyon_cancrivorus',
 'russian_wolfhound',
 'equid',
 'blue_point_siamese',
 'nutria',
 'nail-tailed_wallaby',
 'cocker',
 'cotswold',
 'pocket_mouse',
 'sled_dog',
 'new_world_mouse',
 'procyon_cancrivorus',
 'spermophile',
 'rabbit-eared_bandicoot',
 'sheep

In [103]:
# import nltk
# nltk.download('wordnet')

# WordNetLemmatizer().lemmatize('dogs')

## Problem 1: Hyponym Identification
For this problem, we will be using the WordNet hyponym tree in order to identify all occurences of a hyponym of a given synset in a piece of text. To begin, we call the functions defined above to get the relevant information.  

`word2first_synset` is a dictionary that maps a word to its first synset, and `first_synset2word` is a dictionary that maps a synset to all words that have that synset as their first synset. Note that for this homework, we are considering only the first (which is usually the most common) synset for each word.  
`hyponyms` is a dictionary that maps a synset to a set of synsets which are direct hyponyms of the given synset. This gives the tree structure of hypernym/hyponym relationships.  

In [104]:
# Dictionary mapping word to the first synset it is contained in, and synset to words in the synset
word2first_synset, first_synset2word = read_index_file('index.noun')

# Dictionary of synset to a set of synsets that are direct hyponyms of the synset
hyponyms = read_data_file('data.noun')

### Problem 1.1
Implement `get_hyponym_terms`, which gets all the terms included in the set of all hyponyms of the designated synset.

In [105]:
def get_hyponym_terms(synset_id, hyponyms, first_synsets_to_words):
    
    terms = set()
    """ YOUR CODE HERE """
    #first_synsets_to_words: {synset: [word1, word2, ...]}
    #hyponyms: {synset: [synsets such that synset is a subclass of the synset]}
    #this function should grab all words that are hyponynms of any word in the synset
    #synset_id is a number
    #hy
    #print(first_synsets_to_words[synset_id])
    
    if synset_id in first_synsets_to_words:
        #print(first_synsets_to_words[synset_id])
        words = first_synsets_to_words[synset_id]
        for word in words:
            terms.add(word)
        if synset_id in hyponyms:
            synsets = hyponyms[synset_id]
            for synset  in synsets:
                sub_terms = get_hyponym_terms(synset, hyponyms, first_synsets_to_words)
                for term in sub_terms:
                    terms.add(term)
    

    return terms

returned_terms = get_hyponym_terms(word2first_synset['mammal'], hyponyms, first_synset2word)
#01865519


With this function, we are able to identify whether a particular word or phrase is a hyponym of a given synset. Now we can move on to using this function to help identify the locations of hyponyms in text.

### Problem 1.2
Implement `get_synset_locations`, which takes in a specified text (represented as multiple lines of tokenized words) and returns locations of where any hyponyms of a given word, where each location is a nested tuple of the format `(line, (start_index, end_index))`. The start index is inclusive and the end index is exclusive.

For example, if the word is 'mammal', and the 5th line of the text is "Dogs , such as the poodle and german shepherd make wonderful pets ." you should add `(4, (0, 1))`, `(4, (5, 6))`, and `(4, (7, 9))` to the locations list, as `dog`, `poodle`, and `german_shepherd` are hyponyms of 'mammal'.  

Assume `text` is a list of lines, where each line is a word-tokenized (by spaces) string representation of a paragraph of text, and `word` is the word we want to find the hyponyms of.

In [106]:
def sublist(lst1, lst2):
    #think of lst1 and lst2 as sequences of "words" (or punctuation, anything separated by spaces)
    #we want to see if lst1 is a subsequence of lst2:
    #in the sense that we took a "window" of lst2 and got lst1
    list1 = lst1
    list2 = lst2
    lst1 = [index for index in range(len(lst1))]
    lst2 = [index for index in range(len(lst2)) if lst2[index] in list1]

    if len(lst2)%len(lst1) == 0:
        result = []
        k = int(len(lst2)/len(lst1))
        #print(lst2)
        indices = np.array(lst2, dtype = int).reshape([k, len(lst1)])
        indices = list(indices)
        for index in indices:
            index = list(index)
            if lst1[-1]-lst1[0] == index[-1]-index[0]:
                result.append(index)
        if len(result) > 0:
            return True, result
        else:
            return False, []
    else:
        return False, []
#     if len(lst1)==len(lst2):
#         if (lst1[-1]-lst1[0])== (lst2[-1] -lst2[0]):
#             return True, lst2
#         else:
#             return False, []
#     else:
#         return False, []
    
def get_synset_locations(text,word,hyponyms, first_synset2word):

    locs = []
    synset_id = word2first_synset[word]
    #this gives the synset the word is in
    hyps = get_hyponym_terms(synset_id, hyponyms, first_synset2word) 
    #this gives all terms that are hyponyms of the synset that contains the word
    
    step = 1
    for line_index in range(len(text)):
        #print(step)
        for hyp in hyps:
            #checking for each hyponym term
            h = hyp.split('_')
            line = text[line_index]
            line = line.lower().split(' ')
            line = [WordNetLemmatizer().lemmatize(x.lower()) for x in line]
            (line_contains, indices) = sublist(h, line)
            if line_contains:
                for index in indices:
                    locs.append((line_index, (index[0], index[-1]+1))) 
#                 locs.append((line_index, (indices[0], indices[-1]+1)))
        step+=1
    return locs

The function defined below will help visualize where the hyponyms have been located.

In [107]:
def print_text_with_bolded_hyponyms(text, word, hyponyms, first_synset2word):
    locations = get_synset_locations(text, word, hyponyms, first_synset2word)
    
    text_print = [t.split() for t in text]
    for line_index, word_index in locations:
        text_print[line_index][word_index[0]] = '**' + text_print[line_index][word_index[0]]
        text_print[line_index][word_index[1]-1] = text_print[line_index][word_index[1]-1] + '**'
    
    for l in text_print:
        display(Markdown((' '.join(l))))

With the above functions implemented, we can now see how well we're able to identify hyponyms in text. Run the cell immediately below to read the text file, and then the following cell to display the text, wherein all hyponym of the given word (in this case, 'mammal'), will be bolded.

In [108]:
with open('literary.texts.txt', 'r', encoding = "utf-8") as f:
    lines = [l.rstrip() for l in f.readlines()[1:100]]


In [109]:
print_text_with_bolded_hyponyms(lines, 'mammal', hyponyms, first_synset2word)

Michaelmas term lately over , and the Lord Chancellor sitting in Lincoln 's Inn Hall .

Implacable November weather .

As much mud in the streets as if the waters had but newly retired from the face of the earth , and it would not be wonderful to meet a Megalosaurus , forty feet long or so , waddling like an elephantine lizard up Holborn Hill .

Smoke lowering down from chimney-pots , making a soft black drizzle , with flakes of soot in it as big as full-grown snowflakes -- gone into mourning , one might imagine , for the death of the sun .

**Dogs** , undistinguishable in mire .

**Horses** , scarcely better ; splashed to their very blinkers .

Foot passengers , jostling one another 's umbrellas in a general infection of ill temper , and losing their foot-hold at street-corners , where tens of thousands of other foot passengers have been slipping and sliding since the day broke ( if this day ever broke ) , adding new deposits to the crust upon crust of mud , sticking at those points tenaciously to the pavement , and accumulating at compound interest .

Fog everywhere .

Fog up the river , where it flows among green aits and meadows ; fog down the river , where it rolls defiled among the tiers of shipping and the waterside pollutions of a great ( and dirty ) city .

Fog on the Essex marshes , fog on the Kentish heights .

Fog creeping into the cabooses of collier-brigs ; fog lying out on the yards and hovering in the rigging of great ships ; fog drooping on the gunwales of barges and small boats .

Fog in the eyes and throats of ancient Greenwich pensioners , wheezing by the firesides of their wards ; fog in the stem and bowl of the afternoon pipe of the wrathful skipper , down in his close cabin ; fog cruelly pinching the toes and fingers of his shivering little ' prentice boy on deck .

Chance people on the bridges peeping over the parapets into a nether sky of fog , with fog all round them , as if they were up in a balloon and hanging in the misty clouds .

Gas looming through the fog in divers places in the streets , much as the sun may , from the spongey fields , be seen to loom by husbandman and ploughboy .

Most of the shops lighted two hours before their time -- as the gas seems to know , for it has a haggard and unwilling look .

The raw afternoon is rawest , and the dense fog is densest , and the muddy streets are muddiest near that leaden-headed old obstruction , appropriate ornament for the threshold of a leaden-headed old corporation , Temple Bar .

And hard by Temple Bar , in Lincoln 's Inn Hall , at the very heart of the fog , sits the Lord High Chancellor in his High Court of Chancery .

Never can there come fog too thick , never can there come mud and mire too deep , to assort with the groping and floundering condition which this High Court of Chancery , most pestilent of hoary sinners , holds this day in the sight of heaven and earth .

On such an afternoon , if ever , the Lord High Chancellor ought to be sitting here -- as here he is -- with a foggy glory round his head , softly fenced in with crimson cloth and curtains , addressed by a large advocate with great whiskers , a little voice , and an interminable brief , and outwardly directing his contemplation to the lantern in the roof , where he can see nothing but fog .

On such an afternoon some score of members of the High Court of Chancery bar ought to be -- as here they are -- mistily engaged in one of the ten thousand stages of an endless cause , tripping one another up on slippery precedents , groping knee-deep in technicalities , running their goat-hair and horsehair warded heads against walls of words and making a pretence of equity with serious faces , as players might .

On such an afternoon the various solicitors in the cause , some two or three of whom have inherited it from their fathers , who made a fortune by it , ought to be -- as are they not ?

-- ranged in a line , in a long matted well ( but you might look in vain for truth at the bottom of it ) between the registrar 's red table and the silk gowns , with bills , cross-bills , answers , rejoinders , injunctions , affidavits , issues , references to masters , masters ' reports , mountains of costly nonsense , piled before them .

Well may the court be dim , with wasting candles here and there ; well may the fog hang heavy in it , as if it would never get out ; well may the stained-glass windows lose their colour and admit no light of day into the place ; well may the uninitiated from the streets , who peep in through the glass panes in the door , be deterred from entrance by its owlish aspect and by the drawl , languidly echoing to the roof from the padded dais where the Lord High Chancellor looks into the lantern that has no light in it and where the attendant wigs are all stuck in a fog-bank !

This is the Court of Chancery , which has its decaying houses and its blighted lands in every shire , which has its worn-out lunatic in every madhouse and its dead in every churchyard , which has its ruined suitor with his slipshod heels and threadbare dress borrowing and begging through the round of every man 's acquaintance , which gives to monied might the means abundantly of wearying out the right , which so exhausts finances , patience , courage , hope , so overthrows the brain and breaks the heart , that there is not an honourable man among its practitioners who would not give -- who does not often give -- the warning , " Suffer any wrong that can be done you rather than come here ! "

Who happen to be in the Lord Chancellor 's court this murky afternoon besides the Lord Chancellor , the counsel in the cause , two or three counsel who are never in any cause , and the well of solicitors before mentioned ?

There is the registrar below the judge , in wig and gown ; and there are two or three maces , or petty-bags , or privy purses , or whatever they may be , in legal court suits .

These are all yawning , for no crumb of amusement ever falls from Jarndyce and Jarndyce ( the cause in hand ) , which was squeezed dry years upon years ago .

The short-hand writers , the reporters of the court , and the reporters of the newspapers invariably decamp with the rest of the regulars when Jarndyce and Jarndyce comes on .

Their places are a blank .

Standing on a seat at the side of the hall , the better to peer into the curtained sanctuary , is a little mad old woman in a squeezed bonnet who is always in court , from its sitting to its rising , and always expecting some incomprehensible judgment to be given in her favour .

Some say she really is , or was , a party to a suit , but no one knows for certain because no one cares .

She carries some small litter in a reticule which she calls her documents , principally consisting of paper matches and dry lavender .

A sallow prisoner has come up , in custody , for the half-dozenth time to make a personal application " to purge himself of his contempt , " which , being a solitary surviving executor who has fallen into a state of conglomeration about accounts of which it is not pretended that he had ever any knowledge , he is not at all likely ever to do .

In the meantime his prospects in life are ended .

Another ruined suitor , who periodically appears from Shropshire and breaks out into efforts to address the Chancellor at the close of the day 's business and who can by no means be made to understand that the Chancellor is legally ignorant of his existence after making it desolate for a quarter of a century , plants himself in a good place and keeps an eye on the judge , ready to call out " My Lord ! "

in a voice of sonorous complaint on the instant of his rising .

A few lawyers ' clerks and others who know this suitor by sight linger on the chance of his furnishing some fun and enlivening the dismal weather a little .

Jarndyce and Jarndyce drones on .

This scarecrow of a suit has , in course of time , become so complicated that no man alive knows what it means .

The parties to it understand it least , but it has been observed that no two Chancery lawyers can talk about it for five minutes without coming to a total disagreement as to all the premises .

Innumerable children have been born into the cause ; innumerable young people have married into it ; innumerable old people have died out of it .

Scores of persons have deliriously found themselves made parties in Jarndyce and Jarndyce without knowing how or why ; whole families have inherited legendary hatreds with the suit .

The little plaintiff or defendant who was promised a new rocking-horse when Jarndyce and Jarndyce should be settled has grown up , possessed himself of a real **horse** , and trotted away into the other world .

Fair wards of court have faded into mothers and grandmothers ; a long procession of Chancellors has come in and gone out ; the legion of bills in the suit have been transformed into mere bills of mortality ; there are not three Jarndyces left upon the earth perhaps since old **Tom** Jarndyce in despair blew his brains out at a coffee-house in Chancery Lane ; but Jarndyce and Jarndyce still drags its dreary length before the court , perennially hopeless .

Jarndyce and Jarndyce has passed into a joke .

That is the only good that has ever come of it .

It has been death to many , but it is a joke in the profession .

Every master in Chancery has had a reference out of it .

Every Chancellor was " in it , " for somebody or other , when he was counsel at the bar .

Good things have been said about it by blue-nosed , bulbous-shoed old benchers in select port-wine committee after dinner in hall .

Articled clerks have been in the habit of fleshing their legal wit upon it .

The last Lord Chancellor handled it neatly , when , correcting Mr. Blowers , the eminent silk gown who said that such a thing might happen when the sky rained potatoes , he observed , " or when we get through Jarndyce and Jarndyce , Mr. Blowers " -- a pleasantry that particularly tickled the maces , bags , and purses .

How many people out of the suit Jarndyce and Jarndyce has stretched forth its unwholesome hand to spoil and corrupt would be a very wide question .

From the master upon whose impaling files reams of dusty warrants in Jarndyce and Jarndyce have grimly writhed into many shapes , down to the copying-clerk in the Six Clerks ' Office who has copied his tens of thousands of Chancery folio-pages under that eternal heading , no man 's nature has been made better by it .

In trickery , evasion , procrastination , spoliation , botheration , under false pretences of all sorts , there are influences that can never come to good .

The very solicitors ' boys who have kept the wretched suitors at bay , by protesting time out of mind that Mr. Chizzle , Mizzle , or otherwise was particularly engaged and had appointments until dinner , may have got an extra moral twist and shuffle into themselves out of Jarndyce and Jarndyce .

The receiver in the cause has acquired a goodly sum of money by it but has acquired too a distrust of his own mother and a contempt for his own kind .

Chizzle , Mizzle , and otherwise have lapsed into a habit of vaguely promising themselves that they will look into that outstanding little matter and see what can be done for Drizzle -- who was not well used -- when Jarndyce and Jarndyce shall be got out of the office .

Shirking and sharking in all their many varieties have been sown broadcast by the ill-fated cause ; and even those who have contemplated its history from the outermost circle of such evil have been insensibly tempted into a loose way of letting bad things alone to take their own bad course , and a loose belief that if the world go wrong it was in some off-hand manner never meant to go right .

Chapter 1 Sir Walter Elliot , of Kellynch Hall , in Somersetshire , was a man who , for his own amusement , never took up any book but the Baronetage ; there he found occupation for an idle hour , and consolation in a distressed one ; there his faculties were roused into admiration and respect , by contemplating the limited remnant of the earliest patents ; there any unwelcome sensations , arising from domestic affairs changed naturally into pity and contempt as he turned over the almost endless creations of the last century ; and there , if every other leaf were powerless , he could read his own history with an interest which never failed .

This was the page at which the favourite volume always opened : " ELLIOT OF KELLYNCH HALL .

" Walter Elliot , born March 1 , 1760 , married , July 15 , 1784 , Elizabeth , daughter of James Stevenson , Esq. of South Park , in the county of Gloucester , by which lady ( who died 1800 ) he has issue Elizabeth , born June 1 , 1785 ; Anne , born August 9 , 1787 ; a still-born son , November 5 , 1789 ; Mary , born November 20 , 1791 . "

Precisely such had the paragraph originally stood from the printer 's hands ; but Sir Walter had improved it by adding , for the information of himself and his family , these words , after the date of Mary 's birth -- " Married , December 16 , 1810 , Charles , son and heir of Charles Musgrove , Esq. of Uppercross , in the county of Somerset , " and by inserting most accurately the day of the month on which he had lost his wife .

Then followed the history and rise of the ancient and respectable family , in the usual terms ; how it had been first settled in Cheshire ; how mentioned in Dugdale , serving the office of high sheriff , representing a borough in three successive parliaments , exertions of loyalty , and dignity of baronet , in the first year of Charles II , with all the Marys and Elizabeths they had married ; forming altogether two handsome duodecimo pages , and concluding with the arms and motto : -- " Principal seat , Kellynch Hall , in the county of Somerset , " and Sir Walter 's handwriting again in this finale : -- " Heir presumptive , William Walter Elliot , Esq. , great grandson of the second Sir Walter . "

Vanity was the beginning and the end of Sir Walter Elliot 's character ; vanity of person and of situation .

He had been remarkably handsome in his youth ; and , at fifty-four , was still a very fine man .

Few women could think more of their personal appearance than he did , nor could the valet of any new made lord be more delighted with the place he held in society .

He considered the blessing of beauty as inferior only to the blessing of a baronetcy ; and the Sir Walter Elliot , who united these gifts , was the constant object of his warmest respect and devotion .

His good looks and his rank had one fair claim on his attachment ; since to them he must have owed a wife of very superior character to any thing deserved by his own .

Lady Elliot had been an excellent woman , sensible and amiable ; whose judgement and conduct , if they might be pardoned the youthful infatuation which made her Lady Elliot , had never required indulgence afterwards .

-- She had humoured , or softened , or concealed his failings , and promoted his real respectability for seventeen years ; and though not the very happiest being in the world herself , had found enough in her duties , her friends , and her children , to attach her to life , and make it no matter of indifference to her when she was called on to quit them .

-- Three girls , the two eldest sixteen and fourteen , was an awful legacy for a mother to bequeath , an awful charge rather , to confide to the authority and guidance of a conceited , silly father .

She had , however , one very intimate friend , a sensible , deserving woman , who had been brought , by strong attachment to herself , to settle close by her , in the village of Kellynch ; and on her kindness and advice , Lady Elliot mainly relied for the best help and maintenance of the good principles and instruction which she had been anxiously giving her daughters .

This friend , and Sir Walter , did not marry , whatever might have been anticipated on that head by their acquaintance .

Thirteen years had passed away since Lady Elliot 's death , and they were still near neighbours and intimate friends , and one remained a widower , the other a widow .

That Lady Russell , of steady age and character , and extremely well provided for , should have no thought of a second marriage , needs no apology to the public , which is rather apt to be unreasonably discontented when a woman does marry again , than when she does not ; but Sir Walter 's continuing in singleness requires explanation .

Be it known then , that Sir Walter , like a good father , ( having met with one or two private disappointments in very unreasonable applications ) , prided himself on remaining single for his dear daughters ' sake .

For one daughter , his eldest , he would really have given up any thing , which he had not been very much tempted to do .

Elizabeth had succeeded , at sixteen , to all that was possible , of her mother 's rights and consequence ; and being very handsome , and very like himself , her influence had always been great , and they had gone on together most happily .

His two other children were of very inferior value .

Mary had acquired a little artificial importance , by becoming Mrs Charles Musgrove ; but Anne , with an elegance of mind and sweetness of character , which must have placed her high with any people of real understanding , was nobody with either father or sister ; her word had no weight , her convenience was always to give way -- she was only Anne .

To Lady Russell , indeed , she was a most dear and highly valued god-daughter , favourite , and friend .

Lady Russell loved them all ; but it was only in Anne that she could fancy the mother to revive again .

A few years before , Anne Elliot had been a very pretty girl , but her bloom had vanished early ; and as even in its height , her father had found little to admire in her , ( so totally different were her delicate features and mild dark eyes from his own ) , there could be nothing in them , now that she was faded and thin , to excite his esteem .

He had never indulged much hope , he had now none , of ever reading her name in any other page of his favourite work .

All equality of alliance must rest with Elizabeth , for Mary had merely connected herself with an old country family of respectability and large fortune , and had therefore given all the honour and received none : Elizabeth would , one day or other , marry suitably .

It sometimes happens that a woman is handsomer at twenty-nine than she was ten years before ; and , generally speaking , if there has been neither ill health nor anxiety , it is a time of life at which scarcely any charm is lost .

It was so with Elizabeth , still the same handsome Miss Elliot that she had begun to be thirteen years ago , and Sir Walter might be excused , therefore , in forgetting her age , or , at least , be deemed only half a fool , for thinking himself and Elizabeth as blooming as ever , amidst the wreck of the good looks of everybody else ; for he could plainly see how old all the rest of his family and acquaintance were growing .

Anne haggard , Mary coarse , every face in the neighbourhood worsting , and the rapid increase of the crow 's foot about Lady Russell 's temples had long been a distress to him .

Elizabeth did not quite equal her father in personal contentment .

Thirteen years had seen her mistress of Kellynch Hall , presiding and directing with a self-possession and decision which could never have given the idea of her being younger than she was .

For thirteen years had she been doing the honours , and laying down the domestic law at home , and leading the way to the chaise and four , and walking immediately after Lady Russell out of all the drawing-rooms and dining-rooms in the country .

Thirteen winters ' revolving frosts had seen her opening every ball of credit which a scanty neighbourhood afforded , and thirteen springs shewn their blossoms , as she travelled up to London with her father , for a few weeks ' annual enjoyment of the great world .

She had the remembrance of all this , she had the consciousness of being nine-and-twenty to give her some regrets and some apprehensions ; she was fully satisfied of being still quite as handsome as ever , but she felt her approach to the years of danger , and would have rejoiced to be certain of being properly solicited by baronet-blood within the next twelvemonth or two .

Then might she again take up the book of books with as much enjoyment as in her early youth , but now she liked it not .

Always to be presented with the date of her own birth and see no marriage follow but that of a youngest sister , made the book an evil ; and more than once , when her father had left it open on the table near her , had she closed it , with averted eyes , and pushed it away .

She had had a disappointment , moreover , which that book , and especially the history of her own family , must ever present the remembrance of .

The heir presumptive , the very William Walter Elliot , Esq. , whose rights had been so generously supported by her father , had disappointed her .

She had , while a very young girl , as soon as she had known him to be , in the event of her having no brother , the future baronet , meant to marry him , and her father had always meant that she should .

Feel free to change 'mammal' in the above cell to see different hyponyms being identified.

## Problem 2: Synset Clustering
In this next problem, we will be generating clusters from synsets in order to find which synset are most similar to some words that are not contained in Wordnet.  

We begin by reading in our GloVe word embeddings, trained on a Twitter dataset.

In [110]:
glove_dict = {}
with open('glove.twitter.27B.25d.txt', 'r', encoding= "utf-8") as f:
    for line in f.readlines():
        glove_dict[line.split()[0]] = np.array(line.split()[1:], dtype=np.float32)




In [111]:

for word in glove_dict:
    if len(glove_dict[word]) != 25:
        print(word)

0.065581


### Problem 2.1
Create clusters for each synset by finding the point that maximizes the cosine similarity of all the embeddings of the words in the synset. The resultant dictionary should contain a mapping between the synset ID and the optimal point.

In [112]:
def create_clusters(first_synset2word, glove_dict):
    #cluster
    clusters = {}
    """ YOUR CODE HERE """
    for synset_id in first_synset2word:
        #for each synset_id, we get a list of words in the synset
        words_in_synset = list(first_synset2word[synset_id])
        vectors = [glove_dict[word] for word in words_in_synset if word in glove_dict and word != '0.065581']
        if len(vectors)>0:
            dim = len(vectors[0])
            A = np.concatenate(vectors)
            A = A.reshape(dim, len(vectors))
            SUM = np.sum(A, axis = 1)
            MEAN = SUM/len(vectors)
            clusters[synset_id] = MEAN
    return clusters

Below we have a few words outside of Wordnet's vocabulary. We'll check the most similar synset from the synset clusters.

In [113]:
out_of_wordnet = ['minigame', 'grandmama', 'nocebo', 'crazycatlady', 'blogoversary', 'self-motivation',
                  'bioshock', 'horcrux', 'pokemon', 'allnighter', 'belieber', 'facebook', 'ransomware',
                  'bokeh', 'crowdfunding']

synset_clusters = create_clusters(first_synset2word, glove_dict)

for word in out_of_wordnet:
    dists = []
    for key, val in synset_clusters.items():
        dists.append((key, np.dot(glove_dict[word], val) / np.linalg.norm(val)))
    closest = sorted(dists, key=lambda x: x[1], reverse=True)[0][0]
    print('Closest synset to %s: %s' % (word, ', '.join(first_synset2word[closest])))

Closest synset to minigame: launchpad, launching_pad, launch_area, launch_pad
Closest synset to grandmama: mammy
Closest synset to nocebo: corpus_amygdaloideum, amygdala, amygdaloid_nucleus
Closest synset to crazycatlady: chowchow
Closest synset to blogoversary: fortnight, two_weeks
Closest synset to self-motivation: imperturbability, coolness, imperturbableness
Closest synset to bioshock: absolution
Closest synset to horcrux: aloneness, lonesomeness
Closest synset to pokemon: lego, lego_set
Closest synset to allnighter: slacking, shirking, goofing_off, goldbricking
Closest synset to belieber: ariana
Closest synset to facebook: chirrup, twitter
Closest synset to ransomware: dervish
Closest synset to bokeh: pinhole
Closest synset to crowdfunding: startup


## Problem 2.2
Choose three of the above out of WordNet words and write a comment about each of them, answering the following questions: Was the most similar synset what you expected, or did it surprise you? Why do you think that synet was the most similar, based on what you know about WordNet, word embeddings, and the data that the embeddings were trained on?

Was the most similar synset what you expected?

In my opnion the most similar synset is twitter to facebook, and that doesn't suprprise me at all since twitter and facebook are both social media platforms.

Why do you think that synet was the most similar, based on what you know about WordNet, word embeddings, and the data that the embeddings were trained on?

The glove embeddings are trained on twitter data, and on twitter, people frequently like to talk about social media platforms and why "twitter is better than facebook" or vice versa.
Also, this is related to embeddings being trained on contexts, so social media platforms are talked about in similar contexts. Also, they are both hyponyms of "social media" platforms.