# **Part One of the Course Project**
In this part of the course project, you will manipulate NLTK's WordNet taxonomy.
<hr style="border-top: 2px solid #606366; background: transparent;">

# **Setup**
 
Reset the Python environment to clear it of any previously loaded variables, functions, or libraries. Then, import the libraries and corpora needed for this project.

In [None]:
%reset -f
from IPython.core.interactiveshell import InteractiveShell as IS
IS.ast_node_interactivity = "all"    # allows multiple outputs from a cell
import numpy as np, nltk, pandas as pd, numpy.testing as npt, unittest
from nltk.corpus import wordnet as wn
from colorunittest import run_unittest
ae, aae = npt.assert_equal, npt.assert_almost_equal

_ = nltk.download(['punkt', 'averaged_perceptron_tagger', 'wordnet', 'omw-1.4'], quiet=True)

## Task 1

Complete UDF `Lemmas()`, which takes `sLemma` lemma word, finds all related synsets `SS` and returns the set of lemma names from all synsets in `SS`.

*Example:* the lemma `'tiger'` leads to synsets:

1. `Synset('tiger.n.01')` with the lemma names `['tiger']`
1. `Synset('tiger.n.02')` with the lemma names `['tiger', 'Panthera_tigris']`. 

The concatenated list `['tiger', 'tiger', 'Panthera_tigris']` is converted to the output set `{'tiger', 'Panthera_tigris'}`.

In [None]:
# COMPLETE THIS CELL
def Lemmas(sLemma='tiger') -> set():
    '''Returns a set of all lemma names from all synsets of sLemma.  '''
    SsLemmas = set()  # set of lemmas
    # YOUR CODE HERE
    raise NotImplementedError()
    return SsLemmas

Lemmas()

In [None]:
# RUN CELL TO TEST YOUR CODE
@run_unittest
class Test_Lemmas(unittest.TestCase):
    def test00(self): ae(type(Lemmas()), set)
    def test01(self): ae(Lemmas('tiger'), {'Panthera_tigris', 'tiger'})
    def test02(self): ae(Lemmas('teacher'), {'instructor', 'teacher'})
    def test03(self): ae(Lemmas('Cornell'), {'Cornell', 'Ezra_Cornell', 'Katherine_Cornell'})
    def test04(self): ae(len(Lemmas('cat')), 38)
    def test05(self): ae(len(Lemmas('Dogs')), 30)

## Task 2

Complete UDF `Hypernyms()`, which takes `sLemma` lemma word, finds all related synsets `SS` and returns the set of hypernym's lemma names from all synsets in `SS`.

*Example:* the lemma `'tiger'` leads to synsets:

1. `Synset('tiger.n.01')` with the hypernym synsets:
    1. `Synset('person.n.01')` with lemmas `[['person', 'individual', 'someone', 'somebody', 'mortal', 'soul']]`
1. `Synset('tiger.n.02')` with the hypernym synsets:
    1. `Synset('big_cat.n.01')` with hypernym synset lemmas `[['big_cat', 'cat']]`. 

The flattened concatenated list `['person', 'individual', 'someone', 'somebody', 'mortal', 'soul', 'big_cat', 'cat']` is returned as a set.

In [None]:
# COMPLETE THIS CELL
def HypLemmas(sLemma='tiger') -> set():
    '''Returns a set of all lemma names from all the hypernyms of all the synsets of sLemma. '''
    SsLemmas = set()  # set of lemmas
    # YOUR CODE HERE
    raise NotImplementedError()
    return SsLemmas

print(HypLemmas())

In [None]:
# RUN CELL TO TEST YOUR CODE
@run_unittest
class Test_HypLemmas(unittest.TestCase):
    def test00(self): ae(type(HypLemmas()), set)
    def test01(self): ae(HypLemmas('tiger'), {'big_cat','cat','individual','mortal','person','somebody','someone','soul'})
    def test02(self): ae(HypLemmas('teacher'), {'abstract','abstraction','educator','pedagog','pedagogue'})
    def test03(self): ae(HypLemmas('Cornell'), set())
    def test04(self): ae(len(HypLemmas('cat')), 30)
    def test05(self): ae(len(HypLemmas('Dogs')), 23)

## Task 3

Complete UDF `Sim()`, which takes two lemmas and for each combination of their synsets computes a path similarity score. If `ReturnBest` is selected, then only the topmost similarity pairs are returned (which can be more than one). `Sim('teacher', 'tiger')` return the following dataframe (ordered by columns `ss1`, `ss2`):


|.|ss1|ss2|sim|
|-|-|-|-|
|0|teacher.n.01|tiger.n.01|0.166667|
|1|teacher.n.01|tiger.n.02|0.066667|
|2|teacher.n.02|tiger.n.01|0.076923|
|3|teacher.n.02|tiger.n.02|0.043478|

In [None]:
# COMPLETE THIS CELL
def Sim(sLemma1='teacher', sLemma2='tiger', ReturnBest=False) -> pd.DataFrame():
    '''Computes WordNet's path similarity between all possible pairs of synsets from
        sLemma1 and sLemma2.
    Inputs:
        sLemma1, sLemma2: strings, lemmas
        ReturnBest: Boolean, specifies whether rows with max similarity should be returned only
    Returns: dataframe with columns ss1 & ss2 indicating names of the synsets relating 
        to sLemma1, sLemma2, respectively. Column sim contains the corresponding similarity score.
        Rows are ordered by ss1, ss2.      '''
    dfSim = pd.DataFrame([], columns=['ss1', 'ss2', 'sim']) # format of returned dataframe
    # YOUR CODE HERE
    raise NotImplementedError()
    return dfSim

Sim()

In [None]:
# RUN CELL TO TEST YOUR CODE
@run_unittest
class Test_Sim(unittest.TestCase):
    def test00(self): ae(type(Sim()), pd.DataFrame)
    def test01(self): ae(Sim().shape, (4,3))
    def test02(self): ae(list(Sim().columns), ['ss1','ss2','sim'])
    def test03(self): ae(Sim('monday','autumn').values.tolist(), 
        [['monday.n.01', 'fall.n.01', 0.14285714285714285]])
    def test04(self): ae(Sim('teacher','autumn').values.tolist(), 
        [['teacher.n.01', 'fall.n.01', 0.07142857142857142],
         ['teacher.n.02', 'fall.n.01', 0.07692307692307693]])
    def test05(self): ae(Sim('cat','dog').shape, (80, 3))
    def test06(self): ae(Sim('fall','break').shape, (3300, 3))
    def test07(self): ae(Sim('fall','break', True).values.tolist(), 
        [['decrease.v.01', 'break.v.56', 0.5], ['decrease.v.01', 'break.v.58', 0.5]])
    def test08(self): ae(Sim('cat','dog').sim.mean().round(4), 0.0881)