# Test Case for Resnik Sim

To test your program for computing Resnik Similarity, a good way to write a unit test is to 

In [3]:
import nltk
nltk.download('wordnet')
nltk.download('wordnet_ic')

[nltk_data] Downloading package wordnet to /Users/rgeorgi/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package wordnet_ic to
[nltk_data]     /Users/rgeorgi/nltk_data...
[nltk_data]   Package wordnet_ic is already up-to-date!


True

Let's load our information content

In [4]:
from nltk.corpus import wordnet, wordnet_ic
from nltk.corpus.reader.wordnet import information_content
ic_data = wordnet_ic.ic('ic-brown-resnik-add1.dat')

## Convenience Functions

Let's write one function to lookup a synset by the "necktie.n.01" string format, since this doesn't appear to be implemented in NLTK.

Let's also write a function to look up the information content we loaded earlier for a given synset using the same format.

In [5]:
def synset_by_repr(r):
    """
    Given a synset in the 'necktie.n.01' format, return the synset.
    
    :rtype: Synset
    """
    lemma, pos, index = r.split('.')
    synsets = wordnet.synsets(lemma, pos)
    if synsets and len(synsets) > (int(index)-1):
        return synsets[int(index)-1]

def ic(r):
    if r:
        return information_content(r, ic_data)

def print_ic(r):
    """
    :type r: Synset
    """
    print('Synset <{}> with information content: ~{:.3f}'.format(r.name(), ic(r)))
    
    

necktie_syn = synset_by_repr('necktie.n.01')
print_ic(synset_by_repr('necktie.n.01'))

Synset <necktie.n.01> with information content: ~10.517


## Determining LCS

Determining the LCS is easy enough, but first let's put together a quick way to visualize the ancestors of a given synset.

NLTK makes this somewhat easier by the existence of the `hypernym_paths()` function, which gives us all the possible paths to the root element of a given synset (note that there may be multiple parents of certain nodes, thus multiple paths).

Let's create this function then use it to verify that we are getting the correct LCS.

In [6]:
def print_hypernym_paths(synset):
    """
    :type synset: Synset
    """
    for path in synset.hypernym_paths():
        print('\n \u2191 '.join([path_elt.name() for path_elt in reversed(path)]) + ' *')

doctor_1 = synset_by_repr('doctor.n.01')
nurse_1 = synset_by_repr('nurse.n.01')

print('Hypernyms for doctor.n.01:\n'+'- '*20)
print_hypernym_paths(doctor_1)

print('Hypernyms for nurse.n.01:\n'+'- '*20)
print_hypernym_paths(nurse_1)

Hypernyms for doctor.n.01:
- - - - - - - - - - - - - - - - - - - - 
doctor.n.01
 ↑ medical_practitioner.n.01
 ↑ health_professional.n.01
 ↑ professional.n.01
 ↑ adult.n.01
 ↑ person.n.01
 ↑ causal_agent.n.01
 ↑ physical_entity.n.01
 ↑ entity.n.01 *
doctor.n.01
 ↑ medical_practitioner.n.01
 ↑ health_professional.n.01
 ↑ professional.n.01
 ↑ adult.n.01
 ↑ person.n.01
 ↑ organism.n.01
 ↑ living_thing.n.01
 ↑ whole.n.02
 ↑ object.n.01
 ↑ physical_entity.n.01
 ↑ entity.n.01 *
Hypernyms for nurse.n.01:
- - - - - - - - - - - - - - - - - - - - 
nurse.n.01
 ↑ health_professional.n.01
 ↑ professional.n.01
 ↑ adult.n.01
 ↑ person.n.01
 ↑ causal_agent.n.01
 ↑ physical_entity.n.01
 ↑ entity.n.01 *
nurse.n.01
 ↑ health_professional.n.01
 ↑ professional.n.01
 ↑ adult.n.01
 ↑ person.n.01
 ↑ organism.n.01
 ↑ living_thing.n.01
 ↑ whole.n.02
 ↑ object.n.01
 ↑ physical_entity.n.01
 ↑ entity.n.01 *


## Encoding this into a Test Case

So, looking at our lists above, we should be able to quickly find that the LCS for these synsets is `health_professional.n.01`. 

We can encode this into a unit test for finding the LCS in something like this:

In [7]:
from unittest import TestCase, main

def lowest_common_subsumer(syn_1, syn_2):
    """
    :type syn_1: Synset
    :type syn_2: Synset
    
    Code to retrieve the lowest common subsumer. Replace this
    with your real code, don't use the NLTK implementation!
    """
    return set(syn_1.lowest_common_hypernyms(syn_2))

class LCSTests(TestCase):
    def test_nurse_doctor(self):
        self.assertSetEqual({synset_by_repr('health_professional.n.01')},
                           lowest_common_subsumer(nurse_1, doctor_1))

### Adding More Test Cases

Using the model of the test case written above, can you now find the LCS between 'dog.n.01' and 'fish.n.01'?

In [8]:
dog_syn = synset_by_repr('dog.n.01')
fish_syn = synset_by_repr('fish.n.01')

##### ADD CODE TO EXPLORE HYPERNYMS/SYNSETS HERE ####

# Finding the MI LCS

Now, since we're ultimately interested in finding the appropriate sense for a word given multiple senses for the target word and its' probe words, let's try testing out a sub-problem for solving the meaning for ***bowl***, whether a type of container or the sporting event.

The WordNet listing for bowl [can be found here](http://wordnetweb.princeton.edu/perl/webwn?s=bowl&sub=Search+WordNet&o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&h=1112231231223123123123022220).

Let's start with the first sense, `bowl.n.01`, the type of kitchen utensil. And we'll use a probe word `plate`.

In [80]:
bowl_sense_1 = synset_by_repr('bowl.n.01')
plate_synsets = wordnet.synsets('plate', pos='n')

# "Plate" has 15 synsets! Let's see what the LCS for each, with bowl.n.01:
most_informative = (None, 0)
for plate_synset in plate_synsets:
    # Write

# Now, what about the fifth sense of "bowl," as in a stadium?
bowl_sense_5 = synset_by_repr('bowl.n.05')
house_synsets = wordnet.synsets('house', pos='n')

### Run the Test Cases

The following will run all of the unit tests declared above.

In [10]:
main(argv=['first-arg-is-ignored'], exit=False)

.
----------------------------------------------------------------------
Ran 1 test in 0.003s

OK


<unittest.main.TestProgram at 0x11b99c358>