## Semantic similarity 

In this notebook I use all similarity metrics from the NLTK library.
Given a concept from the McRae as well as a list of the entire dataset concepts, this functions obtain the closest concepts to the input concept. 

### Header and obtaining list of concepts.

In [2]:
import pandas as pd
import nltk

def get_concepts_list ():
    "Returns a list of strings: the names of the concepts"
    df = pd.read_excel('../McRaedataset/CONCS_Synset_brm.xlsx')
    return map(str, list(df['Concept']))
    
def get_synset (concept):
    "Given a concept name (string) it returns its synset (string)"
    # Dataframe for excel document
    df = pd.read_excel('../McRaedataset/CONCS_Synset_brm.xlsx')
    row = df.loc[df['Concept'] == concept]
    return str(list(row['Synset'] )[0])

Concepts = get_concepts_list()
#print Concepts
#print get_synset('airplane')

## Obtaining similarity 

In the next cells we obtain a list of the 'n' "closest" concepts to a given concept, according to several similarity metric on the nltk library...

In [3]:
from nltk.corpus import wordnet as wn
from nltk.corpus import wordnet_ic

#Input concept
concept = wn.synset( get_synset("airplane") )
print concept

Synset('airplane.n.01')


### Path similarity

This metric is based on the shortest path that connects the senses in the is-a (hypernym/hypnoym) taxonomy.
The score is in the range 0 to 1. 

In [4]:
dist_list = []
for c in Concepts:
    c_synset = wn.synset( get_synset(c) )
    dist_list.append([c, round(wn.path_similarity(concept, c_synset), 3) ])

dist_list = sorted(dist_list, key = lambda r : r[1], reverse = True)
print dist_list [:20]

[['airplane', 1.0], ['jet', 0.5], ['helicopter', 0.333], ['boat', 0.167], ['ship', 0.167], ['sled', 0.167], ['sleigh', 0.167], ['yacht', 0.167], ['bike', 0.143], ['missile', 0.143], ['sailboat', 0.143], ['scooter', 0.143], ['skateboard', 0.143], ['tank_(army)', 0.143], ['trailer', 0.143], ['tricycle', 0.143], ['unicycle', 0.143], ['wagon', 0.143], ['bus', 0.125], ['canoe', 0.125]]
