# Similaridades WordNet

* As métricas atuais testadas olham apenas a hierarquia do wordnet
    * Verificar se tem métricas que olham a definição (Information Content, definition)
        * Resnik, [Brown Corpus](https://en.wikipedia.org/wiki/Brown_Corpus)
        * [A Review of Semantic Similarity Measures in WordNet](http://www.cartagena99.com/recursos/alumnos/ejercicios/Article%201.pdf)
    * as vezes os termos similares estão em **derivationally related form**
        * Exemplo: [Actor](http://wordnetweb.princeton.edu/perl/webwn?o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&r=1&s=actor&i=3&h=10001000000#c), no caso o LCS de actor e casting é bem baixo, na verdade até mesmo actor e act
* Encontrar a clique de similaridade máxima. 
    * Apesar de casting ter vários sentidos, o melhor para ser usado é: the choice of **actors** to play particular **roles** in a play or **movie**
    * Escolher um elemento (significado) de cada partição (palavra)

In [1]:
from pprint import pprint as pp
import itertools
import nltk 
from nltk.corpus import stopwords
from nltk.corpus import wordnet as wn
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\pr3ma\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [2]:
schemaWords = ['casting','note','name','char','movie','title','person','role']

In [3]:
for x in wn.synsets('casting'):
    print(x,x.definition())

Synset('cast.n.06') object formed by a mold
Synset('molding.n.01') the act of creating something by casting it in a mold
Synset('casting.n.03') the act of throwing a fishing line out over the water by means of a rod and reel
Synset('casting.n.04') the choice of actors to play particular roles in a play or movie
Synset('project.v.10') put or send forth
Synset('cast.v.02') deposit
Synset('cast.v.03') select to play,sing, or dance a part in a play, movie, musical, opera, or ballet
Synset('hurl.v.01') throw forcefully
Synset('cast.v.05') assign the roles of (a movie or a play) to actors
Synset('roll.v.12') move about aimlessly or without any destination, often in search of food or employment
Synset('cast.v.07') form by pouring (e.g., wax or hot metal) into a cast or mold
Synset('shed.v.01') get rid of
Synset('draw.v.14') choose at random
Synset('frame.v.04') formulate in a particular style or language
Synset('vomit.v.01') eject the contents of the stomach through the mouth


In [5]:
wn.synset('cast.v.05')

Synset('cast.v.05')

In [13]:
def showSimilarities(wordA,wordB):
    A = set(wn.synsets(wordA, pos=wn.NOUN))
    B = set(wn.synsets(wordB, pos=wn.NOUN))

    x = []
    for (sense1,sense2) in itertools.product(A,B):  
        #sim = wn.wup_similarity(sense1,sense2) or 0
        
        sim = sense1.wup_similarity(sense2)
        
        x.append( (sense1.definition(),sense2.definition(),sim) )
    x = sorted(x,key=lambda x: x[2],reverse=True)
    
    for e in x:
        print(e[0],'\n',e[1],'\n',e[2],'\n\n')

In [14]:
showSimilarities('act','actor')

something that people do or cause to happen 
 a person who acts and gets things done 
 0.2 


something that people do or cause to happen 
 a theatrical performer 
 0.16666666666666666 


a subdivision of a play or opera or ballet 
 a person who acts and gets things done 
 0.16666666666666666 


a manifestation of insincerity 
 a person who acts and gets things done 
 0.16666666666666666 


a legal document codifying the result of deliberations of a committee or society or legislative body 
 a person who acts and gets things done 
 0.15384615384615385 


a short theatrical performance that is part of a longer program 
 a person who acts and gets things done 
 0.15384615384615385 


a subdivision of a play or opera or ballet 
 a theatrical performer 
 0.14285714285714285 


a manifestation of insincerity 
 a theatrical performer 
 0.14285714285714285 


a legal document codifying the result of deliberations of a committee or society or legislative body 
 a theatrical performer 
 0.13333

In [8]:
import nltk
nltk.download('wordnet_ic')
from nltk.corpus import wordnet_ic

brown_ic = wordnet_ic.ic('ic-brown.dat')
semcor_ic = wordnet_ic.ic('ic-semcor.dat')

[nltk_data] Downloading package wordnet_ic to
[nltk_data]     C:\Users\pr3ma\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet_ic is already up-to-date!


In [9]:
nltk.download('genesis')
from nltk.corpus import genesis
genesis_ic = wn.ic(genesis, False, 0.0)

[nltk_data] Downloading package genesis to
[nltk_data]     C:\Users\pr3ma\AppData\Roaming\nltk_data...
[nltk_data]   Package genesis is already up-to-date!
