# wordnet use case
- 워드넷: 영단어간의 관계를 정의한 온톨로지, 대략 다음 관계가 정의됨
    - synonym
    - antonym
    - hypernym
    - hyponym
- 지속적으로 유지보수되어야 하는 문제점이 있기는 하지만, 현재 가장 잘 정의된 온톨로지
    - 국내에서도 부산대에서 비슷한 작업을 수행한 적이 있으나, 완전히 공개되지 않아서 접근성 등의 문제가 있음
    - wordnet의 경우는 nltk에 통합되어 상대적으로 쉽게 쓸 수 있음 

## contents
- synset
- synonyms
- similarity
- hypernym
- hyponym
- antonym
- holonym
- meronym

In [44]:
import nltk
from nltk.corpus import wordnet as wn

# synset
- a set of synonyms that share a common meaning

In [45]:
for synset in wn.synsets('car'):
    print("{}: {}".format(synset, synset.definition()))

Synset('car.n.01'): a motor vehicle with four wheels; usually propelled by an internal combustion engine
Synset('car.n.02'): a wheeled vehicle adapted to the rails of railroad
Synset('car.n.03'): the compartment that is suspended from an airship and that carries personnel and the cargo and the power plant
Synset('car.n.04'): where passengers ride up and down
Synset('cable_car.n.01'): a conveyance for passengers or freight on a cable railway


In [46]:
def ordinal_num(i):
    ord_dict = {1:"st", 2:"nd", 3:"rd"}
    ord_dict.update( {j:"th" for j in range(4, 10)} )
    ord_dict[0]="th"
    return str(i)+ord_dict[i%10]

def all_synsets(word):
    for i, synset in enumerate( wn.synsets(word) ):
        print(ordinal_num(i+1), 'synset')
        print("{}: '{}'".format(synset.name(), synset.definition()))
        if synset.examples()!=[]:
            print('example:')
            for example in synset.examples():
                print(example)
        print()
all_synsets('car')

1st synset
car.n.01: 'a motor vehicle with four wheels; usually propelled by an internal combustion engine'
example:
he needs a car to get to work

2nd synset
car.n.02: 'a wheeled vehicle adapted to the rails of railroad'
example:
three cars had jumped the rails

3rd synset
car.n.03: 'the compartment that is suspended from an airship and that carries personnel and the cargo and the power plant'

4th synset
car.n.04: 'where passengers ride up and down'
example:
the car was on the top floor

5th synset
cable_car.n.01: 'a conveyance for passengers or freight on a cable railway'
example:
they took a cable car to the top of the mountain



- `wn.synsets` 을 찾을 때, 반드시 input으로 word의 기본형을 넣지 않아도 상관없음

In [47]:
print( wn.synsets("car") == wn.synsets('cars') )
print( wn.synsets('universities')==wn.synsets('university'))

True
True


- 하지만, word 형태에 따라서 다른 값이 나올 수 있음 
    - 아래를 보면, 'have'와 'having'의 synsets이 다른 것을 알 수 있음
    - 두 synsets의 개수가 다른데, 'have'를 넣었을 때는 noun, verb가 모두 나오고, 'having'을 넣었을 때는 verb만 나오게 됨

In [48]:
print( wn.synsets('have') == wn.synsets('having'))

False


In [49]:
have_set = set([ synset for synset in wn.synsets('have') ])
having_set = set([ synset for synset in wn.synsets('having') ])
print(len(have_set), len(having_set))
# 'having'의 경우 pos가 v인 경우에 대해서만 출력
print( set([x for x in have_set if x.pos() == 'v' ]) == having_set)
print(have_set - having_set)

20 19
True
{Synset('rich_person.n.01')}


- 결과는 같지만, word를 lemmatize(기본형 도출)하여 넣을 경우 기본형을 넣었을 때와 같은 결과가 나옴

In [50]:
from nltk.stem.wordnet import WordNetLemmatizer
print( WordNetLemmatizer().lemmatize('having', 'v') )
print( wn.synsets(WordNetLemmatizer().lemmatize('having', 'v')) == wn.synsets('have') )

have
True


# all synsets

In [51]:
all_synset = [synset for synset in wn.all_synsets()]
print(len(all_synset))

117659


- n: noun
- v: verb
- a: adjective
- s: adjective satellite
- r: adverb

In [52]:
all_pos = set([synset.pos() for synset in all_synset])
all_pos

{'a', 'n', 'r', 's', 'v'}

- each pos count

In [53]:
pos_dict = {pos:[] for pos in all_pos}
for synset in all_synset:
    pos_dict[synset.pos()].append(synset)
for key in pos_dict.keys():
    print(key, ":", len(pos_dict[key]))

a : 7463
v : 13767
r : 3621
s : 10693
n : 82115


### what is adjectiv satellite?
- 정확히는 알 수 없지만, antonym이 있는 형용사와 antonym이 없는 형용사를 구분하는 것으로 보임
- 해당 온톨로지에서 adjective의 경우 반드시, antonym과 함께 triplet으로 구성되어야 하는데, triplet으로 구성될 수 없을 경우를 adjective satellite로 구분해서 표시하는 것으로 보임

In [54]:
for i, synset in enumerate(filter(lambda x: True if x.pos()=='s' else False, all_synset)):
    print(synset)
    print(synset.lemmas()[0].antonyms())
    if i>3:
        break
for i, synset in enumerate(filter(lambda x: True if x.pos()=='a' else False, all_synset)):
    print(synset)
    print(synset.lemmas()[0].antonyms())
    if i>3:
        break

Synset('emergent.s.02')
[]
Synset('dissilient.s.01')
[]
Synset('parturient.s.02')
[]
Synset('moribund.s.02')
[]
Synset('last.s.05')
[]
Synset('able.a.01')
[Lemma('unable.a.01.unable')]
Synset('unable.a.01')
[Lemma('able.a.01.able')]
Synset('abaxial.a.01')
[Lemma('adaxial.a.01.adaxial')]
Synset('adaxial.a.01')
[Lemma('abaxial.a.01.abaxial')]
Synset('acroscopic.a.01')
[Lemma('basiscopic.a.01.basiscopic')]


# synonyms
- 의미가 같은 단어들을 찾아보자
- wordnet에서 다른 관계를 찾을 때는 synset을 기본으로 하지만, synonym의 경우는 lemma를 통해 찾음

In [55]:
def same_meaning_word(synset):
    print( synset.name() )
    print( 'definition:', synset.definition() )
    print('synonyms:', [lemma.name() for lemma in synset.lemmas()] )
same_meaning_word(wn.synset('car.n.01'))

car.n.01
definition: a motor vehicle with four wheels; usually propelled by an internal combustion engine
synonyms: ['car', 'auto', 'automobile', 'machine', 'motorcar']


# similarity
- `wn.synset('dog.n.01').lemma_names()` 말고, similarity를 통해서 계산하는 것도 가능 
- 몇 가지 similarity를 계산하는 방법이 있는데 자세히는 알기 귀찮아서 패스함ㅋㅋㅋㅋ
    - path_similarity
    - wup_similarity
    - lch_similarity

In [56]:
def similar_words_between(w1, w2, simlar_func, sim_threshold, max_threshold):
    all_combinations = [(synset1, synset2) for synset1 in wn.synsets(w1) for synset2 in wn.synsets(w2)
                    if synset1.pos() == synset2.pos()]
    for comb in all_combinations:
        syn1, syn2 = comb[0], comb[1]
        sim = simlar_func(syn1, syn2)
        if sim is not None:
            if sim > sim_threshold and sim<=max_threshold:
                print("{}, {}:, {}".format(syn1, syn2, simlar_func(syn1, syn2)))
                print(syn1, syn1.definition())
                print(syn2, syn2.definition())
                print()

### wn.path_similarity
- return a core denoting how similar two word senses are, 
    - based on the shortest path that connects the senses in the is-a (hypernym/hypnoym) taxonomy. 
- The score is in the range 0 to 1. 
- By default, there is now a fake root node added to verbs so for cases where previously a path could not be found---and None was returned---it should return a value. The old behavior can be achieved by setting simulate_root to be False. A score of 1 represents identity i.e. comparing a sense with itself will return 1.

In [57]:
similar_words_between("have", "take", wn.path_similarity, 0.4, 1.0)

Synset('have.v.02'), Synset('carry.v.02'):, 0.5
Synset('have.v.02') have as a feature
Synset('carry.v.02') have with oneself; have on one's person

Synset('experience.v.03'), Synset('take.v.15'):, 0.5
Synset('experience.v.03') go through (mental or physical states or experiences)
Synset('take.v.15') experience or feel or submit to

Synset('consume.v.02'), Synset('consume.v.02'):, 1.0
Synset('consume.v.02') serve oneself to, or consume regularly
Synset('consume.v.02') serve oneself to, or consume regularly

Synset('accept.v.02'), Synset('accept.v.02'):, 1.0
Synset('accept.v.02') receive willingly something given or offered
Synset('accept.v.02') receive willingly something given or offered

Synset('accept.v.02'), Synset('accept.v.05'):, 0.5
Synset('accept.v.02') receive willingly something given or offered
Synset('accept.v.05') admit into a group or community

Synset('take.v.35'), Synset('take.v.35'):, 1.0
Synset('take.v.35') have sex with; archaic use
Synset('take.v.35') have sex with; 

### wn.wup_similarity
- Wu-Palmer Similarity: Return a score denoting how similar two word senses are, 
    - based on the depth of the two senses in the taxonomy and that of their Least Common Subsumer (most specific ancestor node). 
- Note that at this time the scores given do _not_ always agree with those given by Pedersen's Perl implementation of Wordnet Similarity.

In [58]:
print( wn.synsets('car')[0].lemma_names())
similar_words_between("car", "auto", wn.wup_similarity, 0.6, 1.0)

['car', 'auto', 'automobile', 'machine', 'motorcar']
Synset('car.n.01'), Synset('car.n.01'):, 1.0
Synset('car.n.01') a motor vehicle with four wheels; usually propelled by an internal combustion engine
Synset('car.n.01') a motor vehicle with four wheels; usually propelled by an internal combustion engine

Synset('car.n.02'), Synset('car.n.01'):, 0.7272727272727273
Synset('car.n.02') a wheeled vehicle adapted to the rails of railroad
Synset('car.n.01') a motor vehicle with four wheels; usually propelled by an internal combustion engine



In [59]:
similar_words_between("have", "take", wn.wup_similarity, 0.6, 1.0)

Synset('experience.v.03'), Synset('take.v.15'):, 0.8571428571428571
Synset('experience.v.03') go through (mental or physical states or experiences)
Synset('take.v.15') experience or feel or submit to

Synset('experience.v.03'), Synset('take.v.19'):, 0.6666666666666666
Synset('experience.v.03') go through (mental or physical states or experiences)
Synset('take.v.19') accept or undergo, often unwillingly

Synset('accept.v.02'), Synset('accept.v.02'):, 1.0
Synset('accept.v.02') receive willingly something given or offered
Synset('accept.v.02') receive willingly something given or offered

Synset('accept.v.02'), Synset('accept.v.05'):, 0.8
Synset('accept.v.02') receive willingly something given or offered
Synset('accept.v.05') admit into a group or community

Synset('suffer.v.02'), Synset('take.v.15'):, 0.75
Synset('suffer.v.02') undergo (as of injuries and illnesses)
Synset('take.v.15') experience or feel or submit to

Synset('take.v.35'), Synset('take.v.35'):, 1.0
Synset('take.v.35') hav

### wn.lch_similarity
- Return a score denoting how similar two word senses are, 
    - based on the shortest path that connects the senses (as above) and the maximum depth of the taxonomy in which the senses occur. 
- The relationship is given as -log(p/2d) where p is the shortest path length and d the taxonomy depth.

In [60]:
similar_words_between("have", "take", wn.lch_similarity, 2.5, 3.258096538021482)

Synset('have.v.02'), Synset('carry.v.02'):, 2.5649493574615367
Synset('have.v.02') have as a feature
Synset('carry.v.02') have with oneself; have on one's person

Synset('experience.v.03'), Synset('take.v.15'):, 2.5649493574615367
Synset('experience.v.03') go through (mental or physical states or experiences)
Synset('take.v.15') experience or feel or submit to

Synset('consume.v.02'), Synset('consume.v.02'):, 3.258096538021482
Synset('consume.v.02') serve oneself to, or consume regularly
Synset('consume.v.02') serve oneself to, or consume regularly

Synset('accept.v.02'), Synset('accept.v.02'):, 3.258096538021482
Synset('accept.v.02') receive willingly something given or offered
Synset('accept.v.02') receive willingly something given or offered

Synset('accept.v.02'), Synset('accept.v.05'):, 2.5649493574615367
Synset('accept.v.02') receive willingly something given or offered
Synset('accept.v.05') admit into a group or community

Synset('take.v.35'), Synset('take.v.35'):, 3.25809653802

# hypernyms 
- 해당 단어의 상위 용어를 찾아보자

In [61]:
print( wn.synset('dog.n.01').hypernyms())
print( wn.synset('car.n.01').hypernyms())

[Synset('canine.n.02'), Synset('domestic_animal.n.01')]
[Synset('motor_vehicle.n.01')]


In [62]:
print( wn.synsets('consume') )
print( wn.synset('devour.v.03').hypernyms() )

[Synset('devour.v.03'), Synset('consume.v.02'), Synset('consume.v.03'), Synset('consume.v.04'), Synset('consume.v.05'), Synset('consume.v.06')]
[Synset('eat.v.01')]


- 단지 한 단계위의 상위용어뿐만 아니라, 최상위 용어까지 검색해보자
    - 상위용어가 단 한 개만 있는 것이 아니므로 여러 path가 나올 수 있음

In [63]:
for i, hypernym_path in enumerate(wn.synset('car.n.01').hypernym_paths()):
    print( ordinal_num(i+1), 'path')
    print( hypernym_path )
    print()

1st path
[Synset('entity.n.01'), Synset('physical_entity.n.01'), Synset('object.n.01'), Synset('whole.n.02'), Synset('artifact.n.01'), Synset('instrumentality.n.03'), Synset('container.n.01'), Synset('wheeled_vehicle.n.01'), Synset('self-propelled_vehicle.n.01'), Synset('motor_vehicle.n.01'), Synset('car.n.01')]

2nd path
[Synset('entity.n.01'), Synset('physical_entity.n.01'), Synset('object.n.01'), Synset('whole.n.02'), Synset('artifact.n.01'), Synset('instrumentality.n.03'), Synset('conveyance.n.03'), Synset('vehicle.n.01'), Synset('wheeled_vehicle.n.01'), Synset('self-propelled_vehicle.n.01'), Synset('motor_vehicle.n.01'), Synset('car.n.01')]



# hyponyms
- 해당 단어의 하위 용어를 찾아보자

In [64]:
print( wn.synset('dog.n.01').hyponyms())

[Synset('basenji.n.01'), Synset('corgi.n.01'), Synset('cur.n.01'), Synset('dalmatian.n.02'), Synset('great_pyrenees.n.01'), Synset('griffon.n.02'), Synset('hunting_dog.n.01'), Synset('lapdog.n.01'), Synset('leonberg.n.01'), Synset('mexican_hairless.n.01'), Synset('newfoundland.n.01'), Synset('pooch.n.01'), Synset('poodle.n.01'), Synset('pug.n.01'), Synset('puppy.n.01'), Synset('spitz.n.01'), Synset('toy_dog.n.01'), Synset('working_dog.n.01')]


In [65]:
print( wn.synset('car.n.01').hyponyms())

[Synset('ambulance.n.01'), Synset('beach_wagon.n.01'), Synset('bus.n.04'), Synset('cab.n.03'), Synset('compact.n.03'), Synset('convertible.n.01'), Synset('coupe.n.01'), Synset('cruiser.n.01'), Synset('electric.n.01'), Synset('gas_guzzler.n.01'), Synset('hardtop.n.01'), Synset('hatchback.n.01'), Synset('horseless_carriage.n.01'), Synset('hot_rod.n.01'), Synset('jeep.n.01'), Synset('limousine.n.01'), Synset('loaner.n.02'), Synset('minicar.n.01'), Synset('minivan.n.01'), Synset('model_t.n.01'), Synset('pace_car.n.01'), Synset('racer.n.02'), Synset('roadster.n.01'), Synset('sedan.n.01'), Synset('sport_utility.n.01'), Synset('sports_car.n.01'), Synset('stanley_steamer.n.01'), Synset('stock_car.n.01'), Synset('subcompact.n.01'), Synset('touring_car.n.01'), Synset('used-car.n.01')]


In [66]:
print( wn.synsets('eat') )
eat = wn.synsets('eat')[0]
print( eat, ":", eat.definition())

[Synset('eat.v.01'), Synset('eat.v.02'), Synset('feed.v.06'), Synset('eat.v.04'), Synset('consume.v.05'), Synset('corrode.v.01')]
Synset('eat.v.01') : take in solid food


In [67]:
print( wn.synsets('eat')[0].hyponyms())

[Synset('devour.v.03'), Synset('devour.v.04'), Synset('dunk.v.03'), Synset('eat_up.v.01'), Synset('fare.v.02'), Synset('fill_up.v.04'), Synset('garbage_down.v.01'), Synset('gluttonize.v.01'), Synset('gobble.v.01'), Synset('nibble.v.03'), Synset('peck.v.02'), Synset('pick_at.v.02'), Synset('pitch_in.v.01'), Synset('ruminate.v.01'), Synset('slurp.v.01'), Synset('wash_down.v.01'), Synset('wolf.v.01')]


- 그런데 단지 바로 밑의 하위용어만 찾는 것이 아니라, hypernym_path 처럼 밑에 있는 단어를 쭉 파고 들어가고 싶다
    - 없길래 만들어봄

In [68]:
def hyponym_paths(synset, depth):
    cur_level = [[synset]]
    for i in range(0, depth-1):
        next_level =[]
        for cur in cur_level:
            if cur[-1].hyponyms()!=[]:
                for hypo in cur[-1].hyponyms():
                    next_level.append( cur+[hypo] )
            else:
                next_level.append(cur)
        cur_level = next_level.copy()
    for cur in sorted(cur_level, key=lambda x: len(x)):
        print(cur)
hyponym_paths(wn.synset('car.n.01'), 3)

[Synset('car.n.01'), Synset('bus.n.04')]
[Synset('car.n.01'), Synset('compact.n.03')]
[Synset('car.n.01'), Synset('convertible.n.01')]
[Synset('car.n.01'), Synset('coupe.n.01')]
[Synset('car.n.01'), Synset('electric.n.01')]
[Synset('car.n.01'), Synset('gas_guzzler.n.01')]
[Synset('car.n.01'), Synset('hardtop.n.01')]
[Synset('car.n.01'), Synset('hatchback.n.01')]
[Synset('car.n.01'), Synset('horseless_carriage.n.01')]
[Synset('car.n.01'), Synset('hot_rod.n.01')]
[Synset('car.n.01'), Synset('jeep.n.01')]
[Synset('car.n.01'), Synset('loaner.n.02')]
[Synset('car.n.01'), Synset('minivan.n.01')]
[Synset('car.n.01'), Synset('model_t.n.01')]
[Synset('car.n.01'), Synset('pace_car.n.01')]
[Synset('car.n.01'), Synset('roadster.n.01')]
[Synset('car.n.01'), Synset('sport_utility.n.01')]
[Synset('car.n.01'), Synset('sports_car.n.01')]
[Synset('car.n.01'), Synset('stanley_steamer.n.01')]
[Synset('car.n.01'), Synset('stock_car.n.01')]
[Synset('car.n.01'), Synset('subcompact.n.01')]
[Synset('car.n.01')

In [69]:
hyponym_paths(wn.synset('eat.v.01'), 3)

[Synset('eat.v.01'), Synset('devour.v.03')]
[Synset('eat.v.01'), Synset('devour.v.04')]
[Synset('eat.v.01'), Synset('dunk.v.03')]
[Synset('eat.v.01'), Synset('fare.v.02')]
[Synset('eat.v.01'), Synset('fill_up.v.04')]
[Synset('eat.v.01'), Synset('garbage_down.v.01')]
[Synset('eat.v.01'), Synset('gluttonize.v.01')]
[Synset('eat.v.01'), Synset('gobble.v.01')]
[Synset('eat.v.01'), Synset('nibble.v.03')]
[Synset('eat.v.01'), Synset('peck.v.02')]
[Synset('eat.v.01'), Synset('pick_at.v.02')]
[Synset('eat.v.01'), Synset('pitch_in.v.01')]
[Synset('eat.v.01'), Synset('ruminate.v.01')]
[Synset('eat.v.01'), Synset('slurp.v.01')]
[Synset('eat.v.01'), Synset('wash_down.v.01')]
[Synset('eat.v.01'), Synset('wolf.v.01')]
[Synset('eat.v.01'), Synset('eat_up.v.01'), Synset('tuck_in.v.01')]


# antonyms
- antonyms의 경우 lemma를 써야함, 이유는 귀찮....
- some relations are defined by WordNet only over Lemmas

In [70]:
wn.synset('good.a.01').lemmas()[0].antonyms()

[Lemma('bad.a.01.bad')]

In [71]:
wn.synset('beautiful.a.01').lemmas()[0].antonyms()

[Lemma('ugly.a.01.ugly')]

# holonyms
- is a member of
- `wn.synset('human.n.01').member_holonyms()` => human is a member of something

In [72]:
def all_holonyms(word):
    for holonym in wn.synset(word).member_holonyms():
        print(holonym)
        print('definition:', holonym.definition())
all_holonyms('human.n.01')

Synset('genus_homo.n.01')
definition: type genus of the family Hominidae


In [73]:
all_holonyms('dog.n.01')

Synset('canis.n.01')
definition: type genus of the Canidae: domestic and wild dogs; wolves; jackals
Synset('pack.n.06')
definition: a group of hunting animals


In [74]:
all_holonyms('lion.n.01')

Synset('panthera.n.01')
definition: lions; leopards; snow leopards; jaguars; tigers; cheetahs; saber-toothed tigers
Synset('pride.n.04')
definition: a group of lions


# meronyms
- `wn.synset('face.n.01').part_meronyms()` => something is part of face

In [75]:
wn.synset('face.n.01').part_meronyms()

[Synset('beard.n.01'),
 Synset('brow.n.01'),
 Synset('cheek.n.01'),
 Synset('chin.n.01'),
 Synset('eye.n.01'),
 Synset('eyebrow.n.01'),
 Synset('facial.n.01'),
 Synset('facial_muscle.n.01'),
 Synset('facial_vein.n.01'),
 Synset('feature.n.02'),
 Synset('jaw.n.02'),
 Synset('jowl.n.02'),
 Synset('mouth.n.02'),
 Synset('nose.n.01')]

In [76]:
wn.synset('human.n.01').part_meronyms()

[Synset('arm.n.01'),
 Synset('body_hair.n.01'),
 Synset('face.n.01'),
 Synset('foot.n.01'),
 Synset('hand.n.01'),
 Synset('human_body.n.01'),
 Synset('human_head.n.01'),
 Synset('loin.n.02'),
 Synset('mane.n.02')]

# Reference
- https://web.stanford.edu/class/cs124/lec/sem
- https://pythonprogramming.net/wordnet-nltk-tutorial/