WordNet is a database of English lexicons. In WordNet nouns, verbs, adjectives, and adverbs are put into what are called synsets, which represent a distinct concept. These synsets are linked together by semantic relations.

In [30]:
from nltk.corpus import wordnet as wn
wn.synsets('crab')

[Synset('crab.n.01'),
 Synset('crab.n.02'),
 Synset('cancer.n.02'),
 Synset('cancer.n.04'),
 Synset('crab.n.05'),
 Synset('crab_louse.n.01'),
 Synset('crab.n.07'),
 Synset('crab.v.01'),
 Synset('crab.v.02'),
 Synset('crab.v.03'),
 Synset('gripe.v.01')]

In [93]:
synset = wn.synsets('language')[0]
print(synset.definition())
print(synset.examples())
print(synset.lemmas())

hypern = synset
print(hypern)
while hypern.hypernyms():
    hypern = hypern.hypernyms()[0]
    print(hypern)

a systematic means of communicating by the use of sounds or conventional symbols
['he taught foreign languages', 'the language introduced is standard throughout the text', 'the speed with which a program can be executed depends on the language in which it is written']
[Lemma('language.n.01.language'), Lemma('language.n.01.linguistic_communication')]
Synset('language.n.01')
Synset('communication.n.02')
Synset('abstraction.n.06')
Synset('entity.n.01')


The heirarchal structure of synsets are described with hypernyms, and hyponyms. A hypernym is a level up in generalization of a given synset, eventually reaching a least specific concept, and hyponyms are more specific concepts given a synset.

In [51]:
print(synset.hypernyms())
print(synset.hyponyms())
print(synset.member_meronyms())
print(synset.member_holonyms())
print(synset.lemmas()[0].antonyms())

[Synset('communication.n.02')]
[Synset('artificial_language.n.01'), Synset('barrage.n.01'), Synset('dead_language.n.01'), Synset('indigenous_language.n.01'), Synset('lingua_franca.n.01'), Synset('metalanguage.n.01'), Synset('native_language.n.01'), Synset('natural_language.n.01'), Synset('object_language.n.02'), Synset('sign_language.n.01'), Synset('slanguage.n.01'), Synset('source_language.n.01'), Synset('string_of_words.n.01'), Synset('superstrate.n.02'), Synset('usage.n.03'), Synset('words.n.03')]
[]
[]
[]


In [97]:
synset = wn.synsets('travel')
print(synset)
synset = synset[3]
print(synset.definition())
print(synset.examples())
print(synset.lemmas())

hypern = synset
print(hypern)
while hypern.hypernyms():
    hypern = hypern.hypernyms()[0]
    print(hypern)

[Synset('travel.n.01'), Synset('change_of_location.n.01'), Synset('locomotion.n.02'), Synset('travel.v.01'), Synset('travel.v.02'), Synset('travel.v.03'), Synset('travel.v.04'), Synset('travel.v.05'), Synset('travel.v.06')]
change location; move, travel, or proceed, also metaphorically
['How fast does your new car go?', 'We travelled from Rome to Naples by bus', 'The policemen went from door to door looking for the suspect', 'The soldiers moved towards the city in an attempt to take it before night fell', 'news travelled fast']
[Lemma('travel.v.01.travel'), Lemma('travel.v.01.go'), Lemma('travel.v.01.move'), Lemma('travel.v.01.locomote')]
Synset('travel.v.01')


In [61]:
wn.morphy('traveled', wn.VERB)

'travel'

In [66]:
from nltk.wsd import lesk

sun = wn.synset('sun.n.01')
moon = wn.synset('moon.n.01')
print(sun.path_similarity(moon))
print(sun.wup_similarity(moon))
sent = "the sun and the moon"
print(sent)
lesk(sent, 'moon')

0.2
0.75
the sun and the moon


Synset('moon.v.02')

SentiWordNet uses wordnet to determine sentiment from words. This can be used for many things like determining what customers felt about something without an actual rating.

In [90]:
from nltk.corpus import sentiwordnet as swn
from nltk.tokenize import word_tokenize
sentiword = list(swn.senti_synsets('Depression'))
for word in sentiword:
    print(str(word))

sent = "I am not feeling very good today"
print(sent)
for word in word_tokenize(sent):
    print(list(swn.senti_synsets(word))[0])

<depression.n.01: PosScore=0.0 NegScore=0.375>
<depression.n.02: PosScore=0.0 NegScore=0.25>
<natural_depression.n.01: PosScore=0.0 NegScore=0.0>
<depression.n.04: PosScore=0.125 NegScore=0.625>
<depression.n.05: PosScore=0.0 NegScore=0.0>
<low.n.01: PosScore=0.0 NegScore=0.0>
<depressive_disorder.n.01: PosScore=0.0 NegScore=0.125>
<depression.n.08: PosScore=0.0 NegScore=0.0>
<depression.n.09: PosScore=0.0 NegScore=0.0>
<depression.n.10: PosScore=0.0 NegScore=0.25>
I am not feeling very good today
<iodine.n.01: PosScore=0.0 NegScore=0.0>
<americium.n.01: PosScore=0.0 NegScore=0.0>
<not.r.01: PosScore=0.0 NegScore=0.625>
<feeling.n.01: PosScore=0.125 NegScore=0.125>
<very.s.01: PosScore=0.5 NegScore=0.0>
<good.n.01: PosScore=0.5 NegScore=0.0>
<today.n.01: PosScore=0.125 NegScore=0.0>


When it comes to the sentiment score, words with logical implications or denote feeling usually sway from neutral scores (for example very & not, good & bad). This can be useful to calculate an overall sentiment for the sentence, compounding scores into an overall score.

Collocations are expressions where certain words commonly occur together. for example, "pretty sure" or "take your time", due to their use in natural speech, we as people use these phrases often, instead of saying "very sure" most often people say "pretty sure".