Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WordNet satellite adjectives lookup failure in SentiWordnet #1062

Closed
lqkhoo opened this issue Jul 31, 2015 · 6 comments
Closed

WordNet satellite adjectives lookup failure in SentiWordnet #1062

lqkhoo opened this issue Jul 31, 2015 · 6 comments

Comments

@lqkhoo
Copy link

lqkhoo commented Jul 31, 2015

The problem

The WordNet corpus makes the distinction between satellite adjectives (POS tag 's') and normal adjectives (POS tag 'a'). SentiWordnet does not make this distinction, and defines all adjective entries with 'a'. This causes a lookup failure.

For reference, this is when using the latest versions of the corpus with NLTK 3.0.2 - WordNet 3.0 with SentiWordnet 3.0.

Bug replication

For example, we are interested in the sentiment of the senses of the word 'amazing'. We look up WordNet

from nltk.corpus import wordnet as wn
synsets = wn.synsets('amazing')
print(synsets)

and we get

[Synset('amaze.v.01'), Synset('perplex.v.01'), Synset('amazing.s.01'), Synset('amazing.s.02')]

Suppose we now look up SentiWordNet entries for the sense amazing#1, which corresponds to the entry Synset('amazing.s.01'):

from nltk.corpus import wordnet as wn
from nltk.corpus import sentiwordnet as swn
synsets = wn.synsets('amazing')
target = synsets[2]
senti_synset = swn.senti_synset(target.name())
print(type(senti_synset))

We get

<type 'NoneType'>

We get a similar result if we did the same with amazing#2 as well.

If we look up the word 'amazing' on SentiWordnet (link), we see that there are actually two entries:

  • Amazing#1: {p:0.5, o:0.25, n:0.25}
  • Amazing#2: {p:0.875, o:0, n:0.125}

Digging into the SentiWordNet 3.0 corpus file, we confirm the entries in these lines:
Line 13096:
a 02359789 0.5 0.25 astonishing#1 amazing#1 surprising greatly; "she does an amazing amount of work"; "the dog was capable of astonishing tricks"

Line 7023:
a 01282510 0.875 0.125 awing#1 awful#6 awesome#1 awe-inspiring#1 amazing#2 inspiring awe or admiration or wonder; "New York is an amazing city"; "the Grand Canyon is an awe-inspiring sight"; "the awesome complexity of the universe"; "this sea, whose gently awful stirrings seem to speak of some hidden soul beneath"- Melville; "Westminster Hall's awing majesty, so vast, so high, so silent"

Temporary fix

We can fix this by modifying nltk/nltk/corpus/reader/sentiwordnet.py

The code reads in the corpus and stores the entries in a tuple-keyed Dictionary. The tuples are of the form (pos, offset). The issue here is that the POS is indexed to 'a' instead of 's' for the problematic entries.

We can change this part between line 75-78:

        else:
            synset = wn.synset(vals[0])
            pos = synset.pos()
            offset = synset.offset()

to

        else:
            synset = wn.synset(vals[0])
            pos = synset.pos()
            if pos == 's': pos = 'a' # Relookup POS s --> a as SentiWordNet 3.0 treats all 's' as 'a'
            offset = synset.offset()

Effect

from nltk.corpus import wordnet as wn
from nltk.corpus import sentiwordnet as swn
synsets = wn.synsets('amazing')
target = synsets[2]
senti_synset = swn.senti_synset(target.name())
print(senti_synset)

We now get the following as expected:

<amazing.s.01: PosScore=0.5 NegScore=0.25>
@lqkhoo lqkhoo changed the title WordNet satellite adjectives lookup failure with SentiWordnet WordNet satellite adjectives lookup failure in SentiWordnet Jul 31, 2015
@eugenet12
Copy link

Any updates on this? Not a huge issue, but would be really convenient if updated to fix this so other people don't have to spend time debugging this as well!

@simoneb
Copy link

simoneb commented Jan 29, 2016

👍

@ionuthulub
Copy link

This seems to be fixed: https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/sentiwordnet.py#L75

@alvations
Copy link
Contributor

alvations commented Jun 9, 2017

@ionuthulub but that's only in sentiwordnet, wordnet doesn't have this.

@ionuthulub
Copy link

@alvations I'm interested in working on this issue but could you elaborate on what remains to be done?

@alvations
Copy link
Contributor

alvations commented Jun 9, 2017

Whoop sorry about the misunderstanding. I misunderstood the issue. We can close this issue =)

Yes, this issue is fixed at b99f35e

>>> from nltk.corpus import wordnet as wn
>>> from nltk.corpus import sentiwordnet as swn
>>> synsets = wn.synsets('amazing')
>>> target = synsets[2]
>>> senti_synset = swn.senti_synset(target.name())
>>> print(type(senti_synset))
<class 'nltk.corpus.reader.sentiwordnet.SentiSynset'>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants