New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WordNet satellite adjectives lookup failure in SentiWordnet #1062
Comments
Any updates on this? Not a huge issue, but would be really convenient if updated to fix this so other people don't have to spend time debugging this as well! |
👍 |
This seems to be fixed: https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/sentiwordnet.py#L75 |
@ionuthulub |
@alvations I'm interested in working on this issue but could you elaborate on what remains to be done? |
Whoop sorry about the misunderstanding. I misunderstood the issue. We can close this issue =) Yes, this issue is fixed at b99f35e
|
The problem
The WordNet corpus makes the distinction between satellite adjectives (POS tag 's') and normal adjectives (POS tag 'a'). SentiWordnet does not make this distinction, and defines all adjective entries with 'a'. This causes a lookup failure.
For reference, this is when using the latest versions of the corpus with NLTK 3.0.2 - WordNet 3.0 with SentiWordnet 3.0.
Bug replication
For example, we are interested in the sentiment of the senses of the word 'amazing'. We look up WordNet
and we get
Suppose we now look up SentiWordNet entries for the sense
amazing#1
, which corresponds to the entrySynset('amazing.s.01')
:We get
We get a similar result if we did the same with
amazing#2
as well.If we look up the word 'amazing' on SentiWordnet (link), we see that there are actually two entries:
Amazing#1: {p:0.5, o:0.25, n:0.25}
Amazing#2: {p:0.875, o:0, n:0.125}
Digging into the SentiWordNet 3.0 corpus file, we confirm the entries in these lines:
Line 13096:
a 02359789 0.5 0.25 astonishing#1 amazing#1 surprising greatly; "she does an amazing amount of work"; "the dog was capable of astonishing tricks"
Line 7023:
a 01282510 0.875 0.125 awing#1 awful#6 awesome#1 awe-inspiring#1 amazing#2 inspiring awe or admiration or wonder; "New York is an amazing city"; "the Grand Canyon is an awe-inspiring sight"; "the awesome complexity of the universe"; "this sea, whose gently awful stirrings seem to speak of some hidden soul beneath"- Melville; "Westminster Hall's awing majesty, so vast, so high, so silent"
Temporary fix
We can fix this by modifying nltk/nltk/corpus/reader/sentiwordnet.py
The code reads in the corpus and stores the entries in a tuple-keyed Dictionary. The tuples are of the form
(pos, offset)
. The issue here is that the POS is indexed to 'a' instead of 's' for the problematic entries.We can change this part between line 75-78:
to
Effect
We now get the following as expected:
The text was updated successfully, but these errors were encountered: