## Notebook for tagging senses

This notebook is a template for tagging the WordNet `synsets` of different ambiguous words in context.

- Once a `synset` is obtained, we can map that to `synset.lexname()` to identify the supersense.
- If no `synset` seems like a good chance, simply write "other" instead.

In [1]:
from nltk.corpus import wordnet as wn
import pandas as pd
from IPython.display import clear_output
import nltk

In [2]:
nltk.download("punkt")
nltk.download("omw-1.4")
nltk.download('wordnet')

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/ifunanyaokoroma/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     /Users/ifunanyaokoroma/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/ifunanyaokoroma/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [3]:
### Get lexname for each synset, i.e., s.lexname() 
### lexname is = supersense

## Tagging

In [4]:
### Sentences to tag
df_stims = pd.read_csv("data/raw/rawc_sentences.csv")
df_stims.head(5)

Unnamed: 0,word,sentence,context,supersense_IO,comments,Class
0,bat,He saw a fruit bat.,M1_b,animal,,N
1,bat,He saw a furry bat.,M1_a,animal,,N
2,bat,He saw a baseball bat.,M2_b,artifact,,N
3,bat,He saw a wooden bat.,M2_a,artifact,,N
4,act,It was a comedic act.,M2_b,,,N


In [5]:
### Which sentences have already been tagged?
with open("data/output/rawc_tagging.txt", "r") as f:
    num_sentences = len(f.readlines())
print("{x} sentences tagged.".format(x = num_sentences))

36 sentences tagged.


In [None]:
tags = []

for index, row in df_stims[num_sentences:].iterrows():
    print(row['sentence'])
    print("\n")
    pos = ".{x}.".format(x=row['Class'].lower())
    synsets = [i for i in wn.synsets(row['word']) if pos in i.name()]
    
    for s in synsets:
        print("{x}: {y}\n".format(x = s.name(), y = s.definition()))
    
    tag = input("> ")    
    tags.append(tag)
    
    with open("data/output/rawc_tagging.txt", "a") as f:
        f.write("{sentence},{word},{tag}\n".format(sentence = row['sentence'],
                                                word = row['word'],
                                                tag = tag))
    
    clear_output()
    

They called the debt.


name.v.01: assign a specified (usually proper) proper name to

call.v.02: ascribe a quality to or give a name of a common noun that reflects a quality

call.v.03: get or try to get into communication (with someone) by telephone

shout.v.02: utter a sudden loud cry

call.v.05: order, request, or command to come

visit.v.03: pay a brief visit

call.v.07: call a meeting; invite or command to meet

call.v.08: read aloud to check for omissions or absentees

call.v.09: send a message or attempt to reach someone by radio, phone, etc.; make a signal to in order to transmit a message

call.v.10: utter a characteristic note or cry

call.v.11: stop or postpone because of adverse conditions, such as bad weather

address.v.06: greet, as with a prescribed form, title, or name

call.v.13: make a stop in a harbour

call.v.14: demand payment of (a loan)

bid.v.04: make a demand, as for a card or a suit or a show of hands

call.v.16: give the calls (to the dancers) for a square d

In [None]:
df_stims['synset'] = tags