## Notebook for tagging senses

This notebook is a template for tagging the WordNet `synsets` of different ambiguous words in context.

- Once a `synset` is obtained, we can map that to `synset.lexname()` to identify the supersense.
- If no `synset` seems like a good chance, simply write "other" instead.

In [2]:
from nltk.corpus import wordnet as wn
import pandas as pd
from IPython.display import clear_output
import nltk

In [3]:
nltk.download("punkt")
nltk.download("omw-1.4")
nltk.download('wordnet')

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/ifunanyaokoroma/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     /Users/ifunanyaokoroma/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/ifunanyaokoroma/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [4]:
### Get lexname for each synset, i.e., s.lexname() 
### lexname is = supersense

## Tagging

In [5]:
### Sentences to tag
df_stims = pd.read_csv("data/raw/rawc_sentences.csv")
df_stims.head(5)

Unnamed: 0,word,sentence,context,supersense_IO,comments,Class
0,bat,He saw a fruit bat.,M1_b,animal,,N
1,bat,He saw a furry bat.,M1_a,animal,,N
2,bat,He saw a baseball bat.,M2_b,artifact,,N
3,bat,He saw a wooden bat.,M2_a,artifact,,N
4,act,It was a comedic act.,M2_b,,,N


In [6]:
### Which sentences have already been tagged?
with open("data/output/rawc_tagging.txt", "r") as f:
    num_sentences = len(f.readlines())
print("{x} sentences tagged.".format(x = num_sentences))

60 sentences tagged.


In [None]:
tags = []

for index, row in df_stims[num_sentences:].iterrows():
    print(row['sentence'])
    print("\n")
    pos = ".{x}.".format(x=row['Class'].lower())
    synsets = [i for i in wn.synsets(row['word']) if pos in i.name()]
    
    for s in synsets:
        print("{x}: {y}\n".format(x = s.name(), y = s.definition()))
    
    tag = input("> ")    
    tags.append(tag)
    
    with open("data/output/rawc_tagging.txt", "a") as f:
        f.write("{sentence},{word},{tag}\n".format(sentence = row['sentence'],
                                                word = row['word'],
                                                tag = tag))
    
    clear_output()
    

It was a criminal case.


case.n.01: an occurrence of something

event.n.02: a special set of circumstances

lawsuit.n.01: a comprehensive term for any proceeding in a court of law whereby an individual seeks a legal remedy

case.n.04: the actual state of things

case.n.05: a portable container for carrying several objects

case.n.06: a person requiring professional services

subject.n.06: a person who is subjected to experimental or other observational procedures; someone who is an object of investigation

case.n.08: a problem requiring investigation

case.n.09: a statement of facts and reasons used to support an argument

case.n.10: the quantity contained in a case

case.n.11: nouns or pronouns or adjectives (often marked by inflection) related in some way to other words in a sentence

case.n.12: a specific state of mind that is temporary

character.n.05: a person of a specified kind (usually with many eccentricities)

font.n.01: a specific size and style of type within a type family

In [None]:
df_stims['synset'] = tags