## Notebook for tagging senses

This notebook is a template for tagging the WordNet `synsets` of different ambiguous words in context.

- Once a `synset` is obtained, we can map that to `synset.lexname()` to identify the supersense.
- If no `synset` seems like a good chance, simply write "other" instead.

In [1]:
from nltk.corpus import wordnet as wn
import pandas as pd
from IPython.display import clear_output
import nltk



In [2]:
nltk.download("punkt")
nltk.download("omw-1.4")

[nltk_data] Downloading package punkt to /Users/seantrott/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     /Users/seantrott/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


True

In [3]:
### Get lexname for each synset, i.e., s.lexname() 
### lexname is = supersense

## Tagging

In [4]:
### Sentences to tag
df_stims = pd.read_csv("data/raw/rawc_sentences.csv")
df_stims.head(5)

Unnamed: 0,word,sentence,context,supersense_IO,comments,Class
0,bat,He saw a fruit bat.,M1_b,animal,,N
1,bat,He saw a furry bat.,M1_a,animal,,N
2,bat,He saw a baseball bat.,M2_b,artifact,,N
3,bat,He saw a wooden bat.,M2_a,artifact,,N
4,act,It was a comedic act.,M2_b,,,N


In [5]:
### Which sentences have already been tagged?
with open("data/output/rawc_tagging.txt", "r") as f:
    num_sentences = len(f.readlines())
print("{x} sentences tagged.".format(x = num_sentences))

8 sentences tagged.


In [6]:
tags = []

for index, row in df_stims[num_sentences:].iterrows():
    print(row['sentence'])
    print("\n")
    pos = ".{x}.".format(x=row['Class'].lower())
    synsets = [i for i in wn.synsets(row['word']) if pos in i.name()]
    
    for s in synsets:
        print("{x}: {y}\n".format(x = s.name(), y = s.definition()))
    
    tag = input("> ")    
    tags.append(tag)
    
    with open("data/output/rawc_tagging.txt", "a") as f:
        f.write("{sentence},{word},{tag}\n".format(sentence = row['sentence'],
                                                word = row['word'],
                                                tag = tag))
    
    clear_output()
    

It was a hostile atmosphere.


atmosphere.n.01: a particular environment or surrounding influence

standard_atmosphere.n.01: a unit of pressure: the pressure that will support a column of mercury 760 mm high at sea level and 0 degrees centigrade

atmosphere.n.03: the mass of air surrounding the Earth

atmosphere.n.04: the weather or climate at some place

atmosphere.n.05: the envelope of gases surrounding any celestial body

air.n.03: a distinctive but intangible quality surrounding a person or thing



KeyboardInterrupt: Interrupted by user

In [None]:
df_stims['synset'] = tags

In [11]:
s = wn.synset("bat.n.05")
s

Synset('bat.n.05')

In [12]:
s.lexname()

'noun.artifact'