## Notebook for tagging senses

This notebook is a template for tagging the WordNet `synsets` of different ambiguous words in context.

- Once a `synset` is obtained, we can map that to `synset.lexname()` to identify the supersense.
- If no `synset` seems like a good chance, simply write "other" instead.

In [1]:
from nltk.corpus import wordnet as wn
import pandas as pd
from IPython.display import clear_output
import nltk

In [2]:
nltk.download("punkt")
nltk.download("omw-1.4")
nltk.download('wordnet')

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/ifunanyaokoroma/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     /Users/ifunanyaokoroma/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/ifunanyaokoroma/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [3]:
### Get lexname for each synset, i.e., s.lexname() 
### lexname is = supersense

## Tagging

In [4]:
### Sentences to tag
df_stims = pd.read_csv("data/raw/rawc_sentences.csv")
df_stims.head(5)

Unnamed: 0,word,sentence,context,supersense_IO,comments,Class
0,bat,He saw a fruit bat.,M1_b,animal,,N
1,bat,He saw a furry bat.,M1_a,animal,,N
2,bat,He saw a baseball bat.,M2_b,artifact,,N
3,bat,He saw a wooden bat.,M2_a,artifact,,N
4,act,It was a comedic act.,M2_b,,,N


In [5]:
### Which sentences have already been tagged?
with open("data/output/rawc_tagging.txt", "r") as f:
    num_sentences = len(f.readlines())
print("{x} sentences tagged.".format(x = num_sentences))

32 sentences tagged.


In [None]:
tags = []

for index, row in df_stims[num_sentences:].iterrows():
    print(row['sentence'])
    print("\n")
    pos = ".{x}.".format(x=row['Class'].lower())
    synsets = [i for i in wn.synsets(row['word']) if pos in i.name()]
    
    for s in synsets:
        print("{x}: {y}\n".format(x = s.name(), y = s.definition()))
    
    tag = input("> ")    
    tags.append(tag)
    
    with open("data/output/rawc_tagging.txt", "a") as f:
        f.write("{sentence},{word},{tag}\n".format(sentence = row['sentence'],
                                                word = row['word'],
                                                tag = tag))
    
    clear_output()
    

She had a mental block.


block.n.01: a solid piece of something (usually having flat rectangular sides)

block.n.02: a rectangular area in a city surrounded by streets and usually containing several buildings

block.n.03: a three-dimensional shape with six square or rectangular sides

block.n.04: a number or quantity of related things dealt with as a unit

block.n.05: housing in a large building that is divided into separate units

block.n.06: (computer science) a sector or group of sectors that function as the smallest data unit permitted

block.n.07: an inability to remember or think of something you normally can do; often caused by emotional tension

pulley.n.01: a simple machine consisting of a wheel with a groove in which a rope can run to change the direction or point of application of a force applied to the rope

engine_block.n.01: a metal casting containing the cylinders and cooling ducts of an engine

blockage.n.02: an obstruction in a pipe or tube

auction_block.n.01: a plat

In [None]:
df_stims['synset'] = tags