# Individual Assignment- Martín Pucheu

## Helper Functions

In [1]:
import pandas as pd
import functools
import spacy
from pathlib import Path

In [33]:
def build_dataframe(file_path):
    data = []
    with open(file_path, 'r') as file:
        for line in file:
            data.append(line.strip().split('\t'))
    df = pd.DataFrame(data)
    return df

def set_features_values(row,spacy_tokenized,idx):
    
    # Dependency Parser
    row['lemma']   = spacy_tokenized[idx].lemma_
    row['Dependency']   = spacy_tokenized[idx].dep_
    row['Head']         = spacy_tokenized[idx].head.text
    row['token-ROOT_path']    = len(list(spacy_tokenized[idx].ancestors))    

    return row

def add_token_features(df,spacy_tokenized,offset=1):
    """
    """
    i=0
    for idx,row in df.iterrows():
        
        # Both tokenizarions match:
        row = set_features_values(row,spacy_tokenized,i)
        i += 1
        
    return df
    
def add_token_features2(sentence_df,spacy_tokenized,offset=1):
    i=0
    # tokens = [token for token in spacy_tokenized]
    
    sentence_ = pd.DataFrame(columns=sentence_df.columns)
    for idx,row in sentence_df.iterrows():
        # Both tokenizarions match:
        if row['Token'] == spacy_tokenized[i].text:

            row = set_features_values(row,spacy_tokenized,i)
        else:
            
            if row['Token'] == spacy_tokenized[i-offset].text:
                row = set_features_values(row,spacy_tokenized,i-offset)
            elif row['Token'] == spacy_tokenized[i+offset].text:
                row = set_features_values(row,spacy_tokenized,i+offset)
            else:
                #try one position more
                
                if row['Token'] == spacy_tokenized[i-offset+1].text:
                    row = set_features_values(row,spacy_tokenized,i-offset+1)
                elif row['Token'] == spacy_tokenized[i+offset+1].text:
                    row = set_features_values(row,spacy_tokenized,i+offset+1)                

        sentence_ = sentence_.append(row,ignore_index=True)
        i += 1
    
    return sentence_    

def applySentenceGroupBy(sentence_df):

    tokens = []
    for i,row in sentence_df.iterrows():
        tokens.append(row['Token'])
    
    #reconstruct string from original df
    string = ' '.join([token for token in tokens])
    print("Sentence: ",string)
    
    #use spacy to tokenize and extract dependencies, etc.
    spacy_tokenized = nlp(string)
    
    aux_ = [token for token in spacy_tokenized]
    print("Tokenization: ",aux_)
    sentence_df = add_token_features2(sentence_df,spacy_tokenized,offset=1)    
    
    #render and save dependency tree
    spacy.displacy.render(spacy_tokenized, style="dep")    
    
    return sentence_df    


## Practical Component

### Reading the Dataset and selecting a sentence

The selected sentences is:
* That was grotesque enough in the outset , and yet it ended in a desperate attempt at robbery.


In [34]:
nlp = spacy.load("en_core_web_sm")

df = build_dataframe("corpus/SEM-2012-SharedTask-CD-SCO-dev-simple.v2.txt")
df = df.rename(columns={0: 'Chapter', 1: 'Sent_id', 2: 'Token_id', 3: 'Token', 4: 'Gold Label'})

df.insert(len(df.columns)-1, 'lemma', None)
df.insert(len(df.columns)-1, 'Dependency', None)
df.insert(len(df.columns)-1, 'Head', None)
df.insert(len(df.columns)-1, 'token-ROOT_path', None)

sent_df = df[(df['Sent_id']=='12') & (df['Chapter']== 'wisteria01')]


Below, there is the partial dataframe containing those rows corresponding to the sentence. Columns 'Dependency', 'Head', and 'token-ROOT_path' are still empty:

In [35]:
sent_df = sent_df.groupby(['Sent_id']).apply(applySentenceGroupBy)
sent_df


Sentence:  That was grotesque enough in the outset , and yet it ended in a desperate attempt at robbery .
Tokenization:  [That, was, grotesque, enough, in, the, outset, ,, and, yet, it, ended, in, a, desperate, attempt, at, robbery, .]


Unnamed: 0_level_0,Unnamed: 1_level_0,Chapter,Sent_id,Token_id,Token,lemma,Dependency,Head,token-ROOT_path,Gold Label
Sent_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
12,0,wisteria01,12,0,That,that,nsubj,was,2,O
12,1,wisteria01,12,1,was,be,auxpass,grotesque,1,O
12,2,wisteria01,12,2,grotesque,grotesque,ROOT,grotesque,0,O
12,3,wisteria01,12,3,enough,enough,advmod,grotesque,1,O
12,4,wisteria01,12,4,in,in,prep,grotesque,1,O
12,5,wisteria01,12,5,the,the,det,outset,3,O
12,6,wisteria01,12,6,outset,outset,pobj,in,2,O
12,7,wisteria01,12,7,",",",",punct,grotesque,1,O
12,8,wisteria01,12,8,and,and,cc,grotesque,1,O
12,9,wisteria01,12,9,yet,yet,advmod,ended,2,O


## Theoretical Component

Task:


Answer:

One such task is Negation Scope Identification. Dependency-based features are useful for this task because they capture the relationships between the words in a sentence, and can help identify which words are affected by a negation cue[1]. Jiménez-Zafra et. al.[2] already implemented a set of dependency-based features for the negation scope identification, considering the type and direction of the dependency between tokens and their closes negation cue. Another task that could benefit from dependency-based features is Semantic Role Labeling, as they can provide information about the roles that different words play in a sentence. Finally, Named Entity Recognition is another task where dependency-based features could be useful, as they can provide information about the relationships between entities and their context in a sentence. Overall, the use of dependency-based features has the potential to improve the accuracy and effectiveness of these NLP tasks.



[1] Emanuele Lapponi et al. “Uio 2: sequence-labeling negation using dependency features”. In: * SEM 2012: The First Joint Conference on Lexical and Computational Semantics–Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012). 2012, pp. 319–327.

[2] Salud Marıa Jim´enez-Zafra et al. “Detecting negation cues and scopes in Spanish”. In: Proceedings of the Twelfth Language Resources and Evaluation Conference. 2020, pp. 6902–6911.
