# Assignment 3: Dependency Feature Extraction

The goal of this assignment is to extract 3 dependency features for a sentence in the development set of the SEM2012 dataset. You will use SpaCy to extract the features. Please read about the SpaCy’s dependency parser here: https://SpaCy.io/api/dependencyparser. 
For more information on what the different dependency abbreviations mean, please take a look at the following GitHub: github.com/explosion/spaCy/blob/master/spacy/glossary.py

In [1]:
# libraries
import warnings
warnings.filterwarnings("ignore")

import numpy as np
import pandas as pd
import csv

import spacy
from spacy import displacy
nlp = spacy.load("en_core_web_sm")

In [2]:
# Read the contents of the file
df = pd.read_csv('datas/SEM-2012-SharedTask-CD-SCO-dev-simple.v2.txt', delimiter='\t',
                 names=['Chapter', 'Sent_id', 'Token_id', 'Token', 'Gold_Label'])
df.head()

Unnamed: 0,Chapter,Sent_id,Token_id,Token,Gold_Label
0,wisteria01,0,0,1.,O
1,wisteria01,0,1,The,O
2,wisteria01,0,2,Singular,O
3,wisteria01,0,3,Experience,O
4,wisteria01,0,4,of,O


In [3]:
# Selecting a sentence from the dev data
# Chosen sentence is from wisteria02 with id 209
data = df.iloc[9240:9273]
data

Unnamed: 0,Chapter,Sent_id,Token_id,Token,Gold_Label
9240,wisteria02,209,0,I,O
9241,wisteria02,209,1,managed,O
9242,wisteria02,209,2,to,O
9243,wisteria02,209,3,see,O
9244,wisteria02,209,4,him,O
9245,wisteria02,209,5,on,O
9246,wisteria02,209,6,a,O
9247,wisteria02,209,7,plausible,O
9248,wisteria02,209,8,pretext,O
9249,wisteria02,209,9,",",O


In [4]:
# Extracting the sentence from data
words_list = []

for word in data['Token']:
    words_list.append(word)
    
sentence = " ".join(words_list)
print(sentence)

I managed to see him on a plausible pretext , but I seemed to read in his dark , deepset , brooding eyes that he was perfectly aware of my true business .


### The dependency label for each token

In [5]:
#http://mlreference.com/dependency-tree-spacy

doc = nlp(sentence)

# iterating over the tokens and extracting the dependency label for each token
dependency = []
for token in doc:
    print(token.text, token.dep_)
    dependency.append(token.dep_)
    
# assign it to dataframe    
data["Dependency"] = dependency

I nsubj
managed ROOT
to aux
see xcomp
him dobj
on prep
a det
plausible amod
pretext pobj
, punct
but cc
I nsubj
seemed conj
to aux
read xcomp
in prep
his poss
dark nmod
, punct
deepset pobj
, punct
brooding advcl
eyes dobj
that mark
he nsubj
was ccomp
perfectly advmod
aware acomp
of prep
my poss
true amod
business pobj
. punct


In [6]:
displacy.render(doc, style = "dep", jupyter=True)

### The head of each token

In [7]:
# extracting the token object that is the head of the current token in the dependency parse tree
head=[]
for token in doc:
    print(token.text, "-->", token.head.text)
    head.append(token.head.text)
    
data["Head"] = head

I --> managed
managed --> managed
to --> see
see --> managed
him --> see
on --> see
a --> pretext
plausible --> pretext
pretext --> on
, --> managed
but --> managed
I --> seemed
seemed --> managed
to --> read
read --> seemed
in --> read
his --> deepset
dark --> deepset
, --> deepset
deepset --> in
, --> read
brooding --> seemed
eyes --> brooding
that --> was
he --> was
was --> brooding
perfectly --> aware
aware --> was
of --> aware
my --> business
true --> business
business --> of
. --> seemed


### The length of the path connecting each token with the ROOT of the sentence

In [8]:
#for token in doc:
#    print(f"Token: {token.text}, Path Length: {len(list(token.ancestors)) + 1}")

In [9]:
#for token in doc:
#    path_length = sum(1 for _ in token.ancestors)
#    print(token.text, "-->", path_length)

In [10]:
# https://stackoverflow.com/questions/67111226/extract-a-path-of-dependency-relations-from-the-root-to-a-token-spacy
# Initialize a list to store the path lengths
path_lengths = []

# Iterate over the tokens in the sentence
for token in doc:
    # Initialize a variable to store the path length
    path_length = 0
    # Traverse the dependency parse tree to the root
    while token.head != token:
        path_length += 1
        token = token.head
    # Append the path length to the list
    path_lengths.append(path_length)

data["token-ROOT_path"] = path_lengths

In [11]:
data = data.reindex(columns=["Chapter","Sent_id","Token_id","Token","Dependency","Head","token-ROOT_path","Gold_Label"])
data

Unnamed: 0,Chapter,Sent_id,Token_id,Token,Dependency,Head,token-ROOT_path,Gold_Label
9240,wisteria02,209,0,I,nsubj,managed,1,O
9241,wisteria02,209,1,managed,ROOT,managed,0,O
9242,wisteria02,209,2,to,aux,see,2,O
9243,wisteria02,209,3,see,xcomp,managed,1,O
9244,wisteria02,209,4,him,dobj,see,2,O
9245,wisteria02,209,5,on,prep,see,2,O
9246,wisteria02,209,6,a,det,pretext,4,O
9247,wisteria02,209,7,plausible,amod,pretext,4,O
9248,wisteria02,209,8,pretext,pobj,on,3,O
9249,wisteria02,209,9,",",punct,managed,1,O


## THEORETICAL COMPONENT (7 points)
Write a small paragraph suggesting 3 Natural Language Processing tasks, for which you think dependency features could be useful. Elaborate on one of the tasks providing a motivation (i.e., why, in your opinion, dependency features are useful for this task). You can consult the corresponding literature. Include your answer at the bottom of your notebook in a markdown cell.

Dependency features can be effective for many different NLP tasks. They could be especially helpful in syntactic parsing, which entails investigating the sentence's grammatical structure. The accuracy of the parser could be increased by using dependency features to determine the relationships between the words in a sentence, such as which words are the subject and which are the object. Determining the connections between words and phrases in a text can assist to establish the overall sentiment of the content.(1) Sentiment analysis is another task in which dependency characteristics may be helpful.(2) The identification of relationships between entities and occurrences in a text could be used to extract structured information from unstructured text, and dependency features may also be helpful in information extraction.(3)


Syntactic parsing is one task where dependency features could be extremely helpful. Syntactic parsing is the process of examining a sentence's grammatical structure, and it is essential to many NLP applications including text summarization, information extraction, and machine translation. Particularly in dependency parsing, a sentence is represented as a directed acyclic graph where each node represents a word and each edge reflects a grammatical connection between words, such as subject-verb alignment or object-verb alignment.*(4)* Dependency features are helpful in sentiment analysis because they give a deeper knowledge of the connections between words and the grammatical structure of a phrase. By recognizing word connections, such as subject-verb and object-verb, dependency parsing analyzes the grammatical structure of a phrase. Dependency parsing may be used to determine the sentiment of several clauses in a sentence as well as words that affect the sentiment of a phrase, such as negations and expressions.*(5)* Dependency features may also be employed to provide a more complex understanding of the sentiment of a phrase, increasing the accuracy of sentiment analysis. For instance, a word's feeling should be inverted if it is negated. However, the emotion should not be reversed if the negation word "not" is not syntactically connected to the sentiment word. Dependency parsing may be applied to accurately detect instances and enhance sentiment analysis' precision.*(5)* The ability to identify compound sentiments inside a sentence is another example of how dependency features are used in sentiment analysis. Although words in a phrase might represent more than one emotion, the overall emotion of the sentence is the aggregate of all the individual emotions. Dependency parsing may be used to determine the overall sentiment of a phrase as well as the sentiments of various clauses inside it.*(6)* As a result of their ability to gain insight on the relationships between words and the grammatical structure of a phrase, dependency features are effective in sentiment analysis. By recognizing modifying words and compound sentiments, this information may be utilized to increase the precision of sentiment analysis and to better comprehend the sentiment of various clauses within a phrase.<br>


The previous study **(Agarwal et al., 2015)**, mentions that the performance of sentiment analysis is highly dependent on the efficiency of the feature extraction process. The study presents a new feature extraction method that uses the dependency relationship between words to extract features from text. Possible directions for future work, the need to explore more useful dependency relationships to explore concepts is emphasized.*(7)*
In the article **(Jin et al., 2022)** ,they provide a multi-feature hierarchical graph attention sentiment analysis model sased on dependency and co-occurrence graphs. The proposed methodology offers a feasible solution to the issue of incomplete information in sentiment analysis. They combined several features, including features related to place, part of speech, and structure. They said that in subsequent study, they will alleviate the reliance on word segmentation and investigate a more effective co-occurrence graph generation technique.*(8)*


*Reference*
<br>
(1) Miyao, Y., Sætre, R., Sagae, K., Matsuzaki, T., & Tsujii, J. I. (2008, June). Task-oriented evaluation of syntactic parsers and their representations. In Proceedings of ACL-08: HLT (pp. 46-54).<br>
(2) McDonald, R., & Nivre, J. (2011). Analyzing and integrating dependency parsers. Computational Linguistics, 37(1), 197-230.<br>
(3) Attardi, G., & Simi, M. (2014). Dependency parsing techniques for information extraction. Dependency Parsing Techniques for Information Extraction, 9-14.<br>
*(4)* Alonso, M. A., Gómez-Rodríguez, C., & Vilares, J. (2021). On the use of parsing for named entity recognition. Applied sciences, 11(3), 1090.<br>
*(5)* Di Caro, L., & Grella, M. (2013). Sentiment analysis via dependency parsing. Computer Standards & Interfaces, 35(5), 442-453.<br>
*(6)* Meena, A., & Prabhakar, T. V. (2007). Sentence level sentiment analysis in the presence of conjuncts using linguistic analysis. In Advances in Information Retrieval: 29th European Conference on IR Research, ECIR 2007, Rome, Italy, April 2-5, 2007. Proceedings 29 (pp. 573-580). Springer Berlin Heidelberg.<br>
*(7)* Agarwal, B., Poria, S., Mittal, N., Gelbukh, A., & Hussain, A. (2015). Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach. Cognitive Computation, 7, 487-499.<br>
*(8)* Jin, Z., Tao, M., Zhao, X., & Hu, Y. (2022). Social Media Sentiment Analysis Based on Dependency Graph and Co-occurrence Graph. Cognitive Computation, 14(3), 1039-1054.