# Relation Extraction

### Syntactic Dependency Structure
We need the "benepar" python package for this practice.  
Please install this package using `pip install benepar`  

The **Dependent tag** represent the grammatical relationships between words in a sentence. 
Here's a breif list of the tags:

| Term         | Description                                                                                        |
|--------------|----------------------------------------------------------------------------------------------------|
| **compound** | Represents a compound word, where two or more words are combined to form a single meaning.         |
| **nsubjpass**| Passive nominal subject. The subject is the receiver of the action (in passive voice).             |
| **auxpass**  | Passive auxiliary verb, typically forms part of a passive construction (e.g., "was", "is").        |
| **ROOT**     | The root of the sentence. The main verb or the main action in the sentence.                        |
| **prep**     | Prepositional modifier. Links a noun to another word (e.g., "on", "in", "by").                     |
| **nmod**     | Nominal modifier. Modifies a noun (can be a noun, prepositional phrase, etc.).                     |
| **punct**    | Punctuation. Marks the presence of punctuation in the text.                                        |
| **pobj**     | Object of a preposition. The noun or pronoun governed by a preposition.                            |
| **cc**       | Coordinating conjunction. Connects words, phrases, or clauses (e.g., "and", "but").                |
| **conj**     | Conjunct. The second element in a conjunction, typically connected by a coordinating conjunction.  |
| **preconj**  | Preconjunct. A word that appears before a conjunction (e.g., "either" in "either...or").           |
| **dobj**     | Direct object. The noun or pronoun that directly receives the action of the verb.                  |
| **nsubj**    | Nominal subject. The subject of the sentence performing the action.                                |
| **mark**     | Marker. Introduces a subordinate clause (e.g., "that", "if", "because").                           |
| **ccomp**    | Clausal complement. A clause that serves as the complement or object of a verb.                    |
| **amod**     | Adjectival modifier. An adjective that modifies a noun.                                            |


In [1]:
import spacy
from spacy.cli.download import download
download(model="en_core_web_sm")

Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [4]:
import pandas as pd
import spacy
import en_core_web_sm
from spacy import displacy # import visualization tool
from IPython.display import display

# load English language model
nlp = en_core_web_sm.load()
# Set up the visualization options
options = {"compact": True, "bg": "#ffffff",
           "color": "black", "font": "Source Sans Pro", "distance": 100}

TEXT_SAMPLE = ["I recall that you have given Naksatra birthday gifts",
               "Ezhil cooks and distributes homemade foods",
               "Victoria University is located in Melbourne, Australia and it has both bachelor and master students"] 

for text in TEXT_SAMPLE:
    
    # Create a dataframe for an easy-to-see output
    df = pd.DataFrame()
    # Import the text and get nlp object
    doc = nlp(text)

    # Here parse the examples  using displacy.render method
    displacy.render(doc, style="dep", options=options)
    
    # Obtain the head and dependents of each word
    for tok in doc:
        new_row = {
            "Word":tok.text,
            "Dependent tag": tok.dep_,
            "Head":tok.head,
            "Dependents":list(tok.children),
            "Left dependents":list(tok.rights),
            "Right dependents":list(tok.lefts)
        }
        df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True)
    
    # Show table in a readable format
    display(df)

Unnamed: 0,Word,Dependent tag,Head,Dependents,Left dependents,Right dependents
0,I,nsubj,recall,[],[],[]
1,recall,ROOT,recall,"[I, given]",[given],[I]
2,that,mark,given,[],[],[]
3,you,nsubj,given,[],[],[]
4,have,aux,given,[],[],[]
5,given,ccomp,recall,"[that, you, have, gifts]",[gifts],"[that, you, have]"
6,Naksatra,compound,gifts,[],[],[]
7,birthday,compound,gifts,[],[],[]
8,gifts,dobj,given,"[Naksatra, birthday]",[],"[Naksatra, birthday]"


Unnamed: 0,Word,Dependent tag,Head,Dependents,Left dependents,Right dependents
0,Ezhil,compound,cooks,[],[],[]
1,cooks,ROOT,cooks,"[Ezhil, and, distributes]","[and, distributes]",[Ezhil]
2,and,cc,cooks,[],[],[]
3,distributes,conj,cooks,[foods],[foods],[]
4,homemade,amod,foods,[],[],[]
5,foods,dobj,distributes,[homemade],[],[homemade]


Unnamed: 0,Word,Dependent tag,Head,Dependents,Left dependents,Right dependents
0,Victoria,compound,University,[],[],[]
1,University,nsubjpass,located,[Victoria],[],[Victoria]
2,is,auxpass,located,[],[],[]
3,located,ROOT,located,"[University, is, in, has]","[in, has]","[University, is]"
4,in,prep,located,[Australia],[Australia],[]
5,Melbourne,nmod,Australia,"[,]","[,]",[]
6,",",punct,Melbourne,[],[],[]
7,Australia,pobj,in,"[Melbourne, and, it]","[and, it]",[Melbourne]
8,and,cc,Australia,[],[],[]
9,it,conj,Australia,[],[],[]


Reference:
    http://datamine.unc.edu/jupyter/notebooks/Text%20Mining%20Modules/(3)%20Information%20Extraction.ipynb
        