# Named Entity Recognition
This is a notebook dedicated to recognizing parts of speech from the GTD database.

In [1]:
import pandas as pd
import spacy
from spacy import displacy
import warnings
warnings.filterwarnings('ignore')
NER = spacy.load("en_core_web_sm")

In [2]:
df = pd.read_csv('../data/data.csv', encoding="latin-1")
df.head()

Unnamed: 0,eventid,iyear,imonth,iday,approxdate,extended,resolution,country,country_txt,region,...,addnotes,scite1,scite2,scite3,dbsource,INT_LOG,INT_IDEO,INT_MISC,INT_ANY,related
0,197000000001,1970,7,2,,0,,58,Dominican Republic,2,...,,,,,PGIS,0,0,0,0,
1,197000000002,1970,0,0,,0,,130,Mexico,1,...,,,,,PGIS,0,1,1,1,
2,197001000001,1970,1,0,,0,,160,Philippines,5,...,,,,,PGIS,-9,-9,1,1,
3,197001000002,1970,1,0,,0,,78,Greece,8,...,,,,,PGIS,-9,-9,1,1,
4,197001000003,1970,1,0,,0,,101,Japan,4,...,,,,,PGIS,-9,-9,1,1,


# Language data associated with bombings/explosions
We'll take a small subset of data associated with bombings and explosions. The goal is to notice keywords associate with attacks related to bombings/explosions

In [3]:
def get_language_data(dataframe, column_filter):
    df = dataframe[dataframe["attacktype1_txt"] == column_filter]
    my_data = pd.DataFrame(df, columns=['attacktype1_txt', 'addnotes']).dropna()
    return my_data

In [4]:
def sentence_concatination(df):
    sent = ""
    for item in df["addnotes"]:
        sent += item
    return sent

In [5]:
result = get_language_data(df, "Assassination")
result

Unnamed: 0,attacktype1_txt,addnotes
111,Assassination,Conflicting reports attribute the incident to ...
142,Assassination,The police were unsure if this incident was an...
225,Assassination,Huang and Cheng Tzu-tsai were arrested but jum...
319,Assassination,It is believed that a Chicago businessman was ...
339,Assassination,The White mayoral candidate was Hugh Addonizio...
...,...,...
181038,Assassination,"The victims included Shyamkrishna Shrestha, Su..."
181228,Assassination,The victims included Karun Mahanta.
181433,Assassination,Kyaw Lin was also a contributor to the Democra...
181459,Assassination,"The victims included the driver for Albaran, B..."


## Preprocess sentence phrases
We need to tokenize each sentence and split it into words. Part of speech tag each word

In [6]:
sample = result.head(10)
sent = sentence_concatination(sample)

In [7]:
sent

'Conflicting reports attribute the incident to the Armed Commandos of Liberation and the Armed Revolutionary Independence Movement (MIRA).The police were unsure if this incident was an attempted robbery, with the perpetrators too frightened to take the targets money, or if it was an assassination.Huang and Cheng Tzu-tsai were arrested but jumped bail and fled the United States.It is believed that a Chicago businessman was angry at Barr for attempting to introduce reform legislation concerning mental health issues in the Illinois General Assembly.  It is suspected that the businessman contacted Silas Jayne to hire the hit men.The White mayoral candidate was Hugh Addonizio and the Black mayoral candidate was Kenneth Gibson.Allan Daly survived two weeks before finally succumbing to his wounds.  Frank Thurber belonged to the San Francisco Mailers Union.  Thurber offered Richard Wamsley and Larry Rutherford $300 to "rough up" Daly.  Thurber provided them with a weapon and drove the to Daly\

In [14]:
text1 = NER(sent)

In [15]:
displacy.render(text1,style="ent",jupyter=True)

### Extracting people/subjects
Our original dataset has vague information regarding the targets. With named entity recogintion, we can extract specific targets such as locations/people etc. 

In [11]:
def extract_entity(text, column_filter):
    result = []
    for word in text1.ents:
        #print(word.text,word.label_)
        if word.label_ == column_filter:
            result .append(word.text)
    return result

In [12]:
entity = extract_entity(text1, "NORP")

In [13]:
dataframe = {}
dataframe["entity_result"] = entity
my_data = pd.DataFrame(dataframe, columns=['entity_result'])
my_data.head(10)

Unnamed: 0,entity_result
0,Nazi
1,Black Muslim
2,Black Muslims
3,Black Muslim
4,Black Muslim
