<a href="https://colab.research.google.com/github/makdatascience/Global_Terrorism_Analysis/blob/main/Capstone_project_EDA_GTD.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## <b> The Global Terrorism Database (GTD) is an open-source database including information on terrorist attacks around the world from 1970 through 2017. The GTD includes systematic data on domestic as well as international terrorist incidents that have occurred during this time period and now includes more than 180,000 attacks. The database is maintained by researchers at the National Consortium for the Study of Terrorism and Responses to Terrorism (START), headquartered at the University of Maryland.</b>

# <b> Explore and analyze the data to discover key findings pertaining to terrorist activities. </b>


In [11]:
# importing libraries
import pandas as pd
import numpy as np
import plotly.express as px
import matplotlib.pyplot as plt
import seaborn as sns
#to supress warnings
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline

<b>1)  GTD Handbook Link :</b> https://drive.google.com/file/d/1VG7Mo7Zh5D0oNxsX1Iz5b4NIMCCB-Uu4/view?usp=sharing

<b>2) Dataset link : </b> https://drive.google.com/file/d/1Z1tOCLgiOXQjNNBxp7SdVwZ3s1sQM3LD/view?usp=sharing

In [12]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [13]:
# loading dataframe from csv file
df=pd.read_csv("/content/drive/MyDrive/Colab Notebooks/Capstone projects/Capstone Project_1_EDA/Global_Terrorism_Data.csv" ,encoding="ISO-8859-1")
# df=pd.readcsv("")

In [14]:
df.head()

Unnamed: 0,eventid,iyear,imonth,iday,approxdate,extended,resolution,country,country_txt,region,...,addnotes,scite1,scite2,scite3,dbsource,INT_LOG,INT_IDEO,INT_MISC,INT_ANY,related
0,197000000001,1970,7,2,,0,,58,Dominican Republic,2,...,,,,,PGIS,0,0,0,0,
1,197000000002,1970,0,0,,0,,130,Mexico,1,...,,,,,PGIS,0,1,1,1,
2,197001000001,1970,1,0,,0,,160,Philippines,5,...,,,,,PGIS,-9,-9,1,1,
3,197001000002,1970,1,0,,0,,78,Greece,8,...,,,,,PGIS,-9,-9,1,1,
4,197001000003,1970,1,0,,0,,101,Japan,4,...,,,,,PGIS,-9,-9,1,1,


In [15]:
# checking for columns with more than 35% of the values as null
drop_column=(df.columns[(df.isnull().sum()/df.shape[0])>0.35]).drop("nkillter")
len(drop_column), drop_column

(86, Index(['approxdate', 'resolution', 'location', 'summary', 'alternative',
        'alternative_txt', 'attacktype2', 'attacktype2_txt', 'attacktype3',
        'attacktype3_txt', 'targtype2', 'targtype2_txt', 'targsubtype2',
        'targsubtype2_txt', 'corp2', 'target2', 'natlty2', 'natlty2_txt',
        'targtype3', 'targtype3_txt', 'targsubtype3', 'targsubtype3_txt',
        'corp3', 'target3', 'natlty3', 'natlty3_txt', 'gsubname', 'gname2',
        'gsubname2', 'gname3', 'gsubname3', 'motive', 'guncertain2',
        'guncertain3', 'nperps', 'nperpcap', 'claimed', 'claimmode',
        'claimmode_txt', 'claim2', 'claimmode2', 'claimmode2_txt', 'claim3',
        'claimmode3', 'claimmode3_txt', 'compclaim', 'weaptype2',
        'weaptype2_txt', 'weapsubtype2', 'weapsubtype2_txt', 'weaptype3',
        'weaptype3_txt', 'weapsubtype3', 'weapsubtype3_txt', 'weaptype4',
        'weaptype4_txt', 'weapsubtype4', 'weapsubtype4_txt', 'weapdetail',
        'nkillus', 'nwoundus', 'nwoundte', 'p

In [16]:
# dropping the less populated columns
df.drop(columns=drop_column,inplace=True)
df.shape


(181691, 49)

In [17]:
# setting index as event id since it is unique for every event
df=df.set_index("eventid")
df.shape

(181691, 48)

In [18]:
# null values in each column
(df.isnull().sum())
# *100/df.shape[0]).sort_values(ascending=False)
# .sort_values(ascending=False).head(30)

iyear                   0
imonth                  0
iday                    0
extended                0
country                 0
country_txt             0
region                  0
region_txt              0
provstate             421
city                  434
latitude             4556
longitude            4557
specificity             6
vicinity                0
crit1                   0
crit2                   0
crit3                   0
doubtterr               1
multiple                1
success                 0
suicide                 0
attacktype1             0
attacktype1_txt         0
targtype1               0
targtype1_txt           0
targsubtype1        10373
targsubtype1_txt    10373
corp1               42550
target1               636
natlty1              1559
natlty1_txt          1559
gname                   0
guncertain1           380
individual              0
weaptype1               0
weaptype1_txt           0
weapsubtype1        20768
weapsubtype1_txt    20768
nkill       

In [19]:
# df["weapsubtype1_txt"].value_counts()

In [20]:
# df["specificity"].value_counts()

<b>1) "Specificity" has value "5" 4550 times and 6 null values, which have led to 4556 null values in latitude/longitude,

 so we will turn null values to 5

In [21]:
#handling null values
df[["provstate","city","targsubtype1_txt","natlty1_txt","target1","corp1"]]=df[["provstate","city","targsubtype1_txt","natlty1_txt","target1","corp1"]].fillna("Unknown")
df[["weapsubtype1_txt"]]=df[["weapsubtype1_txt"]].fillna("Unknown Weapon Type")
df[["targsubtype1","weapsubtype1","natlty1"]]=df[["targsubtype1","weapsubtype1","natlty1"]].fillna(-1)
df[["specificity"]]=df[["specificity"]].fillna(5)
df[["doubtterr","ishostkid"]]=df[["doubtterr","ishostkid"]].fillna(-9)
df[["multiple","nkill","nkillter","nwound"]]=df[["multiple","nkill","nkillter","nwound"]].fillna(0)

df.drop(columns=["guncertain1"],inplace=True)

In [22]:
(df.isnull().sum())

iyear                  0
imonth                 0
iday                   0
extended               0
country                0
country_txt            0
region                 0
region_txt             0
provstate              0
city                   0
latitude            4556
longitude           4557
specificity            0
vicinity               0
crit1                  0
crit2                  0
crit3                  0
doubtterr              0
multiple               0
success                0
suicide                0
attacktype1            0
attacktype1_txt        0
targtype1              0
targtype1_txt          0
targsubtype1           0
targsubtype1_txt       0
corp1                  0
target1                0
natlty1                0
natlty1_txt            0
gname                  0
individual             0
weaptype1              0
weaptype1_txt          0
weapsubtype1           0
weapsubtype1_txt       0
nkill                  0
nkillter               0
nwound                 0
