# Data Science and Business Analytics Intern @ The Sparks Foundation

### Author : Gokul krishnan 

#### Topic : Exploratory Data Analysis (EDA) - Terrorism

##### Dataset : globalterrorismdb_0718dist.csv [https://bit.ly/2TK5Xn5]

In [1]:
#importing relevant packages 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")

In [2]:
data = pd.read_csv("globalterrorismdb_0718dist.csv" ,encoding='latin1')
data.head()

Unnamed: 0,eventid,iyear,imonth,iday,approxdate,extended,resolution,country,country_txt,region,...,addnotes,scite1,scite2,scite3,dbsource,INT_LOG,INT_IDEO,INT_MISC,INT_ANY,related
0,197000000001,1970,7,2,,0,,58,Dominican Republic,2,...,,,,,PGIS,0,0,0,0,
1,197000000002,1970,0,0,,0,,130,Mexico,1,...,,,,,PGIS,0,1,1,1,
2,197001000001,1970,1,0,,0,,160,Philippines,5,...,,,,,PGIS,-9,-9,1,1,
3,197001000002,1970,1,0,,0,,78,Greece,8,...,,,,,PGIS,-9,-9,1,1,
4,197001000003,1970,1,0,,0,,101,Japan,4,...,,,,,PGIS,-9,-9,1,1,


In [3]:
data.columns.values

array(['eventid', 'iyear', 'imonth', 'iday', 'approxdate', 'extended',
       'resolution', 'country', 'country_txt', 'region', 'region_txt',
       'provstate', 'city', 'latitude', 'longitude', 'specificity',
       'vicinity', 'location', 'summary', 'crit1', 'crit2', 'crit3',
       'doubtterr', 'alternative', 'alternative_txt', 'multiple',
       'success', 'suicide', 'attacktype1', 'attacktype1_txt',
       'attacktype2', 'attacktype2_txt', 'attacktype3', 'attacktype3_txt',
       'targtype1', 'targtype1_txt', 'targsubtype1', 'targsubtype1_txt',
       'corp1', 'target1', 'natlty1', 'natlty1_txt', 'targtype2',
       'targtype2_txt', 'targsubtype2', 'targsubtype2_txt', 'corp2',
       'target2', 'natlty2', 'natlty2_txt', 'targtype3', 'targtype3_txt',
       'targsubtype3', 'targsubtype3_txt', 'corp3', 'target3', 'natlty3',
       'natlty3_txt', 'gname', 'gsubname', 'gname2', 'gsubname2',
       'gname3', 'gsubname3', 'motive', 'guncertain1', 'guncertain2',
       'guncertain3', 'in

In [4]:
#renaming the colum names for better visualisation 

data.rename(columns={'iyear':'Year','imonth':'Month','iday':"day",'gname':'Group','country_txt':'Country','region_txt':'Region','provstate':'State','city':'City','latitude':'latitude',
    'longitude':'longitude','summary':'summary','attacktype1_txt':'Attacktype','targtype1_txt':'Targettype','weaptype1_txt':'Weapon','nkill':'kill',
     'nwound':'Wound'},inplace=True)

In [5]:
# selecting the relevant columns 
data = data[['Year','Month','day','Country','State','Region','City','latitude','longitude',"Attacktype",'kill',
               'Wound','target1','summary','Group','Targettype','Weapon','motive']]

In [6]:
data.head()

Unnamed: 0,Year,Month,day,Country,State,Region,City,latitude,longitude,Attacktype,kill,Wound,target1,summary,Group,Targettype,Weapon,motive
0,1970,7,2,Dominican Republic,,Central America & Caribbean,Santo Domingo,18.456792,-69.951164,Assassination,1.0,0.0,Julio Guzman,,MANO-D,Private Citizens & Property,Unknown,
1,1970,0,0,Mexico,Federal,North America,Mexico city,19.371887,-99.086624,Hostage Taking (Kidnapping),0.0,0.0,"Nadine Chaval, daughter",,23rd of September Communist League,Government (Diplomatic),Unknown,
2,1970,1,0,Philippines,Tarlac,Southeast Asia,Unknown,15.478598,120.599741,Assassination,1.0,0.0,Employee,,Unknown,Journalists & Media,Unknown,
3,1970,1,0,Greece,Attica,Western Europe,Athens,37.99749,23.762728,Bombing/Explosion,,,U.S. Embassy,,Unknown,Government (Diplomatic),Explosives,
4,1970,1,0,Japan,Fukouka,East Asia,Fukouka,33.580412,130.396361,Facility/Infrastructure Attack,,,U.S. Consulate,,Unknown,Government (Diplomatic),Incendiary,


In [7]:
data.shape

(181691, 18)

In [8]:
# to find null data 
data.isnull().sum()

Year               0
Month              0
day                0
Country            0
State            421
Region             0
City             434
latitude        4556
longitude       4557
Attacktype         0
kill           10313
Wound          16311
target1          636
summary        66129
Group              0
Targettype         0
Weapon             0
motive        131130
dtype: int64

In [9]:
#removing the NULL values with specific values

data['Wound'] = data['Wound'].fillna(0)
data['kill'] = data['kill'].fillna(0)

In [10]:
data['Casualities'] = data['kill'] + data['Wound']

In [11]:
data.describe()

Unnamed: 0,Year,Month,day,latitude,longitude,kill,Wound,Casualities
count,181691.0,181691.0,181691.0,177135.0,177134.0,181691.0,181691.0,181691.0
mean,2002.638997,6.467277,15.505644,23.498343,-458.6957,2.26686,2.883296,5.150156
std,13.25943,3.388303,8.814045,18.569242,204779.0,11.227057,34.309747,40.555416
min,1970.0,0.0,0.0,-53.154613,-86185900.0,0.0,0.0,0.0
25%,1991.0,4.0,8.0,11.510046,4.54564,0.0,0.0,0.0
50%,2009.0,6.0,15.0,31.467463,43.24651,0.0,0.0,1.0
75%,2014.0,9.0,23.0,34.685087,68.71033,2.0,2.0,4.0
max,2017.0,12.0,31.0,74.633553,179.3667,1570.0,8191.0,9574.0


In [12]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 181691 entries, 0 to 181690
Data columns (total 19 columns):
 #   Column       Non-Null Count   Dtype  
---  ------       --------------   -----  
 0   Year         181691 non-null  int64  
 1   Month        181691 non-null  int64  
 2   day          181691 non-null  int64  
 3   Country      181691 non-null  object 
 4   State        181270 non-null  object 
 5   Region       181691 non-null  object 
 6   City         181257 non-null  object 
 7   latitude     177135 non-null  float64
 8   longitude    177134 non-null  float64
 9   Attacktype   181691 non-null  object 
 10  kill         181691 non-null  float64
 11  Wound        181691 non-null  float64
 12  target1      181055 non-null  object 
 13  summary      115562 non-null  object 
 14  Group        181691 non-null  object 
 15  Targettype   181691 non-null  object 
 16  Weapon       181691 non-null  object 
 17  motive       50561 non-null   object 
 18  Casualities  181691 non-

In [None]:
data.to_csv("dataterror.csv")

#The data is cleaned & exported to Power bi for visualization.The conclusion is based on the visualisation.

# Conclusion:

From the above analysis, we can conclude that:

1.Iraq was the country with the most number of attacks.

2.Most number of attacks took place in the year 2014.

3.Majorly Private Citizens and Property are targeted in an attack.

4.Islamic State of Iraq and the Levant (ISIL) killed most number of people.

5.Middle East & North Africa region faced most number of attacks.

6.Bombing/Explosion left most of the people wounded.

7.Majority of deaths in terrorism id due to Armed Assault and Bombing/Explosion

# Thank You For Having A Look At This Notebook!