# COGS 108 - Data Checkpoint

# Names

- Tianyu Yu
- Hanjie Zhan
- Shaolong Li
- Fengdi Liu


<a id='research_question'></a>
# Research Question

Is the frequency of terrorist attacks related to countries' GDP/location/government type/CO2 emission? Do terrorist attacks happen more often in certain places like plazas or bathrooms?

# Dataset(s)

### Dataset 1
- Dataset Name: Global Terrorism Database
- Link to the dataset: https://www.kaggle.com/START-UMD/gtd
- Number of observations: 181691

The first dataset records all terrorist attacks from 1970 to 2017. The dataset is huge since it includes a lot of detailed information, like different attack cities, times, locations, weapon types, etc. 

This data has 135 columns, which is obviously too many. There are many empty columns and repeated columns in this data, so we want to clean it and make it as concise as possible. After cleaning, our result dataset contains 30 columns, and we will define some confusing columns below.

Location: where attacks happen in details, like airport.

Summary: a breif summary of what happened in the attack.

Attacktype: the type of the attack,such as kidnapping and armed assault.

Targtype: the target type of the attack like military.

Targsubtype: a more detailed target type like Military Checkpoint.

Corp: the corporation that got attacked.

Groupname: the group name that the terrorists are from.

Weaptype: the kind of weapon that the terrorists used.

### Dataset 2
- Dataset Name: co2-emission-dataset
- Link to the dataset: https://www.kaggle.com/chavansumit/co2emissiondataset
- Number of observations: 63180

The second dataset records CO2 emission from 1750 to 2019 for all countries in the world. 

This data set contains 4 columns which are Entity, Code, Year and Annual CO2 emission. The Entity records all countries' names. The Code column represent the country code. Since Code and Entity coloumn encode same information, we deleted Code column and rename entity column as Country to make it more intuitive. There are 63180 observations which is obviously too much. Since we 
only interested in emission from 1970 - 2017, we delete other years for all countries. After cleaning, our result dataset contains 3 columns and 11232 observations, and we will define some confusing columns below. 

Country: Name of the country which produce the CO2 emission. 

Year: The year which the CO2 emission was produced. 

Annual CO2 emissions: the total amount of CO2 emission by that country in that specific year. The unit is kiloton.  

### Dataset 3
- Dataset Name: Life Expectancy (WHO)
- Link to the dataset: https://www.kaggle.com/kumarajarshi/life-expectancy-who
- Number of observations: 2938

The third dataset records Life Expectancy for all countries from year 2000 to 2015 and other fectors that may affect the life expectancy. 

The most important column in this dataset are Country, Year and Life Expectancy in age. This dataset also contains column showing relevent information that may affect the life expectancy inlcuding country developing status, infant and adult mortality rate and immunization coverage. Since this data is pretty clean already, we didn't delete or change anything. Our dataset contains 22 columns and 32938 observations, and we will define some confusing columns below. 

Country: The name of the country

Year: The year of the life expectancy was measured

Status: Country develop status, either developed or developing

Life expectancy: Life expectancy in that year for that specific country, was measured in years. 

Adult Mortality: Adult Mortality Rates of both man and women. Calculation: number of dying between 15 and 60 years every 1000 population

Infant deaths: Number of infant deaths every 1000 population

Alcohol: Alcohol comsunption recored per capita, the unit is litres of pure alcoho

percentage expenditure: The expenditure on heath of that country for that specific year. This is calculated as the percentage of GDP.

Hepatitis B: The Hepatitis B (HepB) immunization coverage among 1-year-olds(percentage)

Measles: Number of reported cases of Measles every 1000 population

BMI: Avergae body mass of all population

under-five deaths: death rate for people under five every 1000 population

Polio: Pol3 immunization coverage among 1 year olds, percentage

Total expenditure: Gross government expenditure on health per total government expenditure, percentage

Diphtheria: Diphtheria tetanus toxoid and pertussis (DTP3) immunization coverage among 1 year olds, percentage

HIV/AIDS: Deaths every 1000 live births HIV/AIDS, (0-4 years old)

GDP: Gross Domestic Product per capita(USD)

Population: Country population

thinness 1-19 years: Prevalence of thinness of population for Age from 10 to 19, percenatge

thinness 5-9 years: Prevalence of thinness of population for Age from 5 to 9, percenatge

Income composition of resources: Human Development Index in terms of income composition of resources (value ranging from 0 to 1)

Schooling: Number of years of Schooling


# Setup

In [1]:
## YOUR CODE 
%matplotlib inline

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math
import seaborn as sns
sns.set()
sns.set_context('talk')

import warnings
warnings.filterwarnings('ignore')
pd.set_option("display.max_columns", 104)
import patsy
import statsmodels.api as sm
import scipy.stats as stats
from scipy.stats import ttest_ind, chisquare, normaltest

# Data Cleaning

Describe your data cleaning steps here.

## Dataset 1: Global Terrorism Database

### Take a look of our dataset

In [2]:
df = pd.read_csv('globalterrorism.csv')
df

Unnamed: 0,eventid,iyear,imonth,iday,approxdate,extended,resolution,country,country_txt,region,region_txt,provstate,city,latitude,longitude,specificity,vicinity,location,summary,crit1,crit2,crit3,doubtterr,alternative,alternative_txt,multiple,success,suicide,attacktype1,attacktype1_txt,attacktype2,attacktype2_txt,attacktype3,attacktype3_txt,targtype1,targtype1_txt,targsubtype1,targsubtype1_txt,corp1,target1,natlty1,natlty1_txt,targtype2,targtype2_txt,targsubtype2,targsubtype2_txt,corp2,target2,natlty2,natlty2_txt,targtype3,targtype3_txt,...,weapsubtype1,weapsubtype1_txt,weaptype2,weaptype2_txt,weapsubtype2,weapsubtype2_txt,weaptype3,weaptype3_txt,weapsubtype3,weapsubtype3_txt,weaptype4,weaptype4_txt,weapsubtype4,weapsubtype4_txt,weapdetail,nkill,nkillus,nkillter,nwound,nwoundus,nwoundte,property,propextent,propextent_txt,propvalue,propcomment,ishostkid,nhostkid,nhostkidus,nhours,ndays,divert,kidhijcountry,ransom,ransomamt,ransomamtus,ransompaid,ransompaidus,ransomnote,hostkidoutcome,hostkidoutcome_txt,nreleased,addnotes,scite1,scite2,scite3,dbsource,INT_LOG,INT_IDEO,INT_MISC,INT_ANY,related
0,197000000001,1970,7,2,,0,,58,Dominican Republic,2,Central America & Caribbean,,Santo Domingo,18.456792,-69.951164,1.0,0,,,1,1,1,0.0,,,0.0,1,0,1,Assassination,,,,,14,Private Citizens & Property,68.0,Named Civilian,,Julio Guzman,58.0,Dominican Republic,,,,,,,,,,,...,,,,,,,,,,,,,,,,1.0,,,0.0,,,0,,,,,0.0,,,,,,,0.0,,,,,,,,,,,,,PGIS,0,0,0,0,
1,197000000002,1970,0,0,,0,,130,Mexico,1,North America,Federal,Mexico city,19.371887,-99.086624,1.0,0,,,1,1,1,0.0,,,0.0,1,0,6,Hostage Taking (Kidnapping),,,,,7,Government (Diplomatic),45.0,"Diplomatic Personnel (outside of embassy, cons...",Belgian Ambassador Daughter,"Nadine Chaval, daughter",21.0,Belgium,,,,,,,,,,,...,,,,,,,,,,,,,,,,0.0,,,0.0,,,0,,,,,1.0,1.0,0.0,,,,Mexico,1.0,800000.0,,,,,,,,,,,,PGIS,0,1,1,1,
2,197001000001,1970,1,0,,0,,160,Philippines,5,Southeast Asia,Tarlac,Unknown,15.478598,120.599741,4.0,0,,,1,1,1,0.0,,,0.0,1,0,1,Assassination,,,,,10,Journalists & Media,54.0,Radio Journalist/Staff/Facility,Voice of America,Employee,217.0,United States,,,,,,,,,,,...,,,,,,,,,,,,,,,,1.0,,,0.0,,,0,,,,,0.0,,,,,,,0.0,,,,,,,,,,,,,PGIS,-9,-9,1,1,
3,197001000002,1970,1,0,,0,,78,Greece,8,Western Europe,Attica,Athens,37.997490,23.762728,1.0,0,,,1,1,1,0.0,,,0.0,1,0,3,Bombing/Explosion,,,,,7,Government (Diplomatic),46.0,Embassy/Consulate,,U.S. Embassy,217.0,United States,,,,,,,,,,,...,16.0,Unknown Explosive Type,,,,,,,,,,,,,Explosive,,,,,,,1,,,,,0.0,,,,,,,0.0,,,,,,,,,,,,,PGIS,-9,-9,1,1,
4,197001000003,1970,1,0,,0,,101,Japan,4,East Asia,Fukouka,Fukouka,33.580412,130.396361,1.0,0,,,1,1,1,-9.0,,,0.0,1,0,7,Facility/Infrastructure Attack,,,,,7,Government (Diplomatic),46.0,Embassy/Consulate,,U.S. Consulate,217.0,United States,,,,,,,,,,,...,,,,,,,,,,,,,,,Incendiary,,,,,,,1,,,,,0.0,,,,,,,0.0,,,,,,,,,,,,,PGIS,-9,-9,1,1,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
181686,201712310022,2017,12,31,,0,,182,Somalia,11,Sub-Saharan Africa,Middle Shebelle,Ceelka Geelow,2.359673,45.385034,2.0,0,The incident occurred near the town of Balcad.,12/31/2017: Assailants opened fire on a Somali...,1,1,0,1.0,1.0,Insurgency/Guerilla Action,0.0,1,0,2,Armed Assault,,,,,4,Military,36.0,Military Checkpoint,Somali National Army (SNA),Checkpoint,182.0,Somalia,,,,,,,,,,,...,5.0,Unknown Gun Type,,,,,,,,,,,,,,1.0,0.0,0.0,2.0,0.0,0.0,-9,,,,,0.0,,,,,,,,,,,,,,,,,"""Somalia: Al-Shabaab Militants Attack Army Che...","""Highlights: Somalia Daily Media Highlights 2 ...","""Highlights: Somalia Daily Media Highlights 1 ...",START Primary Collection,0,0,0,0,
181687,201712310029,2017,12,31,,0,,200,Syria,10,Middle East & North Africa,Lattakia,Jableh,35.407278,35.942679,1.0,1,The incident occurred at the Humaymim Airport.,12/31/2017: Assailants launched mortars at the...,1,1,0,1.0,1.0,Insurgency/Guerilla Action,0.0,1,0,3,Bombing/Explosion,,,,,4,Military,27.0,Military Barracks/Base/Headquarters/Checkpost,Russian Air Force,Hmeymim Air Base,167.0,Russia,,,,,,,,,,,...,11.0,"Projectile (rockets, mortars, RPGs, etc.)",,,,,,,,,,,,,Mortars were used in the attack.,2.0,0.0,0.0,7.0,0.0,0.0,1,4.0,Unknown,-99.0,Seven military planes were damaged in this att...,0.0,,,,,,,,,,,,,,,,,"""Putin's 'victory' in Syria has turned into a ...","""Two Russian soldiers killed at Hmeymim base i...","""Two Russian servicemen killed in Syria mortar...",START Primary Collection,-9,-9,1,1,
181688,201712310030,2017,12,31,,0,,160,Philippines,5,Southeast Asia,Maguindanao,Kubentog,6.900742,124.437908,2.0,0,The incident occurred in the Datu Hoffer distr...,12/31/2017: Assailants set fire to houses in K...,1,1,1,0.0,,,0.0,1,0,7,Facility/Infrastructure Attack,,,,,14,Private Citizens & Property,76.0,House/Apartment/Residence,Not Applicable,Houses,160.0,Philippines,,,,,,,,,,,...,18.0,Arson/Fire,,,,,,,,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,1,4.0,Unknown,-99.0,Houses were damaged in this attack.,0.0,,,,,,,,,,,,,,,,,"""Maguindanao clashes trap tribe members,"" Phil...",,,START Primary Collection,0,0,0,0,
181689,201712310031,2017,12,31,,0,,92,India,6,South Asia,Manipur,Imphal,24.798346,93.940430,1.0,0,The incident occurred in the Mantripukhri neig...,12/31/2017: Assailants threw a grenade at a Fo...,1,1,1,0.0,,,0.0,0,0,3,Bombing/Explosion,,,,,2,Government (General),21.0,Government Building/Facility/Office,Forest Department Manipur,Office,92.0,India,,,,,,,,,,,...,7.0,Grenade,,,,,,,,,,,,,A thrown grenade was used in the attack.,0.0,0.0,0.0,0.0,0.0,0.0,-9,,,,,0.0,,,,,,,,,,,,,,,,,"""Trader escapes grenade attack in Imphal,"" Bus...",,,START Primary Collection,-9,-9,0,-9,


### check the shape of our dataset

In [3]:
df.shape

(181691, 135)

### Delete the approxdata column, since it only contains NaN

In [4]:
df.drop('approxdate', inplace=True, axis=1)
df.head()

Unnamed: 0,eventid,iyear,imonth,iday,extended,resolution,country,country_txt,region,region_txt,provstate,city,latitude,longitude,specificity,vicinity,location,summary,crit1,crit2,crit3,doubtterr,alternative,alternative_txt,multiple,success,suicide,attacktype1,attacktype1_txt,attacktype2,attacktype2_txt,attacktype3,attacktype3_txt,targtype1,targtype1_txt,targsubtype1,targsubtype1_txt,corp1,target1,natlty1,natlty1_txt,targtype2,targtype2_txt,targsubtype2,targsubtype2_txt,corp2,target2,natlty2,natlty2_txt,targtype3,targtype3_txt,targsubtype3,...,weapsubtype1,weapsubtype1_txt,weaptype2,weaptype2_txt,weapsubtype2,weapsubtype2_txt,weaptype3,weaptype3_txt,weapsubtype3,weapsubtype3_txt,weaptype4,weaptype4_txt,weapsubtype4,weapsubtype4_txt,weapdetail,nkill,nkillus,nkillter,nwound,nwoundus,nwoundte,property,propextent,propextent_txt,propvalue,propcomment,ishostkid,nhostkid,nhostkidus,nhours,ndays,divert,kidhijcountry,ransom,ransomamt,ransomamtus,ransompaid,ransompaidus,ransomnote,hostkidoutcome,hostkidoutcome_txt,nreleased,addnotes,scite1,scite2,scite3,dbsource,INT_LOG,INT_IDEO,INT_MISC,INT_ANY,related
0,197000000001,1970,7,2,0,,58,Dominican Republic,2,Central America & Caribbean,,Santo Domingo,18.456792,-69.951164,1.0,0,,,1,1,1,0.0,,,0.0,1,0,1,Assassination,,,,,14,Private Citizens & Property,68.0,Named Civilian,,Julio Guzman,58.0,Dominican Republic,,,,,,,,,,,,...,,,,,,,,,,,,,,,,1.0,,,0.0,,,0,,,,,0.0,,,,,,,0.0,,,,,,,,,,,,,PGIS,0,0,0,0,
1,197000000002,1970,0,0,0,,130,Mexico,1,North America,Federal,Mexico city,19.371887,-99.086624,1.0,0,,,1,1,1,0.0,,,0.0,1,0,6,Hostage Taking (Kidnapping),,,,,7,Government (Diplomatic),45.0,"Diplomatic Personnel (outside of embassy, cons...",Belgian Ambassador Daughter,"Nadine Chaval, daughter",21.0,Belgium,,,,,,,,,,,,...,,,,,,,,,,,,,,,,0.0,,,0.0,,,0,,,,,1.0,1.0,0.0,,,,Mexico,1.0,800000.0,,,,,,,,,,,,PGIS,0,1,1,1,
2,197001000001,1970,1,0,0,,160,Philippines,5,Southeast Asia,Tarlac,Unknown,15.478598,120.599741,4.0,0,,,1,1,1,0.0,,,0.0,1,0,1,Assassination,,,,,10,Journalists & Media,54.0,Radio Journalist/Staff/Facility,Voice of America,Employee,217.0,United States,,,,,,,,,,,,...,,,,,,,,,,,,,,,,1.0,,,0.0,,,0,,,,,0.0,,,,,,,0.0,,,,,,,,,,,,,PGIS,-9,-9,1,1,
3,197001000002,1970,1,0,0,,78,Greece,8,Western Europe,Attica,Athens,37.99749,23.762728,1.0,0,,,1,1,1,0.0,,,0.0,1,0,3,Bombing/Explosion,,,,,7,Government (Diplomatic),46.0,Embassy/Consulate,,U.S. Embassy,217.0,United States,,,,,,,,,,,,...,16.0,Unknown Explosive Type,,,,,,,,,,,,,Explosive,,,,,,,1,,,,,0.0,,,,,,,0.0,,,,,,,,,,,,,PGIS,-9,-9,1,1,
4,197001000003,1970,1,0,0,,101,Japan,4,East Asia,Fukouka,Fukouka,33.580412,130.396361,1.0,0,,,1,1,1,-9.0,,,0.0,1,0,7,Facility/Infrastructure Attack,,,,,7,Government (Diplomatic),46.0,Embassy/Consulate,,U.S. Consulate,217.0,United States,,,,,,,,,,,,...,,,,,,,,,,,,,,,Incendiary,,,,,,,1,,,,,0.0,,,,,,,0.0,,,,,,,,,,,,,PGIS,-9,-9,1,1,


### Delete more columns that contain all NaN

In [5]:
df.drop(['resolution', 'attacktype2', 'attacktype2_txt', 'attacktype3', 'attacktype3_txt'], inplace=True, axis=1)
df.head()

Unnamed: 0,eventid,iyear,imonth,iday,extended,country,country_txt,region,region_txt,provstate,city,latitude,longitude,specificity,vicinity,location,summary,crit1,crit2,crit3,doubtterr,alternative,alternative_txt,multiple,success,suicide,attacktype1,attacktype1_txt,targtype1,targtype1_txt,targsubtype1,targsubtype1_txt,corp1,target1,natlty1,natlty1_txt,targtype2,targtype2_txt,targsubtype2,targsubtype2_txt,corp2,target2,natlty2,natlty2_txt,targtype3,targtype3_txt,targsubtype3,targsubtype3_txt,corp3,target3,natlty3,natlty3_txt,...,weapsubtype1,weapsubtype1_txt,weaptype2,weaptype2_txt,weapsubtype2,weapsubtype2_txt,weaptype3,weaptype3_txt,weapsubtype3,weapsubtype3_txt,weaptype4,weaptype4_txt,weapsubtype4,weapsubtype4_txt,weapdetail,nkill,nkillus,nkillter,nwound,nwoundus,nwoundte,property,propextent,propextent_txt,propvalue,propcomment,ishostkid,nhostkid,nhostkidus,nhours,ndays,divert,kidhijcountry,ransom,ransomamt,ransomamtus,ransompaid,ransompaidus,ransomnote,hostkidoutcome,hostkidoutcome_txt,nreleased,addnotes,scite1,scite2,scite3,dbsource,INT_LOG,INT_IDEO,INT_MISC,INT_ANY,related
0,197000000001,1970,7,2,0,58,Dominican Republic,2,Central America & Caribbean,,Santo Domingo,18.456792,-69.951164,1.0,0,,,1,1,1,0.0,,,0.0,1,0,1,Assassination,14,Private Citizens & Property,68.0,Named Civilian,,Julio Guzman,58.0,Dominican Republic,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,1.0,,,0.0,,,0,,,,,0.0,,,,,,,0.0,,,,,,,,,,,,,PGIS,0,0,0,0,
1,197000000002,1970,0,0,0,130,Mexico,1,North America,Federal,Mexico city,19.371887,-99.086624,1.0,0,,,1,1,1,0.0,,,0.0,1,0,6,Hostage Taking (Kidnapping),7,Government (Diplomatic),45.0,"Diplomatic Personnel (outside of embassy, cons...",Belgian Ambassador Daughter,"Nadine Chaval, daughter",21.0,Belgium,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,0.0,,,0.0,,,0,,,,,1.0,1.0,0.0,,,,Mexico,1.0,800000.0,,,,,,,,,,,,PGIS,0,1,1,1,
2,197001000001,1970,1,0,0,160,Philippines,5,Southeast Asia,Tarlac,Unknown,15.478598,120.599741,4.0,0,,,1,1,1,0.0,,,0.0,1,0,1,Assassination,10,Journalists & Media,54.0,Radio Journalist/Staff/Facility,Voice of America,Employee,217.0,United States,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,1.0,,,0.0,,,0,,,,,0.0,,,,,,,0.0,,,,,,,,,,,,,PGIS,-9,-9,1,1,
3,197001000002,1970,1,0,0,78,Greece,8,Western Europe,Attica,Athens,37.99749,23.762728,1.0,0,,,1,1,1,0.0,,,0.0,1,0,3,Bombing/Explosion,7,Government (Diplomatic),46.0,Embassy/Consulate,,U.S. Embassy,217.0,United States,,,,,,,,,,,,,,,,,...,16.0,Unknown Explosive Type,,,,,,,,,,,,,Explosive,,,,,,,1,,,,,0.0,,,,,,,0.0,,,,,,,,,,,,,PGIS,-9,-9,1,1,
4,197001000003,1970,1,0,0,101,Japan,4,East Asia,Fukouka,Fukouka,33.580412,130.396361,1.0,0,,,1,1,1,-9.0,,,0.0,1,0,7,Facility/Infrastructure Attack,7,Government (Diplomatic),46.0,Embassy/Consulate,,U.S. Consulate,217.0,United States,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,Incendiary,,,,,,,1,,,,,0.0,,,,,,,0.0,,,,,,,,,,,,,PGIS,-9,-9,1,1,


### Get every column names in the dataset as a list

In [6]:
## get every column names in the dataset as a list
columns = df.keys()
print(list(columns))

['eventid', 'iyear', 'imonth', 'iday', 'extended', 'country', 'country_txt', 'region', 'region_txt', 'provstate', 'city', 'latitude', 'longitude', 'specificity', 'vicinity', 'location', 'summary', 'crit1', 'crit2', 'crit3', 'doubtterr', 'alternative', 'alternative_txt', 'multiple', 'success', 'suicide', 'attacktype1', 'attacktype1_txt', 'targtype1', 'targtype1_txt', 'targsubtype1', 'targsubtype1_txt', 'corp1', 'target1', 'natlty1', 'natlty1_txt', 'targtype2', 'targtype2_txt', 'targsubtype2', 'targsubtype2_txt', 'corp2', 'target2', 'natlty2', 'natlty2_txt', 'targtype3', 'targtype3_txt', 'targsubtype3', 'targsubtype3_txt', 'corp3', 'target3', 'natlty3', 'natlty3_txt', 'gname', 'gsubname', 'gname2', 'gsubname2', 'gname3', 'gsubname3', 'motive', 'guncertain1', 'guncertain2', 'guncertain3', 'individual', 'nperps', 'nperpcap', 'claimed', 'claimmode', 'claimmode_txt', 'claim2', 'claimmode2', 'claimmode2_txt', 'claim3', 'claimmode3', 'claimmode3_txt', 'compclaim', 'weaptype1', 'weaptype1_txt',

### Check the shape of the dataset before further column drops

In [7]:
df.shape

(181691, 129)

### Drop all columns that we are not interested in

In [8]:
df = df.drop(labels=['eventid', 'country', 'specificity', 'attacktype1', 'natlty1', 'targtype1', 'targsubtype1', 'region', 'targtype2', 'targtype2_txt', 'targsubtype2', 'targsubtype2_txt', 'corp2', 'target2', 'natlty2', 'natlty2_txt', 'targtype3', 'targtype3_txt', 'targsubtype3', 'targsubtype3_txt', 'corp3', 'target3', 'natlty3', 'natlty3_txt', 'gsubname', 'gname2', 'gsubname2', 'gname3', 'gsubname3', 'weaptype2', 'weaptype2_txt', 'weapsubtype2', 'weapsubtype2_txt', 'weaptype3', 'weaptype3_txt', 'weapsubtype3', 'weapsubtype3_txt', 'weaptype4', 'weaptype4_txt', 'extended', 'vicinity', 'crit1', 'crit2', 'crit3', 'doubtterr', 'alternative', 'alternative_txt', 'multiple', 'guncertain1', 'guncertain2', 'guncertain3', 'individual', 'nperps', 'nperpcap', 'claimed', 'claimmode', 'claimmode_txt', 'claim2', 'claimmode2', 'claimmode2_txt', 'claim3', 'claimmode3', 'claimmode3_txt', 'compclaim', 'weaptype1', 'weapsubtype1', 'weapsubtype1_txt', 'weaptype2', 'weaptype2_txt', 'weapsubtype2', 'weapsubtype2_txt', 'weaptype3', 'weaptype3_txt', 'weapsubtype3', 'weapsubtype3_txt', 'weaptype4', 'weaptype4_txt', 'weapsubtype4', 'weapsubtype4_txt', 'weapdetail', 'nkillus', 'nkillter', 'nwoundus', 'nwoundte', 'property', 'propextent', 'propextent_txt', 'propvalue', 'propcomment', 'ishostkid', 'nhostkid', 'nhostkidus', 'nhours', 'ndays', 'divert', 'kidhijcountry', 'ransom', 'ransomamt', 'ransomamtus', 'ransompaid', 'ransompaidus', 'ransomnote', 'hostkidoutcome', 'hostkidoutcome_txt', 'nreleased', 'addnotes', 'scite1', 'scite2', 'scite3', 'dbsource', 'INT_LOG', 'INT_IDEO', 'INT_MISC', 'INT_ANY', 'related'], axis=1)
df

Unnamed: 0,iyear,imonth,iday,country_txt,region_txt,provstate,city,latitude,longitude,location,summary,success,suicide,attacktype1_txt,targtype1_txt,targsubtype1_txt,corp1,target1,natlty1_txt,gname,motive,weaptype1_txt,nkill,nwound
0,1970,7,2,Dominican Republic,Central America & Caribbean,,Santo Domingo,18.456792,-69.951164,,,1,0,Assassination,Private Citizens & Property,Named Civilian,,Julio Guzman,Dominican Republic,MANO-D,,Unknown,1.0,0.0
1,1970,0,0,Mexico,North America,Federal,Mexico city,19.371887,-99.086624,,,1,0,Hostage Taking (Kidnapping),Government (Diplomatic),"Diplomatic Personnel (outside of embassy, cons...",Belgian Ambassador Daughter,"Nadine Chaval, daughter",Belgium,23rd of September Communist League,,Unknown,0.0,0.0
2,1970,1,0,Philippines,Southeast Asia,Tarlac,Unknown,15.478598,120.599741,,,1,0,Assassination,Journalists & Media,Radio Journalist/Staff/Facility,Voice of America,Employee,United States,Unknown,,Unknown,1.0,0.0
3,1970,1,0,Greece,Western Europe,Attica,Athens,37.997490,23.762728,,,1,0,Bombing/Explosion,Government (Diplomatic),Embassy/Consulate,,U.S. Embassy,United States,Unknown,,Explosives,,
4,1970,1,0,Japan,East Asia,Fukouka,Fukouka,33.580412,130.396361,,,1,0,Facility/Infrastructure Attack,Government (Diplomatic),Embassy/Consulate,,U.S. Consulate,United States,Unknown,,Incendiary,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
181686,2017,12,31,Somalia,Sub-Saharan Africa,Middle Shebelle,Ceelka Geelow,2.359673,45.385034,The incident occurred near the town of Balcad.,12/31/2017: Assailants opened fire on a Somali...,1,0,Armed Assault,Military,Military Checkpoint,Somali National Army (SNA),Checkpoint,Somalia,Al-Shabaab,,Firearms,1.0,2.0
181687,2017,12,31,Syria,Middle East & North Africa,Lattakia,Jableh,35.407278,35.942679,The incident occurred at the Humaymim Airport.,12/31/2017: Assailants launched mortars at the...,1,0,Bombing/Explosion,Military,Military Barracks/Base/Headquarters/Checkpost,Russian Air Force,Hmeymim Air Base,Russia,Muslim extremists,,Explosives,2.0,7.0
181688,2017,12,31,Philippines,Southeast Asia,Maguindanao,Kubentog,6.900742,124.437908,The incident occurred in the Datu Hoffer distr...,12/31/2017: Assailants set fire to houses in K...,1,0,Facility/Infrastructure Attack,Private Citizens & Property,House/Apartment/Residence,Not Applicable,Houses,Philippines,Bangsamoro Islamic Freedom Movement (BIFM),,Incendiary,0.0,0.0
181689,2017,12,31,India,South Asia,Manipur,Imphal,24.798346,93.940430,The incident occurred in the Mantripukhri neig...,12/31/2017: Assailants threw a grenade at a Fo...,0,0,Bombing/Explosion,Government (General),Government Building/Facility/Office,Forest Department Manipur,Office,India,Unknown,,Explosives,0.0,0.0


### Check the shape again to make sure dropping works as expected

In [9]:
df.shape

(181691, 24)

### Get every column names in the dataset as a list

In [10]:
before = df.keys()
print(list(before))

['iyear', 'imonth', 'iday', 'country_txt', 'region_txt', 'provstate', 'city', 'latitude', 'longitude', 'location', 'summary', 'success', 'suicide', 'attacktype1_txt', 'targtype1_txt', 'targsubtype1_txt', 'corp1', 'target1', 'natlty1_txt', 'gname', 'motive', 'weaptype1_txt', 'nkill', 'nwound']


### Rename columns to make them sounds more intuitive

In [11]:
# display the modified data
after = ['Year','Month','Day',
 'Country','Region', 'State',
 'City', 'Latitude','Longitude',
 'Location', 'Summary', 'Success', 'Suicide',
'Attacktype', 'Targtype', 'Targsubtype', 'Corp',
'Target', 'Nationality', 'Groupname', 'Motive',
'Weaptype', '#kill', '#wound']
rename_dict = dict(zip(list(df.keys()), after))
df = df.rename(columns = rename_dict)
df

Unnamed: 0,Year,Month,Day,Country,Region,State,City,Latitude,Longitude,Location,Summary,Success,Suicide,Attacktype,Targtype,Targsubtype,Corp,Target,Nationality,Groupname,Motive,Weaptype,#kill,#wound
0,1970,7,2,Dominican Republic,Central America & Caribbean,,Santo Domingo,18.456792,-69.951164,,,1,0,Assassination,Private Citizens & Property,Named Civilian,,Julio Guzman,Dominican Republic,MANO-D,,Unknown,1.0,0.0
1,1970,0,0,Mexico,North America,Federal,Mexico city,19.371887,-99.086624,,,1,0,Hostage Taking (Kidnapping),Government (Diplomatic),"Diplomatic Personnel (outside of embassy, cons...",Belgian Ambassador Daughter,"Nadine Chaval, daughter",Belgium,23rd of September Communist League,,Unknown,0.0,0.0
2,1970,1,0,Philippines,Southeast Asia,Tarlac,Unknown,15.478598,120.599741,,,1,0,Assassination,Journalists & Media,Radio Journalist/Staff/Facility,Voice of America,Employee,United States,Unknown,,Unknown,1.0,0.0
3,1970,1,0,Greece,Western Europe,Attica,Athens,37.997490,23.762728,,,1,0,Bombing/Explosion,Government (Diplomatic),Embassy/Consulate,,U.S. Embassy,United States,Unknown,,Explosives,,
4,1970,1,0,Japan,East Asia,Fukouka,Fukouka,33.580412,130.396361,,,1,0,Facility/Infrastructure Attack,Government (Diplomatic),Embassy/Consulate,,U.S. Consulate,United States,Unknown,,Incendiary,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
181686,2017,12,31,Somalia,Sub-Saharan Africa,Middle Shebelle,Ceelka Geelow,2.359673,45.385034,The incident occurred near the town of Balcad.,12/31/2017: Assailants opened fire on a Somali...,1,0,Armed Assault,Military,Military Checkpoint,Somali National Army (SNA),Checkpoint,Somalia,Al-Shabaab,,Firearms,1.0,2.0
181687,2017,12,31,Syria,Middle East & North Africa,Lattakia,Jableh,35.407278,35.942679,The incident occurred at the Humaymim Airport.,12/31/2017: Assailants launched mortars at the...,1,0,Bombing/Explosion,Military,Military Barracks/Base/Headquarters/Checkpost,Russian Air Force,Hmeymim Air Base,Russia,Muslim extremists,,Explosives,2.0,7.0
181688,2017,12,31,Philippines,Southeast Asia,Maguindanao,Kubentog,6.900742,124.437908,The incident occurred in the Datu Hoffer distr...,12/31/2017: Assailants set fire to houses in K...,1,0,Facility/Infrastructure Attack,Private Citizens & Property,House/Apartment/Residence,Not Applicable,Houses,Philippines,Bangsamoro Islamic Freedom Movement (BIFM),,Incendiary,0.0,0.0
181689,2017,12,31,India,South Asia,Manipur,Imphal,24.798346,93.940430,The incident occurred in the Mantripukhri neig...,12/31/2017: Assailants threw a grenade at a Fo...,0,0,Bombing/Explosion,Government (General),Government Building/Facility/Office,Forest Department Manipur,Office,India,Unknown,,Explosives,0.0,0.0


In [12]:
## export the modified data
## df.to_csv("attack data final.csv", sep=',')# Data Cleaning

# Dataset 2: co2-emission-dataset

### Take a look at the dataset

In [13]:
df2 = pd.read_csv('co2.csv')
df2

Unnamed: 0,Entity,Code,Year,Annual CO2 emissions
0,Afghanistan,AFG,1750,0
1,Afghanistan,AFG,1751,0
2,Afghanistan,AFG,1752,0
3,Afghanistan,AFG,1753,0
4,Afghanistan,AFG,1754,0
...,...,...,...,...
63175,Zimbabwe,ZWE,2015,12170460
63176,Zimbabwe,ZWE,2016,10814761
63177,Zimbabwe,ZWE,2017,10246841
63178,Zimbabwe,ZWE,2018,11340575


### Check the shape of our dataset

In [14]:
df2.shape

(63180, 4)

### Delete the column that we do not need. We drop the "Code" column since it encodes same information as "Entity" column

In [15]:
df2.drop('Code', inplace=True, axis=1)
df2

Unnamed: 0,Entity,Year,Annual CO2 emissions
0,Afghanistan,1750,0
1,Afghanistan,1751,0
2,Afghanistan,1752,0
3,Afghanistan,1753,0
4,Afghanistan,1754,0
...,...,...,...
63175,Zimbabwe,2015,12170460
63176,Zimbabwe,2016,10814761
63177,Zimbabwe,2017,10246841
63178,Zimbabwe,2018,11340575


### Rename the "Entity" column to make it sounds more intuitive

In [16]:
df2 = df2.rename(columns = {'Entity':'Country'})
df2

Unnamed: 0,Country,Year,Annual CO2 emissions
0,Afghanistan,1750,0
1,Afghanistan,1751,0
2,Afghanistan,1752,0
3,Afghanistan,1753,0
4,Afghanistan,1754,0
...,...,...,...
63175,Zimbabwe,2015,12170460
63176,Zimbabwe,2016,10814761
63177,Zimbabwe,2017,10246841
63178,Zimbabwe,2018,11340575


### We are only interested in year 1970 - 2017, therefore we get rid of all other years

In [17]:
df2 = df2.loc[df2['Year'] < 2018]
df2 = df2.loc[df2['Year'] > 1969]
df2.shape
df2

Unnamed: 0,Country,Year,Annual CO2 emissions
220,Afghanistan,1970,1670397
221,Afghanistan,1971,1893581
222,Afghanistan,1972,1530408
223,Afghanistan,1973,1635586
224,Afghanistan,1974,1913339
...,...,...,...
63173,Zimbabwe,2013,11616551
63174,Zimbabwe,2014,11972604
63175,Zimbabwe,2015,12170460
63176,Zimbabwe,2016,10814761


In [18]:
##df2.to_csv("CO2 emission final.csv", sep=',')

# Dataset 3:  Life Expectancy (WHO)

In [19]:
df3 = pd.read_csv('Life Expectancy Data.csv')
df3

Unnamed: 0,Country,Year,Status,Life expectancy,Adult Mortality,infant deaths,Alcohol,percentage expenditure,Hepatitis B,Measles,BMI,under-five deaths,Polio,Total expenditure,Diphtheria,HIV/AIDS,GDP,Population,thinness 1-19 years,thinness 5-9 years,Income composition of resources,Schooling
0,Afghanistan,2015,Developing,65.0,263.0,62,0.01,71.279624,65.0,1154,19.1,83,6.0,8.16,65.0,0.1,584.259210,33736494.0,17.2,17.3,0.479,10.1
1,Afghanistan,2014,Developing,59.9,271.0,64,0.01,73.523582,62.0,492,18.6,86,58.0,8.18,62.0,0.1,612.696514,327582.0,17.5,17.5,0.476,10.0
2,Afghanistan,2013,Developing,59.9,268.0,66,0.01,73.219243,64.0,430,18.1,89,62.0,8.13,64.0,0.1,631.744976,31731688.0,17.7,17.7,0.470,9.9
3,Afghanistan,2012,Developing,59.5,272.0,69,0.01,78.184215,67.0,2787,17.6,93,67.0,8.52,67.0,0.1,669.959000,3696958.0,17.9,18.0,0.463,9.8
4,Afghanistan,2011,Developing,59.2,275.0,71,0.01,7.097109,68.0,3013,17.2,97,68.0,7.87,68.0,0.1,63.537231,2978599.0,18.2,18.2,0.454,9.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2933,Zimbabwe,2004,Developing,44.3,723.0,27,4.36,0.000000,68.0,31,27.1,42,67.0,7.13,65.0,33.6,454.366654,12777511.0,9.4,9.4,0.407,9.2
2934,Zimbabwe,2003,Developing,44.5,715.0,26,4.06,0.000000,7.0,998,26.7,41,7.0,6.52,68.0,36.7,453.351155,12633897.0,9.8,9.9,0.418,9.5
2935,Zimbabwe,2002,Developing,44.8,73.0,25,4.43,0.000000,73.0,304,26.3,40,73.0,6.53,71.0,39.8,57.348340,125525.0,1.2,1.3,0.427,10.0
2936,Zimbabwe,2001,Developing,45.3,686.0,25,1.72,0.000000,76.0,529,25.9,39,76.0,6.16,75.0,42.1,548.587312,12366165.0,1.6,1.7,0.427,9.8


### This data is in good shape therefore we do not make any change

In [20]:
df3.shape

(2938, 22)

In [21]:
# df3.to_csv("foodsecurity final.csv", sep=',')

# Merging all datasets

### We merged all three datasets, make it convenient for furture manipulation

In [22]:
df.head()

Unnamed: 0,Year,Month,Day,Country,Region,State,City,Latitude,Longitude,Location,Summary,Success,Suicide,Attacktype,Targtype,Targsubtype,Corp,Target,Nationality,Groupname,Motive,Weaptype,#kill,#wound
0,1970,7,2,Dominican Republic,Central America & Caribbean,,Santo Domingo,18.456792,-69.951164,,,1,0,Assassination,Private Citizens & Property,Named Civilian,,Julio Guzman,Dominican Republic,MANO-D,,Unknown,1.0,0.0
1,1970,0,0,Mexico,North America,Federal,Mexico city,19.371887,-99.086624,,,1,0,Hostage Taking (Kidnapping),Government (Diplomatic),"Diplomatic Personnel (outside of embassy, cons...",Belgian Ambassador Daughter,"Nadine Chaval, daughter",Belgium,23rd of September Communist League,,Unknown,0.0,0.0
2,1970,1,0,Philippines,Southeast Asia,Tarlac,Unknown,15.478598,120.599741,,,1,0,Assassination,Journalists & Media,Radio Journalist/Staff/Facility,Voice of America,Employee,United States,Unknown,,Unknown,1.0,0.0
3,1970,1,0,Greece,Western Europe,Attica,Athens,37.99749,23.762728,,,1,0,Bombing/Explosion,Government (Diplomatic),Embassy/Consulate,,U.S. Embassy,United States,Unknown,,Explosives,,
4,1970,1,0,Japan,East Asia,Fukouka,Fukouka,33.580412,130.396361,,,1,0,Facility/Infrastructure Attack,Government (Diplomatic),Embassy/Consulate,,U.S. Consulate,United States,Unknown,,Incendiary,,


In [23]:
df2.head()

Unnamed: 0,Country,Year,Annual CO2 emissions
220,Afghanistan,1970,1670397
221,Afghanistan,1971,1893581
222,Afghanistan,1972,1530408
223,Afghanistan,1973,1635586
224,Afghanistan,1974,1913339


### Metrging dataset 1 and dataset 2 on "Country" and "Year"

In [24]:
new = pd.merge(df, df2, on=["Country", "Year"])
new

Unnamed: 0,Year,Month,Day,Country,Region,State,City,Latitude,Longitude,Location,Summary,Success,Suicide,Attacktype,Targtype,Targsubtype,Corp,Target,Nationality,Groupname,Motive,Weaptype,#kill,#wound,Annual CO2 emissions
0,1970,7,2,Dominican Republic,Central America & Caribbean,,Santo Domingo,18.456792,-69.951164,,,1,0,Assassination,Private Citizens & Property,Named Civilian,,Julio Guzman,Dominican Republic,MANO-D,,Unknown,1.0,0.0,3105081
1,1970,3,24,Dominican Republic,Central America & Caribbean,National,Santo Domingo,18.456792,-69.951164,,,1,0,Hostage Taking (Kidnapping),Military,"Military Personnel (soldiers, troops, officers...",U.S. Air force,"Lt. Col. Donal J. Crowley, U.S. Air attache",United States,Dominican Popular Movement (MPD),,Unknown,0.0,0.0,3105081
2,1970,0,0,Mexico,North America,Federal,Mexico city,19.371887,-99.086624,,,1,0,Hostage Taking (Kidnapping),Government (Diplomatic),"Diplomatic Personnel (outside of embassy, cons...",Belgian Ambassador Daughter,"Nadine Chaval, daughter",Belgium,23rd of September Communist League,,Unknown,0.0,0.0,113950680
3,1970,8,25,Mexico,North America,Federal,Mexico city,19.371887,-99.086624,,,1,0,Hostage Taking (Kidnapping),Government (Diplomatic),"Diplomatic Personnel (outside of embassy, cons...",Belgium government,"Jalques Groothaert, ambassador to Mexico",Belgium,Unknown,,Unknown,0.0,0.0,113950680
4,1970,1,0,Philippines,Southeast Asia,Tarlac,Unknown,15.478598,120.599741,,,1,0,Assassination,Journalists & Media,Radio Journalist/Staff/Facility,Voice of America,Employee,United States,Unknown,,Unknown,1.0,0.0,24751417
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
177073,2017,10,18,Liberia,Sub-Saharan Africa,Montserrado,Monrovia,6.313432,-10.801395,,10/18/2017: Assailants threw a petrol bomb at ...,0,0,Assassination,Journalists & Media,Radio Journalist/Staff/Facility,OK FM,Residence of Journalist: Smith Toby,Liberia,Unknown,"The specific motive is unknown; however, sourc...",Incendiary,0.0,0.0,1215203
177074,2017,10,19,Georgia,Central Asia,Kvemo Kartli,Kizilajlo,41.446525,44.172998,The incident occurred in the Marneuli district.,10/19/2017: Assailants opened fire on the Geor...,0,0,Assassination,Private Citizens & Property,Political Party Member/Rally,Georgian Dream - Democratic Georgia,Candidate: Jeyhun Chovdarov,Georgia,Unknown,"The specific motive is unknown; however, sourc...",Firearms,0.0,5.0,9831913
177075,2017,11,27,Malawi,Sub-Saharan Africa,Northern,Mzuzu,-11.407376,33.987129,The incident occurred in the Lupaso neighborhood.,11/27/2017: Assailants set fire to a Seventh D...,1,0,Facility/Infrastructure Attack,Religious Figures/Institutions,Place of Worship,Seventh Day Adventist Church,Church,Malawi,Unknown,,Incendiary,0.0,0.0,1402061
177076,2017,12,7,Netherlands,Western Europe,North Holland,Amsterdam,52.365857,4.895405,The incident occurred in southern Amsterdam.,12/07/2017: An assailant armed with a bat atta...,1,0,Facility/Infrastructure Attack,Business,Restaurant/Bar/Caf�,Unknown,Kosher Restaurant,Netherlands,Palestinian Extremists,A Palestinian extremist claimed responsibility...,Melee,0.0,0.0,164444774


In [25]:
df3

Unnamed: 0,Country,Year,Status,Life expectancy,Adult Mortality,infant deaths,Alcohol,percentage expenditure,Hepatitis B,Measles,BMI,under-five deaths,Polio,Total expenditure,Diphtheria,HIV/AIDS,GDP,Population,thinness 1-19 years,thinness 5-9 years,Income composition of resources,Schooling
0,Afghanistan,2015,Developing,65.0,263.0,62,0.01,71.279624,65.0,1154,19.1,83,6.0,8.16,65.0,0.1,584.259210,33736494.0,17.2,17.3,0.479,10.1
1,Afghanistan,2014,Developing,59.9,271.0,64,0.01,73.523582,62.0,492,18.6,86,58.0,8.18,62.0,0.1,612.696514,327582.0,17.5,17.5,0.476,10.0
2,Afghanistan,2013,Developing,59.9,268.0,66,0.01,73.219243,64.0,430,18.1,89,62.0,8.13,64.0,0.1,631.744976,31731688.0,17.7,17.7,0.470,9.9
3,Afghanistan,2012,Developing,59.5,272.0,69,0.01,78.184215,67.0,2787,17.6,93,67.0,8.52,67.0,0.1,669.959000,3696958.0,17.9,18.0,0.463,9.8
4,Afghanistan,2011,Developing,59.2,275.0,71,0.01,7.097109,68.0,3013,17.2,97,68.0,7.87,68.0,0.1,63.537231,2978599.0,18.2,18.2,0.454,9.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2933,Zimbabwe,2004,Developing,44.3,723.0,27,4.36,0.000000,68.0,31,27.1,42,67.0,7.13,65.0,33.6,454.366654,12777511.0,9.4,9.4,0.407,9.2
2934,Zimbabwe,2003,Developing,44.5,715.0,26,4.06,0.000000,7.0,998,26.7,41,7.0,6.52,68.0,36.7,453.351155,12633897.0,9.8,9.9,0.418,9.5
2935,Zimbabwe,2002,Developing,44.8,73.0,25,4.43,0.000000,73.0,304,26.3,40,73.0,6.53,71.0,39.8,57.348340,125525.0,1.2,1.3,0.427,10.0
2936,Zimbabwe,2001,Developing,45.3,686.0,25,1.72,0.000000,76.0,529,25.9,39,76.0,6.16,75.0,42.1,548.587312,12366165.0,1.6,1.7,0.427,9.8


### Merging dataset 3 into the dataset we have just created

In [29]:
final = pd.merge(new, df3, on=["Country", "Year"])
final

Unnamed: 0,Year,Month,Day,Country,Region,State,City,Latitude,Longitude,Location,Summary,Success,Suicide,Attacktype,Targtype,Targsubtype,Corp,Target,Nationality,Groupname,Motive,Weaptype,#kill,#wound,Annual CO2 emissions,Status,Life expectancy,Adult Mortality,infant deaths,Alcohol,percentage expenditure,Hepatitis B,Measles,BMI,under-five deaths,Polio,Total expenditure,Diphtheria,HIV/AIDS,GDP,Population,thinness 1-19 years,thinness 5-9 years,Income composition of resources,Schooling
0,2000,1,1,Namibia,Sub-Saharan Africa,Kavango,Muitjiku,-17.910812,19.988303,,01/01/2000: In the first of two related incide...,1,0,Armed Assault,Business,Restaurant/Bar/Caf�,,A tavern in Muitjiku,Namibia,National Union for the Total Independence of A...,Unknown,Firearms,0.0,7.0,1641472,Developing,57.4,41.0,3,5.73,35.809785,,469,24.5,4,8.0,6.11,79.0,22.8,257.995570,1899257.0,15.7,15.9,0.559,11.5
1,2000,1,1,Namibia,Sub-Saharan Africa,Kavango,Muitjiku,-17.910812,19.988303,,01/01/2000: In the second of two related incid...,1,0,Hostage Taking (Kidnapping),Business,Entertainment/Cultural/Stadium/Casino,Bush Babies nightclub,The Bush Babies nightclub in Muitjiku,Namibia,National Union for the Total Independence of A...,Unknown,Firearms,0.0,7.0,1641472,Developing,57.4,41.0,3,5.73,35.809785,,469,24.5,4,8.0,6.11,79.0,22.8,257.995570,1899257.0,15.7,15.9,0.559,11.5
2,2000,1,3,Namibia,Sub-Saharan Africa,Caprivi,Katima Mulilio,-17.503986,24.279230,This incident occurred on the road between Kat...,01/03/2000: Three French children were killed ...,1,0,Armed Assault,Tourists,Tour Bus/Van,Civilians,A bus of tourists traveling in Namibia,France,National Union for the Total Independence of A...,Unknown,Firearms,3.0,4.0,1641472,Developing,57.4,41.0,3,5.73,35.809785,,469,24.5,4,8.0,6.11,79.0,22.8,257.995570,1899257.0,15.7,15.9,0.559,11.5
3,2000,1,9,Namibia,Sub-Saharan Africa,Kavango,Nkonke,-17.800000,18.850000,,01/09/2000: Two civilians were killed and one ...,1,0,Armed Assault,Private Citizens & Property,House/Apartment/Residence,Civilians,"A private home in Nkonke, Namibia",Namibia,National Union for the Total Independence of A...,Unknown,Firearms,2.0,1.0,1641472,Developing,57.4,41.0,3,5.73,35.809785,,469,24.5,4,8.0,6.11,79.0,22.8,257.995570,1899257.0,15.7,15.9,0.559,11.5
4,2000,1,14,Namibia,Sub-Saharan Africa,Caprivi,Omega,-17.819342,23.953647,This incident occurred on the Omega Divundu ro...,01/14/2000: Four people were killed and five w...,1,0,Armed Assault,Private Citizens & Property,Unnamed Civilian/Unspecified,Civilians,People traveling on the Omega Divindu road,Namibia,National Union for the Total Independence of A...,Unknown,Firearms,4.0,5.0,1641472,Developing,57.4,41.0,3,5.73,35.809785,,469,24.5,4,8.0,6.11,79.0,22.8,257.995570,1899257.0,15.7,15.9,0.559,11.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
80894,2015,9,28,Uzbekistan,Central Asia,Tashkent,Tashkent,41.367161,69.272486,,09/28/2015: An assailant threw two incendiary ...,0,0,Facility/Infrastructure Attack,Government (Diplomatic),Embassy/Consulate,United States Department of State,Embassy,United States,Unknown,"The specific motive is unknown; however, sourc...",Incendiary,0.0,0.0,101791340,Developing,69.4,184.0,15,,0.000000,99.0,22,44.7,17,99.0,,99.0,0.1,2137.576852,312989.0,3.0,3.1,0.697,12.1
80895,2015,11,5,Morocco,Middle East & North Africa,Grand Casablanca,Casablanca,33.573110,-7.589843,,11/05/2015: Assailants abducted a Moroccan fil...,1,0,Hostage Taking (Kidnapping),Private Citizens & Property,Laborer (General)/Occupation Identified,Not Applicable,Actress: Loubna Abidar,Morocco,Unknown,"The specific motive is unknown; however, sourc...",Melee,0.0,,61027926,Developing,74.3,95.0,17,,0.000000,99.0,17,58.5,20,99.0,,99.0,0.1,2847.285569,3483322.0,6.4,6.2,0.645,12.1
80896,2015,11,23,Argentina,South America,Ciudad de Buenos Aires,Buenos Aires,-34.617680,-58.444435,The incident occurred in the Once neighborhood.,11/23/2015: An explosive device was discovered...,0,0,Bombing/Explosion,Private Citizens & Property,Labor Union Related,Argentine Jewish Mutual Aid Society,Headquarters,Argentina,Unknown,,Explosives,0.0,0.0,192365656,Developing,76.3,116.0,8,,0.000000,94.0,0,62.8,9,93.0,,94.0,0.1,13467.123600,43417765.0,1.0,0.9,0.826,17.3
80897,2015,12,21,Djibouti,Sub-Saharan Africa,Djibouti,Djibouti,11.588561,43.145091,The incident occurred in the Buldhoqo area.,12/21/2015: Assailants attacked security force...,1,0,Armed Assault,Military,"Military Personnel (soldiers, troops, officers...",Djibouti Armed Forces,Soldiers,Djibouti,Unknown,,Firearms,,,445770,Developing,63.5,241.0,1,,0.000000,84.0,47,35.0,1,84.0,,84.0,2.1,1862.167274,927414.0,5.6,5.4,0.470,6.3


In [27]:
#final.to_csv("merged data.csv", sep=',')

We merged all three datasets together because it is convenient for us to make analysis and find correlation in the future. We choose to merge on "Country" and "Year" becasue these are shared column of all three datasets. Since the "Year" column of all three datasets cover different years and dataset 3 only cover year 2000 - 2005, we limit the year to 2000 - 2015 so there won't be many missing data in the final dataset. 