# **Exploratory Data Analysis - Terrorism**

Exploratory data Analysis of Terrorism dataset using python and its libraries.

This notebook is about the analysis of rise and fall of terrorism activities around the world.

#### **By - Pranjal Rawat**

**Importing required Libraries**

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt

pd.set_option('display.max_columns', None)
pd.set_option("display.max_rows", None)

Ignoring warnings

In [None]:
import warnings
warnings.filterwarnings('ignore')

**Loading and reading Terrorism Dataset**

In [None]:
df = pd.read_csv('../input/terrorism/globalterrorismdb_0718dist.csv')

**First five rows of the dataset**

In [None]:
df.head()

**Last five rows of the dataset**

In [None]:
df.tail()

**Checking the shape of the dataset**

In [None]:
df.shape

**Removing unwanted columns, just keeping required and important columns**

In [None]:
df = df[['eventid','iyear', 'imonth', 'iday', 'region_txt', 'country_txt', 'provstate', 'city', 'latitude', 'longitude', 'weaptype1_txt', 'targtype1_txt']]

**Checking the head of the dataset after the changes**

In [None]:
df.head()

**Renaming the columns names for easy understanding and retrieval**

In [None]:
df.rename(columns={'region_txt':'region', 'country_txt': 'country', 'provstate':'state', 'weaptype1_txt':'weapon', 'targtype1_txt':'target'}, inplace = True)

**Dropping duplicate rows from the dataset**

In [None]:
df.drop_duplicates(keep=False,inplace=True)

In [None]:
df.head()

**Summarize the numerical columns**

In [None]:
df.describe()

**Checking for null/ missing values in each columns**

In [None]:
df.isna().sum()

**Dropping Missing values from longitude columns**

In [None]:
df = df.dropna(subset=['longitude'])

**Replacing the 'Unknown' value to NAN from city and state columns**

In [None]:
df.state.replace(to_replace='Unknown', value=np.nan,inplace=True)
df.city.replace(to_replace='Unknown', value=np.nan,inplace=True)

**Filling the NAN/ Missing values in state from country column**

In [None]:
df.state.fillna(df['country'], inplace=True)

**Filling the NAN/ Missing values in city from state column**

In [None]:
df.city.fillna(df['state'],inplace=True)

**Checking all the Unique regions in the dataset**

In [None]:
df.region.unique()

**Summarizing all the regions in seven continents based on their locations**

In [None]:
df.region.replace(['Central America & Caribbean'],'North America',inplace=True)
df.region.replace(['Southeast Asia','East Asia','South Asia', 'Central Asia'],'Asia',inplace=True)
df.region.replace(['Western Europe', 'Eastern Europe'],'Europe',inplace=True)
df.region.replace(['Middle East & North Africa', 'Sub-Saharan Africa'],'Africa',inplace=True)
df.region.replace(['Australasia & Oceania'],'Australia',inplace=True)

In [None]:
df.region.unique()

**Creating a series containg year and count of terrorism activities per year**

In [None]:
year_count  = df['iyear'].value_counts()
year_count.head()

**Plotting a line plot**

In [None]:
plt.figure(figsize = (25,8))
sns.lineplot(year_count.index, year_count.values, alpha=0.9)
plt.xlabel('Years')
plt.ylabel('Number of Terrorism activities')
plt.title('Rise and Fall of Terrorism in Whole World in Recent Years')
plt.suptitle('Line Plot for Terrorism')
plt.show()

The above line plot gives the information of the rise and fall of the terrorism activities from 1970 to 2017. We can see that there is a exponential rise in terrorism activites from 2011.

**Bar plot for the rise of terrorism activites**

In [None]:
plt.figure(figsize = (25,8))
sns.barplot(year_count.index, year_count.values, alpha=0.9)
plt.xlabel('Years')
plt.ylabel('Number of Terrorism activities')
plt.title('Rise and Fall of Terrorism in Whole World in Recent Years')
plt.suptitle('Bar Plot for Terrorism')
plt.show()

**Creating a series containing count of terrorism activities per continents**

In [None]:
ucontinent = df.iyear.groupby(df.region).count()
ucontinent.sort_values(ascending=True,inplace=True)
ucontinent

**Bar graph for the Terrorism activites in continents**

In [None]:
fig = plt.figure(figsize=(10,6))
sns.barplot(ucontinent.index, ucontinent.values)
plt.title('Total terrorism activites per Continents')
plt.show()

From the above Barplot we can conclude that Africa has largest number of terrorism activities around 60000+ followed by Asia with over 55000+ terrorism activities.

Australia has the lowest number of terrorism activites ranges from 200-300.

**Line Plot and Bar Graph for the rise and fall of terrorism activites in different continents over a period of time**

In [None]:
a = 6  # number of rows
b = 2  # number of columns
c = 1  # initialize plot counter

fig = plt.figure(figsize=(24,42))

for i in df['region'].unique():
    plt.suptitle('Rise and Fall of Terrorism Around various Continents',fontsize=24,y=.92)

    plt.subplot(a, b, c)
    year_count  = df[df['region']==i]['iyear'].value_counts()
    sns.lineplot(year_count.index, year_count.values, alpha=0.9)
    plt.xlabel('Years',fontsize=14)
    plt.ylabel('Number of Terrorism activities',fontsize=14)
    plt.title('{}'.format(i),fontsize=16)
    c = c + 1
    
    plt.subplot(a, b, c)
    sns.barplot(year_count.index, year_count.values, alpha=0.9)
    plt.xlabel('Years',fontsize=14)
    plt.xticks(rotation=90)
    plt.ylabel('Number of Terrorism activities',fontsize=14)
    plt.title('{}'.format(i),fontsize=16)
    c = c + 1
    
plt.subplots_adjust(wspace=0.2,hspace=0.4)
plt.show()

From the above plot's we can conclude that **North Ameraica** and **South America** had rise in terrorism activities from 1980 to 2000. But in recents years the terrorism activities in those regions are quite low.

In oppose to this the terrorism activites in **Asia** and **Africa** are increasing exponently in recents years, which is not a very good sign.

Terrorism activites in **Europe** is moderate throught the period from 1970 to 2013, but we can see a spike in terrorism activites in 2014 onwards.

From all the regions **Australia** has the lowest terrorism activites in range of (200-300) from 1970 - 2017. This can be considered as a good sign.

In [None]:
df.head()

**Replacing 'Unknown' value in weapon column from NAN.**

In [None]:
df.weapon.replace(to_replace='Unknown', value=np.nan, inplace=True)

**Counting Missing values in Weapon Column**

In [None]:
df['weapon'].isna().value_counts()

**Filling missing value in *weapon* column section from the mode of weapons in that year.**

In [None]:
df['weapon'] = df.groupby(['iyear'], sort=False)['weapon'].apply(lambda x: x.fillna(x.mode().iloc[0]))

In [None]:
df.head(3)

In [None]:
df['weapon'].isna().value_counts()

**Renaming one of the Weapon value**

In [None]:
df.weapon.replace(to_replace='Vehicle (not to include vehicle-borne explosives, i.e., car or truck bombs)', value='Vechile Exluding Explosions', inplace=True)

**Creating series of weapon and number of times these weapons types are used**

In [None]:
nweapon = df.weapon.value_counts().sort_values(ascending=False)
nweapon

**Line and Bar plot for the use of weapon types from 1970 to 2017**

In [None]:
a = 11  # number of rows
b = 2  # number of columns
c = 1  # initialize plot counter

fig = plt.figure(figsize=(24,50))

for i in nweapon.index:
    plt.suptitle('Use of weapons throughout 1970 - 2017',fontsize=24,y=.9)

    plt.subplot(a, b, c)
    year_count  = df[df['weapon']==i]['iyear'].value_counts()
    sns.lineplot(year_count.index, year_count.values, alpha=0.9)
    plt.xlabel('Years',fontsize=14)
    plt.ylabel('Number of {} used'.format(i),fontsize=13)
    plt.title('{} (Total Count - {})'.format(i,nweapon[i]),fontsize=16)
    c = c + 1
    
    plt.subplot(a, b, c)
    sns.barplot(year_count.index, year_count.values, alpha=0.9)
    plt.xlabel('Years',fontsize=14)
    plt.xticks(rotation=90)
    plt.ylabel('Number of {} used'.format(i),fontsize=13)
    plt.title('{} (Total Count - {})'.format(i,nweapon[i]),fontsize=16)
    c = c + 1
    
    
plt.subplots_adjust(wspace=0.2,hspace=0.5)
plt.show()


From the above plot's we can conclude that **Explosives**, **Firearms**, **Incendiary** are the choice of weapons commonly used by terrorist from **2010**.

In [None]:
df.head()

**Creating series for the target of terrorism**

In [None]:
ntarget = df.target.value_counts()
ntarget

**Replacing 'Unknown' values in Target column from NAN value.**

In [None]:
df.target.replace(to_replace='Unknown', value=np.nan, inplace=True)

**Filling missing value in Target from target from other rows using same weapons**

In [None]:
df['target'] = df.groupby(['weapon'], sort=False)['target'].apply(lambda x: x.fillna(x.mode().iloc[0]))

In [None]:
df.target.unique()

In [None]:
df.target.count()

In [None]:
ntarget = df.target.value_counts()
ntarget

**Plotting the barplot for the count of the target by the terrorists.**

In [None]:
plt.figure(figsize = (25,8))
sns.barplot(ntarget.index, ntarget.values)
plt.xticks(rotation=90,fontsize=14)
plt.show()

**Creating a list containing count of common tagets by terrorists**

In [None]:
utarget = []
utarget.append(df[df.target == 'Private Citizens & Property']['target'].count())
utarget.append(df[df.target == 'Military']['target'].count())
utarget.append(df[df.target == 'Police']['target'].count())
utarget.append(df[df.target == 'Business']['target'].count())
utarget.append(18192+3114)
utarget.append(df.target.count()-sum(utarget))
utarget

**Pie Chart  for the most common target by terrorists**

In [None]:
labels = ['Private', 'Military', 'Police', 'Business','Government','Others']
explode = (0.07, 0, 0, 0,0,0)
fig1, ax1 = plt.subplots()
ax1.pie(utarget, explode=explode, labels=labels, autopct='%1.1f%%',
        shadow=True, startangle=90)
# Equal aspect ratio ensures that pie is drawn as a circle
ax1.axis('equal')  
plt.tight_layout()
plt.title('Total number of terrorist attack on different sectors')
plt.show()

In [None]:
df.head(10)

**Creating the series for the total no of terrorism activities in countires**

In [None]:
ncountry = df.eventid.groupby(df.country).count()
nother = ncountry.values.sum()
ucountry = ncountry.sort_values(ascending=False).head(10)
uother = ucountry.values.sum()
ucountry

In [None]:
ucountry['Others'] = nother-uother
ucountry

**BAR plot for the top 10 countries with maximum number of terrorism activities.**

In [None]:
fig, ax = plt.subplots(figsize=(15,6))
sns.barplot(ucountry.index,ucountry.values)
plt.title('Top 10 countries with maximum number of terrorism activities from 1970 - 2017')
plt.xticks(rotation=90)
plt.ylabel('Number of Terrorism activitis')
plt.show()

From the above plot it is clear that Iraq has largest number of terrorism activities followed by Pakistan.

We can also conclude that top 4 countries for the number of terrorism activities are Asian countries.

**Pie Chart for the top 10 countries with maximum number of terrorism activities.**

In [None]:
labels = ucountry.index
explode = (0, 0, 0, 0,0,0,0,0,0,0,0.04)
fig1, ax1 = plt.subplots(figsize=(20,10))
ax1.pie(ucountry.values, explode=explode, labels=labels, autopct='%1.1f%%',
        shadow=True, startangle=200,textprops={'fontsize': 14})
# Equal aspect ratio ensures that pie is drawn as a circle
ax1.axis('equal')  
plt.tight_layout()
plt.title('Total Percentage of terrorist attack on different Countries')
plt.show()

**Importing Folium and its plugins**

In [None]:
import folium
from folium.plugins import FastMarkerCluster, Fullscreen, MiniMap, HeatMap, HeatMapWithTime

**Time series heatmap of the world from 1970 -2017 for the rise of terrorism activities**

In [None]:
year_list = []
for year in df['iyear'].sort_values().unique():
    data = df.query('iyear == @year')
    data = data.groupby(by=['latitude', 'longitude'], 
                        as_index=False).count().sort_values(by='eventid', ascending=False).iloc[:, :3]
    year_list.append(data.values.tolist())

m = folium.Map(
    location=[0, 0], 
    zoom_start=2, 
    width='100%',
    height='80%',
)

HeatMapWithTime(
    name='Terrorism Heatmap',
    data=year_list,
    radius=3,
    auto_play=True,
    index=list(df['iyear'].sort_values().unique())
).add_to(m)

m

**Time series heatmap of the India from 1970 -2017 for the rise of terrorism activities**

In [None]:
year_list = []
for year in df['iyear'].sort_values().unique():
    data = df.query('iyear == @year')
    data = data.groupby(by=['latitude', 'longitude','country'], 
                        as_index=False).count().sort_values(by='eventid', ascending=False).iloc[:, :3]
    year_list.append(data[data.country=='India'].values.tolist())

m = folium.Map(
    location=[20.5937, 78.9629], 
    zoom_start=4,
    tiles='StamenToner',
    width='90%',
    height='90%',
    max_zoom=4,
    min_zoom=4,
)

HeatMapWithTime(
    name='Terrorism Heatmap in India',
    data=year_list,
    radius=5,
    auto_play=True,
    index=list(df['iyear'].sort_values().unique())
).add_to(m)

m

From the data and visualization, it is concluded that Aisan and African Countries had largest number of terrorism activites. The number of terrorism activites are exponently increasing in Asian and African countries in recent years. The top 4 countries leading with total number of terrorism activites are Asian Countries. The type of weapon used commonly by terrorists recently are explosives and firearms. The main target of the terrorists are Private citizens business and property, Military, police and goverments residences.

Around 25% of terrorism activites are taking place in 4 countries (**Iraq, Pakistan, Afganistan and India**).

Top 10 Countries consisting of around 56% of total terrorism activities around the world.

**Thank You for the walkthrough to this notebook.**