In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
os.chdir("../input")
os.listdir()

## **Exploratory Data Analysis - Terrorism**


* In this notebook, "Exploratory Data Analysis" was performed on the "Global Terrorism" dataset.
### Problems
* Perform ‘Exploratory Data Analysis’ on dataset ‘Global Terrorism’
* As a security/defense analyst, try to find out the hot zone of terrorism.
* What all security issues and insights you can derive by EDA? 

# Author: Muhammet Varlı

## **1. The Story of the Dataset**
### **Information about some variables used in the Data Set.**
For detailed information about the data set: https://www.start.umd.edu/gtd/downloads/Codebook.pdf
* iyear: This field contains the year in which the incident occurred.
* imonth: This field contains the month in which the incident occurred.
* iday: This field contains the day in which the incident occurred.
* country_txt: This field identifies the country or location where the incident occurred (categorical)
* region_txt: This field identifies the region in which the incident occurred(categorical)
* provstate: This variable records the name (at the time of event) of the 1st order subnational administrative region in which the event occurs.
* city: This field contains the name of the city, village, or town in which the incident occurred
* latitude: This field records the latitude (based on WGS1984 standards) of the city in which the event occurred.
* longitude: This field records the longitude (based on WGS1984 standards) of the city in which the event occurred.
* attacktype1_txt: 1:Assassination 2:Hijacking 3:Kidnapping 4:Barricade Incident 5:Bombing/Explosion 6:Armed Assault 7:Unarmed Assault 8:Facility/Infrastructure Attack 9:Unknown
* targtype1_txt: The target/victim type field captures the general type of target/victim. This variable consists of the following 22 categories.
* target1: This is the specific person, building, installation, etc., that was targeted and/or victimized and is a part of the entity named above.
* gname: This field contains the name of the group that carried out the attack.
* weaptype1_txt: Up to four weapon types are recorded for each incident
* nkill: This field stores the number of total confirmed fatalities for the incident.
* nwound: This field records the number of confirmed non-fatal injuries to both perpetrators and victims

In [None]:
from warnings import filterwarnings
filterwarnings('ignore')

In [None]:
# conda install basemap

In [None]:
# Some Libraries Imported
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from mpl_toolkits.basemap import Basemap
from matplotlib import animation,rc
import io
import base64
from IPython.display import HTML, display

## **2. Data Read**

In [None]:
df = pd.read_csv('../input/global-terrorism/globalterrorismdb_0718dist.csv', encoding="latin-1")
pd.set_option('display.max_rows', df.shape[0]+1)
pd.set_option('display.max_columns', df.shape[1]+1)
# Let's have an overview of the data set
df.head()

* Some features have been renamed their names for clarity.

In [None]:
df=df.rename(columns={"provstate": "State","region_txt": "Region","country_txt": "Country",
                      "iyear": "Year","imonth": "Month","iday": "Day",
                      "attacktype1_txt": "Attack_Type","nkill": "Killed",
                      "nwound": "Wounded","targtype1_txt": "Target_Type","weaptype1_txt": "Weap_Type",
                      "gname": "Group_Name","target1": "Target_Name"})

* Selection of only necessary features.

In [None]:
df = df[['State', 'Region', 'city', 'latitude', 'longitude','Country',
         'Attack_Type','Year','Month','Day','Killed', 'Wounded', 'Target_Type',
         'Group_Name', 'Target_Name','Weap_Type']]

df['Wounded'] = df['Wounded'].fillna(0).astype(int)
df['Killed'] = df['Killed'].fillna(0).astype(int)
df['Affected']=df['Killed']+df['Wounded']
# Let's have an overview of the data set
df.head()

In [None]:
df.info()

In [None]:
df.describe().T

* It shows the correlation matrix of the data set.

In [None]:
corrmat = df.corr()
f, ax = plt.subplots(figsize=(10, 10))
sns.heatmap(corrmat, vmax=1, square=True);
plt.show()

In [None]:
#Percentage of NAN Values 
per_Nan = [(c, df[c].isna().mean()*100) for c in df]
per_Nan = pd.DataFrame(per_Nan, columns=["column_name", "Percentage"])

In [None]:
per_Nan

* We observe the number of affected people, which is the sum of deaths, injuries, and deaths and injuries due to terrorism by region.

In [None]:
# Death, Wounded and Affected by Region
number_of_affected = df[['Region','Killed','Wounded','Affected']]
number_of_affected = number_of_affected.groupby(by=['Region']).sum().reset_index().sort_values(by=['Affected'], ascending = False)
number_of_affected

* List the 20 countries most affected by terrorism.

In [None]:
# Death, Wounded and Affected by Country(top 20)
affected_country = df[['Country','Killed','Wounded','Affected']]
affected_country = affected_country.groupby(by=['Country']).sum().reset_index().sort_values(by=['Affected'], ascending = False)
affected_country[:20]

## **3. Data Visualization**

### Global Terror Attacks

* The numbers of terrorist attacks by years are shown.

In [None]:
plt.subplots(figsize=(15,6))
sns.countplot('Year',data=df,palette='rocket_r',edgecolor=sns.color_palette('dark',7))
plt.xticks(rotation=90)
plt.title('Number Of Terrorist Activities Each Year')
plt.show()

* The locations of terrorist incidents around the world are shown on the world map with dots.
* It includes all terrorist incidents that took place between 1970 and 2017 on the map.

In [None]:
regions = list(set(df.Region))
colors = ['yellow', 'red', 'lime','fuchsia', 'purple', 'green', 'orange', 'brown',\
          'aqua','purple', 'black', 'lightgreen']


In [None]:
plt.figure(figsize=(15,8))
m = Basemap(projection='mill',llcrnrlat=-80,urcrnrlat=80, llcrnrlon=-180,urcrnrlon=180,lat_ts=20,resolution='c')
m.drawcoastlines()
m.drawcountries()
m.fillcontinents(color='white',lake_color='lightblue', zorder = 1)
m.drawmapboundary(fill_color='lightblue')

def pltpoints(region, color = None, label = None):
    x, y = m(list(df.longitude[df.Region == region].astype("float")),\
            (list(df.latitude[df.Region == region].astype("float"))))
    points = m.plot(x, y, "o", markersize = 4, color = color, label = label, alpha = .5)
    return(points)

for i, region in enumerate(regions):
    pltpoints(region, color = colors[i], label = region)  
    
plt.title("Global Terrorism (1970 - 2017)")
plt.legend(loc ='lower left', prop= {'size':11})
plt.show()    

## **Animation Of Terrorist Activities**

* An animated representation of the terrorist attacks that took place around the world between 1970 and 2017.

In [None]:
fig = plt.figure(figsize=(15,8))

def animate(Year):
    ax = plt.axes()
    ax.clear()
    ax.set_title('Animation Of Terrorist Activities'+'\n'+'Year:' +str(Year),fontsize=20)
    m6 = Basemap(projection='mill',llcrnrlat=-80,urcrnrlat=80, llcrnrlon=-180,urcrnrlon=180,lat_ts=20,resolution='c')
    lat6=list(df[df['Year']==Year].latitude)
    long6=list(df[df['Year']==Year].longitude)
    x6,y6=m6(long6,lat6)
    m6.scatter(x6, y6,s=[(Killed+Wounded)*0.3 for Killed,Wounded in zip(df[df['Year']==Year].Killed,df[df['Year']==Year].Wounded)],color = 'r')
    m6.drawcoastlines()
    m6.drawcountries()
    m6.fillcontinents(color='coral',lake_color='aqua', zorder = 1,alpha=0.4)
    m6.drawmapboundary(fill_color='aqua')
    m6.drawmapboundary()
ani = animation.FuncAnimation(fig,animate,list(df.Year.unique()), interval = 1500)    
ani.save('animation.gif', writer='pillow', fps=1)
plt.close(1)
filename = 'animation.gif'
video = io.open(filename, 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<img src="data:image/gif;base64,{0}" type="gif" />'''.format(encoded.decode('ascii')))

In [None]:
df['Killed'].sum()

* As can be seen in the chart, the region with the highest terrorist incidents is the Middle East & North Africa.

In [None]:
pd.pivot_table(data=df, index=df.Year, columns='Region', values='Killed', aggfunc='sum')\
    .plot.line(figsize=(15,5), colormap='Dark2').legend(title=None)

* As can be seen in the graph, the country with the highest number of terrorist incidents is observed as Iraq.

In [None]:
top_country_10=df[df['Country'].isin(df['Country'].value_counts()[:10].index)]
pd.crosstab(top_country_10.Year,top_country_10.Country).plot(color=sns.color_palette('bright',10))
fig=plt.gcf()
fig.set_size_inches(18,6)
plt.show()

* Top Countries affected by Terror Attacks

In [None]:
plt.subplots(figsize=(15,6))
sns.barplot(df['Country'].value_counts()[:15].index,df['Country'].value_counts()[:15].values,palette='inferno')
plt.title('Top Countries Affected')
plt.xlabel('Countries')
plt.ylabel('Count')
plt.xticks(rotation= 90)
plt.show()

* Attacks vs Killed

In [None]:
count_terror=df['Country'].value_counts()[:15].to_frame()
count_terror.columns=['Attacks']
count_kill=df.groupby('Country')['Killed'].sum().to_frame()
count_terror.merge(count_kill,left_index=True,right_index=True,how='left').plot.bar(width=0.9)
fig=plt.gcf()
fig.set_size_inches(18,6)
plt.show()

* Top Cities affected by Terror Attacks

In [None]:
plt.subplots(figsize=(15,6))
sns.barplot(df['city'].value_counts()[1:15].index,df['city'].value_counts()[1:15].values,palette='inferno')
plt.title('Top Cities Affected')
plt.xlabel('Cities')
plt.ylabel('Count')
plt.xticks(rotation= 90)
plt.show()

#### **Activity of Top Terrorist Groups**
* 'Unknown' is the most numerous group names in terrorist attacks, so you see the top 10 groups whose perpetrators are known.
* As can be seen in the graph, terrorist activities broke out after 2010 and it is seen that the majority of these terrorist attacks were carried out by ISIL.

In [None]:
top_groups10=df[df['Group_Name'].isin(df['Group_Name'].value_counts()[1:11].index)]
pd.crosstab(top_groups10.Year,top_groups10.Group_Name).plot(color=sns.color_palette('Paired',10))
fig=plt.gcf()
fig.set_size_inches(18,6)
plt.show()

* Top 15 most active terrorist groups.

In [None]:
sns.barplot(df['Group_Name'].value_counts()[1:15].values,df['Group_Name'].value_counts()[1:15].index,palette=('dark'))
plt.xticks(rotation=90)
fig=plt.gcf()
fig.set_size_inches(10,8)
plt.title('Terrorist Groups with Highest Terror Attacks')
plt.show()

#### **Attack Type**

* The number of types of terrorist attacks in the world.

In [None]:
plt.subplots(figsize=(15,6))
sns.countplot('Attack_Type',data=df,palette='inferno',order=df['Attack_Type'].value_counts().index)
plt.xticks(rotation=90)
plt.title('Attacking Methods by Terrorists')
plt.show()

#### **AttackType vs Region**

In [None]:
pd.crosstab(df.Region,df.Attack_Type).plot.barh(stacked=True,width=1,color=sns.color_palette('hls',8))
fig=plt.gcf()
fig.set_size_inches(12,8)
plt.show()


#### **Target Type** 

* The numbers of terrorist attacks carried out by target type are shown in the column chart below.

In [None]:
plt.subplots(figsize=(15,6))
sns.countplot('Target_Type',data=df,palette='inferno',order=df['Target_Type'].value_counts().index)
plt.xticks(rotation=90)
plt.title('Type of Target Attacked by Terrorists')
plt.show()

#### **Weapon Type** 

In [None]:
df["Weap_Type"].value_counts()

* The numbers of terrorist attacks carried out by weapon type are shown in the column chart below.

In [None]:
plt.subplots(figsize=(15,6))
sns.countplot('Weap_Type',data=df,palette='inferno',order=df['Weap_Type'].value_counts().index)
plt.xticks(rotation=90)
plt.title('Type of Weapon Attacked by Terrorists')
plt.show()

## **Results**

* Iraq has been observed as the hottest region of terrorism.
* The countries with the highest terrorism are listed as follows:
1. Iraq
2. Pakistan
3. Afghanistan 
4. India 
5. Colombia 
6. Philippines 
7. Peru 
8. El Salvador 
9. United Kingdom 
10. Turkey
* The most dangerous city because of terrorist attacks is Baghdad(Iraq).
* The cities with the most terrorist attacks are listed as follows:
1. Baghdad 
2. Karachi 
3. Lima 
4. Mosul 
5. Belfast
* Terrorist groups with the most terrorist activities are listed as follows:
1. Taliban
2. Islamic State of Iraq and the Levant (ISIL) 
3. Shining Path (SL) 
4. Farabundo Marti National Liberation Front (FMLN) 
5. Al-Shabaab 
6. New People's Army (NPA) 
7. Irish Republican Army (IRA) 
8. Revolutionary Armed Forces of Colombia (FARC) 
9. Boko Haram 
10. Kurdistan Workers' Party (PKK)
* The types of terrorist attacks are listed as follows:
1. Bombing / Explosion
2. Armed Assault 
3. Assassination 
4. Hostage Taking (Kidnapping) 
5. Facility / Infrastructure Attack
* The target types of terrorist attacks are as follows:
1. Private Citizens & Property
2. Military 
3. Police 
4. Government (General) 
5. Business
* The types of weapons used in terrorist attacks are listed as follows:
1. Explosives 
2. Firearms 
3. Unknown 
4. Incendiary 
5. Melee
* Especially in the 2010s, terrorist attacks peaked. Attacks have been decreasing in recent years, albeit slowly.
