***Lets Grow More***

**2)Intermediate level task:
 Sub task 01)**

**Title:** Exploratory Data Analysis on Dataset - Terrorism 




**Introduction:**
As a security/defense analyst, try to find out the hot zone of terrorism. Here, we will apply an exploratory data analysis, look for patterns and explanations related to the context and present the conclusions in a dynamic and visual ways. 

We will use libs like **Folium**, **Seaborn**, **Matplotlib** and other usefull tools to try to see:

In [None]:
#importing required libraries
import numpy as np
import pandas as pd
import folium 
from folium.plugins import FastMarkerCluster, Fullscreen, MiniMap, HeatMap, HeatMapWithTime
from branca.colormap import LinearColormap
import os
import json
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_style('darkgrid')
import re
import warnings
warnings.filterwarnings('ignore')

In [None]:
data=pd.read_csv('globalterrorismdb_0718dist.csv')


In [None]:
#Making copies of dataset for further use
data_ter=data.copy()#for plots
data_map=data.copy()#for maps

In [None]:
data.head(2)

In [None]:
data.tail(2)

In [None]:
# Columns in the dataframe
data.columns

In [None]:
# Selecting Wanted columns
usecols = [1, 5, 8, 10, 11, 12, 13, 14, 25, 26, 27, 29, 35, 58, 69, 71, 82, 98, 100, 101, 103, 104, 106]
renamecols = {
    'latitude': 'lat',
    'longitude': 'lon',
    'iyear': u'year',
    'country_txt': u'country',
    'region_txt': u'region',
    'provstate': u'state',
    'attacktype1_txt': u'attacktype',
    'targtype1_txt': u'targettype',
    'weaptype1_txt': u'weapontype',
    'nperps': u'nter',
    'nkill': u'nkilled',
    'nkillter': u'nkilledter',
    'nwound': u'nwounded',
    'nwoundte': u'nwoundedter',
    'propextent_txt': u'propertyextent'
}

In [None]:
data.rename(columns=renamecols, inplace=True)

In [None]:
# Removing unknown values in the coordinates
data = data[pd.notnull(data.lat)]
data = data[pd.notnull(data.lon)]
print("Unknown values in the coordinates are removed succcessfully")

In [None]:
# Unknowns in numeric columns
exclude_cols = ['year', 'lat', 'lon']
float_cols = [c for c in data.select_dtypes(include=[float]).columns.tolist() if c not in exclude_cols]

In [None]:
data[float_cols] = data[float_cols].fillna(0).astype(int)
data[float_cols] = data[float_cols].mask(data[float_cols] < 0, 0)

In [None]:
# Limit Long strings
data['weapontype'] = data['weapontype'].replace(u'Vehicle (not to include vehicle-borne explosives, i.e., car or truck bombs)', 'Vehicle')
data['propertyextent'] = data['propertyextent'].replace(u'Minor (likely < $1 million)', u'Minor (< $1 million)')
data['propertyextent'] = data['propertyextent'].replace(u'Major (likely > $1 million but < $1 billion)', u'Major (< $1 billion)')
data['propertyextent'] = data['propertyextent'].replace(u'Catastrophic (likely > $1 billion)', u'Catastrophic (> $1 billion)')

In [None]:
# Number of duplicates values
data.duplicated().sum()

In [None]:
# Datatypes in the Dataframe
data.dtypes

In [None]:
# Summary of Dataset
data.info()

In [None]:
# Count of Values in each column of DataFrame
data.nunique()

In [None]:
# Years in the DataFrame
data["year"].unique()

In [None]:
# Count of each year in the Data Frame
data["year"].value_counts()

In [None]:
# regions in the DataFrame
data['region']

In [None]:
# Total count of gname in the DataFrame
data['gname'].value_counts()

In [None]:
# Total count of cities in the DataFrame
data['city'].value_counts()

In [None]:
# Total count of Attack type in the DataFrame
data['attacktype'].value_counts()

In [None]:
# Total count of Target type in the DataFrame
data['targettype'].value_counts()

In [None]:
# Stastical Summary of Data
data.describe()

# Countries and Terrorism

Let's change the scenery and see the effects of terrorism in specific countries. First of all, let's take a look at the main countries affected by terrorism.

For plots we are using copy of original dataset i.e data_ter

In [None]:
data_ter.head()

In [None]:
data_ter.columns

In [None]:
# Renameing Wanted columns
data_ter.rename(columns = {
    'latitude': 'lat',
    'longitude': 'lon',
    'iyear': 'year',
    'country_txt': 'country',
    'provstate': 'state',
    'attacktype1_txt': 'attacktype',
    'targtype1_txt': 'targettype',
    'weaptype1_txt': 'weapontype',
    'nperps': 'nter',
    'nkill': 'nkilled',
    'nkillter': 'nkilledter',
    'nwound': 'nwounded',
    'nwoundte': 'nwoundedter',
    'propextent_txt': 'propertyextent'
},inplace=True)

In [None]:
# Terrorist Attack Year VS Region
df_region=pd.crosstab(data_ter.year,data_ter.region_txt)
df_region.plot(color=sns.color_palette('Set2',12))
fig=plt.gcf()
plt.title("Terrorist Attack Year VS Region", fontsize=20)
fig.set_size_inches(15,8)
plt.show()

In [None]:
# Terrorist Attack Year VS Target Type
df_region=pd.crosstab(data_ter.year,data_ter.targettype)
df_region.plot(color=sns.color_palette('Set2',12))
fig=plt.gcf()
plt.title("Terrorist Attack Year VS Target Type", fontsize=20)
fig.set_size_inches(15,8)
plt.show()

In [None]:
# Wounded VS Year
d=data_ter.groupby(['year','region_txt'])['nwounded'].sum()
plot_df_terrorism = d.unstack('region_txt').loc[:]
plot_df_terrorism.index = pd.PeriodIndex(plot_df_terrorism.index.tolist(),freq='A')
plot_df_terrorism.plot(figsize=(15,8),color=sns.color_palette('Set2',12))
plt.title("Wounded Vs Year", fontsize=20)
plt.xlabel("Year")
plt.ylabel("Wounded")

As we have already seen in our first geographical plot, the highest concentration of incidentes recorded are from Middle East & North Africa. The region represents 27.8% of all records between 1970 and 2017. 

In the next plot, we will make a comparison of this historical data with 2017 data, but this time looking at the top 10 countries if highest nuber of terrorist incidents.

In [None]:

def format_spines(ax, right_border=True):
    ax.spines['bottom'].set_color('#CCCCCC')
    ax.spines['left'].set_color('#CCCCCC')
    ax.spines['top'].set_visible(False)
    if right_border:
        ax.spines['right'].set_color('#CCCCCC')
    else:
        ax.spines['right'].set_color('#FFFFFF')
    ax.patch.set_facecolor('#FFFFFF')
    
def count_plot(feature, df, colors='Blues_d', hue=False, ax=None, title=''):
    ncount = len(df)
    if hue != False:
        ax = sns.countplot(x=feature, data=df, palette=colors, hue=hue, ax=ax, 
                           order=df[feature].value_counts().index)
    else:
        ax = sns.countplot(x=feature, data=df, palette=colors, ax=ax,
                           order=df[feature].value_counts().index)

    
    ax2=ax.twinx()

    
    ax2.yaxis.tick_left()
    ax.yaxis.tick_right()

    
    ax.yaxis.set_label_position('right')
    ax2.yaxis.set_label_position('left')
    ax2.set_ylabel('Frequency [%]')
    frame1 = plt.gca()
    frame1.axes.get_yaxis().set_ticks([])

    
    format_spines(ax)
    format_spines(ax2)

    
    for p in ax.patches:
        x=p.get_bbox().get_points()[:,0]
        y=p.get_bbox().get_points()[1,1]
        ax.annotate('{:.1f}%'.format(100.*y/ncount), (x.mean(), y), 
                ha='center', va='bottom') 
    
    
    if not hue:
        ax.set_title(df[feature].describe().name + ' Counting plot', size=13, pad=15)
    else:
        ax.set_title(df[feature].describe().name + ' Counting plot by ' + hue, size=13, pad=15)  
    if title != '':
        ax.set_title(title)       
    plt.tight_layout()
    
def country_analysis(country_name, data, palette, colors_plot2, color_lineplot):
    
    country = data.query('country_txt == @country_name')
    if len(country) == 0:
        print('Country did not exists in dataset')
        return 
    country_cities = country.groupby(by='city', as_index=False).count().sort_values('eventid', ascending=False).iloc[:5, :2]
    suicide_size = country['suicide'].sum() / len(country)
    labels = ['Suicide', 'Not Suicide']
    colors = colors_plot2
    country_year = country.groupby(by='iyear', as_index=False).sum().loc[:, ['iyear', 'nkill']]
    country_weapon = country.groupby(by='weaptype1_txt', as_index=False).count().sort_values(by='eventid',ascending=False).iloc[:,:2]
    # Dashboard
    fig, axs = plt.subplots(nrows=2, ncols=2, figsize=(15, 10))
    
    # Plot 1 - Top 5 terrorism cities
    sns.barplot(x='eventid', y='city', data=country_cities, ci=None, palette=palette, ax=axs[0, 0])
    format_spines(axs[0, 0], right_border=False)
    axs[0, 0].set_title(f'Top 5 {country_name} Cities With Most Terrorism Occurences')
    """for p in axs[0, 0].patches:
        width = p.get_width()
        axs[0, 0].text(width-290, p.get_y() + p.get_height() / 2. + 0.10, '{}'.format(int(width)), 
                ha="center", color='white')"""
    axs[0, 0].set_ylabel('City')
    axs[0, 0].set_xlabel('Victims')
    
    # Plot 2 - Suicide Rate
    center_circle = plt.Circle((0,0), 0.75, color='white')
    axs[0, 1].pie((suicide_size, 1-suicide_size), labels=labels, colors=colors_plot2, autopct='%1.1f%%')
    axs[0, 1].add_artist(center_circle)
    format_spines(axs[0, 1], right_border=False)
    axs[0, 1].set_title(f'{country_name} Terrorism Suicide Rate')
    axs[0, 0].set_ylabel('Victims')
    
    # Plot 3 - Victims through the years
    sns.lineplot(x='iyear', y='nkill', data=country_year, ax=axs[1, 0], color=color_lineplot)
    format_spines(axs[1, 0], right_border=False)
    axs[1, 0].set_xlim([1970, 2017])
    axs[1, 0].set_title(f'{country_name} Number of Victims Over Time')
    axs[1, 0].set_ylabel('Victims')
    
    # Plot 4 - Terrorism Weapons
    sns.barplot(x='weaptype1_txt', y='eventid', data=country_weapon, ci=None, palette=palette, ax=axs[1, 1])
    axs[1, 1].set_xticklabels(axs[1, 1].get_xticklabels(), rotation=90)
    axs[1, 1].set_xlabel('')
    axs[1, 1].set_ylabel('Count')
    format_spines(axs[1, 1], right_border=False)
    axs[1, 1].set_title(f'{country_name} Weapons Used in Attacks')
    
    plt.suptitle(f'Terrorism Analysis in {country_name} between 1970 and 2017', size=16)    
    plt.tight_layout()
    plt.subplots_adjust(top=0.90)
    plt.show()

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))
count_plot('region_txt', data_ter, ax=ax, colors='rainbow_r')
ax.set_xticklabels(ax.get_xticklabels(), rotation=90)
ax.set_title('Distribution of Attacks per Region (1970-2017)', size=15)
plt.show()

In [None]:
country_victims = data_map.groupby(by='country_txt', as_index=False).sum().sort_values(by='nkill',ascending=False).loc[:, ['country_txt', 'nkill']]
country_victims = country_victims.iloc[:10, :]

terr_data_2017 = data_map.query('iyear == 2017')
country_victims_2017 = terr_data_2017.groupby(by='country_txt', as_index=False).sum().sort_values(by='nkill', ascending=False).loc[:, ['country_txt','nkill']]
country_victims_2017 = country_victims_2017.iloc[:10, :]
country_victims_2017['country_txt'][16] = 'Central African Rep.'
country_victims_2017['country_txt'][22] = 'Dem. Rep. Congo'

fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(15, 7))

sns.barplot(x='nkill', y='country_txt', data=country_victims, ci=None,
                 palette='twilight', ax=axs[0])
sns.barplot(x='nkill', y='country_txt', data=country_victims_2017, ci=None,
                 palette='twilight_r', ax=axs[1])

format_spines(axs[0], right_border=False)
format_spines(axs[1], right_border=False)
axs[0].set_title('Top 10 - Total Victims by Country (1970-2017)')
axs[1].set_title('Top 10 - Total Victims by Country (2017)')
axs[0].set_ylabel('')
axs[1].set_ylabel('')

for p in axs[0].patches:
    width = p.get_width()
    axs[0].text(width-4000, p.get_y() + p.get_height() / 2. + 0.10, '{}'.format(int(width)), 
            ha="center", color='white')

for p in axs[1].patches:
    width = p.get_width()
    axs[1].text(width-300, p.get_y() + p.get_height() / 2. + 0.10, '{}'.format(int(width)), 
            ha="center", color='white')

plt.show()

With the grap above we can see that Iraq and Afghanistan are the countries with most terrorism occurences in 2017 (and also in all period). Colombia, Peru and El Salvador appear in historica data but don't appear in 2017 data maybe because of past conflicts. Let's make a more specific analysis in some countries to see more details.

In [None]:
country_analysis(country_name='Iraq', data=data_map, palette='Wistia_r', 
                 colors_plot2=['cyan', 'orange'], color_lineplot='cyan')

In [None]:
country_analysis(country_name='United States', data=data_map, palette='ocean', 
                 colors_plot2=['black', 'navy'], color_lineplot='navy')

In [None]:
country_analysis(country_name='Nigeria', data=data_map, palette='mako', 
                 colors_plot2=['aquamarine', 'indigo'], color_lineplot='indigo')

In [None]:
country_analysis(country_name='Colombia', data=data_map, palette='hot', 
                 colors_plot2=['crimson', 'gold'], color_lineplot='brown')

In [None]:
country_analysis(country_name='Egypt', data=data_map, palette='Pastel1',
                 colors_plot2=['lavender', 'pink'], color_lineplot='pink')

In [None]:
heat_data = data_map.groupby(by=['latitude', 'longitude'], 
                                 as_index=False).count().sort_values(by='eventid', ascending=False).iloc[:, :3]

m = folium.Map(
    location=[33.312805, 44.361488], 
    zoom_start=2.5, 
    tiles='Stamen Toner'
)

HeatMap(
    name='Mapa de Calor',
    data=heat_data,
    radius=10,
    max_zoom=13
).add_to(m)

Fullscreen(
    position='topright',
    title='Expand me',
    title_cancel='Exit me',
    force_separate_button=True
).add_to(m)

m.save('terrorism_density.html')
m

In [None]:
year_list = []
for year in data_map['iyear'].sort_values().unique():
    data = data_map.query('iyear == @year')
    data = data_map.groupby(by=['latitude', 'longitude'], 
                        as_index=False).count().sort_values(by='eventid', ascending=False).iloc[:, :3]
    year_list.append(data.values.tolist())

m = folium.Map(
    location=[0, 0], 
    zoom_start=2.0, 
    tiles='Stamen Toner'
)

HeatMapWithTime(
    name='Terrorism Heatmap',
    data=year_list,
    radius=9,
    index=list(data_map['iyear'].sort_values().unique())
).add_to(m)

m

In [None]:
month_index = [
    'jan/2017',
    'feb/2017',
    'mar/2017',
    'apr/2017',
    'may/2017',
    'jun/2017',
    'jul/2017',
    'aug/2017',
    'sep/2017',
    'oct/2017',
    'nov/2017',
    'dec/2017'
]

month_list = []
for month in data_map.query('iyear==2017')['imonth'].sort_values().unique():
    data = data_map.query('imonth == @month')
    data = data_map.groupby(by=['latitude', 'longitude'], 
                        as_index=False).sum().sort_values(by='imonth', 
                                                          ascending=True).loc[:, ['latitude', 
                                                                                   'longitude', 
                                                                                   'nkill']]
    month_list.append(data.values.tolist())

m = folium.Map(
    location=[0, 0], 
    zoom_start=1.5, 
    tiles='Stamen Toner'
)

HeatMapWithTime(
    name='Mapa de Calor',
    data=month_list,
    radius=4,
    index=month_index
).add_to(m)

m

The most recent data we have is from 2017. Let's plot a global heatmap to see incidents among the months of 2017. Using the selection bar in the bottom, we can see the concentration of terrorism from january to december of 2017.

# Conclusion: Terrorism Around the World

Well, here we can see clearly that Iraq is the country with the highest number of incidents recorded. The map also shows tooltips with the name of the country, number of incidents and total of victims recorded. Another thing that can be said looking at the map is that the Middle East and South Asia are the regions with the highes number of recorded attacks between 1970 and 2017.
**Details:
<br>-Country with the highest number of Terrorist Attacks: **Iraq**  
<br>-Regions with the highest number of  Terrorist Attacks: **Middle East & North Africa**  
<br>-Maximum number of people were killed by a single terrorist attack are **1570 people** that took place in Iraq  
<br>-Year with the most Attacks: **2014**  
<br>-Month with the most Attacks: **5**  
<br>-Group with the most Attacks: **Taliban**  
<br>-Most Attacks Types: **Bombing/Explosion** 