# 1. Introduction

Terrorism. A word that most of us associate with terrible memories and experiences. Every day we are reminded by the news or by the first-hand experience that this kind of terrible action seems to occur more often than ever before. This begs the question. Has the number of terrorist attacks increased over the years?

So, with this in mind, I'm going to explore the data and see if any trends emerge. I'll mainly try to answer the following questions:
- Has the number of attacks increased over the years?
- Which regions are most affected with regards to:
    - Number of attacks
    - Casualties
    - Successful attacks
- Which type of attack has been the most common and killed most people over the years?

Terrorism can be a controversial subject especially with regards to the current media climate and all the talk about fake news. Therefore, I'll let the data speak for itself and be very explicit when I deviate from this to share my thoughts.

The dataset used in this project is the [Global Terrorism Database](https://www.kaggle.com/START-UMD/gtd) which contains more than 180,000 terrorist attacks worldwide between 1970-2017.

# 2. Data Exploration

The first step will be to import all the necessary libraries and files.

In [None]:
import pandas as pd
import numpy as np
import os
import io
import base64

import matplotlib.pyplot as plt
from plotly import tools
import plotly.graph_objs as go
import plotly
import plotly.offline as py
import cartopy.crs as ccrs

from IPython.display import HTML

py.init_notebook_mode(connected=True)
%matplotlib inline

In [None]:
df_raw = pd.read_csv('globalterrorismdb_0718dist.csv',encoding='ISO-8859-1', low_memory=False)
df_raw.head(1)

In [None]:
def display_all(df, rows=1000, columns=1000):
    """
    Displays specified/default number of rows and columns
    
    Args:
        df (DataFrame): DataFrame to display
        rows (int): Number of rows
        columns (int): Number of columns
    """
    with pd.option_context("display.max_rows", rows, "display.max_columns", columns):
        display(df)

In [None]:
display_all(df_raw.head(3))

In [None]:
rows = df_raw.shape[0]
columns = df_raw.shape[1]
print(f'Rows in dataset: {rows}')
print(f'Columns in dataset: {columns}')

The dataset contains 181,691 rows (attacks) and 135 columns (features). Next up is to remove the features that don't seem to provide any value for this analysis as well as renaming the remaining. I'll also create a new feature, "casualties" which will be the combined value of nkill(killed) and nwound(injured).

After working on the dataset for a while, I discovered that it appears to contain duplicated data. There are examples of exact duplicates. A possible explanation for this may have to do with how the data is collected. If the sources are in agreement, there is a risk for exact duplicates to make it into the dataset by mistake.

In [None]:
columns = list(df_raw.columns)
# remove eventid (unique value for each row)
del columns[0]
df_raw = df_raw.drop_duplicates(subset=columns)
new_rows = df_raw.shape[0]
print(f'Rows in dataset after exact duplicates removed: {df_raw.shape[0]}')
print(f'Rows removed: {rows - new_rows}')

In [None]:
df_raw.rename(columns={
    'iyear': 'Year',
    'imonth': 'Month',
    'iday': 'Day',
    'country_txt': 'Country',
    'city': 'City',
    'region_txt': 'Region',
    'latitude': 'Latitude',
    'longitude': 'Longitude',
    'attacktype1_txt': 'AttackType',
    'target1': 'Target',
    'targtype1_txt': 'TargetType',
    'nkill': 'Killed',
    'nwound': 'Injured',
    'success': 'Success',
    'gname': 'Group',   
}, inplace=True)
df_raw['Casualties'] = df_raw['Killed'] + df_raw['Injured']

In [None]:
df = df_raw[['Year', 'Month', 'Day', 'Country', 'City', 'Region', 'Latitude', 'Longitude', 'AttackType', 'Target', 'TargetType', 'Killed', 'Injured', 'Success', 'Casualties', 'Group']]

In [None]:
df.head(3)

Those steps made the dataset less noisy and easier to work with. Here is also a great place to start doing some fundamental analysis. Let's assess how many missing values and which kind of features the dataset contains.

## 2.1 Basic Analysis

In [None]:
num_cols = df._get_numeric_data().columns
cat_cols = df.select_dtypes(include=['object']).columns

In [None]:
num_cols

In [None]:
cat_cols

In [None]:
df.describe()

In [None]:
df.info()

In [None]:
df.isnull().sum().sort_values()

Most of the features that are important to this investigation have no or only a few missing values. One observation that could be a problem further down the road is if an attack has missing values in both the city and longitude/latitude columns. If this is the case, it will be hard to pinpoint the exact location of the attack. However, if only one of those is missing it should be reasonably straightforward to find the location and then update the attack with the correct value.

# 3. Data Preparation

As noted earlier most of the columns that are important to this analysis have zero or a low amount of missing values.

To be able to plot each attack on a map its necessary to have the longitude and latitude, but some attacks have missing values in those columns. I planned on using Google maps geocoding API to retrieve long/lat for each row that had a value in the "City" column. I'll not use this approach since Google has changed from a free limit to pay as you go. The result of this is simply that those attacks will not be plotted on the map.

Because this analysis will not make use of any machine learning model, no imputing of missing data will be made. The fact that some attacks have missing data with regards to killed/injured should not make any difference because Pandas excludes them from the sum()/mean() operations. If I were to use a machine learning model, I would start by trying to figure out why the data is missing. Do we have missing values randomly or do the missing values convey information about the attack? After this I would either remove the rows with missing values or impute the mean value.

No encoding/creating dummy variables will be applied to the categorical variables in the dataset (same argument as above).

# 4. Analysis

This analysis will try to answer my three main questions as well as any sub-questions. First out is the question of whether the number of attacks has increased over the years. Following this will be an investigation on a regional level, and lastly, I'll try and get an understanding of the most used form of attack type.

## 4.1 Attacks on global level

In [None]:
tickvals = np.arange(df['Year'].min(), df['Year'].max() + 1)

data =  [go.Bar(
        y = df['Year'].value_counts(sort=False),
        x = df['Year'].value_counts(sort=False).index,
        marker = dict(
            color='rgba(222,45,38,0.8)'
        )
)]

layout = go.Layout(
    title="Number of Terrorist Attacks by Year",
    xaxis=dict(
        title='Year',
        tickvals=tickvals
    ),
    yaxis=dict(
        title='Number of Attacks'
    )
)


figure = go.Figure(data=data, layout=layout)
py.iplot(figure)

Missing data for the year 1993 looks to be due to issues with that year. More information about this can be found [here](https://www.start.umd.edu/gtd/faq/). Henceforth I'll remove the year 1993 from the visuals/analysis.

By looking at the plot, one might come to the shocking conclusion that there has been a drastic increase in the number of attacks in the last 5-6 years. It's important to note that this could be due to the change that was made in 2012 to the data collection methodology.
In this [article](https://www.start.umd.edu/news/discussion-point-benefits-and-drawbacks-methodological-advancements-data-collection-and-coding), Michael Jensen writes: _"While there is no simple answer to this question, what is certain is that by the start of the 2012 collection effort, the staff working on the GTD had become better than ever at identifying terrorist attacks, regardless of where they happened to occur."_ Later in the same article, he writes: _"With that said, the GTD team believes that some portion of the observable increase in terrorist activity since 2011 is the result of new advancements in collection methodology."_

This suggests that there may well be a connection between the new data collection methodology and the increase in reported attacks. Therefore, the results shown here on forth should be considered with care.

In [None]:
# list of years between 1970 - 2017
years = [x for x in range(1970, 2018) if x != 1993]

In [None]:
avg_killed = df.groupby(['Year'])['Killed'].mean()
avg_injured = df.groupby(['Year'])['Injured'].mean()

trace0 = go.Scatter(
    x=years,
    y=avg_killed,
    name='Killed',
    marker=dict(
        color= 'rgba(222,45,38,0.8)'
    ),
    line=dict(),
    opacity = 0.8
)

trace1 = go.Scatter(
    x=years,
    y=avg_injured,
    name='Injured',
    line=dict(),
    marker=dict(
        color='rgb(49,130,189)'
    ),
    opacity = 0.8
)

layout = dict(
    title = 'Average Killed/Injured per Attack by Year',
    xaxis = dict(
        title="Year"
    ),
    yaxis = dict(
        title="Count"
    )
)

data = [trace0, trace1]

fig = dict(data=data, layout=layout)
py.iplot(fig)

In [None]:
avg_killed = df.groupby(['Year'])['Killed'].sum()
avg_injured = df.groupby(['Year'])['Injured'].sum()

trace0 = go.Scatter(
    x=years,
    y=avg_killed,
    name='Killed',
    marker=dict(
        color= 'rgba(222,45,38,0.8)'
    ),
    line=dict(),
    opacity = 0.8
)

trace1 = go.Scatter(
    x=years,
    y=avg_injured,
    name='Injured',
    line=dict(),
    marker=dict(
        color='rgb(49,130,189)'
    ),
    opacity = 0.8
)

layout = dict(
    title = 'Total Killed/Injured by Year',
    xaxis = dict(
        title="Year"
    ),
    yaxis = dict(
        title="Count"
    )
)

data = [trace0, trace1]

fig = dict(data=data, layout=layout)
py.iplot(fig)

When looking at the later plot an expected trend emerges, a steady increase in the number of killed/injured people between 2003 - 2015 and then a dip. This is probably related to the rise in attacks observed earlier during the same period. Only a couple of years (mid-1980, 1997 & 2014) have a higher number of killed vs. injured.

The big spike in the year 2001 is reasonably explained by the attack in New York City. 


During the period between 1995 and 2007, it appears to have occurred a few attacks with a large number of people killed/injured than usual. Hence the higher averages. However, since then the situation seems to have stabilized.

## 4.2 Attacks on a regional level

Let's investigate each region.

After some basic plots, I'll try to combine the insights gained so far to create a video that shows the attacks on a world map. I think that this will illustrate the evolution throughout the years in a good way.

In [None]:
def plot_attacks_per_location(location, top=None):
    """
    Plots total number of terrorist attacks per location.
    
    Args:
        location (string): Name of the column (region/country/city)
        top (int): Default None. Number of location to show.
    """
    # different red colors
    colors_list = ['rgba(103,0,13,1)', 'rgba(165,15,21,1)', 'rgba(203,24,29,1)', 'rgba(239,59,44,1)',
                      'rgba(251,106,74,1)','rgba(252,146,114,1)','rgba(252,187,161,1)','rgba(254,224,210,1)',
                      'rgba(255,245,240,1)']
    
    # code below is to make sure that all the bars after 8 gets the same color
    colors_to_use = []
    location_num = top if top != None else df[location].shape[0]
    #for i in range(location_num):
        #if i > 8:
        #    colors_to_use.append(colors_list[8])
        #else:
        #    colors_to_use.append(colors_list[i])

    data =  [go.Bar(
            y = df[location].value_counts()[:top if top != None else len(df[location])],
            x = df[location].value_counts()[:top if top != None else len(df[location])].index,
            marker=dict(
                color='rgba(222,45,38,0.8)')
    )]

    layout = go.Layout(
        title=f'Number of Terrorist Attacks per {location} (1970 - 2017)',
        xaxis=dict(
            title=location,
        ),
        yaxis=dict(
            title='Number of Attacks'
        )
    )

    figure = go.Figure(data=data, layout=layout)
    py.iplot(figure)

In [None]:
plot_attacks_per_location('Region')

In [None]:
def attacks_region(region):
    """
    Returns the total amount of attacks for specified region
    
    Args:
        region (string): Region for which we want the total number of attacks
    Returns:
        attacks (int): Total number of attacks for region
    """
    attacks = len(df[df['Region'] == region])
    return attacks
    

In [None]:
# dataframe
df_temp = pd.DataFrame({
    'NumberOfAttacks' : df.groupby( [ "Year", "Region"] ).size(),
    'TotalCasualties': 0,
    'Successful': 0,
    'Failed': 0
}).reset_index()

for index, row in df_temp.iterrows():
    total_casualties = df.loc[(df['Region'] == row['Region']) & (df['Year'] == row['Year']), 'Casualties'].sum()
    df_temp.at[index, 'TotalCasualties'] = total_casualties
    
    sucessfull = df.loc[(df['Region'] == row['Region']) & (df['Year'] == row['Year']) & (df['Success'] == 1), 'Success'].sum()
    df_temp.at[index, 'Successful'] = sucessfull
    failed = len(df.loc[(df['Region'] == row['Region']) & (df['Year'] == row['Year']) & (df['Success'] == 0), 'Success'])
    df_temp.at[index, 'Failed'] = failed

In [None]:
total_attacks = df.shape[0]

middle_east_north_africa_attacks= attacks_region('Middle East & North Africa')
south_asia_attacks = attacks_region('South Asia')
central_asia_attacks = attacks_region('Central Asia')
australasia_oceania_attacks = attacks_region('Australasia & Oceania')

print('% of total attacks in Middle East and North Afrika: {:0.2f} %'.format((middle_east_north_africa_attacks/total_attacks) * 100))
print('% of total attacks in South Asia: {:0.2f} %'.format((south_asia_attacks/total_attacks) * 100))
print('% of total attacks in Central Asia: {:0.2f} %'.format((central_asia_attacks/total_attacks) * 100))
print('% of total attacks in Australasia & Oceania {:0.2f} %'.format((australasia_oceania_attacks/total_attacks) * 100))

It's quite clear which regions that have endured most attacks over the years. The Middle East & North Africa together with South Asia have had over 52 % of all the attacks between 1970 and 2017. When comparing those regions with the ones that have had the least amount of attacks the skewness becomes obvious.

Let's investigate the number of attacks per region on a year to year basis.

In [None]:
traces= []
regions = list(df['Region'].unique())

for region in regions:
    trace = go.Scatter(
        x=years,
        y=df_temp[df_temp['Region'] == region]['NumberOfAttacks'],
        name= region,      
        line = dict(),
        opacity = 0.8
    )
    traces.append(trace)

data = traces
layout = dict(
    title = 'Attacks within each Region by Year',
    xaxis = dict(
        title="Year"
    ),
    yaxis = dict(
        title="Number of Attacks"
    )
)
fig = dict(data=data, layout=layout)
py.iplot(fig)

This plot shows the same trend as above. The question is whether the increase is due to more sophisticated data collection methods and communication or that the number of attacks has increased.

An observation from this plot is that both the regions with the highest number of attacks seem to be the only regions to have had a substantial increase in attacks around 2003. A deeper dive into this would have been interesting, but because it is beyond the purpose of the analysis, this will not be done. Another interesting phenomenon, which all regions share, is the noticeable drop in attacks in 1998.

The number of attacks is one metric, but it could also be of value to look at the number of successful/failed attacks and the amount of killed and injured people.

In [None]:
regions_sorted = []
for region, attacks in df['Region'].value_counts().iteritems():
    regions_sorted.append(region)

In [None]:
# these lists will be used inside plots down below
number_killed_regions = []
number_injured_regions = []
success_regions = []
failed_regions = []

for region in regions_sorted:
    killed = df[df['Region'] == region]['Killed'].sum()
    injured = df[df['Region'] == region]['Injured'].sum()
    successful = len(df[(df['Region'] == region) & (df['Success'] == 1)])
    failed = len(df[(df['Region'] == region) & (df['Success'] == 0)])

    
    number_killed_regions.append(killed)
    number_injured_regions.append(injured)
    success_regions.append(successful)
    failed_regions.append(failed)

In [None]:
def align_arrays(regions, feat1, feat2):
    """
    Takes three lists and sorts them all based on the values inside feat1.sort(reverse=True)
    
    Args:
        regions (list): List with region names
        feat1 (list): List with integers that we want to sort by descending
        feat2 (list): List with integers
    
    Returns:
        regions_sorted (list): Regions sorted based on feat1_sorted
        feat1_sorted (list): feat1 sorted by descending
        feat2_sorted (list): feat2 sorted based on feat1_sorted
    """
    feat1_sorted = feat1.copy()
    feat2_sorted = feat2.copy()
    feat1_sorted.sort(reverse=True)
    regions_sorted = []
    for i in range(len(regions)):
        regions_sorted.append("")
    for idx, val in enumerate(feat1_sorted):
        index = feat1.index(val)
        regions_sorted[idx] = regions[index]
        feat2_sorted[idx] = feat2[index]
    
    return regions_sorted, feat1_sorted, feat2_sorted


In [None]:
def plot_two_bars(names, yaxis, xaxis, colors = None):
    """
    Plots grouped bar chart for y values and x.
    
    Args:
        names (list): List of names to be used for axis
        y1 (list): List containing integers for bar chart 1
        y2 (list): List containing integers for bar chart 2
        xaxis (string): List containing strings for xaxis
        colors (list): List of colors to be used
    """
    traces = []
    counter = 0
    colors = ['rgba(222,45,38,0.8)', 'rgb(204,204,204)']
    
    for y in yaxis:
        trace = go.Bar(
            x=xaxis,
            y=y,
            name=names[counter],
            marker=dict(
                color=colors[counter]
            )
        )
        traces.append(trace)
        counter += 1

    data = traces
    layout = go.Layout(
        title=f'{names[0]} and {names[1]} by Region',
        xaxis=go.layout.XAxis(
            automargin=True,
            tickangle=-50,
            title='Region',
        ),
        yaxis=dict(
            title="Count"
        ),
        barmode='group',
    )

    fig = go.Figure(data=data, layout=layout)
    py.iplot(fig)

In [None]:
reg, killed, injured = align_arrays(regions_sorted, number_killed_regions, number_injured_regions)

plot_two_bars(['Killed', 'Injured'], [killed, injured], reg)

The plot above is sorted based on the total amount of people killed. Here we see a slight shift. Regions such as Sub-Saharan Africa and Central America & Caribbean looks to have endured attacks that are more deadly as well as more casualties in proportion to the number of attacks.

In [None]:
plot_two_bars(['Successful', 'Failed'], [success_regions, failed_regions], regions_sorted)

A sad truth looks to be that a large proportion of the attacks carried out over the years have been successful.

In [None]:
num_attacks = df.groupby(['Year']).size()
num_success = df.groupby(['Year'])['Success'].sum()

trace = go.Scatter(
    x=years,
    y=np.divide(num_success, num_attacks),
    name='Proportions',
    line=dict(),
    opacity = 0.8
)

layout = dict(
    title = 'Proportions of Successful Attacks by Year',
    xaxis = dict(
        title="Year"
    ),
    yaxis = dict(
        title="Proportion"
    )
)

data = [trace]

fig = dict(data=data, layout=layout)
py.iplot(fig)

Finally, a positive sign. From the year 2007 and forward we observe a steady decline in the number of successful attacks in proportion to the total number of attacks.

In [None]:
# set path to location where world map file is located
path = os.getcwd()
os.environ["CARTOPY_USER_BACKGROUNDS"] = path + '\BG'

In [None]:
def create_map(date, data, ax=None, resolution='low'):
    if ax is None:
        fig = plt.figure(figsize=(19.2, 10.8))
        ax = plt.axes(projection=ccrs.Mercator(min_latitude=-65,
                                               max_latitude=70))
    
    ax.background_img(name='BM', resolution=resolution)
    ax.set_extent([-170, 179, -65, 70], crs=ccrs.PlateCarree())
    
    attacks = data[data['Year'] == date]
    attacks = attacks.groupby(['Longitude', 'Latitude'])['Killed'].agg('sum').rename('Killed').reset_index()
    
    avg_killed = data[data['Year'] == 2001].groupby(['Year'])['Killed'].mean().reset_index()
    
    #colors = {'High': '#f95c3c',
    #          'Low': '#02b3e4'}
    colors = {'High': '#f95c3c',
              'Low': '#ecc81a'}

    for idx, row in attacks.iterrows():
        longs = row['Longitude']
        lats = row['Latitude']
        sizes = row['Killed'] ** 0.555 * 8
        if row['Killed'] > 10:
            color = colors['High']
        else:
            color = colors['Low']

        ax.scatter(longs, lats, s=sizes, color=color, alpha=0.8, transform=ccrs.PlateCarree())
    
    fontsize = 28
    # Positions for the date and counter
    date_x = -53
    date_y = -50
    date_spacing = 90
    name_x = -70
    name_y = -60      
    name_spacing = {'High': 55,
                    'Low': 1.9*55}
    
    ax.text(date_x, date_y, 
            "YEAR: ", 
            color='white',
            fontsize=fontsize,
            transform=ccrs.PlateCarree()) 
    
    ax.text(date_x + 23, date_y, 
            f"{date}", 
            color='white',
            fontsize=fontsize*1.3,
            transform=ccrs.PlateCarree())

    ax.text(date_x + date_spacing, date_y, 
            "KILLED:", color='white',
            fontsize=fontsize,
            transform=ccrs.PlateCarree())
    ax.text(date_x + date_spacing*1.33, date_y, 
            f"{data[data['Year'] == date]['Killed'].sum()}",
            color='white', ha='left',
            fontsize=fontsize*1.3,
            transform=ccrs.PlateCarree())
    for val in ['High', 'Low']:
        ax.text(name_x + name_spacing[val], 
                name_y, 
                val.upper(), ha='center',
                fontsize=fontsize*1.1,
                color=colors[val],
                transform=ccrs.PlateCarree())
    
    return ax

In [None]:
fig = plt.figure(figsize=(19.2, 10.8))
ax = plt.axes(projection=ccrs.Mercator(min_latitude=-65,
                                       max_latitude=70))

start_date = 2014
end_date = 2017

for ii, date in [x for x in enumerate(range(start_date, end_date + 1))]:
    if date == 1993:
        continue
    ax = create_map(date, df, ax=ax, resolution='high')
    num = 44 + ii
    fig.tight_layout(pad=-0.5)
    fig.savefig(f"frames/frame_{num:04d}.png", dpi=300,     
                frameon=False, facecolor='black')
    ax.clear()

In [None]:
video = io.open('attacks3.mp4', 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="test" controls>
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii')))

The points on the map represent all the terrorist attacks with casualties larger than 0.  The number of casualties determines the size and color. 

As seen by the other plots in this analysis there seem to be a trend which points to an increase in attacks as well as more people affected. This is most evident in the Middle East, Sub-Saharan Africa, and South Asia.

Large areas around the globe (Russia, Greenland, Canada & Australia) looks to have been affected by only a few attacks over the years. What these areas have in common is that they are sparsely populated.

## 4.3 Different Attack types

In [None]:
df_attack_type = df.groupby('AttackType')['Killed'].agg(['sum', 'count']).reset_index().rename(columns={'sum': 'Killed', 'count': 'Count'})

In [None]:
df_attack_type_count_sort = df_attack_type.sort_values('Count', ascending=False)

trace = go.Bar(
    x=df_attack_type_count_sort['AttackType'],
    y=df_attack_type_count_sort['Count'],
    name='Type',
    marker=dict(
        color= 'rgba(222,45,38,0.8)'
    ),
    opacity = 0.8
)

layout = dict(
    title = 'Most common attack types',
    xaxis=go.layout.XAxis(
        automargin=True,
        tickangle=-20,
        title='Type',
    ),
    yaxis = dict(
        title="Number of attacks"
    )
)

data = [trace]

fig = dict(data=data, layout=layout)
py.iplot(fig)

In [None]:
df_attack_type['AvgKilled'] = df_attack_type['Killed'] / df_attack_type['Count']
df_attack_type_avgkilled_sorted = df_attack_type.sort_values('AvgKilled', ascending=True)

trace = go.Bar(
    x=df_attack_type_avgkilled_sorted['AvgKilled'],
    y=df_attack_type_avgkilled_sorted['AttackType'],
    name='Type',
    marker=dict(
        color= 'rgba(222,45,38,0.8)'
    ),
    opacity = 0.8,
    orientation = 'h'
)

layout = dict(
    title = 'Average people killed per Attack type',
    xaxis = dict(
        title="Average killed"
    ),
    yaxis=go.layout.YAxis(
        automargin=True,
    ),
)

data = [trace]

fig = dict(data=data, layout=layout)
py.iplot(fig)

In [None]:
amount_killed_11sep = df[(df['Year'] == 2001) & (df['AttackType'] == 'Hijacking') & (df['Region'] == "North America")]['Killed'].sum()
hypo_avg = amount_killed_11sep / 604

The first chart shows that a large portion of the terrorist attacks is classified as either bombings or armed assaults. One possible explanation for this could be that these forms of attacks demand fewer preparations than some of the other ones. The accessibility of weapons and explosives could be a factor here.

Hijacking is the form of attack that has killed the largest amount of people per attack. This may be due to the fact this type is relatively uncommon in combination with the high number of people killed by the September 11 attacks. If we imagine that the people who died in those attacks were the total amount killed by Hijacking (3001 people) the average amount of killed for Hijacking would be 4.969, which is still the 4th highest amount.

In [None]:
hostage = df[(df['AttackType'] == 'Hostage Taking (Barricade Incident)') & (df['Killed'] > 50)]
print(f'Hostage attacks with killed over 50: {len(hostage)}')

hijacking = df[(df['AttackType'] == 'Hijacking') & (df['Killed'] > 50)]
print(f'Hijacking attacks with killed over 50: {len(hijacking)}')

In [None]:
df_attack_type_by_year = df.groupby(['Year', 'AttackType'])['Killed'].agg(['sum', 'count']).reset_index().rename(columns={'sum': 'Killed', 'count': 'Count'})

In [None]:
traces= []
types = list(df_attack_type_by_year['AttackType'].unique())

for attack_type in types:
    trace = go.Scatter(
        x=years,
        y=df_attack_type_by_year[df_attack_type_by_year['AttackType'] == attack_type]['Count'],
        name= attack_type,      
        line = dict(),
        opacity = 0.8
    )
    traces.append(trace)

data = traces
layout = dict(
    title = 'Number of attacks for each Attack Type by Year',
    xaxis = dict(
        title="Year"
    ),
    yaxis = dict(
        title="Number of Attacks"
    )
)
fig = dict(data=data, layout=layout)
py.iplot(fig)

In [None]:
traces= []
types = list(df_attack_type_by_year['AttackType'].unique())

for attack_type in types:
    trace = go.Scatter(
        x=years,
        y=df_attack_type_by_year[df_attack_type_by_year['AttackType'] == attack_type]['Killed'],
        name= attack_type,      
        line = dict(),
        opacity = 0.8
    )
    traces.append(trace)

data = traces
layout = dict(
    title = 'Number of people killed by each Attack Type Yearly',
    xaxis = dict(
        title="Year"
    ),
    yaxis = dict(
        title="People Killed"
    )
)
fig = dict(data=data, layout=layout)
py.iplot(fig)

In [None]:
df_attack_type

With 81,976 attacks and 156,924 people killed bombing has without a doubt been the most frequently used method as well as the second most deadly attack behind armed assault which has killed 159,799. Hijacking has the highest average killed per attack.

# 5. Conclusion

See separate blog post [here](https://medium.com/@michel.naslund/a-journey-through-the-global-terrorism-database-ce35076d5289).