# Analyzing the Global Terrorism Database with Pandas

The [Global Terrorism Database](https://www.start.umd.edu/gtd/) (GTD) is an open-source database including information on terrorist events around the world from 1970 through 2015 (with annual updates planned for the future). It is publically available and anyone on can access it by [registering with their website](https://www.start.umd.edu/gtd/NewUser.aspx). In this excercise, we'll use a shortened version of the dataset that I've filtered to only contain data over Nigeria from 2012 to 2015. 

Let's start with a cool IPython trick. We can preview the webpage in our IPython Notebook by importing the IFrame and showing the webpage in the IFrame as follows.

In [None]:
from IPython.display import IFrame  
url = 'https://www.start.umd.edu/gtd/'
IFrame(url, width='100%', height=500)

That's the GTD landing page. Pretty cool, right?

In this excercise, you will also introduced to [_pandas_](http://pandas.pydata.org/), a Python library used for fast and efficient data analysis. Let's start by importing the library.

In [None]:
import pandas as pd

A nice function of IPython or Jupyter notebooks is that windows commands like _cd_ (used to change directories) can be used to change your working directory.  Let's change our working directory to the location of our nigeria_gtd_long.csv file.

In [None]:
cd C:\Users\greg6750\Documents\IPython Notebooks\Python_for_GIS_and_RS\Week_13\data

Using pandas, we can read the file in with csv_read.

In [None]:
df = pd.read_csv('nigeria_gtd_long.csv')

Let's use the head function to show the folumn names and first 5 rows of the data frame.

In [None]:
df.head()

I want to ensure that all of the rows have the correct data types. I can do this by iterating over every column and converting the data types to the proper objects.

In [None]:
for val in list(df.columns.values):
    df[val] = df[val].convert_objects(convert_numeric=True)

Let's see what data type each column is.

In [None]:
df.dtypes

Let's look at the data. we can print the sum and also describe data corresponding to any field in the data frame. Let's look at the 'nkill' field. This corresponds to the number of casualties associated with each terrorist attack.

In [None]:
df['nkill'].sum()

We can breakdown the number of casualties statistically and find the mean number of casualties per event, the min, the max, and more using the describe function.

In [None]:
df['nkill'].describe()

## matplotlib

Using [matplotlib](http://matplotlib.org/), we can plot data straight from the data frame. We call '%matplotlib inline' so that the plot shows up inside the notebook.  Then we import matplotlib.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

## Let's plot a histogram of the number of casualties

In [None]:
plt.figure()
ax = df['nkill'].hist(bins=20, range=[0,20])
ax.set_xlabel("nkill")
ax.set_ylabel("Number of Events")
plt.show()

## We can do the same for number of people wounded

In [None]:
ax1 = df['nwound'].hist(bins=20, range=[0,20])
ax1.set_xlabel("nwound")
ax1.set_ylabel("Number of Events")
plt.show()

This concludes our intro to pandas. There is _soooo_ much more we can do with pandas. Perhaps we will touch on it more next week.