# Mapping Arsons in Durham

Durham County, NC, makes its crime data available as part of the city's OpenData initiative. This notebook describes how to download and use some of this data to find arson occurrences near my house

## Importing the tools

We will need several Python modules for this task:


In [61]:
import pandas as pd
from datetime import date
from matplotlib import pyplot as plt
from mpl_toolkits.basemap import Basemap


## Getting the data

I have already downloaded the basic data as a CSV file.

In [62]:
datafile = "durham-police-crime-reports.csv"

I can read the data in with Pandas. This will create a _dataframe_.

I'm using the read_csv() method. It takes a large number of keyword options. I'll describe the ones I'm using:

    header=0    column headers are in the first line
    sep=';'     fields are separated by ';'
    index_col=2 use the second column as the row index
    parse_dates[3,7,10,17,18,19]
                convert these columns into date objects
    use_cols=[...]
                
We don't need all the columns, so we only get the ones we need. This will save a ton of memory. 

_One of the advantages of Jupyter, vs running a separate script, is that I don't have to reload the 50MB of data every time I want to make a small change in how I'm using the data_

In [63]:
df = pd.read_csv("durham-police-crime-reports.csv",
                 usecols=[0, 2, 7, 15],
                 header=0, sep=';', index_col=1,
                 parse_dates=[1])

print(df.columns)

Index(['Geo Point', 'DATE_OCCU', 'CHRGDESC'], dtype='object')


I can get the number of records in the dataframe

In [64]:
print(len(df))


116946


I can print the column indices

In [65]:
print(df.columns)

Index(['Geo Point', 'DATE_OCCU', 'CHRGDESC'], dtype='object')


Or the row indices

In [66]:
print(df.index)

Int64Index([11000003, 11000069, 11000047, 11000042, 11000090, 11000026,
            11000014, 15020064, 15020069, 15020072, 
            ...
            15009727, 15009729, 15009749, 15009752, 15009780, 15009810,
            15009795, 15009816, 15009819, 15009823],
           dtype='int64', name='INCI_ID', length=116946)


I'm only interested in crimes on or after January 1, 2015

In [67]:
crimes_2015 = df[df['DATE_OCCU'] >= '2015-01-01']

Let's look at the first row to see if we're on track:

In [68]:
print(crimes_2015.iloc[0])

Geo Point    35.9398002674, -78.8968435563
DATE_OCCU        2015-05-27T20:00:00-04:00
CHRGDESC             FRAUD - IMPERSONATION
Name: 15020064, dtype: object


Now we only want the arsons

In [69]:
arsons = crimes_2015[crimes_2015.CHRGDESC == 'ARSON']
print(len(arsons))


28


In [70]:
lat_lons = arsons['Geo Point'].apply(str.split, args=(', ',))
lats = pd.Series(float(lat) for lat, lon in lat_lons)
lons = pd.Series(float(lon) for lat, lon in lat_lons)
arsons['Latitude'] = lats
print(lats)
print(lons)
print(arsons.iloc[0:5])

0     35.943571
1     36.056275
2     36.076475
3     35.983451
4     36.076248
5     35.885440
6     36.062965
7     36.010890
8     35.993909
9     35.941658
10    36.010873
11    35.979421
12    36.047766
13    36.008462
14    35.919957
15    35.974028
16    36.079178
17    35.974028
18    35.990829
19    35.975125
20    35.977335
21    35.983451
22    35.978522
23    35.964627
24    36.010890
25    35.996845
26    35.995547
27    36.004578
dtype: float64
0    -78.917818
1    -78.884244
2    -78.908978
3    -78.904418
4    -78.910249
5    -78.887407
6    -78.917661
7    -78.927820
8    -78.853752
9    -78.911315
10   -78.881733
11   -78.870213
12   -78.927406
13   -78.853214
14   -78.957461
15   -78.849155
16   -78.923307
17   -78.849155
18   -78.908064
19   -78.849604
20   -78.930452
21   -78.904418
22   -78.874452
23   -78.914600
24   -78.927820
25   -78.869402
26   -78.879007
27   -78.852054
dtype: float64
                              Geo Point                  DATE_OCCU CHRGDES

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Now we know that there were 28 arsons from 1/1/2015 through the end of the data. The latitude and longitude of the occurrence are stored as a comma-separated stringn in the 'Geo Point' column. We need those as floats, so let's add 2 new columns from that data.