# Weekly shooting numbers

We need to run the script below to get the numbers to update [this page](http://www.chicagotribune.com/news/data/ct-shooting-victims-map-charts-htmlstory.html). When this script run from the same directory as wherever you download the shootings csv, the numbers you need to update the shootings page should be printed in your command line.

You should follow along with [this ticket](https://tribune.unfuddle.com/a#/projects/46/tickets/by_number/1262) to fully understand what we're doing here.

### Step 1: download shootings csv from newsroomdb. Make sure it is saved on your Desktop (or just in the same directory as wherever this script will live).

### Step 2: Let's make sure we can take a look at the data.

In [169]:
# Below, you can import into this file the python libraries needed to do this analysis.
import pandas as pd
import numpy as np

# This assigns the variable 'shootings' to the appropriate csv, which you should have downloaded already.
# You can edit the following line to make sure it points to wherever the shootings csv lives on your machine.
shootings = pd.read_csv('../Desktop/shootings.csv')
# This allows you to look at the first 3 rows of the data.
shootings[:3]

Unnamed: 0,RD Number,Date,Day,Time,UCR,Last Name,First Name,Sex,Age,DOB,...,Geocode Override,Shooting Specificity,District,Hospital 1,Hospital 2,Link,Link 2,Link 3,Notes,Computed time
0,,2011-09-13,Tuesday,15:40,0110,Varner,Devon,M,18,,...,"(41.783858, -87.615722)",Block,3,,,,,,,940
1,,2011-09-21,Wednesday,21:15,041A,Doe,Jane,F,25,,...,"(41.7784, -87.615612)",Block,3,,,,,,,1275
2,,2011-09-21,Wednesday,21:15,041A,Doe,John,M,27,,...,"(41.7784, -87.615612)",Block,3,,,,,,,1275


In [170]:
# Let's look just at the first 10 rows of the 'Date' column.
shootings['Date'][:10]

0    2011-09-13
1    2011-09-21
2    2011-09-21
3    2011-09-23
4    2011-09-24
5    2011-09-24
6    2011-10-04
7    2011-10-04
8    2011-10-04
9    2011-09-25
Name: Date, dtype: object

### Step 3: Let's see how many rows we have in the 'Date' column.

In [171]:
# The 'count' method will tell us how many rows are in this entire dataset.
# Our analyses for the purposes of the shootings page will focus on dates, so let's look at how many Date rows there are.
shootings['Date'].count()

16495

### Step 4: Let's check out total shootings in 2016.

In [172]:
# Let's focus in on shootings from 2016. The 'startswith' method allows us to zero in on rows that start with 
# the year we want, and assign them to the variable 'shootings_2016'.
shootings_2016 = shootings[shootings['Date'].str.startswith('2016', na=False)]
# Now, let's count how many rows there are of shootings with 2016 dates.
print "There were", shootings_2016['Date'].count(), "shootings in 2016."

There were 4368 shootings in 2016.


### Step 5: Let's check out total shootings so far in 2017.

In [173]:
# Now let's do the same thing for shootings in 2017.
shootings_2017 = shootings[shootings['Date'].str.startswith('2017', na=False)]
print "There have been", shootings_2017['Date'].count(), "shootings in 2017 so far."

There have been 1366 shootings in 2017 so far.


### For the shootings page, we want to focus in on how many shootings _within a certain time frame_ (until present day), which we now have for 2017. 

### Step 6: Let's try to find the shootings in each year before 2017 only from 1/1 to present day.

In [174]:
# Everytime you update this data, you will need to change the ending date from '2016-05-30' to the current date.
shootings_in_range_2016 = shootings_2016[(shootings['Date'] > '2016-01-01') & (shootings['Date'] <= '2016-05-30')]
shootings_in_range_2015 = shootings[(shootings['Date'] > '2015-01-01') & (shootings['Date'] <= '2015-05-30')]
shootings_in_range_2014 = shootings[(shootings['Date'] > '2014-01-01') & (shootings['Date'] <= '2014-05-30')]
shootings_in_range_2013 = shootings[(shootings['Date'] > '2013-01-01') & (shootings['Date'] <= '2013-05-30')]
shootings_in_range_2012 = shootings[(shootings['Date'] > '2012-01-01') & (shootings['Date'] <= '2012-05-30')]

print "There were", shootings_in_range_2016['Date'].count(), "shootings between January 1 and present day in 2016."
print "There were", shootings_in_range_2015['Date'].count(), "shootings between January 1 and present day in 2015."
print "There were", shootings_in_range_2014['Date'].count(), "shootings between January 1 and present day in 2014."
print "There were", shootings_in_range_2013['Date'].count(), "shootings between January 1 and present day in 2013."
print "There were", shootings_in_range_2012['Date'].count(), "shootings between January 1 and present day in 2012."

There were 1504 shootings between January 1 and present day in 2016.
There were 965 shootings between January 1 and present day in 2015.
There were 838 shootings between January 1 and present day in 2014.
There were 745 shootings between January 1 and present day in 2013.
There were 896 shootings between January 1 and present day in 2012.


  from ipykernel import kernelapp as app


### Step 7: The next thing we want to do is find out how many shootings there were _per month_ in 2017. 

In [175]:
# "Date" need some formating. Some dates are 'None' or 'NaN' -- the try/except accounts for those.

try:
    shootings['Date'] = pd.to_datetime(shootings['Date'], errors='coerce')
except:
    pass

In [176]:
# The next thing we need for the shootings page is an updated shootings breakdown by month for 2017.
# One way to check if this is right, is to go to the current shootings page (linked at the top of this notebook),
# and see if the numbers for January through May match, since those were done manually.
shootings_by_month_2017 = shootings_2017.groupby([shootings['Date'].dt.year, shootings['Date'].dt.month])['Date'].count()
# Below is how we change from numerals to months. As we progress through the months of 2017, you should add more months.
shootings_by_month_2017.index = ['January','February','March','April','May']
print "Here are 2017 shootings by month:", shootings_by_month_2017

Here are 2017 shootings by month: January     312
February    213
March       210
April       319
May         312
Name: Date, dtype: int64
