## Answering Data Science Questions

Time for a case study to reinforce all of your learning so far! You'll use all the containers and data types you've learned about to answer several real world questions about a dataset containing information about crime in Chicago. Have fun!

### Reading your data with CSV Reader and Establishing your Data Containers
Let's get started! The exercises in this chapter are intentionally more challenging, to give you a chance to really solidify your knowledge. Don't lose heart if you find yourself stuck; think back to the concepts you've learned in previous chapters and how you can apply them to this crime dataset. Good luck!

Your data file, Chicago crime.csv contains the date (1st column), block where it occurred (2nd column), primary type of the crime (3rd), description of the crime (4th), description of the location (5th), if an arrest was made (6th), was it a domestic case (7th), and city district (8th).

Here, however, you'll focus only 4 columns: The date, type of crime, location, and whether or not the crime resulted in an arrest.

Your job in this exercise is to use a CSV Reader to load up a list to hold the data you're going to analyze.

In [98]:
# Import the csv module
import csv

# Create the file object: csvfile
csvfile = open('Chicago crime.csv', 'r')

# Create an empty list: crime_data
crime_data = list()

# Loop over a csv reader on the file object
for row in csv.reader(csvfile):

    # Append the date, type of crime, location description, and arrest
    crime_data.append((row[0], row[2], row[4], row[6]))
    
# Remove the first element from crime_data
crime_data.pop(0)

# Print the first 10 records
print(crime_data[:10])

[('5/23/2016 17:35', 'ASSAULT', 'STREET', 'TRUE'), ('3/26/2016 20:20', 'BURGLARY', 'SMALL RETAIL STORE', 'FALSE'), ('4/25/2016 15:05', 'THEFT', 'DEPARTMENT STORE', 'FALSE'), ('4/26/2016 17:30', 'BATTERY', 'SIDEWALK', 'FALSE'), ('6/19/2016 1:15', 'BATTERY', 'SIDEWALK', 'FALSE'), ('5/28/2016 20:00', 'BATTERY', 'GAS STATION', 'TRUE'), ('7/3/2016 15:43', 'THEFT', 'OTHER', 'FALSE'), ('6/11/2016 18:55', 'PUBLIC PEACE VIOLATION', 'STREET', 'FALSE'), ('10/4/2016 10:20', 'BATTERY', 'STREET', 'FALSE'), ('2/14/2017 21:00', 'CRIMINAL DAMAGE', 'PARK PROPERTY', 'FALSE')]


### Find the Months with the Highest Number of Crimes
Using the crime_data list from the prior exercise, you'll answer a common question that arises when dealing with crime data: How many crimes are committed each month?

Feel free to use the IPython Shell to explore the crime_data list - it has been pre-loaded for you. For example, crime_data[0][0] will show you the first column of the first row which, in this case, is the date and time time that the crime occurred.

In [88]:
# Import necessary modules
from collections import Counter
from datetime import datetime

print(crime_data[0][0])

# Create a Counter Object: crimes_by_month
crimes_by_month = Counter()

# Loop over the crime_data list
for row in crime_data:
    
    # Convert the first element of each item into a Python Datetime Object
    date = datetime.strptime(row[0], '%m/%d/%Y %H:%M')
    
    # Increment the counter for the month of the row by one
    crimes_by_month[date.month] += 1
    
# Print the 3 most common months for crime
print(crimes_by_month.most_common(3))

5/23/2016 17:35
[(1, 1893), (2, 1808), (7, 1245)]


### Transforming your Data Containers to Month and Location
Now let's flip your crime_data list into a dictionary keyed by month with a list of location values for each month, and filter down to the records for the year 2016. Remember you can use the shell to look at the crime_data list, such as crime_data[1][2] to see the location of the crime in the second item of the list (since lists start at 0).

In [104]:
# Import necessary modules
from collections import defaultdict
from datetime import datetime

# Create a dictionary that defaults to a list: locations_by_month
locations_by_month =  defaultdict(list)

# Loop over the crime_data list
for row in crime_data:
    # Convert the first element to a date object
    date = datetime.strptime(row[0], '%m/%d/%Y %H:%M')
    
    #If the year is 2016 
    if date.year == 2016:
        # Set the dictionary key to the month and append the location (third element) to the values list
        locations_by_month[date.month].append(row[2]) 
    
# Print the dictionary
print(locations_by_month)

('3/26/2016 20:20', 'BURGLARY', 'SMALL RETAIL STORE', 'FALSE')
defaultdict(<class 'list'>, {5: ['STREET', 'GAS STATION', '', 'PARKING LOT/GARAGE(NON.RESID.)', 'RESIDENCE', 'STREET', 'RESTAURANT', 'SMALL RETAIL STORE', 'STREET', 'APARTMENT', 'SIDEWALK', 'PARKING LOT/GARAGE(NON.RESID.)', 'DEPARTMENT STORE', 'PARKING LOT/GARAGE(NON.RESID.)', 'SMALL RETAIL STORE', 'RESIDENCE', 'STREET', 'RESIDENCE', 'APARTMENT', 'RESIDENCE-GARAGE', 'APARTMENT', 'ALLEY', 'HIGHWAY/EXPRESSWAY', 'SIDEWALK', 'POLICE FACILITY/VEH PARKING LOT', 'RESIDENCE', 'STREET', 'APARTMENT', 'RESIDENCE PORCH/HALLWAY', 'STREET', 'RESIDENCE', 'SMALL RETAIL STORE', 'SIDEWALK', 'STREET', 'APARTMENT', 'STREET', 'SIDEWALK', 'SMALL RETAIL STORE', 'ALLEY', 'OTHER', 'APARTMENT', 'STREET', 'RESIDENCE', 'GROCERY FOOD STORE', 'SIDEWALK', 'APARTMENT', 'APARTMENT', 'PARKING LOT/GARAGE(NON.RESID.)', 'RESIDENCE', 'STREET', 'APARTMENT', 'APARTMENT', 'CURRENCY EXCHANGE', 'RESIDENTIAL YARD (FRONT/BACK)', 'ALLEY', 'CTA TRAIN', 'RESIDENCE', 'RES

### Find the Most Common Crimes by Location Type by Month in 2016
Using the locations_by_month dictionary from the prior exercise, you'll now determine common crimes by month and location type. Because your dataset is so large, it's a good idea to use Counter to look at an aspect of it in an easier to manageable size and learn more about it.

In [106]:
# Import Counter from collections
from collections import Counter

# Loop over the items from locations_by_month using tuple expansion of the month and locations
for month, locations in locations_by_month.items():
    # Make a Counter of the locations
    location_count = Counter(locations)
    # Print the month 
    print(month)
    # Print the most common location
    print(location_count.most_common(5))

5
[('STREET', 240), ('RESIDENCE', 175), ('APARTMENT', 128), ('SIDEWALK', 111), ('OTHER', 41)]
3
[('STREET', 237), ('RESIDENCE', 190), ('APARTMENT', 139), ('SIDEWALK', 99), ('OTHER', 50)]
4
[('STREET', 212), ('RESIDENCE', 171), ('APARTMENT', 152), ('SIDEWALK', 96), ('OTHER', 40)]
6
[('STREET', 244), ('RESIDENCE', 164), ('APARTMENT', 159), ('SIDEWALK', 123), ('PARKING LOT/GARAGE(NON.RESID.)', 44)]
7
[('STREET', 306), ('RESIDENCE', 177), ('APARTMENT', 166), ('SIDEWALK', 125), ('OTHER', 47)]
10
[('STREET', 245), ('RESIDENCE', 206), ('APARTMENT', 122), ('SIDEWALK', 92), ('OTHER', 62)]
12
[('STREET', 207), ('RESIDENCE', 158), ('APARTMENT', 136), ('OTHER', 46), ('SIDEWALK', 46)]
1
[('STREET', 195), ('RESIDENCE', 160), ('APARTMENT', 153), ('SIDEWALK', 72), ('PARKING LOT/GARAGE(NON.RESID.)', 43)]
9
[('STREET', 274), ('RESIDENCE', 183), ('APARTMENT', 144), ('SIDEWALK', 121), ('OTHER', 38)]
11
[('STREET', 233), ('RESIDENCE', 182), ('APARTMENT', 154), ('SIDEWALK', 75), ('OTHER', 41)]
8
[('STREET',

### Reading your Data with DictReader and Establishing your Data Containers
Your data file, Chicago crime.csv contains in positional order: the date, block where it occurred, primary type of the crime, description of the crime, description of the location, if an arrest was made, was it a domestic case, and city district.

You'll now use a DictReader to load up a dictionary to hold your data with the district as the key and the rest of the data in a list. The csv, defaultdict, and datetime modules have already been imported for you.

In [71]:
# import defaultdict
from collections import defaultdict

# Create the CSV file: csvfile
csvfile = open('Chicago crime.csv', 'r')

# Create a dictionary that defaults to a list: crimes_by_district
crimes_by_district = defaultdict(list)

# Loop over a DictReader of the CSV file
for row in csv.DictReader(csvfile):
    # Pop the district from each row: district
    district = row.pop('District')
    # Append the rest of the data to the list for proper district in crimes_by_district
    crimes_by_district[district].append(row)
    
crimes_by_district

defaultdict(list,
            {'14': [{'Date': '5/23/2016 17:35',
               'Block': '024XX W DIVISION ST',
               'Primary Type': 'ASSAULT',
               'Description': 'SIMPLE',
               'Location Description': 'STREET',
               'Arrest': 'FALSE',
               'Domestic': 'TRUE'},
              {'Date': '9/22/2016 15:00',
               'Block': '027XX N SPAULDING AVE',
               'Primary Type': 'THEFT',
               'Description': 'FROM BUILDING',
               'Location Description': 'APARTMENT',
               'Arrest': 'FALSE',
               'Domestic': 'FALSE'},
              {'Date': '8/24/2016 5:13',
               'Block': '033XX W BARRY AVE',
               'Primary Type': 'CRIMINAL DAMAGE',
               'Description': 'TO VEHICLE',
               'Location Description': 'STREET',
               'Arrest': 'FALSE',
               'Domestic': 'FALSE'},
              {'Date': '5/20/2016 17:00',
               'Block': '020XX W le moyne s

### Determine the Arrests by District by Year
Using your crimes_by_district dictionary from the previous exercise, you'll now determine the number arrests in each City district for each year. Counter is already imported for you. You'll want to use the IPython Shell to explore the crimes_by_district dictionary to determine how to check if an arrest was made.

In [82]:
# Loop over the crimes_by_district using expansion as district and crimes
for disctrict, crimes in crimes_by_district.items():
    # Print the district
    print(district)
    
    # Create an empty Counter object: year_count
    year_count = Counter()
    
    # Loop over the crimes:
    for crime in crimes:
        # If there was an arrest
        if crime['Arrest'] != True:
            # Convert the Date to a datetime and get the year
            year = datetime.strptime(crime['Date'],'%m/%d/%Y %H:%M').year
            # Increment the Counter for the year
            year_count[year] += 1
            
  # Print the counter
    print(year_count)            

11
Counter({2016: 510, 2017: 79})
11
Counter({2016: 349, 2017: 58})
11
Counter({2016: 782, 2017: 130})
11
Counter({2016: 558, 2017: 63})
11
Counter({2016: 604, 2017: 115})
11
Counter({2016: 723, 2017: 110})
11
Counter({2016: 627, 2017: 110})
11
Counter({2016: 855, 2017: 146})
11
Counter({2016: 615, 2017: 106})
11
Counter({2016: 381, 2017: 62})
11
Counter({2016: 571, 2017: 93})
11
Counter({2016: 419, 2017: 60})
11
Counter({2016: 789, 2017: 141})
11
Counter({2016: 545, 2017: 89})
11
Counter({2016: 533, 2017: 100})
11
Counter({2016: 604, 2017: 94})
11
Counter({2016: 615, 2017: 97})
11
Counter({2016: 665, 2017: 91})
11
Counter({2016: 367, 2017: 53})
11
Counter({2016: 207, 2017: 25})
11
Counter({2016: 695, 2017: 113})
11
Counter({2016: 592, 2017: 92})
11
Counter({2016: 1})
