# Clean San Francisco Crime data 
- Jim Haskin

- GA-Data Science
- Dec 2015

- 2/17/2016

## Method
- I have collected the incident reports of the San Franciso Police Department from the SF OpenData website. https://data.sfgov.org/data?category=Public%20Safety. I have the records from January, 2003 until the beginning of 2016.
- I cleaned and reformated the fields.
- I summerized the report to generate a daily report of the number of incidents and another factor I am calling Crime Level. Each incident is given a score based on how violent it is. Murders and assaults are rated high. Traffic violations and non-criminal incidents are rated low. These scores are summed and then normalized to a scale of 0 - 10.

# Working Notes
## features to add or create
- shift - when during the day did the event occure
- crime_level - how severe is the crime - use catagory and description features
- month
- day of month

## consolidate to daily totals
- date
- day of week
- month
- day
- total_crime_level - daily total
- shift_crime_level - daily total for each shift

## merge with weather info by date


## Data source
- Data downloaded from SF Open Data site. File includes incidents from 1/1/2003 until the present 
- SFPD_Incidents_-_from_1_January_2003.csv
- https://data.sfgov.org/data?category=Public%20Safety


FieldName|Type|Description                             
---------------|------------|---------------------
IncidntNum|string|Police assigned number
Category|string|General Crime category
Descript|string|Secondary category/details
DayOfWeek|string|Day of week event occured
Date|string|Date in format : 01/18/2016
Time|string|Time in format : 23:52
PdDistrict|string|Police District that event occured in
Resolution|int|How case was resolved
Address|string|Address of event
X|float|Longitude 
Y|float|Latitude
Location|string|Latitude,Longitude in character pair
PdId|int|Police Department ID number


In [1]:
import pandas as pd
import numpy as np
import seaborn as sb
%matplotlib inline

In [2]:
! head -2 SFPD_Incidents_-_from_1_January_2003.csv


IncidntNum,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,Address,X,Y,Location,PdId
160051264,WARRANTS,WARRANT ARREST,Monday,01/18/2016,23:52,CENTRAL,"ARREST, BOOKED",400 Block of POWELL ST,-122.408568445228,37.7887594214703,"(37.7887594214703, -122.408568445228)",16005126463010


In [3]:
! tail -2 SFPD_Incidents_-_from_1_January_2003.csv



031353484,OTHER OFFENSES,OBSCENE PHONE CALLS(S),Wednesday,01/01/2003,00:01,TARAVAL,NONE,1500 Block of 41ST AV,-122.5003001196,37.7578465298467,"(37.7578465298467, -122.5003001196)",3135348419050
030320997,SUSPICIOUS OCC,SUSPICIOUS OCCURRENCE,Wednesday,01/01/2003,00:01,SOUTHERN,NONE,0 Block of LAFAYETTE ST,-122.416608653757,37.7725681063387,"(37.7725681063387, -122.416608653757)",3032099764070


### Read in Crime data

In [4]:
sf_data = pd.read_csv('SFPD_Incidents_-_from_1_January_2003.csv', index_col=0)    # has header, commas, index

### Convert to lower case
- Feature names
- Feature values that I'm working with

In [5]:
sf_data.columns = sf_data.columns.str.lower()
sf_data['category'] = sf_data['category'].str.lower()
sf_data['descript'] = sf_data['descript'].str.lower()
sf_data['dayofweek'] = sf_data['dayofweek'].str.lower()
sf_data.head(2)

Unnamed: 0_level_0,category,descript,dayofweek,date,time,pddistrict,resolution,address,x,y,location,pdid
IncidntNum,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
160051264,warrants,warrant arrest,monday,01/18/2016,23:52,CENTRAL,"ARREST, BOOKED",400 Block of POWELL ST,-122.408568,37.788759,"(37.7887594214703, -122.408568445228)",16005126463010
160051242,robbery,"robbery, bodily force",monday,01/18/2016,23:40,TENDERLOIN,NONE,100 Block of STOCKTON ST,-122.406428,37.787109,"(37.78710945429, -122.40642786236)",16005124203074


### Investigate data

In [6]:
sf_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1866570 entries, 160051264 to 30320997
Data columns (total 12 columns):
category      object
descript      object
dayofweek     object
date          object
time          object
pddistrict    object
resolution    object
address       object
x             float64
y             float64
location      object
pdid          int64
dtypes: float64(2), int64(1), object(9)
memory usage: 185.1+ MB


### Observations
- 1,866,570 records 
- date and time in string format
- other fields look appropriate

## Clean features

### Convert date to datetime

In [7]:
sf_data['date'] = pd.to_datetime(sf_data['date'])

## New Features

### Add the hour as numeric

In [8]:
sf_data['hour'] = sf_data['time'].str[0:2].astype(int)

### Add month, day and year features

In [9]:
#tdf['Date'].dtype
sf_data['month'] = sf_data['date'].dt.month
sf_data['day'] = sf_data['date'].dt.day
sf_data['year'] = sf_data['date'].dt.year
sf_data.head(2)

Unnamed: 0_level_0,category,descript,dayofweek,date,time,pddistrict,resolution,address,x,y,location,pdid,hour,month,day,year
IncidntNum,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
160051264,warrants,warrant arrest,monday,2016-01-18,23:52,CENTRAL,"ARREST, BOOKED",400 Block of POWELL ST,-122.408568,37.788759,"(37.7887594214703, -122.408568445228)",16005126463010,23,1,18,2016
160051242,robbery,"robbery, bodily force",monday,2016-01-18,23:40,TENDERLOIN,NONE,100 Block of STOCKTON ST,-122.406428,37.787109,"(37.78710945429, -122.40642786236)",16005124203074,23,1,18,2016


In [10]:
sf_data['year'].describe()

count    1866570.000000
mean        2009.062880
std            3.821668
min         2003.000000
25%         2006.000000
50%         2009.000000
75%         2012.000000
max         2016.000000
Name: year, dtype: float64

In [11]:
sf_data['year'].value_counts()

2015    153879
2013    152812
2014    150161
2003    149176
2004    148148
2005    142186
2008    141311
2012    140858
2009    139861
2006    137853
2007    137639
2010    133525
2011    132699
2016      6462
Name: year, dtype: int64

### Create shift feature
- For more detailed analysis or workforce planning, add feature that records the shift that event occured.
- 3rd shift - Midnight to 7:59am
- 1st shift - 8:00am - 3:59pm
- 2nd shift - 4:00pm - 11:59pm

In [12]:
def calc_shift(hour):
    shift = hour//8
    if shift == 0:
        shift = 3
    return 'shift_' + str(shift)
        

In [13]:
sf_data['shift'] = sf_data['hour'].apply(calc_shift)
# or leave shift as hour//8. so that it sorts into time order, but label shift0 as third shift
#sf_data['shift'] = sf_data['hour'].apply(lambda x : x//8)

In [14]:
sf_data[sf_data['shift']=='shift_2'].tail(2)

Unnamed: 0_level_0,category,descript,dayofweek,date,time,pddistrict,resolution,address,x,y,location,pdid,hour,month,day,year,shift
IncidntNum,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
30005882,vehicle theft,"vehicle, recovered, auto",wednesday,2003-01-01,16:00,TARAVAL,NONE,1700 Block of 48TH AV,-122.507539,37.753788,"(37.7537879722664, -122.507539443431)",3000588207041,16,1,1,2003,shift_2
30005882,vehicle theft,stolen automobile,wednesday,2003-01-01,16:00,TARAVAL,NONE,1700 Block of 48TH AV,-122.507539,37.753788,"(37.7537879722664, -122.507539443431)",3000588207021,16,1,1,2003,shift_2


### Create crime_level from category and description
Instead of only calculating the number of incidents, create crime level feature that weights the incident by the severity/violence of the crime
- Could be used to help with workforce planning.
- This is very subjective...

In [15]:
# temp 
#sf_data['crime_level'] = x : np.random.random_integers(1,6)
#sf_data['crime_level'] = np.random.choice(range(1, 6), sf_data.shape[0])

sf_data.head(3)

Unnamed: 0_level_0,category,descript,dayofweek,date,time,pddistrict,resolution,address,x,y,location,pdid,hour,month,day,year,shift
IncidntNum,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
160051264,warrants,warrant arrest,monday,2016-01-18,23:52,CENTRAL,"ARREST, BOOKED",400 Block of POWELL ST,-122.408568,37.788759,"(37.7887594214703, -122.408568445228)",16005126463010,23,1,18,2016,shift_2
160051242,robbery,"robbery, bodily force",monday,2016-01-18,23:40,TENDERLOIN,NONE,100 Block of STOCKTON ST,-122.406428,37.787109,"(37.78710945429, -122.40642786236)",16005124203074,23,1,18,2016,shift_2
160051305,disorderly conduct,committing public nuisance,monday,2016-01-18,23:30,MISSION,"ARREST, BOOKED",500 Block of SHOTWELL ST,-122.415922,37.759612,"(37.7596123168685, -122.415921585459)",16005130519010,23,1,18,2016,shift_2


In [16]:
sf_data['category'].value_counts()

larceny/theft                  379973
other offenses                 266265
non-criminal                   198314
assault                        163366
vehicle theft                  113270
drug/narcotic                  110787
vandalism                       95234
warrants                        88857
burglary                        78010
suspicious occ                  66959
missing person                  54917
robbery                         48291
fraud                           35588
forgery/counterfeiting          21707
secondary codes                 21368
weapon laws                     18318
trespass                        15570
prostitution                    15490
stolen property                  9960
sex offenses, forcible           9418
drunkenness                      8963
disorderly conduct               8912
recovered vehicle                6346
driving under the influence      4905
kidnapping                       4818
runaway                          3972
liquor laws 

In [17]:
sf_data[sf_data['category']=='larceny/theft']['descript'].value_counts()

grand theft from locked auto                               132525
petty theft from locked auto                                42584
petty theft of property                                     35267
grand theft of property                                     23804
petty theft from a building                                 21582
petty theft shoplifting                                     20400
grand theft from a building                                 19993
grand theft from person                                     14965
grand theft pickpocket                                      11845
grand theft from unlocked auto                              10136
petty theft with prior                                       8196
petty theft from unlocked auto                               5580
grand theft bicycle                                          5349
attempted theft from locked vehicle                          4695
grand theft shoplifting                                      4564
petty thef

In [18]:
sf_data[sf_data['category']=='other offenses']['descript'].value_counts()

drivers license, suspended or revoked                                        56931
traffic violation                                                            34099
resisting arrest                                                             18531
miscellaneous investigation                                                  17488
probation violation                                                          16314
lost/stolen license plate                                                    13961
violation of restraining order                                               12076
traffic violation arrest                                                     11468
parole violation                                                             10190
conspiracy                                                                    6472
false personation to receive money or property                                5553
obscene phone calls(s)                                                        5200
viol

In [19]:
sf_data[sf_data['category']=='non-criminal']['descript'].value_counts()

lost property                                         67112
aided case, mental disturbed                          46243
found property                                        26490
aided case                                            11611
death report, cause unknown                            9114
case closure                                           5171
stay away or court order, non-dv related               3427
aided case, dog bite                                   2913
civil sidewalks, citation                              2668
property for identification                            2558
aided case, injured person                             2221
death report, natural causes                           2052
courtesy report                                        2019
aided case -property for destruction                   1846
fire report                                            1690
located property                                       1572
tarasoff report                         

In [20]:
sf_data[sf_data['category']=='weapon laws']['descript'].value_counts()

poss of loaded firearm                                     4113
carrying a concealed weapon                                2017
exhibiting deadly weapon in a threating manner             2016
poss of firearm by convicted felon/addict/alien            1538
poss of prohibited weapon                                  1399
discharge firearm at an inhabited dwelling                 1181
possession of air gun                                       868
loitering while carrying concealed weapon                   735
discharge firearm within city limits                        631
poss of deadly weapon with intent to assault                530
firearm, loaded, in vehicle, possession or use              502
carrying of concealed weapon by convicted felon             346
ammunition, poss. by prohibited person                      325
weapon, possess or bring other on school grounds            230
switchblade knife, possession                               190
firearm, armed while possessing controll

In [21]:
sf_data[sf_data['category']=='secondary codes']['descript'].value_counts()

domestic violence                         15743
juvenile involved                          1891
gang activity                              1643
prejudice-based incident                   1404
atm related crime                           588
battery by juvenile suspect                  53
weapons possession by juvenile suspect       26
assault by juvenile suspect                  18
shooting by juvenile suspect                  2
Name: descript, dtype: int64

In [22]:
sf_data[sf_data['category']=='family offenses']['descript'].value_counts()

desertion of child                                        269
children, abandonment & neglect of (general)              232
minor without proper parental care                        215
abandonment of child                                      201
failure to provide for child                               84
immoral acts or drunk in presence of child                 38
concealment/removal of child without consent               32
failure to provide for parents                              3
harassing child or ward because of person's employment      1
Name: descript, dtype: int64

### Assign a Crime level to each catagory

In [23]:
levels = {'larceny/theft' : 2,                
          'other offenses' : 1,                 
          'non-criminal' : 1,
          'assault' :  4,                        
          'vehicle theft' : 2,                 
          'drug/narcotic' : 2,                 
          'vandalism' : 2,                       
          'warrants' : 1,                        
          'burglary' : 2,                        
          'suspicious occ' : 2,                 
          'missing person' : 1,                 
          'robbery' : 2,                         
          'fraud' : 2,                          
          'forgery/counterfeiting' : 2,         
          'secondary codes' :  4,              
          'weapon laws' :  3,                    
          'trespass' :  2,                       
          'prostitution' :  2,                  
          'stolen property' :  2,                 
          'sex offenses, forcible' : 4,          
          'drunkenness' :  1,                     
          'disorderly conduct'  : 1,              
          'recovered vehicle' :  1,              
          'driving under the influence' :  1,      
          'kidnapping' :  3,                      
          'runaway' :  1,                          
          'liquor laws' : 1,                     
          'arson' : 3,                           
          'embezzlement' : 1,                    
          'loitering' : 1,                      
          'suicide' :  1,                         
          'family offenses' : 3,                  
          'bad checks' : 1,                 
          'bribery' : 1,                          
          'extortion' : 2,                        
          'sex offenses, non forcible' : 2,       
          'gambling' : 1,                          
          'pornography/obscene mat' : 2,          
          'trea' : 1}

In [24]:
sf_data['crime_level'] = sf_data['category'].map(lambda x : levels[x])


### Was a Gun Used
- Just wanted to know
- May be useful for other analysis

In [25]:
sf_data['gun'] = sf_data['descript'].apply(lambda x: x.find('gun') != -1 )
sf_data['gun'].value_counts()


False    1850561
True       16009
Name: gun, dtype: int64

In [26]:
sf_data.head()

Unnamed: 0_level_0,category,descript,dayofweek,date,time,pddistrict,resolution,address,x,y,location,pdid,hour,month,day,year,shift,crime_level,gun
IncidntNum,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
160051264,warrants,warrant arrest,monday,2016-01-18,23:52,CENTRAL,"ARREST, BOOKED",400 Block of POWELL ST,-122.408568,37.788759,"(37.7887594214703, -122.408568445228)",16005126463010,23,1,18,2016,shift_2,1,False
160051242,robbery,"robbery, bodily force",monday,2016-01-18,23:40,TENDERLOIN,NONE,100 Block of STOCKTON ST,-122.406428,37.787109,"(37.78710945429, -122.40642786236)",16005124203074,23,1,18,2016,shift_2,2,False
160051305,disorderly conduct,committing public nuisance,monday,2016-01-18,23:30,MISSION,"ARREST, BOOKED",500 Block of SHOTWELL ST,-122.415922,37.759612,"(37.7596123168685, -122.415921585459)",16005130519010,23,1,18,2016,shift_2,1,False
160051258,robbery,attempted robbery on the street with a gun,monday,2016-01-18,23:30,BAYVIEW,NONE,BANCROFT AV / KEITH ST,-122.392791,37.725605,"(37.7256051449087, -122.392791275294)",16005125803411,23,1,18,2016,shift_2,2,True
160051258,assault,aggravated assault with a gun,monday,2016-01-18,23:30,BAYVIEW,NONE,BANCROFT AV / KEITH ST,-122.392791,37.725605,"(37.7256051449087, -122.392791275294)",16005125804011,23,1,18,2016,shift_2,4,True


## Consolidate into daily records
Group the incidents by day and count the Number of incidents and the sum of the crime_level

### Group by the day

To group by day and shift, uncomment code marked #SHIFT

In [27]:
day_group = sf_data.groupby(['date'])[['crime_level']].agg(['sum', 'count'])
#SHIFT day_group = sf_data.groupby(['date','shift'])[['crime_level']].agg(['sum', 'count'])
day_group.head()

Unnamed: 0_level_0,crime_level,crime_level
Unnamed: 0_level_1,sum,count
date,Unnamed: 1_level_2,Unnamed: 2_level_2
2003-01-01,1254,622
2003-01-02,750,411
2003-01-03,799,440
2003-01-04,674,347
2003-01-05,755,377


In [28]:
# unstack to bring shift from rows to columns
#SHIFT day_group = day_group.unstack(level=-1)
#SHIFT day_group.head()

In [29]:
day_group.columns.values

array([('crime_level', 'sum'), ('crime_level', 'count')], dtype=object)

In [30]:
day_group.columns = ['_'.join(col).strip() for col in day_group.columns.values]
day_group.head(2)


Unnamed: 0_level_0,crime_level_sum,crime_level_count
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2003-01-01,1254,622
2003-01-02,750,411


### Create a crime level that is on the scale of 0 to 10

In [39]:
high_crime = day_group['crime_level_sum'].max()
low_crime = day_group['crime_level_sum'].min()
day_group['crime_level'] = (day_group['crime_level_sum'] - low_crime) * 10 / (high_crime - low_crime)
day_group.head()

Unnamed: 0_level_0,crime_level_sum,crime_level_count,crime_level
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2003-01-01,1254,622,10.0
2003-01-02,750,411,5.974441
2003-01-03,799,440,6.365815
2003-01-04,674,347,5.367412
2003-01-05,755,377,6.014377


In [31]:
# Sum up the 3 shift info into day totals
#SHIFT day_group['crime_level_sum_day'] = day_group['crime_level_sum_shift_1'] + 
#                                   day_group['crime_level_sum_shift_2'] + 
#                                   day_group['crime_level_sum_shift_3']
#day_group['crime_level_count_day'] = day_group['crime_level_count_shift_1'] + 
#                                     day_group['crime_level_count_shift_2'] + 
#                                     day_group['crime_level_count_shift_3']        
#day_group.head(2)

### Add in the other fields that are not crime rate
Features that are needed for further analysis
- day, month, year and dayofweek

In [40]:
day_group_static = sf_data.groupby(['date'])[['dayofweek','day', 'month', 'year']].min()
day_group_static.head()

Unnamed: 0_level_0,dayofweek,day,month,year
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2003-01-01,wednesday,1,1,2003
2003-01-02,thursday,2,1,2003
2003-01-03,friday,3,1,2003
2003-01-04,saturday,4,1,2003
2003-01-05,sunday,5,1,2003


### merge crimelevel df with other fields
- day_group
- day_group_static

In [41]:
data = pd.concat([day_group, day_group_static], axis=1, join_axes=[day_group.index])
data.head()

Unnamed: 0_level_0,crime_level_sum,crime_level_count,crime_level,dayofweek,day,month,year
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2003-01-01,1254,622,10.0,wednesday,1,1,2003
2003-01-02,750,411,5.974441,thursday,2,1,2003
2003-01-03,799,440,6.365815,friday,3,1,2003
2003-01-04,674,347,5.367412,saturday,4,1,2003
2003-01-05,755,377,6.014377,sunday,5,1,2003


## Review new data

In [42]:
data.describe()

Unnamed: 0,crime_level_sum,crime_level_count,crime_level,day,month,year
count,4765.0,4765.0,4765.0,4765.0,4765.0,4765.0
mean,733.63106,391.725079,5.843699,15.706611,6.502413,2009.025813
std,92.795111,49.040628,0.741175,8.798708,3.45947,3.759785
min,2.0,2.0,0.0,1.0,1.0,2003.0
25%,673.0,360.0,5.359425,8.0,4.0,2006.0
50%,730.0,391.0,5.814696,16.0,7.0,2009.0
75%,788.0,422.0,6.277955,23.0,10.0,2012.0
max,1254.0,650.0,10.0,31.0,12.0,2016.0


### Observations
- minimum crime level count is 2. That seems unreasonable that there would only be 2 incidents on a day.

In [43]:
data.sort_values('crime_level_count', ascending=True).head(10)

Unnamed: 0_level_0,crime_level_sum,crime_level_count,crime_level,dayofweek,day,month,year
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2007-12-16,2,2,0.0,sunday,16,12,2007
2008-08-01,10,8,0.063898,friday,1,8,2008
2013-12-25,314,152,2.492013,wednesday,25,12,2013
2011-12-25,327,168,2.595847,sunday,25,12,2011
2013-12-24,308,173,2.444089,tuesday,24,12,2013
2010-12-25,355,175,2.819489,saturday,25,12,2010
2013-12-23,379,187,3.011182,monday,23,12,2013
2008-12-25,403,199,3.202875,thursday,25,12,2008
2007-12-25,395,212,3.138978,tuesday,25,12,2007
2012-12-25,403,217,3.202875,tuesday,25,12,2012


### Remove the two days that appear to be missing data
- there were 2 days that only had 2 and 8 incidents. 
- remove them since there must be missing data from those 2 days.

In [45]:
data = data[data['crime_level_count'] > 10]
data.shape

(4763, 7)

## Write final data to file

In [46]:
data.to_csv('sf_crime_clean.csv')