# Introduction

The below script helps generate beat values for Stop and search file. We load data from Calls_for_service file and merge with data from Stop_and_Search file. 

Following constraints are applied:
- Records year range = 2014 - 2016

# Index

- [1. Libraries](#1.-Libraries)
- [2. Data Loading](#2.-Data-Loading)
    - [A. Load Calls-for-Service (CFS) Data](#A.-Load-Calls-for-Service-Data)    
    - [B. Load Stop and Search Data](#B.-Load-Stop-and-Search-Data)
    - [C. Filter Records for Year 2014 to 2016](#C.-Filter-Records-for-Year-2014-to-2016)  
- [3. Get beat for SNS](#3.-Get-beat-for-SNS)

# 1. Libraries

Following libraries are required for this code to run successfully.

In [1]:
import os
import csv
import zipfile
import string
import datetime
import pandas as pd

# 2. Data Loading

In this section we load following files:
- Calls_for_Service Data
- Stop_and_Search__Field\_Interviews_ Data

And perform required data manipulation steps.

## A. Load Calls-for-Service Data

In [3]:
# Set location of file
path = os.path.join("..\\Datasets\\Raw_Data\\Calls_for_Service\\")
path

'..\\Datasets\\Raw_Data\\Calls_for_Service\\'

In [4]:
# Get filenames
filenames = os.listdir(path)
filenames

['Calls_for_Service_2012.zip',
 'Calls_for_Service_2013.zip',
 'Calls_for_Service_2014.zip',
 'Calls_for_Service_2015.zip',
 'Calls_for_Service_2016.zip']

In [5]:
# Load data from files in list
dfs = []
for f in filenames:
    zf = zipfile.ZipFile(os.path.join(path,f)) 
    dfs.append(pd.read_csv(zf.open(str.replace(f, 'zip', 'csv')),))

# Merge all df in list
cfs_df = pd.concat(dfs, ignore_index=True)

# Change datatype of column Type_ to String
cfs_df.Type_ = cfs_df.Type_.apply(str)

In [6]:
# Display top 5 rows
cfs_df.head()

Unnamed: 0,NOPD_Item,Type_,TypeText,Priority,InitialType,InitialTypeText,InitialPriority,MapX,MapY,TimeCreate,...,TimeArrive,TimeClosed,Disposition,DispositionText,SelfInitiated,Beat,BLOCK_ADDRESS,Zip,PoliceDistrict,Location
0,A0000112,62A,"BURGLAR ALARM, SILEN",2C,,,,3683627,532625,1/1/2012 0:00,...,,1/1/2012 0:33,NAT,NECESSARY ACTION TAKEN,,,009XX Decatur St,70116.0,8,"(29.958469303316875, -90.0613152964016)"
1,A0000412,94,DISCHARGING FIREARMS,2B,,,,3732996,562418,1/1/2012 0:00,...,1/1/2012 0:16,1/1/2012 0:30,UNF,UNFOUNDED,,,147XX Chef Menteur Hwy,70129.0,7,"(30.038788769111676, -89.90425047516077)"
2,A0000212,103,DISTURBANCE (OTHER),1C,,,,3687688,548824,1/1/2012 0:01,...,1/1/2012 0:01,1/1/2012 0:19,NAT,NECESSARY ACTION TAKEN,,,038XX Gentilly Blvd,70122.0,3,"(30.002886229898206, -90.04791794333323)"
3,A0000712,21,COMPLAINT OTHER,1H,,,,3670776,521242,1/1/2012 0:01,...,,1/1/2012 0:20,NAT,NECESSARY ACTION TAKEN,,,Carondelet St & Napoleon Ave,70115.0,2,"(29.927555772946167, -90.10228161624175)"
4,A0000512,62A,"BURGLAR ALARM, SILEN",2C,,,,3665739,549621,1/1/2012 0:01,...,1/1/2012 0:09,1/1/2012 1:55,NAT,NECESSARY ACTION TAKEN,,,002XX W Harrison Ave,70124.0,3,"(30.005736477457617, -90.11723146931276)"


In [7]:
# Number of records
len(cfs_df)

2252907

In [8]:
# Column Names
cfs_df.columns

Index([u'NOPD_Item', u'Type_', u'TypeText', u'Priority', u'InitialType',
       u'InitialTypeText', u'InitialPriority', u'MapX', u'MapY', u'TimeCreate',
       u'TimeDispatch', u'TimeArrive', u'TimeClosed', u'Disposition',
       u'DispositionText', u'SelfInitiated', u'Beat', u'BLOCK_ADDRESS', u'Zip',
       u'PoliceDistrict', u'Location'],
      dtype='object')

In [9]:
# Select required columns
cfs_beat = cfs_df[['NOPD_Item','Beat']]

In [10]:
# Display top 5 rows
cfs_beat.head()

Unnamed: 0,NOPD_Item,Beat
0,A0000112,
1,A0000412,
2,A0000212,
3,A0000712,
4,A0000512,


In [11]:
# Filter records where beat values are not null
cfs_beat = cfs_beat[cfs_beat.Beat.notnull()]

In [12]:
# Total rows
len(cfs_beat.NOPD_Item)

1283208

In [13]:
# Unique NOPD Item
len(set(cfs_beat.NOPD_Item))

1283194

## B. Load Stop and Search Data

In [14]:
# Set location of file
sns_file_path = "..\\Datasets\\Raw_Data\\Stop_and_Search\\Stop_and_Search__Field_Interviews_.csv"

In [15]:
# Load the SNS Data
sns_df = pd.read_csv(sns_file_path) 

  interactivity=interactivity, compiler=compiler, result=result)


In [16]:
# Display top 5 rows
sns_df.head()

Unnamed: 0,FieldInterviewID,NOPD_Item,EventDate,District,Zone,OfficerAssignment,StopDescription,ActionsTaken,VehicleYear,VehicleMake,...,SubjectWeight,SubjectEyeColor,SubjectHairColor,SubjectDriverLicState,CreatedDateTime,LastModifiedDateTime,Longitude,Latitude,Zip,BlockAddress
0,17415,,01/01/2010 01:11:00 AM,6,E,6th District,TRAFFIC VIOLATION,,2005.0,DODGE,...,160.0,Brown,Black,LA,01/01/2010 01:26:26 AM,,0.0,0.0,,
1,17416,,01/01/2010 02:06:00 AM,5,D,5th District,CALL FOR SERVICE,,,,...,140.0,Brown,Black,,01/01/2010 02:27:38 AM,,0.0,0.0,,
2,17416,,01/01/2010 02:06:00 AM,5,D,5th District,CALL FOR SERVICE,,,,...,145.0,Brown,Black,,01/01/2010 02:27:38 AM,,0.0,0.0,,
3,17416,,01/01/2010 02:06:00 AM,5,D,5th District,CALL FOR SERVICE,,,,...,140.0,Brown,Black,,01/01/2010 02:27:38 AM,,0.0,0.0,,
4,17416,,01/01/2010 02:06:00 AM,5,D,5th District,CALL FOR SERVICE,,,,...,140.0,Brown,Black,,01/01/2010 02:27:38 AM,,0.0,0.0,,


In [18]:
sns_df.columns

Index([u'FieldInterviewID', u'NOPD_Item', u'EventDate', u'District', u'Zone',
       u'OfficerAssignment', u'StopDescription', u'ActionsTaken',
       u'VehicleYear', u'VehicleMake', u'VehicleModel', u'VehicleStyle',
       u'VehicleColor', u'SubjectID', u'SubjectRace', u'SubjectGender',
       u'SubjectAge', u'SubjectHasPhotoID', u'SubjectHeight', u'SubjectWeight',
       u'SubjectEyeColor', u'SubjectHairColor', u'SubjectDriverLicState',
       u'CreatedDateTime', u'LastModifiedDateTime', u'Longitude', u'Latitude',
       u'Zip', u'BlockAddress'],
      dtype='object')

In [19]:
len(sns_df.NOPD_Item)

430607

## C. Filter Records for Year 2014 to 2016

In [21]:
# Convert column type to datetime
sns_df.EventDate = pd.to_datetime(sns_df.EventDate)

In [22]:
# Apply filter for year range 2014-2016
sns_df = sns_df[(sns_df.EventDate >= datetime.date(2014,1,1)) \
                & (sns_df.EventDate < datetime.date(2017,1,1))]

In [23]:
max(sns_df.EventDate)

Timestamp('2016-12-31 23:25:00')

In [24]:
min(sns_df.EventDate)

Timestamp('2014-01-01 00:20:00')

In [25]:
# Unique NOPD Item
tot_sns = len(set(sns_df.NOPD_Item))
tot_sns

132791

# 3. Get beat for SNS

In [26]:
# Merge SNS and CFS Data
cfs_sns_df = pd.merge(sns_df, cfs_beat , how='left', on='NOPD_Item')

In [27]:
cfs_sns_df.head()

Unnamed: 0,FieldInterviewID,NOPD_Item,EventDate,District,Zone,OfficerAssignment,StopDescription,ActionsTaken,VehicleYear,VehicleMake,...,SubjectEyeColor,SubjectHairColor,SubjectDriverLicState,CreatedDateTime,LastModifiedDateTime,Longitude,Latitude,Zip,BlockAddress,Beat
0,148175,B2251112,2015-12-22 23:41:00,8,F,8th District,TRAFFIC VIOLATION,,,,...,Brown,Brown,,02/16/2012 12:29:13 AM,,0.0,0.0,,N Rampart & St Roch,
1,232445,A2171314,2014-01-17 16:15:00,5,C,Traffic,TRAFFIC VIOLATION,,2005.0,DODGE,...,,,,01/21/2014 02:03:14 PM,,-90.060016,30.003164,70122.0,044XX Elysian Fields Ave,3Q01
2,246870,F2697114,2014-06-20 22:40:00,8,D,8th District,OTHER,,,,...,Brown,Brown,LA,06/20/2014 11:54:50 PM,,-90.067523,29.951844,70130.0,005XX Canal St,8G04
3,246870,F2697114,2014-06-20 22:40:00,8,D,8th District,OTHER,,,,...,Brown,Brown,LA,06/20/2014 11:54:50 PM,,-90.067523,29.951844,70130.0,005XX Canal St,8G04
4,246870,F2697114,2014-06-20 22:40:00,8,D,8th District,OTHER,,,,...,Brown,Brown,LA,06/20/2014 11:54:50 PM,,-90.067523,29.951844,70130.0,005XX Canal St,8G04


In [28]:
# Unique NOPD Item
tot_sns_beat = len(set(cfs_sns_df.NOPD_Item[cfs_sns_df.Beat.notnull()]))
tot_sns_beat

131370

In [29]:
# Calculate % of records with beats
sns_beat_available = tot_sns_beat*100.0/tot_sns

In [30]:
# % of records missing beat
100 - sns_beat_available

1.0701026424983553