This JNB analyzes shot spotter events in Chicago Police District 3 (D3). In particular, a prediction is made whether a shooting event can be linked to a shotspotter event based on time and position.

<b3> Model</h3>
The Chicago Data Portal contains detailed information about both shotspotter and shooting incidents. However, these data sets are completely separate, so it is not a simple data analysis to determine which shotspotter events correspond to which shooting incidents. 

To match Chicago data portal shotspotter and shooting incidents, we need to match both spatial and temporal data. The method we propose is called Proximity-Temporal-Linking  PTL, outlined as follows:


<h4>STEP ONE:Construct the P Matrix </h4> First, construct the $n_1$x$n_2$ $proximity \,matrix$ <b>P</b> where the first data set has $n_1$ shooting incident locations and the second site has $n_2$ shot spotter incident locations. The $p_{ij}$ entry of <b>P</b> gives the distance from the $i^{th}$ shooting incident ($1\le i \le n_1)$ to the $j^{th}$ shot spotter incident ($1\le j\le n_2$.

<h4> STEP TWO: Construct the T Matrix</h4>
Next, we construct the $n_1$x$n_2$ $temporal\, matrix$ <b>T</b> whose $t_{ij}$ entry gives the difference between the  time of  $j^{th}$ shot spotter incident ($1\le j\le n_2$) the and the time of the  $i^{th}$ shooting incident ($1\le i \le n_1$). The sign of  $t_{ij}$ is important since we assume a shotspotter incident occurs before a related shooting incident ($t_{ij}$ ). 

<h4> STEP THREE: Compute the linkage marix L</h4>  The linkage matrix <b>L</b> is a binary matrix whose $l_{ij}$ entry equals 1 if the   $i^{th}$ shooting incident ($1\le i\le n_1$) is predicted by the model to correspond to the  $j^{th}$ shotspotter incident ($1\le j \le n_2$).  Note that
\begin{equation}
    L=f(P,T)
\end{equation}
where different choices are possible for the function $f$.  

The simplest choice of $f(P,T)$ is a deterministic $threshold \, model$ which sets a maximum distance $\delta$ and time separation $\tau$ for a shotspotter and shooting incident to be linked.

\begin{equation}
    l_{ij} = 1 \,if\,and\,only\,if\, [p_{ij}<\delta \, and \, 0\le t_{ij}<\tau]
\end{equation}

The $linking\, distribution$ $\mathcal{L}(\delta,\tau)$ gives the distribution in the number of shotspotter incidents which are linked to a specific shooting incident.

1) Import Libraries

In [1]:
import numpy as np
import pandas as pd

2) Import the Chicago shotspotter data and filter for D3.

In [67]:
shot_data_raw = pd.read_csv('Shot_data.csv')
shot_data_raw=shot_data_raw[shot_data_raw["DISTRICT"]==3.0]

shot_data_raw=shot_data_raw.reset_index(drop=True)
shot_data_raw.head(5)

Unnamed: 0,DATE,BLOCK,ZIP_CODE,WARD,COMMUNITY_AREA,AREA,DISTRICT,BEAT,STREET_OUTREACH_ORGANIZATION,UNIQUE_ID,MONTH,DAY_OF_WEEK,HOUR,INCIDENT_TYPE_DESCRIPTION,ROUNDS,ILLINOIS_HOUSE_DISTRICT,ILLINOIS_SENATE_DISTRICT,LATITUDE,LONGITUDE,LOCATION
0,09/24/2021 09:02:22 PM,7300 S COLES AVE,60649.0,7.0,SOUTH SHORE,1.0,3.0,334.0,Claretian Associates South Shore,SST-346981,9,6,21,MULTIPLE GUNSHOTS,2,25.0,13.0,41.762513,-87.560759,POINT (-87.560759406201 41.762512616858)
1,09/24/2021 12:26:46 AM,6900 S EAST END,60649.0,5.0,SOUTH SHORE,1.0,3.0,332.0,Claretian Associates South Shore,SST-346876,9,6,0,SINGLE GUNSHOT,1,26.0,13.0,41.768989,-87.583176,POINT (-87.583175990799 41.768988689342)
2,09/24/2021 12:43:56 AM,6900 S EAST END,60649.0,5.0,SOUTH SHORE,1.0,3.0,332.0,Claretian Associates South Shore,SST-346878,9,6,0,MULTIPLE GUNSHOTS,5,26.0,13.0,41.76987,-87.583616,POINT (-87.58361593052 41.769869904558)
3,09/24/2021 07:46:20 AM,1400 E 69TH ST,60637.0,5.0,SOUTH SHORE,1.0,3.0,332.0,Claretian Associates South Shore,SST-346896,9,6,7,SINGLE GUNSHOT,1,26.0,13.0,41.76957,-87.589931,POINT (-87.589931280599 41.769569552942)
4,09/24/2021 11:10:55 PM,6800 S DANTE AVE,60637.0,5.0,SOUTH SHORE,1.0,3.0,332.0,Claretian Associates South Shore,SST-347006,9,6,23,SINGLE GUNSHOT,1,26.0,13.0,41.769703,-87.589315,POINT (-87.589314948699 41.769703449242)


3) Separate the date from the time.

In [68]:
for i in shot_data_raw.index:
    d=shot_data_raw.loc[i,"DATE"]
    a=d.split("A")
    b=d.split("P")
    if len(a[0])<len(d):
        shot_data_raw.loc[i,"DT"]=a[0]
        shot_data_raw.loc[i,"AMPM"]="AM"
    else:
        shot_data_raw.loc[i,"DT"]=b[0]
        shot_data_raw.loc[i,"AMPM"]="PM"        
shot_data_raw.head(2)

Unnamed: 0,DATE,BLOCK,ZIP_CODE,WARD,COMMUNITY_AREA,AREA,DISTRICT,BEAT,STREET_OUTREACH_ORGANIZATION,UNIQUE_ID,...,HOUR,INCIDENT_TYPE_DESCRIPTION,ROUNDS,ILLINOIS_HOUSE_DISTRICT,ILLINOIS_SENATE_DISTRICT,LATITUDE,LONGITUDE,LOCATION,DT,AMPM
0,09/24/2021 09:02:22 PM,7300 S COLES AVE,60649.0,7.0,SOUTH SHORE,1.0,3.0,334.0,Claretian Associates South Shore,SST-346981,...,21,MULTIPLE GUNSHOTS,2,25.0,13.0,41.762513,-87.560759,POINT (-87.560759406201 41.762512616858),09/24/2021 09:02:22,PM
1,09/24/2021 12:26:46 AM,6900 S EAST END,60649.0,5.0,SOUTH SHORE,1.0,3.0,332.0,Claretian Associates South Shore,SST-346876,...,0,SINGLE GUNSHOT,1,26.0,13.0,41.768989,-87.583176,POINT (-87.583175990799 41.768988689342),09/24/2021 12:26:46,AM


4) Streamline the dataframe.

In [69]:
shot_data_raw=shot_data_raw[["DATE","BLOCK","BEAT","HOUR","ROUNDS","LATITUDE","LONGITUDE","LOCATION","DT","AMPM"]]
for i in shot_data_raw.index:
    x=shot_data_raw.loc[i,"DT"]
    x1=x.split(' ')
    x2=x1[0].split('/')
    shot_data_raw.loc[i,"time"]=x1[1]
    shot_data_raw.loc[i,'month'] =x2[0]
    shot_data_raw.loc[i,"day"]=x2[1]
    shot_data_raw.loc[i,"year"]=x2[2]
shot_data_raw = shot_data_raw[shot_data_raw["year"] == '2019']
shot_data=shot_data_raw[["DATE","BLOCK",	"BEAT", "DT","year","month",	"day","time","AMPM","HOUR", "ROUNDS",	"LATITUDE",	"LONGITUDE","LOCATION"]]
shot_data = shot_data.reset_index(drop=True)
shot_data.head(2)

Unnamed: 0,DATE,BLOCK,BEAT,DT,year,month,day,time,AMPM,HOUR,ROUNDS,LATITUDE,LONGITUDE,LOCATION
0,09/10/2019 03:57:43 PM,6700 S SAINT LAWRENCE,321.0,09/10/2019 03:57:43,2019,9,10,03:57:43,PM,15,2,41.771773,-87.610178,POINT (-87.610178153001 41.771772910458)
1,11/29/2019 11:57:40 PM,7100 S RHODES AVE,323.0,11/29/2019 11:57:40,2019,11,29,11:57:40,PM,23,1,41.764712,-87.611835,POINT (-87.611834809 41.764711658)


5) Read in the shooting event data.

In [70]:
hom_data_raw = pd.read_csv('hom_data_raw.csv')
hom_data_raw.columns

Index(['Unnamed: 0', 'CASE_NUMBER', 'DATE', 'BLOCK', 'VICTIMIZATION_PRIMARY',
       'INCIDENT_PRIMARY', 'GUNSHOT_INJURY_I', 'UNIQUE_ID', 'ZIP_CODE', 'WARD',
       'COMMUNITY_AREA', 'STREET_OUTREACH_ORGANIZATION', 'AREA', 'DISTRICT',
       'BEAT', 'AGE', 'SEX', 'RACE', 'VICTIMIZATION_FBI_CD', 'INCIDENT_FBI_CD',
       'VICTIMIZATION_FBI_DESCR', 'INCIDENT_FBI_DESCR',
       'VICTIMIZATION_IUCR_CD', 'INCIDENT_IUCR_CD',
       'VICTIMIZATION_IUCR_SECONDARY', 'INCIDENT_IUCR_SECONDARY',
       'HOMICIDE_VICTIM_FIRST_NAME', 'HOMICIDE_VICTIM_MI',
       'HOMICIDE_VICTIM_LAST_NAME', 'MONTH', 'DAY_OF_WEEK', 'HOUR',
       'LOCATION_DESCRIPTION', 'STATE_HOUSE_DISTRICT', 'STATE_SENATE_DISTRICT',
       'UPDATED', 'LATITUDE', 'LONGITUDE', 'LOCATION', 'time', 'day', 'year'],
      dtype='object')

6) Filter the shooting data for D3 and create a column "spotted" to indicate which events are linked to shotspotter incidents (note that there might be several linked shooting and shotspotter incidents based on the threshold values).

In [71]:
hom_data = hom_data_raw[['DATE','GUNSHOT_INJURY_I','DISTRICT','MONTH','LATITUDE', 'LONGITUDE','LOCATION','time','day','year']]
hom_data = hom_data[hom_data['year']==2019]
hom_data = hom_data[hom_data['DISTRICT'] == 3]
hom_data = hom_data.reset_index(drop=True)
hom_data['spotted'] = 0
hom_data.head(5)

Unnamed: 0,DATE,GUNSHOT_INJURY_I,DISTRICT,MONTH,LATITUDE,LONGITUDE,LOCATION,time,day,year,spotted
0,02/03/2019 02:01:00 AM,YES,3.0,2,41.758552,-87.601815,POINT (-87.60181534612 41.758551654142),02:01:00,3,2019,0
1,02/03/2019 02:01:00 AM,YES,3.0,2,41.758552,-87.601815,POINT (-87.60181534612 41.758551654142),02:01:00,3,2019,0
2,02/03/2019 05:24:00 AM,YES,3.0,2,41.783391,-87.62234,POINT (-87.62234 41.7833905),05:24:00,3,2019,0
3,02/10/2019 08:21:00 PM,YES,3.0,2,41.767022,-87.581545,POINT (-87.58154534612 41.767021654142),08:21:00,10,2019,0
4,04/02/2019 11:52:00 PM,YES,3.0,4,41.783653,-87.616109,POINT (-87.616109058699 41.783653095858),11:52:00,2,2019,0


7) Import libraries used to link positions and times of shotspotter and shooting incidents.

In [7]:
!!pip install geopy
from geopy import distance
import datetime
import re

8) Create variables 'strdate1' and 'strdate2' with date and time information for the 90th shooting and 181st shot spotter incidents.

In [8]:
strdate1 = hom_data.loc[90,"DATE"]
strdate2 = shot_data.loc[181,"DATE"]
date1 = datetime.datetime.strptime(strdate1, "%m/%d/%Y %I:%M:%S %p")
date2 = datetime.datetime.strptime(strdate2, "%m/%d/%Y %I:%M:%S %p")
dt = date2-date1
print(date1)
print(date2)
print(int(dt.total_seconds()/60))

2019-08-13 22:22:00
2019-08-24 01:55:44
14613


9) Compute the P and T matrices as described in the introduction.

In [9]:
P = np.zeros((hom_data.shape[0], shot_data.shape[0]))
T = np.zeros((hom_data.shape[0], shot_data.shape[0]))
for i in hom_data.index:
    for j in shot_data.index:
        #if int(shot_data.loc[j,"year"])==hom_data.loc[i,"year"]&int(shot_data.loc[j,"month"])==hom_data.loc[i,"MONTH"]&int(shot_data.loc[j,"day"])==hom_data.loc[i,"day"]:
        pt1 = [hom_data.loc[i,"LATITUDE"],hom_data.loc[i,"LONGITUDE"]]
        pt2 = [shot_data.loc[j,"LATITUDE"],shot_data.loc[j,"LONGITUDE"]]
        P[i][j] = distance.distance(pt1, pt2).km
        strdate1 = hom_data.loc[i,"DATE"]
        strdate2 = shot_data.loc[j,"DATE"]
        date1 = datetime.datetime.strptime(strdate1, "%m/%d/%Y %I:%M:%S %p")
        date2 = datetime.datetime.strptime(strdate2, "%m/%d/%Y %I:%M:%S %p")
        dt = date2-date1
        T[i][j] = int(dt.total_seconds() / 60)
#        else:
#            P[i][j] = -1
#            T[i][j] = -1

In [10]:
print("The distance (in km) between the shooting event 2  and shotspotter incident 3 is ", P[2][3])
print("The time (in minutes) between shotspotter  incident 3 and the shooting incident 2 and  is ", T[2][3])

The distance (in km) between the shooting event 2  and shotspotter incident 3 is  4.132770652859228
The time (in minutes) between shotspotter  incident 3 and the shooting incident 2 and  is  375555.0


10) Create a function which is used to fill the 'spotted' column of hom_data

In [11]:
def linked(P, T, delta, tau):
    for k in range(P.shape[0]):
        for q in range(P.shape[1]):
            if P[k][q] < delta and T[k][q] < tau and T[k][q] >= 0:
                hom_data.loc[k, 'spotted'] = hom_data.loc[k,'spotted']+ 1

11) Check how many rows (shootings) and columns (shot potter incidents)

In [None]:
P.shape
#for i in np.arange(P.shape[0]):
#    for k in np.arange(P.shape[1]):
#        print(T[i][k])

12) Apply the function for the threshold values $\delta=.6$ (km) and $\tau=60$ (minutes).

In [12]:
delta = .6
tau = 60
linked(P,T,delta,tau)

In [20]:
hom_data['GUNSHOT_INJURY_I'].value_counts()

YES    148
NO       5
Name: GUNSHOT_INJURY_I, dtype: int64

13) Print the incidents which are linked to shot spotter events.

In [14]:
for i in hom_data.index:
    if hom_data.loc[i,'spotted'] >= 1:
        print(hom_data.loc[i])

DATE                02/03/2019 02:01:00 AM
GUNSHOT_INJURY_I                       YES
DISTRICT                                 3
MONTH                                    2
LATITUDE                           41.7586
LONGITUDE                         -87.6018
time                              02:01:00
day                                      3
year                                  2019
spotted                                  2
Name: 0, dtype: object
DATE                02/03/2019 02:01:00 AM
GUNSHOT_INJURY_I                       YES
DISTRICT                                 3
MONTH                                    2
LATITUDE                           41.7586
LONGITUDE                         -87.6018
time                              02:01:00
day                                      3
year                                  2019
spotted                                  2
Name: 1, dtype: object
DATE                04/08/2019 12:15:00 AM
GUNSHOT_INJURY_I                       YES
DISTRICT

14) Compute the linking distribution $\mathcal{L}(\delta,\tau)$ giving the number of linked shotspotter incidents linked to a particular shooting incident.

In [15]:
hom_data['spotted'].value_counts()

0    84
1    36
3    17
2    11
4     4
5     1
Name: spotted, dtype: int64

15) Create a function which has the threshold values $\delta$ and $\tau$ for inputs and outputs the linking distribution $\mathcal{L}(\delta,\tau)$ for a given year and month. (The function assumes geopy has been installed)

In [72]:
def LD(hom_data_raw,shot_data,year,month,delta,tau):
    #install libraries
    from geopy import distance
    import datetime
    #get the threshold values
    delta = delta
    tau = tau
    hom_data = hom_data_raw[['DATE','GUNSHOT_INJURY_I','DISTRICT','MONTH','LATITUDE', 'LONGITUDE','LOCATION','time','day','year']]
    hom_data = hom_data[hom_data['year']==year]
    hom_data = hom_data[hom_data['MONTH']==month]
    hom_data = hom_data[hom_data['DISTRICT'] == 3]
    hom_data = hom_data.reset_index(drop=True)
    hom_data['spotted'] = 0
    #define the function to determine linkage of shotspotter and shooting incidents based on the threshold values
    def linked(P, T, delta, tau):
        for k in range(P.shape[0]):
            for q in range(P.shape[1]):
                if P[k][q] < delta and T[k][q] < tau and T[k][q] >= 0:
                    hom_data.loc[k, 'spotted'] = hom_data.loc[k,'spotted']+ 1
    #Create the P and T matrices
    P = np.zeros((hom_data.shape[0], shot_data.shape[0]))
    T = np.zeros((hom_data.shape[0], shot_data.shape[0]))
    for i in hom_data.index:
        for j in shot_data.index:
            #if int(shot_data.loc[j,"year"])==hom_data.loc[i,"year"]&int(shot_data.loc[j,"month"])==hom_data.loc[i,"MONTH"]&int(shot_data.loc[j,"day"])==hom_data.loc[i,"day"]:
            location1 = hom_data.loc[i,"LOCATION"]
            location2 = shot_data.loc[j,"LOCATION"]
            a1 = location1.split(' ')
            a2 = location2.split(' ')
            pt1 = [float(re.sub('[^0-9.-]','', a1[2])),float(re.sub('[^0-9.-]','', a1[1]))]
            pt2 = [float(re.sub('[^0-9.-]','', a2[2])),float(re.sub('[^0-9.-]','', a2[1]))]
            #pt1 = [hom_data.loc[i,"LATITUDE"],hom_data.loc[i,"LONGITUDE"]]
            #pt2 = [shot_data.loc[j,"LATITUDE"],shot_data.loc[j,"LONGITUDE"]]
            P[i][j] = distance.distance(pt1, pt2).km
            strdate1 = hom_data.loc[i,"DATE"]
            strdate2 = shot_data.loc[j,"DATE"]
            date1 = datetime.datetime.strptime(strdate1, "%m/%d/%Y %I:%M:%S %p")
            date2 = datetime.datetime.strptime(strdate2, "%m/%d/%Y %I:%M:%S %p")
            dt = date2-date1
            T[i][j] = int(dt.total_seconds() / 60)
    #Apply the function linked to the P and T matrices
    linked(P,T,delta,tau)
    #return the linking distribution
    return hom_data['spotted'].value_counts()

16) Find the linking distribution for September 2019

In [73]:
Sept19=LD(hom_data,shot_data,2019,9,.6,60)

In [74]:
for i in np.arange(0,len(Sept19)):
    print("In Sept 19, the number of shooting incidents linked to " + str(i) + " shot spotter incidents was ",Sept19[i])

In Sept 19, the number of shooting incidents linked to 0 shot spotter incidents was  8
In Sept 19, the number of shooting incidents linked to 1 shot spotter incidents was  5
In Sept 19, the number of shooting incidents linked to 2 shot spotter incidents was  2
In Sept 19, the number of shooting incidents linked to 3 shot spotter incidents was  5


17) Find the linking distribution for all months in 2019. L19[0] stores the distribution for Jan 2019, L19[1] for Feb 2019 etc.

In [75]:
L19=list_of_num = [[]] * 12
for mo in np.arange(0,12,1):
    L19[mo]=LD(hom_data,shot_data,2019,mo+1,.6,60)
    for i in np.arange(0,L19[mo].index.max()+1,1):  #handles cases where there is missing number of shotspotter incidents 
        if i in L19[mo].index:        
            print("In "+str(mo+1)+'/19', "the number of shooting incidents linked to " + str(i) + " shot spotter incidents was ",L19[mo][i])

In 1/19 the number of shooting incidents linked to 0 shot spotter incidents was  10
In 2/19 the number of shooting incidents linked to 0 shot spotter incidents was  5
In 2/19 the number of shooting incidents linked to 1 shot spotter incidents was  4
In 2/19 the number of shooting incidents linked to 2 shot spotter incidents was  2
In 3/19 the number of shooting incidents linked to 0 shot spotter incidents was  4
In 3/19 the number of shooting incidents linked to 1 shot spotter incidents was  2
In 3/19 the number of shooting incidents linked to 2 shot spotter incidents was  2
In 3/19 the number of shooting incidents linked to 3 shot spotter incidents was  2
In 4/19 the number of shooting incidents linked to 0 shot spotter incidents was  10
In 4/19 the number of shooting incidents linked to 1 shot spotter incidents was  2
In 4/19 the number of shooting incidents linked to 2 shot spotter incidents was  1
In 5/19 the number of shooting incidents linked to 0 shot spotter incidents was  7
In

<h3>NEXT STEPS</h3>

1) For D3, make a month by month graph of the linkage ratio $\lambda$, that is, the proportion of shooting incidents in each month that are linked to at least 1 shotspotter event.

2) Explore how the linking distribution changes with the choice of thresholds $\delta$ and $\tau$.  When does the linkage distribution become binary (0 or 1)?

3) Find the police district for LaVillita (where Adam Toledo was shot) and do the same month by month analysis as GP 1 is doing for D3.


OTHER: