# Offensive Prescriptions: Are some medications more corellated with criminal offenses?

Here we examine the police offenses and prescription data from England to try to find out.


### Let's start by importing a subset of the police data and expand from there

In [1]:
import pandas as pd
import time
import seaborn as sb
import matplotlib.pyplot as plt

In [2]:
police = pd.read_csv("./police-data/2016-05/2016-05-metropolitan-street.csv")

In [3]:
paths = []
for j in range(2016,2018):
    j = str(j)
    for i in range(1,13):
        i = str(i)
        if len(i)==1:
            i = '0'+i
        paths.append('./police-data/'+j+'-'+i+'/'+j+'-'+i+'-metropolitan-street.csv')
        #paths.append('./201703to202002police/'+j+'-'+i+'/'+j+'-'+i+'-city-of-london-street.csv')
paths = paths[5:17]

In [4]:
police.size

1042980

In [5]:
for i in paths:
    police = pd.concat([police,pd.read_csv(i)])

In [6]:
police.tail(100)

Unnamed: 0,Crime ID,Month,Reported by,Falls within,Longitude,Latitude,Location,LSOA code,LSOA name,Crime type,Last outcome category,Context
90923,3bc94717716605bdc79459ffbb691a6cd51bf5db774a16...,2017-05,Metropolitan Police Service,Metropolitan Police Service,,,No Location,,,Violence and sexual offences,Under investigation,
90924,d22310f14472396d9a3df805dcc403bd4a4cd41330f123...,2017-05,Metropolitan Police Service,Metropolitan Police Service,,,No Location,,,Violence and sexual offences,Under investigation,
90925,558239a12641f4d50d3cc096bc26a755151909b0562c9c...,2017-05,Metropolitan Police Service,Metropolitan Police Service,,,No Location,,,Violence and sexual offences,Under investigation,
90926,5666d161ace624eefd199bb42c0d5dce0a009e83e92ce6...,2017-05,Metropolitan Police Service,Metropolitan Police Service,,,No Location,,,Violence and sexual offences,Under investigation,
90927,6cef642c5ab4d35401b1263c73bc3a30940a9d052619e3...,2017-05,Metropolitan Police Service,Metropolitan Police Service,,,No Location,,,Violence and sexual offences,Under investigation,
90928,bd9ca56fd2e9dc8a103378682226d39b4edb1294bf8234...,2017-05,Metropolitan Police Service,Metropolitan Police Service,,,No Location,,,Violence and sexual offences,Under investigation,
90929,dc2475281e66c472c8f24bbf3c5ea5575233e3395f1b6b...,2017-05,Metropolitan Police Service,Metropolitan Police Service,,,No Location,,,Violence and sexual offences,Under investigation,
90930,c03df1521a386f00b2cb9143f772c478c9da183821fd3c...,2017-05,Metropolitan Police Service,Metropolitan Police Service,,,No Location,,,Violence and sexual offences,Under investigation,
90931,2d4c2ec1e536fefcea0bd1fdf5711ee14d66a9473b1118...,2017-05,Metropolitan Police Service,Metropolitan Police Service,,,No Location,,,Violence and sexual offences,Under investigation,
90932,a29cc0aaacd6e235af135fc0e501531ed4bb71e3d3e0e7...,2017-05,Metropolitan Police Service,Metropolitan Police Service,,,No Location,,,Violence and sexual offences,Under investigation,


In [7]:
police['count']=1

In [8]:
police.groupby(['Crime type']).sum()

Unnamed: 0_level_0,Longitude,Latitude,Context,count
Crime type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Anti-social behaviour,-32869.013786,13793470.0,0.0,267869
Bicycle theft,-2799.342696,1085292.0,0.0,21331
Burglary,-9287.008595,3884668.0,0.0,75590
Criminal damage and arson,-7730.51013,3488120.0,0.0,68174
Drugs,-4218.291204,1778344.0,0.0,35425
Other crime,-1695.772136,552690.9,0.0,11374
Other theft,-14615.782048,6005290.0,0.0,118058
Possession of weapons,-633.580909,295654.3,0.0,5808
Public order,-6033.604175,2575669.0,0.0,51001
Robbery,-2728.409722,1382207.0,0.0,27290


### The GP prescription data has a lot of work that needs to be done

Particularly, the locations (preferably latitude and longitude) of the practices needs to be identified from the postal codes.

In [9]:
practice_header = ['practice','name','location_1','location_2','location_3','location_4','postal_code']
practice = pd.read_csv('./general-practice-prescribing-data/practices.csv',names = practice_header).dropna(subset=['postal_code'])

In [10]:
practice.head()

Unnamed: 0,practice,name,location_1,location_2,location_3,location_4,postal_code
0,A81001,THE DENSHAM SURGERY,THE HEALTH CENTRE,LAWSON STREET,STOCKTON ON TEES,CLEVELAND,TS18 1HU
1,A81002,QUEENS PARK MEDICAL CENTRE,QUEENS PARK MEDICAL CTR,FARRER STREET,STOCKTON ON TEES,CLEVELAND,TS18 2AW
2,A81003,VICTORIA MEDICAL PRACTICE,THE HEALTH CENTRE,VICTORIA ROAD,HARTLEPOOL,CLEVELAND,TS26 8DB
3,A81004,WOODLANDS ROAD SURGERY,6 WOODLANDS ROAD,,MIDDLESBROUGH,CLEVELAND,TS1 3BE
4,A81005,SPRINGWOOD SURGERY,SPRINGWOOD SURGERY,RECTORY LANE,GUISBOROUGH,,TS14 7DJ


### Let's try a lookup method of determining coordinates from name and postal_code

In [11]:
from geopy.geocoders import Nominatim

In [12]:
practice_location = practice.drop(['practice','location_1','location_2','location_3','location_4'], axis=1)

In [13]:
t = practice_location.iloc[1].to_list()
t = t[0]+' '+t[1]
t

'QUEENS PARK MEDICAL CENTRE TS18 2AW'

In [14]:
st = time.time()
geolocator = Nominatim(user_agent="Offensive Prescriptions")
location = geolocator.geocode(t)
print(time.time() - st)
print(location)
print((location.latitude, location.longitude))

0.5826442241668701
Queens Park Medical Centre, Farrer Street, Portrack, Stockton-on-Tees, North East England, England, TS18, United Kingdom
(54.56917125, -1.313866888882218)


It works, but takes about .5 seconds to lookup. That means a total of about 12 hours to lookup all 84140 practices. No problem normally, but time is limited now.

In [15]:
t = practice_location.iloc[2].to_list()
t = t[0]+' '+t[1]
t

'VICTORIA MEDICAL PRACTICE TS26 8DB'

In [16]:
st = time.time()
geolocator = Nominatim(user_agent="Offensive Prescriptions")
location = geolocator.geocode(t)
print(time.time() - st)
print(location)
if location:
    print((location.latitude, location.longitude))

0.5243399143218994
None


Sometimes it doesn't work and until we process all of them we don't really know how many will fail. Google found this without problem, but open street map couldn't.

### For now let's use a table of latitude and longitude for the postal codes.

This will be less accurate, but may be good enough to get started

In [17]:
postcodes = pd.read_csv('./ukpostcodes.csv').drop(['id'], axis = 1)

In [18]:
postcodes.head()

Unnamed: 0,postcode,latitude,longitude
0,AB10 1XG,57.144165,-2.114848
1,AB10 6RN,57.13788,-2.121487
2,AB10 7JB,57.124274,-2.12719
3,AB11 5QN,57.142701,-2.093295
4,AB11 6UL,57.137547,-2.112233


In [19]:
postcodes.latitude[postcodes.postcode == 'AB10 1XG']

0    57.144165
Name: latitude, dtype: float64

In [20]:
postcodes = postcodes.set_index('postcode')
practice = practice.set_index('postal_code')
practice = practice.join(postcodes)

In [21]:
postcodes = None

In [22]:
practice = practice.reset_index()

In [23]:
practice['index'].iloc[1]

'AL1 3JB'

### Now we want to make a new giant dataframe containing the distance from each practice to each crime.

Ultimately this will be even bigger because we will want to know this data on a per medication basis for the top N medications.

Since we are currently looking only at London data, let's limit the practices to that geographical area.

In [24]:
police = police.dropna(subset=['Latitude',"Longitude"])

In [25]:
print(police.Latitude.min(),police.Longitude.min())
print(police.Latitude.max(),police.Longitude.max())

50.120106 -5.534661
55.79064399999999 1.732103


In [26]:
jointable = pd.read_json('./general-practice-prescribing-data/column_remapping.json').drop(['bnf_code','bnf_name'], axis = 1)

In [27]:
prescribe = pd.read_csv("./general-practice-prescribing-data/T201605PDPI+BNFT.csv")

In [28]:
prescribe['month'] = 201605

In [29]:
prescribe.size

80769704

In [30]:
preappend = pd.read_csv("./general-practice-prescribing-data/T201606PDPI+BNFT.csv")
preappend['month'] = 201606
prescribe = pd.concat([prescribe,preappend])
preappend = pd.read_csv("./general-practice-prescribing-data/T201606PDPI+BNFT.csv")
preappend['month'] = 201606
prescribe = pd.concat([prescribe,preappend])
preappend = None

In [31]:
prescribe.size

244610552

In [32]:
# paths = []
# for j in range(2016,2018):
#     j = str(j)
#     for i in range(1,13):
#         i = str(i)
#         if len(i)==1:
#             i = '0'+i
#         paths.append('./general-practice-prescribing-data/T'+j+i+'PDPI+BNFT.csv')
#         #paths.append('./201703to202002police/'+j+'-'+i+'/'+j+'-'+i+'-city-of-london-street.csv')
# paths = paths[5:4+13]
# paths

Loading all the months is too much for my toaster to handle.

In [33]:
# for i in paths:
#     t = pd.read_csv(i)
#     t['month'] = i[37:43]
#     prescribe = pd.concat([prescribe,pd.read_csv(i)])

Bringing it all together in the prescribe dataframe:

In [34]:
prescribe = prescribe.set_index('practice').join(jointable)

In [35]:
prescribe = prescribe.set_index('practice')

In [36]:
prescribe = prescribe.join(practice.set_index('practice'))

In [37]:
prescribe = prescribe.reset_index()

### Let's try to reduce the prescriber data by focusing on the top most prescribed medications

In [39]:
t = prescribe[['bnf_code',"quantity"]].groupby('bnf_code').sum()

In [40]:
t=t.reset_index()

In [41]:
tcode = t.bnf_code.tolist()
t = None

In [42]:
p = prescribe[prescribe.bnf_code.isin(tcode)]

In [43]:
p.size

540667760

In [44]:
p = p.drop(['name','location_1','location_2','location_3','location_4'],axis=1)

### Here we now need to give each crime a medication weight which should be something like the sum of the items/(distance from crime to prescriber)

So we need to do this for each item in police...

In [45]:
def distance(la1,lo1,la2,lo2):
    return ((la1-la2)**2+(lo1-lo2)**2)**.5
def weight(items, dist):
    return items/dist

In [46]:
# for i in tcode:
#     police[str(i)] = p[p.bnf_code == i]

I don't have the ability yet to construct this dataframe using pandas. I'm going to try to do it by building a dictionary

In [47]:
# st = time.time()
# weight_sum = 0
# for j in p.iterrows():
#         #w[str(i)] = 
#         count+=1
# print(time.time()-st)

This takes about a minute to run. I would have to run a similar loop once for every crime. That would take the better part of a year to execute. This is too slow

In [49]:
police['coords'] = list(zip(police['Latitude'],police['Longitude']))

In [116]:
#police.groupby(['coords','Crime type']).sum()
pol = police.groupby('coords').sum()

In [117]:
pol = pol[pol['count']>100]

In [118]:
pol['Latitude'],pol['Longitude'] = zip(*pol.index.tolist())

In [119]:
pol = pol.reset_index()

In [120]:
pol.size

5555

Now that I've cut down the police data by grouping coordinates together, which seemed justified considering the coordinate anonymizing done by the police, I think I can construct my dictionary.

In [121]:
ptemp = p.groupby('bnf_code').sum()

In [122]:
ptemp = ptemp[ptemp.quantity>160000000]
ptemp = ptemp.index.tolist()

In [123]:
p = p[p.bnf_code.isin(ptemp)]

In [124]:
p.size

1097630

In [125]:
p['coords'] = list(zip(p['latitude'],p['longitude']))

In [126]:
p = p.groupby(['coords','bnf_code']).sum().reset_index()

In [127]:
p['latitude'],p['longitude'] = zip(*p.coords.tolist())

Now I've also cut down the prescriptions, by grouping them together by bnf_code and practice

In [128]:
p.head()

Unnamed: 0,coords,bnf_code,bnf_name,items,nic,act_cost,quantity,month,latitude,longitude
0,"(49.9126243289637, -6.30890191279307)",4234,21539,1,6.32,5.86,500,201605,49.912624,-6.308902
1,"(49.9126243289637, -6.30890191279307)",6079,65061,187,650.93,606.97,27296,604817,49.912624,-6.308902
2,"(49.9126243289637, -6.30890191279307)",8265,70140,148,107.47,101.31,4284,604817,49.912624,-6.308902
3,"(49.9126243289637, -6.30890191279307)",9201,1125,15,90.6,84.05,7500,604817,49.912624,-6.308902
4,"(49.9126243289637, -6.30890191279307)",10242,55935,7,56.32,52.21,5500,604817,49.912624,-6.308902


In [129]:
pol.head()

Unnamed: 0,coords,Longitude,Latitude,Context,count
0,"(51.314582, 0.033772)",0.033772,51.314582,0.0,133
1,"(51.349185, -0.310769)",-0.310769,51.349185,0.0,101
2,"(51.356217, -0.119475)",-0.119475,51.356217,0.0,164
3,"(51.360101, -0.193175)",-0.193175,51.360101,0.0,157
4,"(51.361186, -0.191236)",-0.191236,51.361186,0.0,166


In [130]:
poltest = pol.head().copy()
ptest = p.head().copy()

for i in ptest.bnf_code.unique():
    poltest[str(i)] = 0

    infinite_weight_count = 0

st = time.time()
for i in poltest.iterrows():
    #print(i[1][0])
    #print(time.time()-st)
    for j in ptest.iterrows():
        d = distance(i[1][1],i[1][2],j[1][9],j[1][8])
        if d != 0:
            w = j[1][3]*i[1][4]/d
            #print(w)
            poltest[str(j[1][1])][poltest['coords']==i[1][0]] += w
        else:
            infinite_weight_count += 1

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a

In [131]:
poltest

Unnamed: 0,coords,Longitude,Latitude,Context,count,4234,6079,8265,9201,10242
0,"(51.314582, 0.033772)",0.033772,51.314582,0.0,133,20.47487,3828.800728,3030.28079,307.123053,143.324091
1,"(51.349185, -0.310769)",-0.310769,51.349185,0.0,101,16.375466,3062.212161,2423.568983,245.631992,114.628263
2,"(51.356217, -0.119475)",-0.119475,51.356217,0.0,164,25.804234,4825.391849,3819.026704,387.063517,180.629641
3,"(51.360101, -0.193175)",-0.193175,51.360101,0.0,157,24.981351,4671.512664,3697.239969,374.720267,174.869458
4,"(51.361186, -0.191236)",-0.191236,51.361186,0.0,166,26.404426,4937.627589,3907.85499,396.066384,184.830979


Now that the code is working on the test dataset, let's try it on the bigger dataset.

In [132]:
for i in p.bnf_code.unique():
    pol[str(i)] = 0

infinite_weight_count = 0

st = time.time()
for i in pol.iterrows():
    print(time.time()-st)
    for j in ptest.iterrows():
        d = distance(i[1][1],i[1][2],j[1][9],j[1][8])
        if d != 0:
            w = j[1][3]*i[1][4]/d
            pol[str(j[1][1])][pol['coords']==i[1][0]] += w
        else:
            infinite_weight_count += 1

0.0032126903533935547


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  del sys.path[0]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  del sys.path[0]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  del sys.path[0]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  del sys.path[0]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-

0.35091686248779297
0.6848897933959961
1.0303168296813965
1.3881926536560059
1.7422218322753906
2.084385633468628
2.4285168647766113
2.821878671646118
3.188570737838745
3.533498764038086
3.8632898330688477
4.210028648376465
4.538903713226318
4.8666627407073975
5.187042713165283
5.520383834838867
5.851374864578247
6.182914733886719
6.508701801300049
6.840225696563721
7.166690826416016
7.503913879394531
7.855627059936523
8.217764854431152
8.552702903747559
8.887808799743652
9.266382694244385
9.596272945404053
9.931259870529175
10.337970972061157
10.705729722976685
11.039947032928467
11.444102764129639
11.778329849243164
12.161517858505249
12.496392965316772
12.839256763458252
13.178126811981201
13.522725820541382
13.85883378982544
14.186193943023682
14.522300958633423
14.848817825317383
15.180121898651123
15.506227970123291
15.83255672454834
16.15736174583435
16.486703872680664
16.815143823623657
17.143281936645508
17.472641706466675
17.814913749694824
18.147756814956665
18.4850077629089

151.62613368034363
151.95532393455505
152.2906539440155
152.61158871650696
152.9332458972931
153.26563692092896
153.60200667381287
153.9323809146881
154.25328373908997
154.5914466381073
154.9165906906128
155.23534274101257
155.55998992919922
155.90249395370483
156.23062682151794
156.56339573860168
156.89079976081848
157.21427202224731
157.55283784866333
157.88004779815674
158.2100167274475
158.54166388511658
158.87122678756714
159.2016179561615
159.5337369441986
159.86355876922607
160.1943998336792
160.51876378059387
160.84348273277283
161.1677098274231
161.49766087532043
161.81837701797485
162.14004182815552
162.4727427959442
162.7996277809143
163.12403988838196
163.45587468147278
163.78063988685608
164.1049838066101
164.4320387840271
164.75450491905212
165.08199882507324
165.41811990737915
165.74925684928894
166.07620882987976
166.40729880332947
166.73563480377197
167.0664267539978
167.40264296531677
167.7349739074707
168.07580876350403
168.4077010154724
168.73512196540833
169.056125

296.0870726108551
296.41547870635986
296.7460596561432
297.07970690727234
297.41475772857666
297.74823570251465
298.0811986923218
298.4161789417267
298.73150992393494
299.06630396842957
299.3978896141052
299.7257788181305
300.0438747406006
300.3740248680115
300.6989939212799
301.0358488559723
301.3657019138336
301.6988468170166
302.02128887176514
302.3448646068573
302.67900681495667
303.00675773620605
303.3348789215088
303.6566698551178
303.9863250255585
304.31085681915283
304.6371657848358
304.99276971817017
305.33501386642456
305.6649327278137
305.9929838180542
306.32133769989014
306.6529688835144
306.98128271102905
307.30122780799866
307.6225378513336
307.96145367622375
308.291446685791
308.63179087638855
308.9576988220215
309.28233671188354
309.61739683151245
309.94508695602417
310.2657558917999
310.58857774734497
310.9179217815399
311.2436068058014
311.5731637477875
311.89596486091614
312.222101688385
312.55108284950256
312.88230299949646
313.20533871650696
313.5308668613434
313.8

In [134]:
pol

Unnamed: 0,coords,Longitude,Latitude,Context,count,4234,6079,8265,9201,10242,...,17987,19544,21005,23911,25187,26512,26861,9063,133,8687
0,"(51.314582, 0.033772)",0.033772,51.314582,0.0,133,20.474870,3828.800728,3030.280790,307.123053,143.324091,...,0,0,0,0,0,0,0,0,0,0
1,"(51.349185, -0.310769)",-0.310769,51.349185,0.0,101,16.375466,3062.212161,2423.568983,245.631992,114.628263,...,0,0,0,0,0,0,0,0,0,0
2,"(51.356217, -0.119475)",-0.119475,51.356217,0.0,164,25.804234,4825.391849,3819.026704,387.063517,180.629641,...,0,0,0,0,0,0,0,0,0,0
3,"(51.360101, -0.193175)",-0.193175,51.360101,0.0,157,24.981351,4671.512664,3697.239969,374.720267,174.869458,...,0,0,0,0,0,0,0,0,0,0
4,"(51.361186, -0.191236)",-0.191236,51.361186,0.0,166,26.404426,4937.627589,3907.854990,396.066384,184.830979,...,0,0,0,0,0,0,0,0,0,0
5,"(51.362295, -0.19195299999999998)",-0.191953,51.362295,0.0,107,17.020917,3182.911469,2519.095708,255.313754,119.146419,...,0,0,0,0,0,0,0,0,0,0
6,"(51.36428, 0.110818)",0.110818,51.364280,0.0,120,18.232091,3409.401026,2698.349475,273.481366,127.624637,...,0,0,0,0,0,0,0,0,0,0
7,"(51.365714000000004, 0.059611000000000004)",0.059611,51.365714,0.0,111,16.992786,3177.650900,2514.932263,254.891783,118.949499,...,0,0,0,0,0,0,0,0,0,0
8,"(51.370607, -0.100093)",-0.100093,51.370607,0.0,148,23.205868,4339.497334,3434.468478,348.088021,162.441077,...,0,0,0,0,0,0,0,0,0,0
9,"(51.371703000000004, -0.099473)",-0.099473,51.371703,0.0,111,17.402070,3254.187149,2575.506406,261.031055,121.814492,...,0,0,0,0,0,0,0,0,0,0


In [137]:
import folium
from folium.plugins import HeatMap

In [138]:
plotdata = list(zip(pol.Latitude,pol.Longitude,pol['8265']))

In [148]:
m = folium.Map(location=[51.45, 0],zoom_start=10)
HeatMap(data = plotdata,radius=8, max_zoom=13).add_to(m)
m