---

# Crime <a name="crime"></a>

These datasets (crime & AHAH) are no longer used in the choropleth visualisation, and so code cells have been converted to markdown

In [16]:
# all of my data for crime is stored across 44 datasets per month, with 12 months of 2020 equalling 528 datasets in total

# for each dataset I will be extracting the number of crimes commited at each LSOA code
# the most natural ways of doing this in my opinion would be to 
# a) load all tables, convert to dataframes, append all dataframes into one, drop all unwanted columns, 
# then sum by LSOA... or...
# b) load all tables, convert to dataframes, for each column: i) drop unwanted columns , ii) sum by LSOA, iii) append all
# remaining dataframes, iv) sum by LSOA

# looks like a) is going to be the simplest, so lets go for it. Of course the most challenging step is to find a way to 
# automate the loading and conversion of 

import os

rootdir = wdir + '\datasets\Police_Data'

li = []

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        file = os.path.join(subdir, file)
        df = pd.read_csv(file, index_col=None, header=0)
        li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

frame.columns

frame.drop(columns = ['Crime ID', 'Month', 'Reported by', 'Falls within', 'Longitude',
       'Latitude', 'Location', 'Last outcome category', 'LSOA name', 'Context'], inplace = True)
frame

crime_by_LSOA = (frame.groupby('LSOA code').count())

crime_by_LSOA.rename(columns = {'Crime type' : 'Crime Count'}, inplace = True)

crime_by_LSOA

#### Dealing with outliers

boxplot = crime_by_LSOA.reset_index().plot(kind='box', x='LSOA code', y='Crime Count', figsize=(10, 6))

The above boxplot shows that the median number of crimes per LSOA is around 100, with the maximum at roughly 450. Extending beyond 450 crimes in a given LSOA is classified as an outlier, and represented on the plot as a black circle.

How many outliers are there?

boxplot = mpl.pyplot.boxplot(crime_by_LSOA)
#extracting L-estimators 
median = np.median(crime_by_LSOA)
whiskers = [item.get_ydata()[1] for item in boxplot['whiskers']]
minimum = whiskers[0]
maximum = whiskers[1]

print(median)
print(minimum)
print(maximum)

#number of LSOA crime counts defined as outliers
len(crime_by_LSOA[crime_by_LSOA['Crime Count'] > 433])

In order to deal with the extremities of this dataset, the counts can be split into 10 equal sized groups.

bin_labels = list(range(1, 11))
crime_by_LSOA['Tier'] = pd.qcut(crime_by_LSOA['Crime Count'], q=10, precision=0, labels=bin_labels)
crime_by_LSOA

We can see how it's broken down between the groups, and how many occupy each tier:

tiercount = pd.DataFrame(crime_by_LSOA['Tier'].value_counts()).reset_index()
tiercount.rename(columns = {'index' : 'Tier', 'Tier' : 'count'}, inplace = True)

results, bin_edges = pd.qcut(crime_by_LSOA['Crime Count'],
                             q=10, precision=0,
                             labels=bin_labels, retbins=True)

results = pd.DataFrame(zip(bin_edges, bin_labels),
                            columns=['Threshold', 'Tier'])

results_table = (results.merge(tiercount, how = 'outer', on = 'Tier')).set_index('Tier')

display(results_table)

# Community Health and Engagement <a name="CHaE"></a>

ahahcsv = wdir + '\datasets\AccessToHealthyAssets&Hazards.csv'

AHAH = pd.read_csv(ahahcsv, index_col=None, header=0)

AHAH.columns

AHAH.rename(columns = {'lsoa11' : 'LSOA code', 'd_ahah' : 'AHAH Decile'}, inplace = True)
AHAH.set_index('LSOA code')

AHAHselected = AHAH.drop(columns = ['r_rank', 'h_rank', 'g_rank', 'e_rank', 'r_exp', 'h_exp',
       'g_exp', 'e_exp', 'ahah', 'r_ahah', 'r_dec', 'h_dec', 'g_dec',
       'e_dec'])

display(AHAHselected)

pseudo_community_strength = AHAH.merge(crime_by_LSOA, how = 'outer', on = 'LSOA code')

pseudo_community_strength_selected = AHAHselected.merge(crime_by_LSOA, how = 'outer', on = 'LSOA code')

pseudo_community_strength_selected

### Preparing data to be mapped


...continued on 'choropleth_mapping.ipynb'

In [None]:
#datamerged = pd.merge(data, shapes, left_index = True, right_index = True, how = 'inner')
#datamerged.to_csv('selecteddata_shapes.csv')

the below cell creates a single dataframe with all useful indices and the polygon shapefile corresponding to each present LSOA code

In [None]:
data = pd.DataFrame(selecteddata)
data.set_index('LSOA code', inplace = True)
data = data[~data.index.str.contains("W", "S")]

In [None]:
shapes = pd.DataFrame(json_codes, json_shapes, columns = ['LSOA']).reset_index()
shapes.rename(columns = {'index' : 'shapefile'}, inplace = True)
shapes.set_index('LSOA', inplace = True)

In [None]:
json_codes = []
json_shapes =[]
for i, item in enumerate(lsoa_json['features']):
    json_codes.append(lsoa_json['features'][i]['properties']['LSOA11CD'])
    json_shapes.append(lsoa_json['features'][i]['geometry'])

In [None]:
with open(lsoa_boundaries) as lsoa_file:
    lsoa_json = json.load(lsoa_file)

In [None]:
data = wdir + '\datasets\selecteddata.csv'
selecteddata = pd.read_csv(data)
lsoa_boundaries = wdir + '\datasets\LSOA-2011-GeoJSON\lsoa.geojson'