# Stroke unit locations and services

## Data sources

Office for National Statistics:

+ Boundaries (shapes) file: `LSOA_(Dec_2011)_Boundaries_Super_Generalised_Clipped_(BSC)_EW_V3.geojson`

Our custom data:

+ Source unknown: `stroke_hospitals_2022.csv`
+ Made in `regions.ipynb` notebook in this folder: `regions_lsoa_ew.csv`, `regions_ew.csv`

Made in this notebook:
+ `unit_postcodes_coords.csv`



## Notebook setup

In [1]:
import pandas as pd
import geopandas
import os

dir_geojson = '../../data_geojson'
dir_data = 'data_input'
dir_ons = 'ons_data'

file_unit_services = 'stroke_hospitals_2022.csv'
file_unit_coords = 'unit_postcodes_coords.csv'
file_lsoa_geojson = 'LSOA_(Dec_2011)_Boundaries_Super_Generalised_Clipped_(BSC)_EW_V3.geojson'
file_regions = 'regions_ew.csv'
file_lsoa_regions = 'regions_lsoa_ew.csv'

file_output = 'stroke_units_regions.csv'

## Services

This information is contained in another file made elsewhere. (I don't know the origin, sorry.)

In [2]:
path_to_unit_services = os.path.join(dir_data, file_unit_services)

df_unit_services = pd.read_csv(path_to_unit_services)

In [3]:
df_unit_services.head().T

Unnamed: 0,0,1,2,3,4
Postcode,RM70AG,E11BB,SW66SX,SE59RW,BR68ND
Hospital_name,RM70AG,E11BB,SW66SX,SE59RW,BR68ND
Use_IVT,1,1,1,1,1
Use_MT,1,1,1,1,0
Use_MSU,1,1,1,1,0
Country,England,England,England,England,England
Strategic Clinical Network,London SCN,London SCN,London SCN,London SCN,London SCN
Health Board / Trust,Barking,Barts Health NHS Trust,Imperial College Healthcare NHS Trust,King's College Hospital NHS Foundation Trust,King's College Hospital NHS Foundation Trust
Stroke Team,Havering and Redbridge University Hospitals N...,The Royal London Hospital,"Charing Cross Hospital, London","King's College Hospital, London",Princess Royal University Hospital
SSNAP name,Queens Hospital Romford HASU,Royal London Hospital HASU,Charing Cross Hospital HASU,King's College Hospital HASU,Princess Royal University Hospital HASU


Useful data:
+ Postcode
+ Use_IVT / Use_MT / Use_MSU
+ Stroke Team
+ SSNAP name

The following coordinates data...
+ Easting, Northing, long, lat

... is useful but we're instead using the "locations" data below.

In [4]:
cols_to_keep = ['Postcode', 'Use_IVT', 'Use_MT', 'Use_MSU', 'Stroke Team', 'SSNAP name']

df_unit_services = df_unit_services[cols_to_keep]

In [5]:
df_unit_services.head()

Unnamed: 0,Postcode,Use_IVT,Use_MT,Use_MSU,Stroke Team,SSNAP name
0,RM70AG,1,1,1,Havering and Redbridge University Hospitals N...,Queens Hospital Romford HASU
1,E11BB,1,1,1,The Royal London Hospital,Royal London Hospital HASU
2,SW66SX,1,1,1,"Charing Cross Hospital, London",Charing Cross Hospital HASU
3,SE59RW,1,1,1,"King's College Hospital, London",King's College Hospital HASU
4,BR68ND,1,0,0,Princess Royal University Hospital,Princess Royal University Hospital HASU


## Locations

The services file already contains the location information Easting, Northing, Latitude, Longitude, but if it hadn't then we could have used the following method.

Take the following list of postcodes of stroke units...

RM70AG
E11BB
SW66SX
SE59RW
BR68ND
HA13UJ
SW170QT
NW12BU
DE223NE
NN15BD
NG72UH
NG51PB
NG174JL
LN25QY
PE219QS
LE15WW
SS165NL
MK429DJ
CB20QQ
CO45JL
SG14AB
IP45PD
NR316LA
LU40DZ
CM17ET
NR47UY
PE39GZ
PE304ET
SS00RY
WD180HB
IP332QZ
DE130RB
DY12HQ
CV107DJ
B95SS
WV100QP
B714HJ
TF16TF
CV345BW
B152TH
CV22DX
ST46QG
WS29PS
WR51DD
HR12ER
L97AL
CH21UL
CW14QJ
L78XP
PR86PN
L355DR
WA51QG
CH495PE
FY38NR
BB23HH
PR29HT
BL97TD
M68HD
SK27JE
LA144LF
LA14RP
SR47TP
DH15TW
NE96SX
NE14LP
CA27HY
CA288JG
TS198PE
NE236NZ
TS43BW
NE340PL
S752EP
BD96RJ
HX30PW
S445BL
DN25LT
HG27SX
HU32JZ
LS13EX
WF14DG
DN157BH
S602UD
S102JF
YO318HE
KT160PZ
RH164EX
BN25BE
DA28DA
CT13NG
CT94AN
TN240LZ
BN212UD
KT187EG
GU167UJ
ME169QQ
TN24QJ
ME75NY
GU27XX
RH15RH
PO196SE
BN112DH
GL13NN
SN36BB
BS105NB
EX314JB
PL68DH
TR13LQ
EX25DW
BA13NG
SP28BJ
TA15DA
TQ27AA
BS28HW
BS234TQ
BA214AT
HP112TT
SL24HL
MK65LD
OX169AL
OX39DU
RG15AN
DT12JY
SO225DG
PO305TG
BH152JB
PO63LY
BH77DW
SO166YD
NP448YN
LL185UJ
LL137TD
LL572PW
CF144XW
CF479DT
CF311RQ
SY231ER
SA148QF
SA312AF
SA612PZ
SA66NL
NP202UB

... and pass it into [this website](https://gridreferencefinder.com/postcodeBatchConverter/). For each postcode, this generates its Grid Reference, Easting, Northing, Latitute and Longitude.

Save the output to a .csv and import it here:

In [6]:
path_to_unit_coords = os.path.join(dir_data, file_unit_coords)

df_unit_locations = pd.read_csv(path_to_unit_coords)

In [7]:
df_unit_locations.head()

Unnamed: 0,Postcode,Description,Grid Reference,X (easting),Y (northing),Latitude,Longitude
0,RM70AG,RM70AG,TQ 51110 87777,551110,187777,51.568622,0.178914
1,E11BB,E11BB,TQ 34833 81799,534833,181799,51.519026,-0.058075
2,SW66SX,SW66SX,TQ 24227 76487,524227,176487,51.473716,-0.212722
3,SE59RW,SE59RW,TQ 32536 76228,532536,176228,51.469505,-0.093252
4,BR68ND,BR68ND,TQ 43478 64974,543478,164974,51.365713,0.059625


Useful data:
+ Postcode
+ X (easting)
+ Y (northing)
+ Latitude
+ Longitude

## Regions

### Link units to LSOA

We already have data linking LSOAs to larger regions. To use that here, we first need to link each stroke unit location to the LSOA containing it.

Data source: Open Geography Portal, LSOA boundaries. `LSOA_(Dec_2011)_Boundaries_Super_Generalised_Clipped_(BSC)_EW_V3.geojson`

Load in the LSOA shapes:

In [8]:
path_to_lsoa_geojson = os.path.join(dir_geojson, file_lsoa_geojson)

gdf_lsoa = geopandas.read_file(path_to_lsoa_geojson)

# Convert the coordinate system to British National Grid:
gdf_lsoa = gdf_lsoa.to_crs('EPSG:27700')

In [9]:
gdf_lsoa.head().T

Unnamed: 0,0,1,2,3,4
OBJECTID,1,2,3,4,5
LSOA11CD,E01000001,E01000002,E01000003,E01000005,E01000006
LSOA11NM,City of London 001A,City of London 001B,City of London 001C,City of London 001E,Barking and Dagenham 016A
LSOA11NMW,City of London 001A,City of London 001B,City of London 001C,City of London 001E,Barking and Dagenham 016A
BNG_E,532129,532480,532245,533581,544994
BNG_N,181625,181699,182036,181265,184276
LONG,-0.09706,-0.09197,-0.09523,-0.07628,0.089318
LAT,51.5181,51.51868,51.52176,51.51452,51.53876
Shape__Area,157794.481079,164882.427628,42219.805717,212682.404259,130551.387161
Shape__Length,1685.391778,1804.828196,909.223277,2028.654904,1716.896118


Next convert the unit locations to a GeoDataFrame.

We use crs (coordinate reference system) EPSG:27700. This is the British National Grid. By definition, Easting and Northing use this grid.

In [10]:
gdf_unit_locations = df_unit_locations.copy()

x = gdf_unit_locations['X (easting)']
y = gdf_unit_locations['Y (northing)']

gdf_unit_locations['geometry'] = geopandas.points_from_xy(x, y)

In [11]:
gdf_unit_locations = geopandas.GeoDataFrame(
    gdf_unit_locations, geometry='geometry', crs='EPSG:27700')

In [12]:
gdf_unit_locations

Unnamed: 0,Postcode,Description,Grid Reference,X (easting),Y (northing),Latitude,Longitude,geometry
0,RM70AG,RM70AG,TQ 51110 87777,551110,187777,51.568622,0.178914,POINT (551110.000 187777.000)
1,E11BB,E11BB,TQ 34833 81799,534833,181799,51.519026,-0.058075,POINT (534833.000 181799.000)
2,SW66SX,SW66SX,TQ 24227 76487,524227,176487,51.473716,-0.212722,POINT (524227.000 176487.000)
3,SE59RW,SE59RW,TQ 32536 76228,532536,176228,51.469505,-0.093252,POINT (532536.000 176228.000)
4,BR68ND,BR68ND,TQ 43478 64974,543478,164974,51.365713,0.059625,POINT (543478.000 164974.000)
...,...,...,...,...,...,...,...,...
136,SA148QF,SA148QF,SN 52459 01367,252459,201367,51.691612,-4.135974,POINT (252459.000 201367.000)
137,SA312AF,SA312AF,SN 42792 21240,242792,221240,51.867524,-4.284723,POINT (242792.000 221240.000)
138,SA612PZ,SA612PZ,SM 95710 16835,195710,216835,51.812716,-4.965072,POINT (195710.000 216835.000)
139,SA66NL,SA66NL,SN 66311 00206,266311,200206,51.684652,-3.935252,POINT (266311.000 200206.000)


Now find which LSOA polygon each unit point lies in.

In [13]:
gdf_units_lsoa = gdf_lsoa.sjoin(gdf_unit_locations)

In [14]:
gdf_units_lsoa.head().T

Unnamed: 0,557,736,836,1859,2197
OBJECTID,558,737,837,1860,2198
LSOA11CD,E01000568,E01000751,E01000854,E01001906,E01002248
LSOA11NM,Brent 008D,Bromley 036C,Camden 026D,Hammersmith and Fulham 022C,Havering 017C
LSOA11NMW,Brent 008D,Bromley 036C,Camden 026D,Hammersmith and Fulham 022C,Havering 017C
BNG_E,516615,543288,529410,524352,550391
BNG_N,187769,164244,182122,176663,187661
LONG,-0.31851,0.056632,-0.13604,-0.21083,0.168526
LAT,51.57673,51.3592,51.52319,51.47527,51.56777
Shape__Area,819395.407837,2088044.694969,184407.264679,65715.775612,562284.406448
Shape__Length,3514.756622,6307.16394,1755.087262,1123.749426,3669.243855


Check that this worked using some stroke units we're familiar with:

In [15]:
postcodes_to_check = ['EX314JB', 'PL68DH', 'TR13LQ', 'EX25DW', 'TQ27AA']
mask_check = gdf_units_lsoa['Postcode'].isin(postcodes_to_check)

gdf_units_lsoa[['Postcode', 'LSOA11NM']][mask_check].T

Unnamed: 0,14636,14794,18316,19458,19577
Postcode,PL68DH,TQ27AA,TR13LQ,EX25DW,EX314JB
LSOA11NM,Plymouth 005C,Torbay 003C,Cornwall 043C,Exeter 013C,North Devon 007C


Useful data:
+ LSOA11CD
+ LSOA11NM

In [16]:
df_units_lsoa = gdf_units_lsoa[['Postcode', 'LSOA11NM', 'LSOA11CD']]

## Link units to health areas

Next we will link each stroke unit to the larger regions containing it.

Load in the data linking LSOA with SICBL and LHB:

In [17]:
path_to_lsoa_regions = os.path.join('.', file_lsoa_regions)

df_lsoa_regions = pd.read_csv(path_to_lsoa_regions)

In [18]:
df_lsoa_regions.head()

Unnamed: 0,lsoa,lsoa_code,region,region_code,region_type
0,Halton 007A,E01012367,NHS Cheshire and Merseyside ICB - 01F,E38000068,SICBL
1,Halton 003A,E01012368,NHS Cheshire and Merseyside ICB - 01F,E38000068,SICBL
2,Halton 005A,E01012369,NHS Cheshire and Merseyside ICB - 01F,E38000068,SICBL
3,Halton 007B,E01012370,NHS Cheshire and Merseyside ICB - 01F,E38000068,SICBL
4,Halton 016A,E01012371,NHS Cheshire and Merseyside ICB - 01F,E38000068,SICBL


Keep only the information for LSOA containing stoke units:

In [19]:
# df_units_regions = 
df_units_regions = pd.merge(
    df_units_lsoa, df_lsoa_regions,
    left_on='LSOA11CD', right_on='lsoa_code', how='left'
)
df_units_regions = df_units_regions.drop(['LSOA11NM', 'LSOA11CD'], axis='columns')

In [20]:
df_units_regions.head()

Unnamed: 0,Postcode,lsoa,lsoa_code,region,region_code,region_type
0,HA13UJ,Brent 008D,E01000568,NHS North West London ICB - W2U3Z,E38000256,SICBL
1,BR68ND,Bromley 036C,E01000751,NHS South East London ICB - 72Q,E38000244,SICBL
2,NW12BU,Camden 026D,E01000854,NHS North Central London ICB - 93C,E38000240,SICBL
3,SW66SX,Hammersmith and Fulham 022C,E01001906,NHS North West London ICB - W2U3Z,E38000256,SICBL
4,RM70AG,Havering 017C,E01002248,NHS North East London ICB - A3A8R,E38000255,SICBL


Load in the data linking SICBL and LHB with larger regions:

In [21]:
path_to_regions = os.path.join('.', file_regions)

df_regions = pd.read_csv(path_to_regions)

In [22]:
df_regions.head()

Unnamed: 0,region,region_code,region_type,country,icb,icb_code,isdn
0,NHS Cheshire and Merseyside ICB - 01F,E38000068,SICBL,England,NHS Cheshire and Merseyside Integrated Care Board,E54000008,Cheshire and Merseyside
1,NHS Cheshire and Merseyside ICB - 02E,E38000194,SICBL,England,NHS Cheshire and Merseyside Integrated Care Board,E54000008,Cheshire and Merseyside
2,NHS Cheshire and Merseyside ICB - 27D,E38000233,SICBL,England,NHS Cheshire and Merseyside Integrated Care Board,E54000008,Cheshire and Merseyside
3,NHS Cheshire and Merseyside ICB - 01J,E38000091,SICBL,England,NHS Cheshire and Merseyside Integrated Care Board,E54000008,Cheshire and Merseyside
4,NHS Cheshire and Merseyside ICB - 99A,E38000101,SICBL,England,NHS Cheshire and Merseyside Integrated Care Board,E54000008,Cheshire and Merseyside


Merge in the larger region data:

In [23]:
df_units_regions = pd.merge(
    df_units_regions,
    df_regions.drop(['region', 'region_type'], axis='columns'),
    on='region_code', how='left'
)

In [24]:
df_units_regions.head()

Unnamed: 0,Postcode,lsoa,lsoa_code,region,region_code,region_type,country,icb,icb_code,isdn
0,HA13UJ,Brent 008D,E01000568,NHS North West London ICB - W2U3Z,E38000256,SICBL,England,NHS North West London Integrated Care Board,E54000027,London
1,BR68ND,Bromley 036C,E01000751,NHS South East London ICB - 72Q,E38000244,SICBL,England,NHS South East London Integrated Care Board,E54000030,London
2,NW12BU,Camden 026D,E01000854,NHS North Central London ICB - 93C,E38000240,SICBL,England,NHS North Central London Integrated Care Board,E54000028,London
3,SW66SX,Hammersmith and Fulham 022C,E01001906,NHS North West London ICB - W2U3Z,E38000256,SICBL,England,NHS North West London Integrated Care Board,E54000027,London
4,RM70AG,Havering 017C,E01002248,NHS North East London ICB - A3A8R,E38000255,SICBL,England,NHS North East London Integrated Care Board,E54000029,London


## Link services and regions

In [25]:
df_units = pd.merge(df_unit_services, df_units_regions, on='Postcode')

In [26]:
df_units.head()

Unnamed: 0,Postcode,Use_IVT,Use_MT,Use_MSU,Stroke Team,SSNAP name,lsoa,lsoa_code,region,region_code,region_type,country,icb,icb_code,isdn
0,RM70AG,1,1,1,Havering and Redbridge University Hospitals N...,Queens Hospital Romford HASU,Havering 017C,E01002248,NHS North East London ICB - A3A8R,E38000255,SICBL,England,NHS North East London Integrated Care Board,E54000029,London
1,E11BB,1,1,1,The Royal London Hospital,Royal London Hospital HASU,Tower Hamlets 017A,E01004322,NHS North East London ICB - A3A8R,E38000255,SICBL,England,NHS North East London Integrated Care Board,E54000029,London
2,SW66SX,1,1,1,"Charing Cross Hospital, London",Charing Cross Hospital HASU,Hammersmith and Fulham 022C,E01001906,NHS North West London ICB - W2U3Z,E38000256,SICBL,England,NHS North West London Integrated Care Board,E54000027,London
3,SE59RW,1,1,1,"King's College Hospital, London",King's College Hospital HASU,Lambeth 014C,E01003076,NHS South East London ICB - 72Q,E38000244,SICBL,England,NHS South East London Integrated Care Board,E54000030,London
4,BR68ND,1,0,0,Princess Royal University Hospital,Princess Royal University Hospital HASU,Bromley 036C,E01000751,NHS South East London ICB - 72Q,E38000244,SICBL,England,NHS South East London Integrated Care Board,E54000030,London


Add a new column for the pathway.

In [27]:
df_units['Transfer unit postcode'] = 'nearest'

Change the order of the columns:

In [28]:
cols_order = [
    'Postcode', 'Stroke Team', 'SSNAP name',
    'Use_IVT', 'Use_MT', 'Use_MSU', 'Transfer unit postcode',
    'lsoa', 'lsoa_code', 'region', 'region_code', 'region_type',
    'country', 'icb', 'icb_code', 'isdn'
]
df_units = df_units[cols_order]

Rename columns:

In [29]:
cols_dict = dict()
for col in df_units.columns:
    # Change to lower case:
    new_name = col.casefold()
    # Replace spaces with underscores:
    new_name = new_name.replace(' ', '_')
    # Add to the dictionary:
    cols_dict[col] = new_name

df_units = df_units.rename(columns=cols_dict)

In [30]:
df_units.head(2).T

Unnamed: 0,0,1
postcode,RM70AG,E11BB
stroke_team,Havering and Redbridge University Hospitals N...,The Royal London Hospital
ssnap_name,Queens Hospital Romford HASU,Royal London Hospital HASU
use_ivt,1,1
use_mt,1,1
use_msu,1,1
transfer_unit_postcode,nearest,nearest
lsoa,Havering 017C,Tower Hamlets 017A
lsoa_code,E01002248,E01004322
region,NHS North East London ICB - A3A8R,NHS North East London ICB - A3A8R


Save this file:

In [31]:
df_units.to_csv(file_output, index=False)