# Finding regions for each LSOA

Match LSOAs to the regions that contain them.

The final file will contain this information:

| Column name | Description | Usage |
| --- | --- | --- | 
| LSOA11 CD / NM | LSOA 2011 codes / names | England & Wales |
| long / lat | Longitude and latitude of the centroid of the LSOA | England & Wales |
| CCG19 CD / NM | Clinical Commissioning Groups 2019 codes / names | England |
| ICB22 CD / NM | Integrated Care Board 2022 codes / names. Replacement for CCGs. | England |
| STP19 CD / NM | Sustainability and Transformation Partnerships codes / names (~similar to counties) | England |
| LHB20 CD / NM / NMW | Local Health Boards 2020 codes / names / Welsh names | Wales |
| RGN11 CD / NM | Region codes / names. Gives specific regions within England, but Wales and Scotland just get the country name. | England & Wales |
| LAD17 CD / NM | Local Authority District 2017 codes / names | England & Wales |
| SCN17 CD/NM | Strategic Clinical Network 2017 codes / names | England |

To match the LSOAs to regions, there are a number of files from the Office for National Statistics that allow cross-matching.

| File | Provides |
| --- | --- |
| LSOA (2011) to Clinical Commissioning Groups to Sustainability and Transformation Partnerships (April 2019) Lookup in England | CCG 2019 names and codes, STP 2019 names and codes |
| LSOA (2011) to Sub ICB Locations to Integrated Care Boards to Local Authority Districts (July 2022) Lookup in England | ICB 2022 names and codes |
| Output Areas (2011) to Local Health Boards (December 2020) Lookup in Wales | OA 2011 codes, LHB codes, names, and Welsh names |
| Output Area to LSOA to MSOA to Local Authority District (December 2017) Lookup with Area Classifications in Great Britain | OA 2011 codes, LSOA 2011 codes and names, region codes and names, LAD codes and names |
| Local Authority District to Strategic Clinical Network (December 2017) Lookup in England | SCN 2017 codes, names |
| (geojson) LSOA (Dec 2011) Boundaries Super Generalised Clipped (BSC) EW V3 | LSOA centroids in longitude / latitude |

Two files are used for matching the Welsh LSOAs with region names because one file links LSOA to OA and the other links OA to region names.

Two files are used for matching the LSOAs with SCNs because one file links LSOA to LAD and the other links LAD to SCN.

## Setup

In [1]:
# For handling the tabular data:
import pandas as pd

# For importing geojson:
import geopandas

In [2]:
dir_tabular = '../data_tabular/'
dir_ons_tabular = '../data_tabular/ons_data/'
dir_ons_geojson = '../data_geojson/ons_data/'

## Import data files

### English CCGs and STPs from LSOA:

In [3]:
df_lsoa_ccg_stp = pd.read_csv(f'{dir_ons_tabular}LSOA_(2011)_to_Clinical_Commissioning_Groups_to_Sustainability_and_Transformation_Partnerships_(April_2019)_Lookup_in_England.csv')

df_lsoa_ccg_stp.head()

Unnamed: 0,FID,LSOA11CD,LSOA11NM,CCG19CD,CCG19CDH,CCG19NM,STP19CD,STP19NM,LAD19CD,LAD19NM
0,1,E01010650,Bradford 027B,E38000019,02R,NHS Bradford Districts CCG,E54000005,West Yorkshire and Harrogate (Health and Care ...,E08000032,Bradford
1,2,E01010651,Bradford 032B,E38000019,02R,NHS Bradford Districts CCG,E54000005,West Yorkshire and Harrogate (Health and Care ...,E08000032,Bradford
2,3,E01010652,Bradford 026A,E38000019,02R,NHS Bradford Districts CCG,E54000005,West Yorkshire and Harrogate (Health and Care ...,E08000032,Bradford
3,4,E01010653,Bradford 026B,E38000019,02R,NHS Bradford Districts CCG,E54000005,West Yorkshire and Harrogate (Health and Care ...,E08000032,Bradford
4,5,E01010654,Bradford 027C,E38000019,02R,NHS Bradford Districts CCG,E54000005,West Yorkshire and Harrogate (Health and Care ...,E08000032,Bradford


### English ICBs from LSOA:

In [4]:
df_lsoa_icb = pd.read_csv(f'{dir_ons_tabular}LSOA11_LOC22_ICB22_LAD22_EN_LU.csv')

df_lsoa_icb.head()

Unnamed: 0,LSOA11CD,LSOA11NM,LOC22CD,LOC22CDH,LOC22NM,ICB22CD,ICB22CDH,ICB22NM,LAD22CD,LAD22NM
0,E01012367,Halton 007A,E38000068,01F,NHS Cheshire and Merseyside ICB - 01F,E54000008,QYG,NHS Cheshire and Merseyside Integrated Care Board,E06000006,Halton
1,E01012368,Halton 003A,E38000068,01F,NHS Cheshire and Merseyside ICB - 01F,E54000008,QYG,NHS Cheshire and Merseyside Integrated Care Board,E06000006,Halton
2,E01012369,Halton 005A,E38000068,01F,NHS Cheshire and Merseyside ICB - 01F,E54000008,QYG,NHS Cheshire and Merseyside Integrated Care Board,E06000006,Halton
3,E01012370,Halton 007B,E38000068,01F,NHS Cheshire and Merseyside ICB - 01F,E54000008,QYG,NHS Cheshire and Merseyside Integrated Care Board,E06000006,Halton
4,E01012371,Halton 016A,E38000068,01F,NHS Cheshire and Merseyside ICB - 01F,E54000008,QYG,NHS Cheshire and Merseyside Integrated Care Board,E06000006,Halton


### Welsh OAs and LHBs:

In [5]:
df_oa_lhb = pd.read_csv(f'{dir_ons_tabular}Output_Areas_(2011)_to_Local_Health_Boards_(December_2020)_Lookup_in_Wales.csv')

df_oa_lhb.head()

Unnamed: 0,FID,OA11CD,LHB20CD,LHB20NM,LHB20NMW
0,1,W00000052,W11000023,Betsi Cadwaladr University Health Board,Bwrdd Iechyd Prifysgol Betsi Cadwaladr
1,2,W00000104,W11000023,Betsi Cadwaladr University Health Board,Bwrdd Iechyd Prifysgol Betsi Cadwaladr
2,3,W00000053,W11000023,Betsi Cadwaladr University Health Board,Bwrdd Iechyd Prifysgol Betsi Cadwaladr
3,4,W00000054,W11000023,Betsi Cadwaladr University Health Board,Bwrdd Iechyd Prifysgol Betsi Cadwaladr
4,5,W00000105,W11000023,Betsi Cadwaladr University Health Board,Bwrdd Iechyd Prifysgol Betsi Cadwaladr


### English, Welsh, and Scottish regions of various sizes

+ LSOAs to regions
+ LSOAs to OAs (for use with the other Welsh OA and LHB file)

In [6]:
df_oa_lsoa_region = pd.read_csv(f'{dir_ons_tabular}Output_Area_to_LSOA_to_MSOA_to_Local_Authority_District_(December_2017)_Lookup_with_Area_Classifications_in_Great_Britain.csv')

df_oa_lsoa_region.head()

Unnamed: 0,OA11CD,OAC11CD,OAC11NM,LSOA11CD,LSOA11NM,SOAC11CD,SOAC11NM,MSOA11CD,MSOA11NM,LAD17CD,LAD17NM,LACCD,LACNM,RGN11CD,RGN11NM,CTRY11CD,CTRY11NM,FID
0,E00060343,7d1,Ageing Communities and Families,E01011966,Hartlepool 006B,5b,Aspiring urban households,E02002488,Hartlepool 006,E06000001,Hartlepool,6a2r,Mining Legacy,E12000001,North East,E92000001,England,1
1,E00174083,7d1,Ageing Communities and Families,E01011974,Hartlepool 005B,4b,Constrained renters,E02002487,Hartlepool 005,E06000001,Hartlepool,6a2r,Mining Legacy,E12000001,North East,E92000001,England,2
2,E00060349,6a4,Ageing in Suburbia,E01011965,Hartlepool 006A,8b,Ageing suburbanites,E02002488,Hartlepool 006,E06000001,Hartlepool,6a2r,Mining Legacy,E12000001,North East,E92000001,England,3
3,E00060418,6a4,Ageing in Suburbia,E01011983,Hartlepool 006C,8a,Affluent communities,E02002488,Hartlepool 006,E06000001,Hartlepool,6a2r,Mining Legacy,E12000001,North East,E92000001,England,4
4,E00060255,8c1,Ageing Industrious Workers,E01011950,Hartlepool 008A,4a,Challenged white communities,E02002490,Hartlepool 008,E06000001,Hartlepool,6a2r,Mining Legacy,E12000001,North East,E92000001,England,5


### English LAD to SCN

Local Authority District (e.g. Leicester) to Strategic Clinical Network (e.g. East Midlands). 

In [7]:
df_lad_scn = pd.read_csv(f'{dir_ons_tabular}Local_Authority_District_to_Strategic_Clinical_Network_(December_2017)_Lookup_in_England_.csv')

df_lad_scn.head()

Unnamed: 0,LAD17CD,LAD17NM,SCN17CD,SCN17NM,FID
0,E07000201,Forest Heath,E55000006,East of England,1
1,E07000202,Ipswich,E55000006,East of England,2
2,E07000203,Mid Suffolk,E55000006,East of England,3
3,E07000204,St Edmundsbury,E55000006,East of England,4
4,E07000205,Suffolk Coastal,E55000006,East of England,5


### Geojson for coordinates

In [8]:
df_geojson = geopandas.read_file(f'{dir_ons_geojson}LSOA_(Dec_2011)_Boundaries_Super_Generalised_Clipped_(BSC)_EW_V3.geojson')

In [9]:
df_geojson.head()

Unnamed: 0,OBJECTID,LSOA11CD,LSOA11NM,LSOA11NMW,BNG_E,BNG_N,LONG,LAT,Shape__Area,Shape__Length,GlobalID,geometry
0,1,E01000001,City of London 001A,City of London 001A,532129,181625,-0.09706,51.5181,157794.481079,1685.391778,b12173a3-5423-4672-a5eb-f152d2345f96,"POLYGON ((-0.09474 51.52060, -0.09546 51.51544..."
1,2,E01000002,City of London 001B,City of London 001B,532480,181699,-0.09197,51.51868,164882.427628,1804.828196,90274dc4-f785-4afb-95cd-7cc1fc9a2cad,"POLYGON ((-0.08810 51.51941, -0.09546 51.51544..."
2,3,E01000003,City of London 001C,City of London 001C,532245,182036,-0.09523,51.52176,42219.805717,909.223277,7e89d0ba-f186-45fb-961c-8f5ffcd03808,"POLYGON ((-0.09453 51.52205, -0.09274 51.52139..."
3,4,E01000005,City of London 001E,City of London 001E,533581,181265,-0.07628,51.51452,212682.404259,2028.654904,a14c307a-874c-4862-828a-3b1486cc21ea,"POLYGON ((-0.07589 51.51590, -0.07394 51.51445..."
4,5,E01000006,Barking and Dagenham 016A,Barking and Dagenham 016A,544994,184276,0.089318,51.53876,130551.387161,1716.896118,65121a2d-3d2b-4935-9712-690f2993cfd2,"POLYGON ((0.09328 51.53787, 0.09363 51.53767, ..."


## Create new dataframe

Merge all of these DataFrames into one combined DataFrame with all of the useful information.

At each merge step, only merge the relevant columns from the right-hand DataFrame to prevent a bunch of repeat columns with suffixes. e.g. when the contents of LSOA11NM are different in the two dataframes because one contains all of England, Wales and Scotland and the other contains only England, then the merge will create two columns LSOA11NM_x and LSOA11NM_y. Generally we want to keep only the column that has complete information.

In [20]:
# Start with this dataframe because it contains all
# English, Welsh, and Scottish LSOAs.
df_regions = df_oa_lsoa_region.copy()
# Reduce number of columns:
cols_to_keep = [
    'LSOA11CD',
    'LSOA11NM',
    'MSOA11CD',
    'MSOA11NM',
    'LAD17CD',
    'LAD17NM',
    'RGN11CD',
    'RGN11NM',
    'CTRY11NM',
    'OA11CD'     # Keep for merging in Welsh LHBs.
    ]
df_regions = df_regions[cols_to_keep]

# Merge in Welsh LHBs:
df_regions = pd.merge(
    df_regions,
    df_oa_lhb[['OA11CD', 'LHB20CD', 'LHB20NM', 'LHB20NMW']],
    left_on='OA11CD', right_on='OA11CD', how='left')
# Remove the Output Area column now that it's served its purpose:
df_regions = df_regions.drop('OA11CD', axis=1)

# Merge in English SCNs:
df_regions = pd.merge(
    df_regions,
    df_lad_scn[['LAD17CD', 'SCN17CD', 'SCN17NM']],
    left_on='LAD17CD', right_on='LAD17CD', how='left'
)

# Merge in CCGs, STPs:
df_regions = pd.merge(
    df_regions,
    df_lsoa_ccg_stp[['LSOA11CD', 'CCG19CD', 'CCG19NM', 'STP19CD', 'STP19NM']],
    left_on='LSOA11CD', right_on='LSOA11CD', how='left'
)

# Merge in ICBs:
df_regions = pd.merge(
    df_regions,
    df_lsoa_icb[['LSOA11CD', 'ICB22CD', 'ICB22NM']],
    left_on='LSOA11CD', right_on='LSOA11CD', how='left'
)

# Merge in geojson data:
df_regions = pd.merge(
    df_regions,
    df_geojson[['LSOA11CD', 'LSOA11NMW', 'BNG_E', 'BNG_N', 'LONG', 'LAT']],
    left_on='LSOA11CD', right_on='LSOA11CD', how='left'
)

In [21]:
df_regions.head(5).T

Unnamed: 0,0,1,2,3,4
LSOA11CD,E01011966,E01011974,E01011965,E01011983,E01011950
LSOA11NM,Hartlepool 006B,Hartlepool 005B,Hartlepool 006A,Hartlepool 006C,Hartlepool 008A
MSOA11CD,E02002488,E02002487,E02002488,E02002488,E02002490
MSOA11NM,Hartlepool 006,Hartlepool 005,Hartlepool 006,Hartlepool 006,Hartlepool 008
LAD17CD,E06000001,E06000001,E06000001,E06000001,E06000001
LAD17NM,Hartlepool,Hartlepool,Hartlepool,Hartlepool,Hartlepool
RGN11CD,E12000001,E12000001,E12000001,E12000001,E12000001
RGN11NM,North East,North East,North East,North East,North East
CTRY11NM,England,England,England,England,England
LHB20CD,,,,,


Limit to England and Wales:

In [14]:
df_regions = df_regions[(
    (df_regions['CTRY11NM'] == 'England') |
    (df_regions['CTRY11NM'] == 'Wales')
    )]

Reorder columns:

In [15]:
cols_order = [
    'LSOA11CD', 'LSOA11NM', 'LSOA11NMW',
    'BNG_E', 'BNG_N', 'LONG', 'LAT',
    'MSOA11CD', 'MSOA11NM', 
    'CCG19CD', 'CCG19NM',
    'ICB22CD', 'ICB22NM',
    'LAD17CD', 'LAD17NM',
    'STP19CD', 'STP19NM',
    'LHB20CD', 'LHB20NM', 'LHB20NMW',
    'SCN17CD', 'SCN17NM',
    'RGN11CD', 'RGN11NM',
    'CTRY11NM',
]
df_regions = df_regions[cols_order]

Rename some columns:

In [16]:
df_regions = df_regions.rename(columns=dict(
    LONG = 'LSOA11LONG',
    LAT = 'LSOA11LAT',
    BNG_E = 'LSOA11BNG_E',
    BNG_N = 'LSOA11BNG_N',
))

Check the results for the first few entries:

In [17]:
df_regions.head(5).T

Unnamed: 0,0,1,2,3,4
LSOA11CD,E01011966,E01011974,E01011965,E01011983,E01011950
LSOA11NM,Hartlepool 006B,Hartlepool 005B,Hartlepool 006A,Hartlepool 006C,Hartlepool 008A
LSOA11NMW,Hartlepool 006B,Hartlepool 005B,Hartlepool 006A,Hartlepool 006C,Hartlepool 008A
LSOA11BNG_E,449564.0,451274.0,449780.0,449025.0,450438.0
LSOA11BNG_N,532844.0,533055.0,532227.0,532832.0,531499.0
LSOA11LONG,-1.23268,-1.20612,-1.22943,-1.24104,-1.21935
LSOA11LAT,54.68822,54.68994,54.68265,54.68816,54.67604
MSOA11CD,E02002488,E02002487,E02002488,E02002488,E02002490
MSOA11NM,Hartlepool 006,Hartlepool 005,Hartlepool 006,Hartlepool 006,Hartlepool 008
CCG19CD,E38000075,E38000075,E38000075,E38000075,E38000075


Save the dataframe to file:

In [18]:
df_regions.to_csv('LSOA_regions.csv', index=False)