# Finding regions for each LSOA

Match LSOAs to the regions that contain them.

The final file will contain this information:

| Column name | Description | Usage |
| --- | --- | --- | 
| LSOA11 CD / NM | LSOA 2011 codes / names | England & Wales |
| long / lat | Longitude and latitude of the centroid of the LSOA | England & Wales |
| CCG19 CD / NM | Clinical Commissioning Groups 2019 codes / names | England |
| STP19 CD / NM | Sustainability and Transformation Partnerships codes / names (~similar to counties) | England |
| LHB20 CD / NM / NMW | Local Health Boards 2020 codes / names / Welsh names | Wales |
| RGN11 CD / NM | Region codes / names. Gives specific regions within England, but Wales and Scotland just get the country name. | England & Wales |
| LAD17 CD / NM | Local Authority District 2017 codes / names | England & Wales |
| SCN17 CD/NM | Strategic Clinical Network 2017 codes / names | England |

To match the LSOAs to regions, there are a number of files from the Office for National Statistics that allow cross-matching.

| File | Provides |
| --- | --- |
| LSOA (2011) to Clinical Commissioning Groups to Sustainability and Transformation Partnerships (April 2019) Lookup in England | CCG 2019 names and codes, STP 2019 names and codes |
| Output Areas (2011) to Local Health Boards (December 2020) Lookup in Wales | OA 2011 codes, LHB codes, names, and Welsh names |
| Output Area to LSOA to MSOA to Local Authority District (December 2017) Lookup with Area Classifications in Great Britain | OA 2011 codes, LSOA 2011 codes and names, region codes and names, LAD codes and names |
| Local Authority District to Strategic Clinical Network (December 2017) Lookup in England | SCN 2017 codes, names |
| (geojson) LSOA (Dec 2011) Boundaries Super Generalised Clipped (BSC) EW V3 | LSOA centroids in longitude / latitude |

Two files are used for matching the Welsh LSOAs with region names because one file links LSOA to OA and the other links OA to region names.

Two files are used for matching the LSOAs with SCNs because one file links LSOA to LAD and the other links LAD to SCN.

## Setup

In [1]:
# For handling the tabular data:
import pandas as pd

# For importing geojson:
import json

In [None]:
dir_tabular = '../data_tabular/'
dir_ons_tabular = '../data_tabular/ons_data/'
dir_ons_geojson = '../data_geojson/ons_data/'

## Import data files

### LSOA names used in the matrix:

In [2]:
df_travel_matrix = pd.read_csv(f'{dir_tabular}lsoa_travel_time_matrix_calibrated.csv')

In [3]:
LSOA11NM = df_travel_matrix['LSOA']

LSOA11NM

0        Adur 001A
1        Adur 001B
2        Adur 001C
3        Adur 001D
4        Adur 001E
           ...    
34747    York 024B
34748    York 024C
34749    York 024D
34750    York 024E
34751    York 024F
Name: LSOA, Length: 34752, dtype: object

### English CCGs and STPs from LSOA:

In [4]:
df_lsoa_ccg_stp = pd.read_csv(f'{dir_ons_tabular}LSOA_(2011)_to_Clinical_Commissioning_Groups_to_Sustainability_and_Transformation_Partnerships_(April_2019)_Lookup_in_England.csv')

df_lsoa_ccg_stp.head()

Unnamed: 0,FID,LSOA11CD,LSOA11NM,CCG19CD,CCG19CDH,CCG19NM,STP19CD,STP19NM,LAD19CD,LAD19NM
0,1,E01010650,Bradford 027B,E38000019,02R,NHS Bradford Districts CCG,E54000005,West Yorkshire and Harrogate (Health and Care ...,E08000032,Bradford
1,2,E01010651,Bradford 032B,E38000019,02R,NHS Bradford Districts CCG,E54000005,West Yorkshire and Harrogate (Health and Care ...,E08000032,Bradford
2,3,E01010652,Bradford 026A,E38000019,02R,NHS Bradford Districts CCG,E54000005,West Yorkshire and Harrogate (Health and Care ...,E08000032,Bradford
3,4,E01010653,Bradford 026B,E38000019,02R,NHS Bradford Districts CCG,E54000005,West Yorkshire and Harrogate (Health and Care ...,E08000032,Bradford
4,5,E01010654,Bradford 027C,E38000019,02R,NHS Bradford Districts CCG,E54000005,West Yorkshire and Harrogate (Health and Care ...,E08000032,Bradford


In [5]:
# LSOAs:
LSOA11CD_England = df_lsoa_ccg_stp['LSOA11CD']
LSOA11NM_England = df_lsoa_ccg_stp['LSOA11NM']
# CCGs:
CCG19CD_England = df_lsoa_ccg_stp['CCG19CD']
CCG19NM_England = df_lsoa_ccg_stp['CCG19NM']
# STPs:
STP19CD_England = df_lsoa_ccg_stp['STP19CD']
STP19NM_England = df_lsoa_ccg_stp['STP19NM']

### Welsh OAs and LHBs:

In [6]:
df_oa_lhb = pd.read_csv(f'{dir_ons_tabular}Output_Areas_(2011)_to_Local_Health_Boards_(December_2020)_Lookup_in_Wales.csv')

df_oa_lhb.head()

Unnamed: 0,FID,OA11CD,LHB20CD,LHB20NM,LHB20NMW
0,1,W00000052,W11000023,Betsi Cadwaladr University Health Board,Bwrdd Iechyd Prifysgol Betsi Cadwaladr
1,2,W00000104,W11000023,Betsi Cadwaladr University Health Board,Bwrdd Iechyd Prifysgol Betsi Cadwaladr
2,3,W00000053,W11000023,Betsi Cadwaladr University Health Board,Bwrdd Iechyd Prifysgol Betsi Cadwaladr
3,4,W00000054,W11000023,Betsi Cadwaladr University Health Board,Bwrdd Iechyd Prifysgol Betsi Cadwaladr
4,5,W00000105,W11000023,Betsi Cadwaladr University Health Board,Bwrdd Iechyd Prifysgol Betsi Cadwaladr


In [7]:
# Output areas:
OA11CD_Wales = df_oa_lhb['OA11CD']
# Local Health Boards:
LHB20CD_Wales = df_oa_lhb['LHB20CD']
LHB20NM_Wales = df_oa_lhb['LHB20NM']
LHB20NMW_Wales = df_oa_lhb['LHB20NMW']

### English and Welsh

+ LSOAs to regions
+ LSOAs to OAs (for use with the other Welsh OA and LHB file)

In [8]:
df_oa_lsoa_region = pd.read_csv(f'{dir_ons_tabular}Output_Area_to_LSOA_to_MSOA_to_Local_Authority_District_(December_2017)_Lookup_with_Area_Classifications_in_Great_Britain.csv')

df_oa_lsoa_region.head()

Unnamed: 0,OA11CD,OAC11CD,OAC11NM,LSOA11CD,LSOA11NM,SOAC11CD,SOAC11NM,MSOA11CD,MSOA11NM,LAD17CD,LAD17NM,LACCD,LACNM,RGN11CD,RGN11NM,CTRY11CD,CTRY11NM,FID
0,E00060343,7d1,Ageing Communities and Families,E01011966,Hartlepool 006B,5b,Aspiring urban households,E02002488,Hartlepool 006,E06000001,Hartlepool,6a2r,Mining Legacy,E12000001,North East,E92000001,England,1
1,E00174083,7d1,Ageing Communities and Families,E01011974,Hartlepool 005B,4b,Constrained renters,E02002487,Hartlepool 005,E06000001,Hartlepool,6a2r,Mining Legacy,E12000001,North East,E92000001,England,2
2,E00060349,6a4,Ageing in Suburbia,E01011965,Hartlepool 006A,8b,Ageing suburbanites,E02002488,Hartlepool 006,E06000001,Hartlepool,6a2r,Mining Legacy,E12000001,North East,E92000001,England,3
3,E00060418,6a4,Ageing in Suburbia,E01011983,Hartlepool 006C,8a,Affluent communities,E02002488,Hartlepool 006,E06000001,Hartlepool,6a2r,Mining Legacy,E12000001,North East,E92000001,England,4
4,E00060255,8c1,Ageing Industrious Workers,E01011950,Hartlepool 008A,4a,Challenged white communities,E02002490,Hartlepool 008,E06000001,Hartlepool,6a2r,Mining Legacy,E12000001,North East,E92000001,England,5


In [9]:
# Output areas:
OA11CD_EnglandWales = df_oa_lsoa_region['OA11CD']
# LSOAs:
LSOA11CD_EnglandWales = df_oa_lsoa_region['LSOA11CD']
LSOA11NM_EnglandWales = df_oa_lsoa_region['LSOA11NM']
# Regions
RGN11CD_EnglandWales = df_oa_lsoa_region['RGN11CD']
RGN11NM_EnglandWales = df_oa_lsoa_region['RGN11NM']
# LADs:
LAD17CD_EnglandWales = df_oa_lsoa_region['LAD17CD']
LAD17NM_EnglandWales = df_oa_lsoa_region['LAD17NM']

### English LAD to SCN

Local Authority District (e.g. Leicester) to Strategic Clinical Network (e.g. East Midlands). 

In [10]:
df_lad_scn = pd.read_csv(f'{dir_ons_tabular}Local_Authority_District_to_Strategic_Clinical_Network_(December_2017)_Lookup_in_England_.csv')

df_lad_scn.head()

Unnamed: 0,LAD17CD,LAD17NM,SCN17CD,SCN17NM,FID
0,E07000201,Forest Heath,E55000006,East of England,1
1,E07000202,Ipswich,E55000006,East of England,2
2,E07000203,Mid Suffolk,E55000006,East of England,3
3,E07000204,St Edmundsbury,E55000006,East of England,4
4,E07000205,Suffolk Coastal,E55000006,East of England,5


In [11]:
# LADs:
LAD17CD_England = df_lad_scn['LAD17CD']
LAD17NM_England = df_lad_scn['LAD17NM']
# SCNs:
SCN17CD_England = df_lad_scn['SCN17CD']
SCN17NM_England = df_lad_scn['SCN17NM']

### Geojson for coordinates

In [12]:
with open(f'{dir_ons_geojson}LSOA_(Dec_2011)_Boundaries_Super_Generalised_Clipped_(BSC)_EW_V3.geojson') as f:
    geojson_ew = json.load(f)

In [13]:
big_geojson_order = []
LSOA_lats = []
LSOA_longs = []

for i in range(len(geojson_ew['features'])):
    big_geojson_order.append(geojson_ew['features'][i]['properties']['LSOA11CD'])
    LSOA_lats.append(geojson_ew['features'][i]['properties']['LAT'])
    LSOA_longs.append(geojson_ew['features'][i]['properties']['LONG'])

## Create new dataframe

Create a new dataframe that we'll put everything into:

In [14]:
# Fill these lists with info from all of England and Wales
# in the same order as our starting LSOA list.
LSOA11CD = []
LSOA11LONG = []
LSOA11LAT = []
CCG19CD = []
CCG19NM = []
STP19CD = []
STP19NM = []
LHB20CD = []
LHB20NM = []
LHB20NMW = []
LAD17CD = []
LAD17NM = []
SCN17CD = []
SCN17NM = []
RGN11CD = []
RGN11NM = []

for LSOA in LSOA11NM:
    # First find where this name is in the England and Wales file:
    i3 = LSOA11NM_EnglandWales == LSOA

    try:
        # Pull out the matching LSOA code and region name/code.
        # If the i3 index is invalid, the following lines will fail
        # and the "except" clause will run.
        LSOA_code = LSOA11CD_EnglandWales[i3].iloc[0]
        region_name = RGN11NM_EnglandWales[i3].iloc[0]
        region_code = RGN11CD_EnglandWales[i3].iloc[0]
        LAD_code = LAD17CD_EnglandWales[i3].iloc[0]
        LAD_name = LAD17NM_EnglandWales[i3].iloc[0]
        
        # Find longitude and latitude from the geojson data:
        i5 = big_geojson_order.index(LSOA_code)
        LSOA_long = LSOA_longs[i5]
        LSOA_lat = LSOA_lats[i5]
    except (IndexError, TypeError):
        # Unexpectedly, this LSOA isn't in the file.
        print(f'Problem with {LSOA}')
        # Set everything to placeholder values:
        LSOA_code = ''
        LSOA_long = 0.0
        LSOA_lat = 0.0
        CCG_code = ''
        CCG_name = ''
        STP_code = ''
        STP_name = ''
        LHB_code = ''
        LHB_name = ''
        LHB_nameW = ''
        region_code = ''
        region_name = ''
        LAD_code = ''
        LAD_name = ''
        SCN_code = ''
        SCN_name = ''

    if region_name == 'Wales':
        # This LSOA is in Wales.
        # Find the Output Area code:
        OA_code = OA11CD_EnglandWales[i3].iloc[0]
        # Find where this OA is in the LHB file:
        i2 = OA11CD_Wales == OA_code
        # Find the LHB info for this OA:
        LHB_code = LHB20CD_Wales[i2].iloc[0]
        LHB_name = LHB20NM_Wales[i2].iloc[0]
        LHB_nameW = LHB20NMW_Wales[i2].iloc[0]

        # Set the English variables to placeholder values:
        CCG_code = ''
        CCG_name = ''
        STP_code = ''
        STP_name = ''
        SCN_code = ''
        SCN_name = ''

    elif region_name != '':
        # This LSOA is in England.
        # Find where the LSOA is in the CCG/STP file:
        i1 = LSOA11NM_England == LSOA
        # Find the info for this LSOA:
        CCG_code = CCG19CD_England[i1].iloc[0]
        CCG_name = CCG19NM_England[i1].iloc[0]
        STP_code = STP19CD_England[i1].iloc[0]
        STP_name = STP19NM_England[i1].iloc[0]
        
        # Find where the LAD is in the LAD/SCN file:
        i4 = LAD17CD_England == LAD_code
        SCN_code = SCN17CD_England[i4].iloc[0]
        SCN_name = SCN17NM_England[i4].iloc[0]

        # Set the Welsh variables to placeholder values:
        LHB_code = ''
        LHB_name = ''
        LHB_nameW = ''

    # Update the various lists:
    # LSOA code:
    LSOA11CD.append(LSOA_code)
    # LSOA coordinates:
    LSOA11LONG.append(LSOA_long)
    LSOA11LAT.append(LSOA_lat)
    # CCGs:
    CCG19CD.append(CCG_code)
    CCG19NM.append(CCG_name)
    # STPs:
    STP19CD.append(STP_code)
    STP19NM.append(STP_name)
    # LHBs:
    LHB20CD.append(LHB_code)
    LHB20NM.append(LHB_name)
    LHB20NMW.append(LHB_nameW)
    # LADs:
    LAD17CD.append(LAD_code)
    LAD17NM.append(LAD_name)
    # SCNs:
    SCN17CD.append(SCN_code)
    SCN17NM.append(SCN_name)
    # Regions:
    RGN11CD.append(region_code)
    RGN11NM.append(region_name)

Add these lists to the final dataframe:

In [15]:
df_regions = pd.DataFrame()

df_regions['LSOA11CD'] = LSOA11CD
df_regions['LSOA11NM'] = LSOA11NM
df_regions['LSOA11LONG'] = LSOA11LONG
df_regions['LSOA11LAT'] = LSOA11LAT
df_regions['CCG19CD'] = CCG19CD
df_regions['CCG19NM'] = CCG19NM
df_regions['STP19CD'] = STP19CD
df_regions['STP19NM'] = STP19NM
df_regions['LHB20CD'] = LHB20CD
df_regions['LHB20NM'] = LHB20NM
df_regions['LHB20NMW'] = LHB20NMW
df_regions['LAD17CD'] = LAD17CD
df_regions['LAD17NM'] = LAD17NM
df_regions['SCN17CD'] = SCN17CD
df_regions['SCN17NM'] = SCN17NM
df_regions['RGN11CD'] = RGN11CD
df_regions['RGN11NM'] = RGN11NM

df_regions.head()

Unnamed: 0,LSOA11CD,LSOA11NM,LSOA11LONG,LSOA11LAT,CCG19CD,CCG19NM,STP19CD,STP19NM,LHB20CD,LHB20NM,LHB20NMW,LAD17CD,LAD17NM,SCN17CD,SCN17NM,RGN11CD,RGN11NM
0,E01031349,Adur 001A,-0.22737,50.83651,E38000213,NHS Coastal West Sussex CCG,E54000033,Sussex and East Surrey,,,,E07000223,Adur,E55000010,South East Coast,E12000008,South East
1,E01031350,Adur 001B,-0.22842,50.84244,E38000213,NHS Coastal West Sussex CCG,E54000033,Sussex and East Surrey,,,,E07000223,Adur,E55000010,South East Coast,E12000008,South East
2,E01031351,Adur 001C,-0.253,50.85845,E38000213,NHS Coastal West Sussex CCG,E54000033,Sussex and East Surrey,,,,E07000223,Adur,E55000010,South East Coast,E12000008,South East
3,E01031352,Adur 001D,-0.23812,50.8429,E38000213,NHS Coastal West Sussex CCG,E54000033,Sussex and East Surrey,,,,E07000223,Adur,E55000010,South East Coast,E12000008,South East
4,E01031370,Adur 001E,-0.24649,50.83958,E38000213,NHS Coastal West Sussex CCG,E54000033,Sussex and East Surrey,,,,E07000223,Adur,E55000010,South East Coast,E12000008,South East


Save the dataframe to file:

In [16]:
df_regions.to_csv('LSOA_regions.csv', index=False)