*Read BestFit Data*
- Read best fit spreadsheet
- subset to London Boroughs

*Read Data*
- Read 2011 data 
- Read 2021 data (e.g. census2021-ts001-lsoa.csv)
- Read 2021 geometries (london-2021-lsoa.shp)
    - calculate polygon area

*Merge Data*
- Merge 2011 data to bestfit on LSOA11CD 
- Merge 2021 data to bestfit on LSOA21CD
- Merge 2021 areas on LSOA21CD

*Calc props*
- Create '2021 area prop' column:
    - Where CHGIND is U, set value 1
    - Where CHGIND is M, set value 2
    - Where CHGIND is S: calculate proportion from sum of areas for LSOAs with identical LSOA11CD 

*Calc 2021 merged*
- Create 'merged 2011' column
    - Where CHGIND is U, use original 2011 data value 
    - Where CHGIND is M, use sum of 2011 LSOAs with the LSOA21CD for M
    - Where CHGIND is S, multiply original 2011 data value by '2021 area prop'

In [1]:
from datetime import date
print(f'Last tested: {date.today()}')

Last tested: 2023-08-16


In [2]:
import pyproj
import geopandas as gpd
import shapely
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path

*Read BestFit Data*
- Read best fit spreadsheet
- subset to London Boroughs

In [3]:
census_ipath = Path("../data/inputs/census/")
bestfit = pd.read_csv(census_ipath / "Lookup-ExactFit-LSOA11_to_LSOA21_to_LAD22_EW_Version_2.csv",
                      usecols = list(range(7)))

In [4]:
bestfit.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35796 entries, 0 to 35795
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   LSOA11CD  35796 non-null  object
 1   LSOA11NM  35796 non-null  object
 2   LSOA21CD  35796 non-null  object
 3   LSOA21NM  35796 non-null  object
 4   CHGIND    35796 non-null  object
 5   LAD22CD   35796 non-null  object
 6   LAD22NM   35796 non-null  object
dtypes: object(7)
memory usage: 1.9+ MB


In [5]:
lads = pd.unique(bestfit.LAD22NM)
lads

array(['City of London', 'Barking and Dagenham', 'Bexley', 'Barnet',
       'Brent', 'Bromley', 'Croydon', 'Camden', 'Ealing', 'Enfield',
       'Greenwich', 'Hackney', 'Haringey', 'Hammersmith and Fulham',
       'Hillingdon', 'Harrow', 'Havering', 'Islington', 'Hounslow',
       'Kensington and Chelsea', 'Kingston upon Thames', 'Lambeth',
       'Lewisham', 'Merton', 'Redbridge', 'Newham', 'Sutton', 'Southwark',
       'Richmond upon Thames', 'Tower Hamlets', 'Waltham Forest',
       'Wandsworth', 'Bury', 'Bolton', 'Westminster', 'Manchester',
       'Oldham', 'Rochdale', 'Salford', 'Stockport', 'Trafford', 'Wigan',
       'Tameside', 'Knowsley', 'Liverpool', 'Sefton', 'Wirral',
       'St. Helens', 'Doncaster', 'Barnsley', 'Rotherham', 'Sheffield',
       'Gateshead', 'Newcastle upon Tyne', 'North Tyneside', 'Sunderland',
       'South Tyneside', 'Birmingham', 'Coventry', 'Dudley', 'Sandwell',
       'Solihull', 'Walsall', 'Wolverhampton', 'Bradford', 'Calderdale',
       'Leeds', '

From this we can see London boroughs _are_ listed first (except for Westminster - Bury and Bolton will need to be dropped), so we can get list of borough names relatively easily:

In [6]:
lads = list(lads)
london_lads = lads[:lads.index('Westminster')+1]
london_lads.remove("Bury")
london_lads.remove("Bolton")
print(london_lads)
print(len(london_lads))

['City of London', 'Barking and Dagenham', 'Bexley', 'Barnet', 'Brent', 'Bromley', 'Croydon', 'Camden', 'Ealing', 'Enfield', 'Greenwich', 'Hackney', 'Haringey', 'Hammersmith and Fulham', 'Hillingdon', 'Harrow', 'Havering', 'Islington', 'Hounslow', 'Kensington and Chelsea', 'Kingston upon Thames', 'Lambeth', 'Lewisham', 'Merton', 'Redbridge', 'Newham', 'Sutton', 'Southwark', 'Richmond upon Thames', 'Tower Hamlets', 'Waltham Forest', 'Wandsworth', 'Westminster']
33


In [7]:
bestfit = bestfit[bestfit['LSOA21NM'].str.contains("|".join(london_lads))]   #from https://stackoverflow.com/a/71399966
bestfit = bestfit.copy(deep=False)
bestfit.head()

Unnamed: 0,LSOA11CD,LSOA11NM,LSOA21CD,LSOA21NM,CHGIND,LAD22CD,LAD22NM
0,E01000001,City of London 001A,E01000001,City of London 001A,U,E09000001,City of London
1,E01000002,City of London 001B,E01000002,City of London 001B,U,E09000001,City of London
2,E01000003,City of London 001C,E01000003,City of London 001C,U,E09000001,City of London
3,E01000005,City of London 001E,E01000005,City of London 001E,U,E09000001,City of London
4,E01000006,Barking and Dagenham 016A,E01000006,Barking and Dagenham 016A,U,E09000002,Barking and Dagenham


In [8]:
pd.unique(bestfit.LAD22NM)

array(['City of London', 'Barking and Dagenham', 'Bexley', 'Barnet',
       'Brent', 'Bromley', 'Croydon', 'Camden', 'Ealing', 'Enfield',
       'Greenwich', 'Hackney', 'Haringey', 'Hammersmith and Fulham',
       'Hillingdon', 'Harrow', 'Havering', 'Islington', 'Hounslow',
       'Kensington and Chelsea', 'Kingston upon Thames', 'Lambeth',
       'Lewisham', 'Merton', 'Redbridge', 'Newham', 'Sutton', 'Southwark',
       'Richmond upon Thames', 'Tower Hamlets', 'Waltham Forest',
       'Wandsworth', 'Westminster', 'Brentwood'], dtype=object)

In [9]:
bestfit = bestfit[~bestfit['LSOA21NM'].str.contains('Brentwood')]

In [10]:
pd.unique(bestfit.LAD22NM)

array(['City of London', 'Barking and Dagenham', 'Bexley', 'Barnet',
       'Brent', 'Bromley', 'Croydon', 'Camden', 'Ealing', 'Enfield',
       'Greenwich', 'Hackney', 'Haringey', 'Hammersmith and Fulham',
       'Hillingdon', 'Harrow', 'Havering', 'Islington', 'Hounslow',
       'Kensington and Chelsea', 'Kingston upon Thames', 'Lambeth',
       'Lewisham', 'Merton', 'Redbridge', 'Newham', 'Sutton', 'Southwark',
       'Richmond upon Thames', 'Tower Hamlets', 'Waltham Forest',
       'Wandsworth', 'Westminster'], dtype=object)

*Read Data*
- Read 2011 data 
- Read 2021 data (e.g. census2021-ts001-lsoa.csv)
- Read 2021 geometries (london-2021-lsoa.shp)
    - calculate polygon area

In [11]:
ts11 = pd.read_csv(census_ipath / "KS101EWDATA06.CSV")

In [12]:
ts11.head()

Unnamed: 0,GeographyCode,KS101EW0001,KS101EW0002,KS101EW0003,KS101EW0004,KS101EW0005,KS101EW0006,KS101EW0007,KS101EW0008,KS101EW0009,KS101EW0010,KS101EW0011,KS101EW0012
0,E01000001,1465,767,698,1465,0,21,12.98,112.865948,52.354949,47.645051,100.0,0.0
1,E01000002,1436,767,669,1436,0,22,22.84,62.872154,53.412256,46.587744,100.0,0.0
2,E01000003,1346,714,632,1250,96,12,5.91,227.749577,53.046062,46.953938,92.867756,7.132244
3,E01000005,985,528,457,985,0,5,18.96,51.951477,53.604061,46.395939,100.0,0.0
4,E01000006,1703,866,837,1699,4,16,14.66,116.166439,50.851439,49.148561,99.76512,0.23488


In [13]:
ts11 = ts11[['GeographyCode', 'KS101EW0001']]
ts11 = ts11.set_axis(['LSOA11CD', 'TotalRes11'], axis=1)

In [14]:
ts11.head()

Unnamed: 0,LSOA11CD,TotalRes11
0,E01000001,1465
1,E01000002,1436
2,E01000003,1346
3,E01000005,985
4,E01000006,1703


In [15]:
ts21 = pd.read_csv(census_ipath / "census2021-ts001-lsoa.csv")

In [16]:
ts21.head()

Unnamed: 0,date,geography,geography code,Residence type: Total; measures: Value,Residence type: Lives in a household; measures: Value,Residence type: Lives in a communal establishment; measures: Value
0,2021,Hartlepool 001A,E01011954,2284,2284,0
1,2021,Hartlepool 001B,E01011969,1344,1344,0
2,2021,Hartlepool 001C,E01011970,1070,1070,0
3,2021,Hartlepool 001D,E01011971,1323,1323,0
4,2021,Hartlepool 001F,E01033465,1955,1955,0


In [17]:
ts21 = ts21[['geography code', 'Residence type: Total; measures: Value']]
ts21 = ts21.set_axis(['LSOA21CD', 'TotalRes21'], axis=1)

In [18]:
ts21.head()

Unnamed: 0,LSOA21CD,TotalRes21
0,E01011954,2284
1,E01011969,1344
2,E01011970,1070
3,E01011971,1323
4,E01033465,1955


In [19]:
census_gpath = Path("../data/geographies/census/")
boundaries = gpd.read_file(census_gpath / "london-2021-lsoa.shp").set_index('LSOA21NM')

In [20]:
print(boundaries.crs)

epsg:27700


In [21]:
boundaries['sqkm21'] = boundaries['geometry'].area / 10**6

In [22]:
boundaries.head()

Unnamed: 0_level_0,LSOA21CD,MSOA21CD,MSOA21NM,LAD22CD,LAD22NM,geometry,sqkm21
LSOA21NM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
City of London 001A,E01000001,E02000001,City of London 001,E09000001,City of London,"POLYGON ((532151.537 181867.433, 532152.500 18...",0.129865
City of London 001B,E01000002,E02000001,City of London 001,E09000001,City of London,"POLYGON ((532634.497 181926.016, 532632.048 18...",0.22842
City of London 001C,E01000003,E02000001,City of London 001,E09000001,City of London,"POLYGON ((532153.703 182165.155, 532158.250 18...",0.059054
City of London 001E,E01000005,E02000001,City of London 001,E09000001,City of London,"POLYGON ((533619.062 181402.364, 533639.868 18...",0.189578
Barking and Dagenham 016A,E01000006,E02000017,Barking and Dagenham 016,E09000002,Barking and Dagenham,"POLYGON ((545126.852 184310.838, 545145.213 18...",0.146537


In [23]:
areas = boundaries[['LSOA21CD', 'sqkm21']]

*Merge Data*
- Merge 2011 data to bestfit on LSOA11CD 
- Merge 2021 data to bestfit on LSOA21CD
- Merge 2021 areas on LSOA21CD

In [24]:
merge_bestfit = pd.merge(bestfit, ts11, how='left', on='LSOA11CD')

In [25]:
merge_bestfit = pd.merge(merge_bestfit, ts21, how='left', on='LSOA21CD')

In [26]:
merge_bestfit = pd.merge(merge_bestfit, areas, how='left', on='LSOA21CD')

*Calc props*
- Create '2021 area prop' column:
    - Where CHGIND is U, set value 1
    - Where CHGIND is M, set value 2
    - Where CHGIND is S: calculate proportion from sum of areas for LSOAs with identical LSOA11CD 

In [27]:
merge_bestfit['prop21'] = merge_bestfit['sqkm21'] / merge_bestfit.groupby('LSOA11CD')['sqkm21'].transform('sum')

In [28]:
merge_bestfit.loc[merge_bestfit['CHGIND'] == 'M', 'prop21'] = 2

In [29]:
census_opath = Path("../data/census/")
merge_bestfit.to_csv(census_opath / "merge_bestfit.csv")

*Calc 2021 merged*
- Create 'merged 2011' column
    - Where CHGIND is U, use original 2011 data value 
    - Where CHGIND is M, use sum of 2011 LSOAs with the LSOA21CD for M
    - Where CHGIND is S, multiply original 2011 data value by '2021 area prop'