- The geography tree is as follows: LADs/Met counties/counties/combined authorities => Regions => England/Wales/Scotland/NI. 
- Lookups for the most recent names and codes of these geographies have been downloaded into `data/lookups`.
- To generate a list of active geographies we combined all of these lookups into a single file with the column titles `geography_code` and `geography_name`. This is temporarily stored in `metadata/temp`.
- Each data set may contain some/all or none of these geographies. Per dataset, we iterate through each file's unique geographies and check if they are in the list of active codes. If they are not, we add them to a list of inactive codes stored in `metadata/temp`.
- For all geographies we determine when data was first and last published per dataset.
- This is stored in a `JSON` file in `src/data/areas/place-page/_data/metadata.json` and used to generate the site.

Import modules and set up paths for reading and writing files

In [7]:
from pathlib import Path
import pandas as pd
ROOT = Path('../..')
ROOT.resolve()

PosixPath('/Users/lukestrange/Code/housing')

In [8]:
frame = pd.DataFrame()
paths = [
    "metadata/lookups/Local_Authority_Districts_(April_2023)_Names_and_Codes_in_the_United_Kingdom.csv", 
    "metadata/lookups/Metropolitan_Counties_(December_2023)_Names_and_Codes_in_EN.csv", 
    "metadata/lookups/Regions_(December_2023)_Names_and_Codes_in_EN.csv",
    "metadata/lookups/Combined_Authorities_(May_2024)_Names_and_Codes_in_England.csv",
    "metadata/lookups/Counties_(April_2023)_Names_and_Codes_in_EN.csv"
    ]
for path in paths:
    data = pd.read_csv(ROOT / path)
    code_name = data.columns[data.columns.str.endswith('CD')].values[0]
    geo_name = data.columns[data.columns.str.endswith('NM')].values[0]
    data.rename(columns={f'{code_name}': 'geography_code', f'{geo_name}': 'geography_name'}, inplace=True)
    data = data[['geography_code', 'geography_name']]
    frame = pd.concat([frame, data])

frame = frame[~frame['geography_code'].str.startswith(('W', 'S', 'N'))]
frame.reset_index(inplace=True, drop=True)
frame['active'] = 'true'
frame.set_index('geography_code', inplace=True)
frame.to_json(ROOT / 'metadata/temp/active_geographies.json', orient='index', indent=4)

len(frame.geography_name.unique())

336

In [9]:
p = pd.read_csv(ROOT / 'data/vacant-homes/absolute.csv')
p = p[p.Year == max(p.Year)]

In [10]:
len(p.AreaCode.unique())
for i in p.AreaCode.unique():
    if i not in frame.index.unique():
        print(i) # This is a list of inactive codes that were in the vacant-homes data.

E06000028
E06000029
E07000004
E07000005
E07000006
E07000007
E07000026
E07000027
E07000028
E07000029
E07000030
E07000031
E07000048
E07000049
E07000050
E07000051
E07000052
E07000053
E07000150
E07000151
E07000152
E07000153
E07000154
E07000155
E07000156
E07000163
E07000164
E07000165
E07000166
E07000167
E07000168
E07000169
E07000187
E07000188
E07000189
E07000190
E07000191
E07000201
E07000204
E07000205
E07000206
E07000246
E10000002
E10000006
E10000009
E10000021
E10000023
E10000027
E11000004
E92000001
