In [1]:
import pandas as pd
import geopandas as gpd

from params import DATA_DIR

Function to load a geojson file by type.

In [2]:
def load_geojson(type):
  path = f'{DATA_DIR}/geo/{type}.geojson'
  key = type.upper() + 'CD'
  data = gpd.read_file(path)
  if type == 'panrgn22':
      key = 'RGN21CD'
      data.loc[data.RGN21NM == 'The North', 'RGN21CD'] = 'E12999901'
  data.index = pd.Index(data[key], name='ons_code')
  data['ons_code'] = data.index.to_series()
  data['ons_type'] = key
  return data

Load all the required types into a single `GeoDataFrame`.

In [3]:
data = pd.concat([load_geojson(type) for type in ['panrgn22', 'rgn22', 'cauth22', 'cty22', 'lad22', 'wd22']])

Check for and remove duplicates

In [4]:
data[data.index.duplicated()]

Unnamed: 0_level_0,RGN21CD,RGN21NM,geometry,ons_code,ons_type,RGN22CD,RGN22NM
ons_code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1


In [5]:
data = data[~data.index.duplicated()]

Save to a parquet file for later processing. Could have written to a geojson file as follows:

```py
with open('places.geojson', 'w') as f:
    f.write(data[['ons_code', 'ons_type', 'geometry']].to_json())
```

In [6]:
data[['ons_code', 'ons_type', 'geometry']].to_parquet(f'{DATA_DIR}/interim/shapes.parquet')