# NomisWeb

It was a massive PITA tracking down these codes...

- TYPE150: 2021 output areas within England and Wales
- TYPE151: 2021 super output areas - lower layer within England and Wales
- TYPE152: 2021 super output areas - middle layer within England and Wales
- TYPE153: 2022 wards within England and Wales
- TYPE154: 2022 local authorities: districts within England and Wales
- TYPE155: 2022 local authorities: counties within England and Wales
- TYPE168: 2021 national parks within England and Wales
- TYPE423: local authorities: county / unitary (as of April 2023) within England and Wales
- TYPE424: local authorities: district / unitary (as of April 2023) within England and Wales
- TYPE459: local enterprise partnerships (as of April 2021) within England and Wales
- TYPE480: regions within England and Wales


In [38]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [49]:
from pathlib import Path

import numpy as np
import pandas as pd

from nomisweb import FieldMetadata, TableMetadata, build_geog_query, fetch, fetch_table
from utils import extract_crime_data

In [None]:
table_name = "NM_2041_1"  # random 2021 census table - age, ethnicity, sex

top_level_geogs = FieldMetadata(**fetch(f"dataset/{table_name}/geography.def.sdmx.json"))
top_level_geogs.to_dataframe()

Unnamed: 0_level_0,NomisCode,TypeName,TypeCode
GeogCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
K04000001,2092957703,countries,499
E92000001,2092957699,countries,499
W92000004,2092957700,countries,499


In [None]:
# list the available geography types
ew_geog_types = FieldMetadata(**fetch(f"dataset/{table_name}/geography/2092957703.def.sdmx.json"))
ew_geog_types.to_dataframe()

Unnamed: 0_level_0,NomisCode,TypeName,TypeCode,IsAbstractCode,ParentCode,ChildCount
GeogCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
K04000001,2092957703,countries,499,,,
,2092957703TYPE150,2021 output areas,150,True,2092958000.0,188880.0
,2092957703TYPE151,2021 super output areas - lower layer,151,True,2092958000.0,35672.0
,2092957703TYPE152,2021 super output areas - middle layer,152,True,2092958000.0,7264.0
,2092957703TYPE153,2022 wards,153,True,2092958000.0,7638.0
,2092957703TYPE154,2022 local authorities: districts,154,True,2092958000.0,331.0
,2092957703TYPE155,2022 local authorities: counties,155,True,2092958000.0,174.0
,2092957703TYPE168,2021 national parks,168,True,2092958000.0,13.0
,2092957703TYPE423,local authorities: county / unitary (as of Apr...,423,True,2092958000.0,175.0
,2092957703TYPE424,local authorities: district / unitary (as of A...,424,True,2092958000.0,318.0


In [None]:
# list specific geographies of a given type
ew_lads = FieldMetadata(**fetch(f"dataset/{table_name}/geography/2092957703TYPE154.def.sdmx.json"))
ew_lads.to_dataframe()

Unnamed: 0_level_0,NomisCode,TypeName,TypeCode
GeogCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
E06000001,645922819,2022 local authorities: districts,154
E06000002,645922820,2022 local authorities: districts,154
E06000003,645922822,2022 local authorities: districts,154
E06000004,645922823,2022 local authorities: districts,154
E06000005,645922817,2022 local authorities: districts,154
...,...,...,...
W06000020,645923145,2022 local authorities: districts,154
W06000021,645923146,2022 local authorities: districts,154
W06000022,645923147,2022 local authorities: districts,154
W06000023,645923132,2022 local authorities: districts,154


In [None]:
# codelist endpoint will give every supported geography for a given table (cached as its large)
geog_df = Path("./data/census2021geographies.parquet")
if not geog_df.exists():
    # seems like codelist endpoints dont like api keys
    all_geogs = FieldMetadata(**fetch("codelist/CL_2041_1_GEOGRAPHY.def.sdmx.json")).to_dataframe()
    all_geogs.to_parquet(geog_df)
else:
    all_geogs = pd.read_parquet(geog_df)

all_geogs  # .TypeName.value_counts()

Unnamed: 0_level_0,NomisCode,TypeName,TypeCode
GeogCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
E00060274,629202434,2021 output areas,150
E00060275,629202435,2021 output areas,150
E00060276,629202436,2021 output areas,150
E00060277,629202437,2021 output areas,150
E00060279,629202439,2021 output areas,150
...,...,...,...
E12000009,2013265929,regions,480
W92000004,2013265930,regions,480
K04000001,2092957703,countries,499
E92000001,2092957699,countries,499


## Get table metadata

In [50]:
metadata = TableMetadata(**fetch(f"dataset/{table_name}.def.sdmx.json"))

# table info
for a in metadata.structure.keyfamilies.keyfamily[0].annotations.annotation:
    print(a.annotationtitle, a.annotationtext)

Status Current (being actively updated)
Keywords Sex,Age,Ethnic group
Units Persons
contenttype/sources census_2021_rm
contenttype/geoglevel oa2021,lsoa2021,msoa2021,la2021,ward2021
SubDescription All usual residents
Mnemonic c2021rm032
MetadataTitle0 About this dataset
MetadataText0 This dataset provides Census 2021 estimates that classify usual residents in England and Wales by ethnic group, by sex, and by age. The estimates are as at Census Day, 21 March 2021.
MetadataTitle1 Protecting personal data
MetadataText1 Sometimes we need to make changes to data if it is possible to identify individuals. This is known as statistical disclosure control. In Census 2021, we:

* Swapped records (targeted record swapping), for example, if a household was likely to be identified in datasets because it has unusual characteristics, we swapped the record with a similar one from a nearby small area. Very unusual households could be swapped with one in a nearby local authority.
* Added small changes t

In [51]:
fields = {
    a.conceptref: {"codelist": a.codelist} for a in metadata.structure.keyfamilies.keyfamily[0].components.dimension
}
fields

{'GEOGRAPHY': {'codelist': 'CL_2132_1_GEOGRAPHY'},
 'C2021_ETH_20': {'codelist': 'CL_2132_1_C2021_ETH_20'},
 'C2021_AGE_6': {'codelist': 'CL_2132_1_C2021_AGE_6'},
 'C_SEX': {'codelist': 'CL_2132_1_C_SEX'},
 'MEASURES': {'codelist': 'CL_2132_1_MEASURES'},
 'FREQ': {'codelist': 'CL_2132_1_FREQ'}}

In [None]:
# get metadata for each field
for field_name, metadata in fields.items():
    if field_name == "GEOGRAPHY":
        continue

    metadata["values"] = FieldMetadata(**fetch(f"codelist/{metadata['codelist']}.def.sdmx.json")).to_dataframe()

In [None]:
fields["C2021_ETH_20"]["values"]

Unnamed: 0_level_0,Description,ChildCount,FirstChildCode,FirstChildTypeCode,IsTotal,IsDefault,TypeName,TypeCode,Level,isDerived,DisplayName
NomisCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0,Total,5.0,1001.0,1000000.0,True,True,Ethnic group,1000000,0,,
1001,"Asian, Asian British or Asian Welsh",5.0,1.0,1000000.0,,,Ethnic group,1000000,1,True,
1,"Asian, Asian British or Asian Welsh: Bangladeshi",,,,,,Ethnic group,1000000,2,,Bangladeshi
2,"Asian, Asian British or Asian Welsh: Chinese",,,,,,Ethnic group,1000000,2,,Chinese
3,"Asian, Asian British or Asian Welsh: Indian",,,,,,Ethnic group,1000000,2,,Indian
4,"Asian, Asian British or Asian Welsh: Pakistani",,,,,,Ethnic group,1000000,2,,Pakistani
5,"Asian, Asian British or Asian Welsh: Other Asian",,,,,,Ethnic group,1000000,2,,Other Asian
1002,"Black, Black British, Black Welsh, Caribbean o...",3.0,6.0,1000000.0,,,Ethnic group,1000000,1,True,
6,"Black, Black British, Black Welsh, Caribbean o...",,,,,,Ethnic group,1000000,2,,African
7,"Black, Black British, Black Welsh, Caribbean o...",,,,,,Ethnic group,1000000,2,,Caribbean


## Get some crime data and the LSOAs they occur in 

In [14]:
crime_data = extract_crime_data("./data/wy202204-202503.zip")

In [None]:
available_lsoas = FieldMetadata(
    **fetch(f"dataset/{table_name}/geography/2092957703TYPE151.def.sdmx.json")
).to_dataframe()["GeogCode"]
lsoas = crime_data["LSOA code"].unique()
# FFS crime data has some 2011 LSOAs
lsoas = np.intersect1d(lsoas, available_lsoas)
lsoas

array(['E01005410', 'E01005414', 'E01005448', ..., 'E01035052',
       'E01035053', 'E01035054'], shape=(1433,), dtype=object)

In [99]:
table_name = "NM_2132_1"
nomis_area_codes = available_lsoas[available_lsoas.isin(lsoas)].index.to_list()

# TODO? select only using codes and use metadata as lookup?
selections = ",".join(
    (
        "GEOGRAPHY_CODE",
        *(f"{field}_NAME" for field in fields if field not in ["GEOGRAPHY", "FREQ", "MEASURES"]),
        "OBS_VALUE",
    )
)

params = {
    "date": "latest",
    "geography": build_geog_query(nomis_area_codes),
    "c2021_eth_20": "1001...1005",
    "c2021_age_6": "1...5",
    "c_sex": "1,2",
    "select": selections,
}

data = fetch_table(table_name, **params)

In [101]:
data  # .C2021_ETH_20_NAME.unique()

Unnamed: 0,GEOGRAPHY_CODE,C2021_ETH_20_NAME,C2021_AGE_6_NAME,C_SEX_NAME,OBS_VALUE
0,E01005410,"Asian, Asian British or Asian Welsh",Aged 24 years and under,Female,1
1,E01005410,"Asian, Asian British or Asian Welsh",Aged 24 years and under,Male,1
2,E01005410,"Asian, Asian British or Asian Welsh",Aged 25 to 34 years,Female,0
3,E01005410,"Asian, Asian British or Asian Welsh",Aged 25 to 34 years,Male,0
4,E01005410,"Asian, Asian British or Asian Welsh",Aged 35 to 49 years,Female,4
...,...,...,...,...,...
71645,E01035054,Other ethnic group,Aged 35 to 49 years,Male,13
71646,E01035054,Other ethnic group,Aged 50 to 64 years,Female,2
71647,E01035054,Other ethnic group,Aged 50 to 64 years,Male,1
71648,E01035054,Other ethnic group,Aged 65 years and over,Female,0


In [88]:
# e.g. compare proportion of Black people in community to stop-and-search incidences

data["is_black"] = data.C2021_ETH_20_NAME.str.contains("Black")
lsoa_totals = data.groupby(["GEOGRAPHY_CODE", "GEOGRAPHY_NAME", "is_black"]).OBS_VALUE.sum().unstack(level="is_black")

proportion = lsoa_totals.apply(lambda r: r[True] / r.sum(), axis=1)
proportion

GEOGRAPHY_CODE  GEOGRAPHY_NAME 
E01005410       Oldham 006A        0.000668
E01005414       Oldham 006C        0.001498
E01005448       Oldham 012A        0.057352
E01005561       Rochdale 014D      0.004954
E01006881       St. Helens 012E    0.010497
                                     ...   
E01035050       Leeds 105G         0.090965
E01035051       Leeds 105H         0.045877
E01035052       Leeds 105I         0.037216
E01035053       Leeds 105J         0.016110
E01035054       Leeds 112F         0.039875
Length: 1433, dtype: float64