# NomisWeb

It was a massive PITA tracking down these codes...

- TYPE150: 2021 output areas within England and Wales
- TYPE151: 2021 super output areas - lower layer within England and Wales
- TYPE152: 2021 super output areas - middle layer within England and Wales
- TYPE153: 2022 wards within England and Wales
- TYPE154: 2022 local authorities: districts within England and Wales
- TYPE155: 2022 local authorities: counties within England and Wales
- TYPE168: 2021 national parks within England and Wales
- TYPE423: local authorities: county / unitary (as of April 2023) within England and Wales
- TYPE424: local authorities: district / unitary (as of April 2023) within England and Wales
- TYPE459: local enterprise partnerships (as of April 2021) within England and Wales
- TYPE480: regions within England and Wales


In [4]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [5]:
from pathlib import Path

import numpy as np
import pandas as pd

from nomisweb import FieldMetadata, TableMetadata, build_geog_query, fetch, fetch_table
from utils import extract_crime_data

In [6]:
table_name = "NM_2041_1"  # random 2021 census table - age, ethnicity, sex

top_level_geogs = FieldMetadata(**fetch(f"dataset/{table_name}/geography.def.sdmx.json"))
top_level_geogs.to_dataframe()

Unnamed: 0_level_0,Description,TypeName,TypeCode,GeogCode
NomisCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2092957703,England and Wales,countries,499,K04000001
2092957699,England,countries,499,E92000001
2092957700,Wales,countries,499,W92000004


In [7]:
# list the available geography types
ew_geog_types = FieldMetadata(**fetch(f"dataset/{table_name}/geography/2092957703.def.sdmx.json"))
ew_geog_types.to_dataframe()

Unnamed: 0_level_0,Description,TypeName,TypeCode,GeogCode,IsAbstractCode,ParentCode,ChildCount
NomisCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2092957703,England and Wales,countries,499,K04000001,,,
2092957703TYPE150,2021 output areas within England and Wales,2021 output areas,150,,True,2092958000.0,188880.0
2092957703TYPE151,2021 super output areas - lower layer within E...,2021 super output areas - lower layer,151,,True,2092958000.0,35672.0
2092957703TYPE152,2021 super output areas - middle layer within ...,2021 super output areas - middle layer,152,,True,2092958000.0,7264.0
2092957703TYPE153,2022 wards within England and Wales,2022 wards,153,,True,2092958000.0,7638.0
2092957703TYPE154,2022 local authorities: districts within Engla...,2022 local authorities: districts,154,,True,2092958000.0,331.0
2092957703TYPE155,2022 local authorities: counties within Englan...,2022 local authorities: counties,155,,True,2092958000.0,174.0
2092957703TYPE168,2021 national parks within England and Wales,2021 national parks,168,,True,2092958000.0,13.0
2092957703TYPE423,local authorities: county / unitary (as of Apr...,local authorities: county / unitary (as of Apr...,423,,True,2092958000.0,175.0
2092957703TYPE424,local authorities: district / unitary (as of A...,local authorities: district / unitary (as of A...,424,,True,2092958000.0,318.0


In [8]:
# list specific geographies of a given type
ew_lads = FieldMetadata(**fetch(f"dataset/{table_name}/geography/2092957703TYPE154.def.sdmx.json"))
ew_lads.to_dataframe()

Unnamed: 0_level_0,Description,TypeName,TypeCode,GeogCode
NomisCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
645922819,Hartlepool,2022 local authorities: districts,154,E06000001
645922820,Middlesbrough,2022 local authorities: districts,154,E06000002
645922822,Redcar and Cleveland,2022 local authorities: districts,154,E06000003
645922823,Stockton-on-Tees,2022 local authorities: districts,154,E06000004
645922817,Darlington,2022 local authorities: districts,154,E06000005
...,...,...,...,...
645923145,Torfaen,2022 local authorities: districts,154,W06000020
645923146,Monmouthshire,2022 local authorities: districts,154,W06000021
645923147,Newport,2022 local authorities: districts,154,W06000022
645923132,Powys,2022 local authorities: districts,154,W06000023


In [9]:
# codelist endpoint will give every supported geography for a given table (cached as its large)
geog_df = Path("./data/census2021geographies.parquet")
if not geog_df.exists():
    # seems like codelist endpoints dont like api keys
    all_geogs = FieldMetadata(**fetch("codelist/CL_2041_1_GEOGRAPHY.def.sdmx.json")).to_dataframe()
    all_geogs.to_parquet(geog_df)
else:
    all_geogs = pd.read_parquet(geog_df)

all_geogs  # .TypeName.value_counts()

Unnamed: 0_level_0,NomisCode,TypeName,TypeCode
GeogCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
E00060274,629202434,2021 output areas,150
E00060275,629202435,2021 output areas,150
E00060276,629202436,2021 output areas,150
E00060277,629202437,2021 output areas,150
E00060279,629202439,2021 output areas,150
...,...,...,...
E12000009,2013265929,regions,480
W92000004,2013265930,regions,480
K04000001,2092957703,countries,499
E92000001,2092957699,countries,499


## Get table metadata

In [10]:
metadata = TableMetadata(**fetch(f"dataset/{table_name}.def.sdmx.json"))

# table info
for a in metadata.structure.keyfamilies.keyfamily[0].annotations.annotation:
    print(a.annotationtitle, a.annotationtext)

Status Current (being actively updated)
Keywords Ethnic group
Units Persons
contenttype/sources census_2021_ts
contenttype/geoglevel oa2021,lsoa2021,msoa2021,la2021,ward2021,oa,msoa,la
SubDescription All usual residents
Mnemonic c2021ts021
FirstReleased 2022-11-29 09:30:00
LastUpdated 2022-11-29 11:30:00
LastRevised 2022-11-29 11:30:00
MetadataTitle0 About this dataset
MetadataText0 This dataset provides Census 2021 estimates that classify usual residents in England and Wales by ethnic group. The estimates are as at Census Day, 21 March 2021.
   
   National Park data are created by plotting unique properties as identified by their Unique Property Reference Number or postcodes into National Park boundaries current at December 2022. This differs from the OA best fit methodology used for other geographic level data.
MetadataTitle1 Protecting personal data
MetadataText1 Sometimes we need to make changes to data if it is possible to identify individuals. This is known as statistical disclo

In [11]:
fields = {
    a.conceptref: {"codelist": a.codelist} for a in metadata.structure.keyfamilies.keyfamily[0].components.dimension
}
fields

{'GEOGRAPHY': {'codelist': 'CL_2041_1_GEOGRAPHY'},
 'C2021_ETH_20': {'codelist': 'CL_2041_1_C2021_ETH_20'},
 'MEASURES': {'codelist': 'CL_2041_1_MEASURES'},
 'FREQ': {'codelist': 'CL_2041_1_FREQ'}}

In [12]:
# get metadata for each field
for field_name, metadata in fields.items():
    if field_name == "GEOGRAPHY":
        continue

    metadata["values"] = FieldMetadata(**fetch(f"codelist/{metadata['codelist']}.def.sdmx.json")).to_dataframe()

In [13]:
fields["C2021_ETH_20"]["values"]

Unnamed: 0_level_0,Description,ChildCount,FirstChildCode,FirstChildTypeCode,IsTotal,IsDefault,TypeName,TypeCode,Level,isDerived,DisplayName
NomisCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0,Total: All usual residents,5.0,1001.0,1000000.0,True,True,Ethnic group,1000000,0,,
1001,"Asian, Asian British or Asian Welsh",5.0,12.0,1000000.0,,,Ethnic group,1000000,1,True,
12,"Asian, Asian British or Asian Welsh: Bangladeshi",,,,,,Ethnic group,1000000,2,,Bangladeshi
13,"Asian, Asian British or Asian Welsh: Chinese",,,,,,Ethnic group,1000000,2,,Chinese
10,"Asian, Asian British or Asian Welsh: Indian",,,,,,Ethnic group,1000000,2,,Indian
11,"Asian, Asian British or Asian Welsh: Pakistani",,,,,,Ethnic group,1000000,2,,Pakistani
14,"Asian, Asian British or Asian Welsh: Other Asian",,,,,,Ethnic group,1000000,2,,Other Asian
1002,"Black, Black British, Black Welsh, Caribbean o...",3.0,16.0,1000000.0,,,Ethnic group,1000000,1,True,
16,"Black, Black British, Black Welsh, Caribbean o...",,,,,,Ethnic group,1000000,2,,African
15,"Black, Black British, Black Welsh, Caribbean o...",,,,,,Ethnic group,1000000,2,,Caribbean


## Get some crime data and the LSOAs they occur in 

In [14]:
crime_data = extract_crime_data("west-yorkshire")
crime_data

Unnamed: 0_level_0,Month,Reported by,Falls within,Location,LSOA code,LSOA name,Crime type,geometry
Crime ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
5595af42450784687d49fbac869eea6fcf80cc0b9216cdf52b5ba29493e70d5b,2022-06,West Yorkshire Police,West Yorkshire Police,On or near Park/Open Space,E01007418,Barnsley 016A,Public order,POINT (428510.985 412445.001)
0a55896e95eae6efc9f1d96e2338b7ae58c8c2e903ce55cd180ad85772173791,2022-06,West Yorkshire Police,West Yorkshire Police,On or near Cocking Lane,E01010646,Bradford 001A,Criminal damage and arson,POINT (407795.98 448637.999)
6ef8495fb8063b21759a66268016972e7ac403e1a4d3d0ee394eaaeb99edbbf4,2022-06,West Yorkshire Police,West Yorkshire Police,On or near Beacon Street,E01010646,Bradford 001A,Other theft,POINT (408373.007 449758.96)
ed7bc5cf1e0d6d98e657d84d9145e28e1741257afd40893d0397350a7d97728e,2022-06,West Yorkshire Police,West Yorkshire Police,On or near Beacon Street,E01010646,Bradford 001A,Other theft,POINT (408373.007 449758.96)
d39b91fd1beb6d981ca79be112354879477f97ee610684c95298d6d90c499e1b,2022-06,West Yorkshire Police,West Yorkshire Police,On or near Cross End Fold,E01010646,Bradford 001A,Public order,POINT (408037.995 449751.034)
...,...,...,...,...,...,...,...,...
3936ab0e2fd239fb68cbb3749d2db27e73aeb1c1265abef4ee4e9bcf8f111805,2025-05,West Yorkshire Police,West Yorkshire Police,On or near Mill Street,E01011872,Wakefield 045D,Violence and sexual offences,POINT (444424.007 410686.031)
2d4730ccf43dbdf19f614cd34084afe6ded3e7dad5a5e79b70c61b1acfc247b9,2025-05,West Yorkshire Police,West Yorkshire Police,On or near Mill Lane Villas,E01011872,Wakefield 045D,Violence and sexual offences,POINT (444518.988 410582.003)
b987d5101a01108a62245d0aa788817221f5b3274e403305bc6e83e935061b94,2025-05,West Yorkshire Police,West Yorkshire Police,On or near Holmsley Mount,E01011872,Wakefield 045D,Violence and sexual offences,POINT (444333.024 410802.003)
deaf6252399b6e42b3c2de28076c65c0b539236afa2d9eb4d224b5c9691232d9,2025-05,West Yorkshire Police,West Yorkshire Police,On or near Holmsley Mount,E01011872,Wakefield 045D,Violence and sexual offences,POINT (444333.024 410802.003)


In [15]:
available_lsoas = FieldMetadata(
    **fetch(f"dataset/{table_name}/geography/2092957703TYPE151.def.sdmx.json")
).to_dataframe()["GeogCode"]
lsoas = crime_data["LSOA code"].unique()
# FFS crime data has some 2011 LSOAs
lsoas = np.intersect1d(lsoas, available_lsoas)
lsoas

array(['E01005410', 'E01005414', 'E01005448', ..., 'E01035052',
       'E01035053', 'E01035054'], shape=(1447,), dtype=object)

In [16]:
table_name = "NM_2132_1"
nomis_area_codes = available_lsoas[available_lsoas.isin(lsoas)].index.to_list()

# TODO? select only using codes and use metadata as lookup?
selections = ",".join(
    (
        "GEOGRAPHY_CODE",
        *(f"{field}_NAME" for field in fields if field not in ["GEOGRAPHY", "FREQ", "MEASURES"]),
        "OBS_VALUE",
    )
)

params = {
    "date": "latest",
    "geography": build_geog_query(nomis_area_codes),
    "c2021_eth_20": "1001...1005",
    "c2021_age_6": "1...5",
    "c_sex": "1,2",
    "select": selections,
}

data = fetch_table(table_name, **params)

In [17]:
data  # .C2021_ETH_20_NAME.unique()

Unnamed: 0,GEOGRAPHY_CODE,C2021_ETH_20_NAME,OBS_VALUE
0,E01005410,"Asian, Asian British or Asian Welsh",1
1,E01005410,"Asian, Asian British or Asian Welsh",1
2,E01005410,"Asian, Asian British or Asian Welsh",0
3,E01005410,"Asian, Asian British or Asian Welsh",0
4,E01005410,"Asian, Asian British or Asian Welsh",4
...,...,...,...
72345,E01035054,Other ethnic group,13
72346,E01035054,Other ethnic group,2
72347,E01035054,Other ethnic group,1
72348,E01035054,Other ethnic group,0


In [18]:
# e.g. compare proportion of Black people in community to stop-and-search incidences

data["is_black"] = data.C2021_ETH_20_NAME.str.contains("Black")
lsoa_totals = data.groupby(["GEOGRAPHY_CODE", "is_black"]).OBS_VALUE.sum().unstack(level="is_black")

proportion = lsoa_totals.apply(lambda r: r[True] / r.sum(), axis=1)
proportion

GEOGRAPHY_CODE
E01005410    0.000668
E01005414    0.001498
E01005448    0.057352
E01005561    0.004954
E01006881    0.010497
               ...   
E01035050    0.090965
E01035051    0.045877
E01035052    0.037216
E01035053    0.016110
E01035054    0.039875
Length: 1447, dtype: float64