# Auto NUTS

There have been several versions of the NUTS geocode standard - 2003, 2006, 2010, 2013 and 2016.

Each of these versions has an [associated enforcement date](https://ec.europa.eu/eurostat/web/nuts/history), which can lag by around 2 years from the date of introduction.

Organisations releasing data aggregated at the NUTS geographies are not required to use the latest version until the enforcement date, leaving a two year period during which it is unclear whether an organisation is using a particular version.

Here we will create a function that can automatically take a dataset with specified NUTS regions and infer the version year.

In [None]:
%run ../notebook_preamble.ipy

In [None]:
from collections import defaultdict
import geopandas as gpd
import os
from itertools import chain

nuts_years = [2003, 2006, 2010, 2013, 2016]

Modelled data for air pollution across the UK is compiled by DEFRA. The values are obtained by using the data from monitoring stations and using atmospheric modelling to interpolate the data to a 1km by 1km grid across the whole country.

In [None]:
nuts_ids = {}

for nuts_year in  nuts_years:
    file = f'{data_path}/raw/gis/eurostat/NUTS_RG_01M_{nuts_year}_4326_LEVL_2.shp/NUTS_RG_01M_{nuts_year}_4326_LEVL_2.shp'
    eu_regions = region = gpd.read_file(file)
    nuts_ids[nuts_year] = set(eu_regions[eu_regions['CNTR_CODE'] == 'UK']['NUTS_ID'].values)

In [None]:
def jaccard(a, b):
    u = a.union(b)
    i = a.intersection(b)
    return len(i) / len(u)

d = defaultdict(list)

for i, u in nuts_ids.items():
    for j, v in nuts_ids.items():
        d['s'].append(i)
        d['t'].append(j)
        d['v'].append(jaccard(u, v))

In [None]:
pd.DataFrame(d).pivot(columns='s', index='t', values='v')

We can see that each NUTS version has a unique set of regions.

In [None]:
d = defaultdict(list)

for i, u in nuts_ids.items():
    for j, v in nuts_ids.items():
        d['s'].append(i)
        d['t'].append(j)
        d['v'].append(u.difference(v))
        
pd.DataFrame(d).pivot(columns='s', index='t', values='v')

In [None]:
for y in nuts_years:
    s = nuts_ids[y]
    t = [v for k, v in nuts_ids.items() if k > y]
    diff_post = s.difference(*t)
    t = [v for k, v in nuts_ids.items() if k < y]
    diff_prev = s.difference(*t)
    print(y, len(s))
    if y > 2003:
        print('Difference to previous:', diff_prev)
    if y < 2016:
        print('Difference to following:', diff_post)

In [None]:
s = nuts_ids[2003]

In [None]:
nuts_2_deprecating = {
    2003: ['UKM1', 'UKM4'],
    2006: ['UKM6', 'UKM5'],
    2010: ['UKD7', 'UKD6'],
    2013: ['UKM3', 'UKM2'],
    2016: []
}

nuts_2_introduced = {
    2003: [],
    2006: ['UKM6', 'UKM5'],
    2010: ['UKD7', 'UKD6'],
    2013: ['UKI3', 'UKI6', 'UKI7', 'UKI4', 'UKI5'],
    2016: ['UKM9', 'UKM7', 'UKM8']
}

nuts_enforced = {
    2006: 2008,
    2010: 2012,
    2013: 2015,
    2016: 2018
}

# def check_deprecating(nuts_ids, year):
#     deprecating = nuts_2_deprecating[year]
#     deprecating_in_ids = [True if d in nuts_ids else False for d in deprecating]
#     contains_deprecating = any(deprecating_in_ids)
#     print(deprecating_in_ids)
#     return contains_deprecating

def check_subsequent(ids, year):
    subsequent = [v for k, v in nuts_2_introduced.items() if k > year]
    subsequent = list(chain(*subsequent))
    subsequent_not_in_ids = [False if s in ids else True for s in subsequent]
    print(subsequent)
    omits_subsequent = all(subsequent_not_in_ids)
    return omits_subsequent

def check_deprecated(ids, year):
    deprecated = [v for k, v in nuts_2_deprecating.items() if k < year]
    deprecated = set(chain(*deprecated))
    deprecated_in_ids = deprecated.union(set(ids))
#     deprecated_not_in_ids = [False if s in ids else True for d in deprecated]
#     print(deprecated)
#     omits_deprecated = all(deprecated_not_in_ids)
#     return omits_deprecated
    return len(deprecated)

def is_nuts_year(ids, year):
    print(ids)
#     contains_deprecating = check_deprecating(nuts_ids, year)
    omits_deprecated = check_deprecated(ids, year)
    omits_subsequent = check_subsequent(ids, year)
    return omits_deprecated, omits_subsequent

In [None]:
for i in ['UKI3', 'UKI6', 'UKI7', 'UKI4', 'UKI5']:
    print(i in nuts_ids[2016])

In [None]:
for i in ['UKM1', 'UKM2']:
    print(i in nuts_ids[2013])

In [None]:
is_nuts_year(nuts_ids[2003], 2013)

**NUTS Level 2 Properties:**

- 2003
  - n_regions: 37
  - deprecating: 'UKM4', 'UKM1'
- 2006
  - n_regions: 37
  - new: 'UKM6', 'UKM5'
  - deprecating: 'UKD2', 'UKD5'
  - enforced: 2008
- 2010
  - n_regions: 37
  - new: 'UKD7', 'UKD6'
  - deprecating: 'UKI1', 'UKI2'
  - enforced: 2012
- 2013
  - n_regions: 40
  - new: 'UKI3', 'UKI6', 'UKI7', 'UKI4', 'UKI5', 
  - deprecating: 'UKM3', 'UKM2'
  - enforced: 2015
- 2016
  - len: 41
  - new: 'UKM9', 'UKM7', 'UKM8'
  - enforced: 2018

In [None]:
len(nuts_ids[2016])

In [None]:
u.difference??