# Linking UK SOC and ISCO-08 frameworks

In the following, we derive two options for crosswalking labour market data associated with UK SOC codes to the ESCO occupations, and we check which option covers better the full set of ESCO occupations and their four-digit ISCO occupational codes. 


# 0. Import dependencies and inputs

In [7]:
%run ../notebook_preamble.ipy

In [22]:
# Import all ESCO occupations
occupations = pd.read_csv(data_folder + 'processed/ESCO_occupational_hierarchy/ESCO_occupational_hierarchy.csv')

# Import the subset of ESCO occupations analysed in the project report
occupations_risk = pd.read_csv(data_folder + 'processed/ESCO_automation_risk.csv')

# Import ONS crosswalk between SOC2010 to ISCO-08
soc_to_isco_v1 = pd.read_excel(
    data_folder + 'raw/ons/soc2010_to_isco08.xlsx',
    sheet_name='SOC2010')

# Import the latest UK SOC coding index (at the time of doing the analysis)
soc_to_isco_v2 = pd.read_excel(
    data_folder + 'raw/ons/soc2020volume2thecodingindex140220.xlsx',
    sheet_name = 'SOC2020 coding index')

# Import ISCO occupational category titles
isco_titles = pd.read_csv(f'{data_folder}processed/ISCO_occupation_titles.csv')

# 1. Inspect options for crosswalking data linked to UK SOC to ESCO

## 1.1 ONS official crosswalk between UK SOC 2010 and ISCO-08

The first option uses the official crosswalk between UK SOC 2010 and ISCO-08 classifications, obtained from [here](https://www.ons.gov.uk/methodology/classificationsandstandards/standardoccupationalclassificationsoc/soc2010). 


In [9]:
# Organise the crosswalk table
soc_to_isco_v1_concat = pd.concat([
    soc_to_isco_v1[['SOC2010','ISCO08_A', '% A']].rename(columns={'SOC2010': 'soc10', 'ISCO08_A': 'isco', '% A': 'prop'}),
    soc_to_isco_v1[['SOC2010','ISCO08_B', '% B']].rename(columns={'SOC2010': 'soc10', 'ISCO08_B': 'isco', '% B': 'prop'}),
], axis=0)
soc_to_isco_v1_concat = soc_to_isco_v1_concat[-soc_to_isco_v1_concat.isco.isnull()]
soc_to_isco_v1_concat.isco = soc_to_isco_v1_concat.isco.astype(int)

# Crosswalk dataframe
soc_to_isco_v1_concat.head(3)


Unnamed: 0,soc10,isco,prop
0,1115,1120,1.0
1,1116,1111,1.0
2,1121,1321,1.0


In [36]:
soc_to_isco_v1_concat.to_csv(data_folder + 'processed/linked_data/uksoc2010_to_isco08_official.csv', index=False)

Note: The `prop` column indicates the proportion of workers with the ISCO code that have been mapped to the UK SOC code (vast majority of the mappings are one to one).

### Check the coverage of the first crosswalk

In [10]:
# Unique four-digit ISCO codes in the crosswalk
set_of_isco_codes_v1 = set(soc_to_isco_v1_concat.isco.unique())

In [11]:
def check_coverage(set_of_isco_codes):
    
    """ Helper function to check coverage of the ISCO and ESCO occupations by the provided set of ISCO codes """
    
    # Fraction of all four-digit ESCO ISCO codes that are covered by the ONS crosswalk
    esco_set_all = set(occupations.isco_level_4.unique());
    print(f"Fraction of all ESCO occupations' ISCO codes: {np.round(len(esco_set_all.intersection(set_of_isco_codes))/len(esco_set_all),2)}")

    # Fraction of all four-digit ISCO codes for 'top level' ESCO occupations that were analysed in the report
    esco_set_top = set(occupations_risk.isco_code.unique());
    print(f"Fraction of top level occupations' ISCO codes: {np.round(len(esco_set_top.intersection(set_of_isco_codes))/len(esco_set_top),2)}")

    print('----')

    # Fraction of all ESCO occupations that are covered
    print(f'Fraction of all ESCO occupations: {np.round(np.sum(occupations.isco_level_4.isin(list(set_of_isco_codes))) / len(occupations),2)}')

    # Fraction of all four-digit ISCO codes for 'top level' ESCO occupations that were analysed in the report
    print(f'Fraction of top level ESCO occupations: {np.round(np.sum(occupations_risk.isco_code.isin(list(set_of_isco_codes))) / len(occupations_risk),2)}')


In [12]:
check_coverage(set_of_isco_codes_v1)

Fraction of all ESCO occupations' ISCO codes: 0.68
Fraction of top level occupations' ISCO codes: 0.68
----
Fraction of all ESCO occupations: 0.8
Fraction of top level ESCO occupations: 0.75


Note: this crosswalk covers about two-thirds of the ISCO four-digit occupations in the ESCO dataset. As a result, we would miss labour market data on around 400 'top level' ESCO occupations. Hence, we explore below an alternative approach using the coding index.

## 1.2 UK SOC2010 coding index

The second crosswalking option leverages the correspondence between various UK SOC and ISCO-08 codes found in the UK SOC coding index, obtained from [here](https://www.ons.gov.uk/methodology/classificationsandstandards/standardoccupationalclassificationsoc/soc2010/soc2010volume2thestructureandcodingindex).

In [13]:
# Select relevant columns and rename
soc_to_isco = soc_to_isco_v2[['SOC\n2010', 'INDEXOCC', 'ISCO-08 code based on SOC2020']].copy()
soc_to_isco.rename(columns={
    'SOC\n2010': 'soc10',
    'ISCO-08 code based on SOC2020': 'isco',
    'INDEXOCC': 'title'}, inplace=True)

# Remove empty columns
soc_to_isco = soc_to_isco[-soc_to_isco.isco.isnull()]
soc_to_isco = soc_to_isco[soc_to_isco.soc10!='}}}}']

# Convert codes to integers
soc_to_isco.isco = soc_to_isco.isco.apply(lambda x: int(x))
soc_to_isco.soc10 = soc_to_isco.soc10.apply(lambda x: int(x))

# Remove duplicates
soc_to_isco.drop_duplicates(inplace=True)
print(len(soc_to_isco))

27248


In [23]:
isco_titles.sample()

Unnamed: 0,isco,isco_title,level
198,1344,Social welfare managers,4


In [28]:
# Add ISCO occupational category titles to the crosswalk table
soc_to_isco_title = soc_to_isco.merge(isco_titles[['isco', 'isco_title']], on='isco', how='left')

In [29]:
soc_to_isco_title.sample(10)

Unnamed: 0,soc10,title,isco,isco_title
23827,6219,"Steward, chief",5111,Travel attendants and travel stewards
19972,9272,"Orderly, kitchen",9412,Kitchen helpers
20913,9260,"Porter, food",9333,Freight handlers
22710,4114,"Secretary, financial",4120,Secretaries (general)
26243,9260,"Waterman, dock",9333,Freight handlers
11984,5330,"Inspector, line, pipe",3123,Construction supervisors
10867,5221,"Grinder, tool and cutter",7224,"Metal polishers, wheel grinders and tool sharp..."
11595,8127,"Helper, printer's",7322,Printers
24462,6211,"Supervisor, services, visitor",5113,Travel guides
25775,5223,"Tuner, loom",8152,Weaving and knitting machine operators


### Check the coverage of the second "crosswalk" (inferred from the coding index)

In [30]:
set_of_isco_codes_v2 = set(soc_to_isco.isco.unique())

In [31]:
check_coverage(set_of_isco_codes_v2)

Fraction of all ESCO occupations' ISCO codes: 0.99
Fraction of top level occupations' ISCO codes: 0.99
----
Fraction of all ESCO occupations: 1.0
Fraction of top level ESCO occupations: 1.0


Note: This exercise shows that the approach using the coding index provides practically full coverage of the ESCO occupations and their ISCO codes. Hence, in the following we used this approach.

That said, this approach has a limited precision, as many SOC codes can be mapped to the same ISCO codes and vice versa (see below). This crosswalk does not account for the relative contribution of each particular UK SOC code to the make-up of the ISCO occupations. With that in mind, the derived earnings and hours estimates of ESCO occupations should be seen as indicative approximations. 

Further work could combine both crosswalking approaches, with the coding index approach filling the gaps left by the official crosswalk. Alternatively, in the longer term, real time estimates of salaries could be derived from online job posting data, by linking job postings to ESCO occupations.

In [33]:
# Note that the same SOC10 can be mapped to different ISCO codes, and vice-versa;
soc_to_isco_title[soc_to_isco_title.soc10==2113].drop_duplicates('isco')

Unnamed: 0,soc10,title,isco,isco_title
326,2113,"Adviser, geological",2114,Geologists and geophysicists
384,2113,"Adviser, protection, radiation",2263,Environmental and occupational health and hygi...
473,2113,Aerodynamicist,2111,Physicists and astronomers
1736,2113,"Assistant, meteorological",2112,Meteorologists
3188,2113,Biophysicist,2131,"Biologists, botanists, zoologists and related ..."
6103,2113,Crystallographer,2113,Chemists
16440,2113,"Manager, research",1223,Research and development managers
20491,2113,"Physicist, medical",2269,Health professionals not elsewhere classified


In [34]:
soc_to_isco_title[soc_to_isco_title.isco==2263].drop_duplicates('soc10')

Unnamed: 0,soc10,title,isco,isco_title
313,3567,"Adviser, EHS",2263,Environmental and occupational health and hygi...
328,2219,"Adviser, health, occupational",2263,Environmental and occupational health and hygi...
384,2113,"Adviser, protection, radiation",2263,Environmental and occupational health and hygi...
16056,2463,"Manager, health, environmental",2263,Environmental and occupational health and hygi...
16483,1259,"Manager, safety, crowd",2263,Environmental and occupational health and hygi...
18910,3563,"Officer, safety and training",2263,Environmental and occupational health and hygi...


## 1.3 Export the UK SOC to ISCO correspondence table

In [37]:
# Export the full coding-index based crosswalk
soc_to_isco_title.to_csv(data_folder + 'processed/linked_data/UKSOC2010_to_ISCO08_coding_index.csv', index=False)