# Crosswalk between O*NET to ESCO occupations

This notebook generates a pre-validated crosswalk from the occupations described in O\*NET to ESCO occupations. 

The automated mapping strategy primarily involved applying natural language processing (NLP) techniques to identify occupations with similar job titles and descriptions. Similarity in this context refers to semantic similarity and was calculated by comparing sentence embeddings generated by [Sentence-BERT](github.com/UKPLab/sentence-transformers) (Reimers and Gurevych 2019), a recent neural network model that outputs high-quality, semantically meaningful numerical representations of text.

The resulting automated mapping was subsequently manually validated by the authors. See also the Appendix of the Mapping Career Causeways report for further details.

# 1. Set up dependencies and helper functions

In [1]:
import os
import pandas as pd
import numpy as np
import pickle
import collections
import seaborn as sns
from scipy.spatial.distance import pdist, squareform, cosine
import itertools
from time import time

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('bert-base-nli-mean-tokens')

In [2]:
def get_title_sim(some_esco_code, some_onet_code):
    '''
    Measure pairwise cosine similarities of ESCO and O*NET job titles; identify common job titles
    '''
    esco_options = esco_alt_titles[some_esco_code]
    onet_options = onet_alt_titles[some_onet_code]
    title_combo = list(itertools.product(esco_options, onet_options))
    cosines = [1-cosine(esco_titles_dict[elem[0]], onet_titles_dict[elem[1]]) for elem in title_combo]
    common = [elem for elem in esco_options if elem in onet_options]
    if len(common):
        res2 = (common)
    else:
        res2 = ('na')
        
    return (cosines, res2)

In [3]:
def get_desc_sim(some_esco_code, some_onet_code):
    '''
    Measure cosine similarity of occupation descriptions
    '''
    esco_desc_embed = esco_desc_dict[some_esco_code]
    onet_desc_embed = onet_desc_dict[some_onet_code]
    desc_sim = 1- cosine(esco_desc_embed, onet_desc_embed)
    return desc_sim


In [4]:
def eval_onet_match(some_esco_code, some_onet_code):
    '''
    Calculate various measures of similarity; return for evaluating the quality of the matches
    '''
    onet_title = onet_off_titles[some_onet_code]
    title_sims, shared_titles = get_title_sim(some_esco_code, some_onet_code)
    desc_sim = get_desc_sim(some_esco_code, some_onet_code)
    if len(title_sims):
        res = (onet_title, max(title_sims), np.median(title_sims), desc_sim, shared_titles)
    else:
        res = (onet_title, 0, 0, desc_sim, shared_titles)
    return res


# 2. Read in various lookups and existing crosswalks

First, we compared a given ESCO occupation with its most likely O\*NET matches. These are also referred to as ‘constrained’ matches and were derived by extrapolating from existing crosswalks between the US 2010 Standard Occupational Classification (SOC) and the ISCO-08.

The logic of this initial mapping was as follows: ESCO occupations with an immediate ISCO parent code (i.e. so-called level 5 ESCO occupations) &rarr; 4-digit ISCO code &rarr; US 2010 SOC &rarr; O\*NET occupations.

In [5]:
base_dir = ''

## 2.1 ONET to US 2010 SOC

Crosswalk from O\*NET to US 2010 SOC obtained from the [O\*NET website](https://www.onetcenter.org/taxonomy/2010/soc.html/2010_to_SOC_Crosswalk.xls) in February 2020.


In [6]:
# Import O*NET to US 2010 SOC
onet_us2010soc = pd.read_excel(os.path.join(base_dir, 'lookups', 'ONET_to_US2010SOC.xlsx'))

In [7]:
onet_us2010soc.head(10)

Unnamed: 0,O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,2010 SOC Code,2010 SOC Title
0,11-1011.00,Chief Executives,11-1011,Chief Executives
1,11-1011.03,Chief Sustainability Officers,11-1011,Chief Executives
2,11-1021.00,General and Operations Managers,11-1021,General and Operations Managers
3,11-1031.00,Legislators,11-1031,Legislators
4,11-2011.00,Advertising and Promotions Managers,11-2011,Advertising and Promotions Managers
5,11-2011.01,Green Marketers,11-2011,Advertising and Promotions Managers
6,11-2021.00,Marketing Managers,11-2021,Marketing Managers
7,11-2022.00,Sales Managers,11-2022,Sales Managers
8,11-2031.00,Public Relations and Fundraising Managers,11-2031,Public Relations and Fundraising Managers
9,11-3011.00,Administrative Services Managers,11-3011,Administrative Services Managers


In [8]:
#Create a mapping of US 2010 SOC options to ONET
onet_soc_lookup = collections.defaultdict(list)
for name, group in onet_us2010soc.groupby('2010 SOC Code'):
    options = group['O*NET-SOC 2010 Code']
    for option in options:
        onet_soc_lookup[name].append(option)

In [9]:
# Map ONET codes to occupation titles
onet_titles = {}
for ix, row in onet_us2010soc.iterrows():
    onet_titles[row['O*NET-SOC 2010 Code']] = row['O*NET-SOC 2010 Title']

## 2.2 ESCO (directly associated with an ISCO code) to ISCO

Mapping of ESCO to ISCO was obtained using ESCO API in February 2020.  
Note that the structure of ESCO is not straightforward, as at each level there is a combination of nested and leaf nodes.

In [10]:
# Import ESCO to ISCO-08
esco_occup_level5 = pd.read_csv(os.path.join(base_dir, 'lookups', 'esco_occup_level5.csv'), encoding = 'utf-8')

In [11]:
esco_occup_level5.head()

Unnamed: 0,id,concept_type,concept_uri,preferred_label,alt_labels,description,isco_group,isco_level_1,isco_level_2,isco_level_3,isco_level_4
0,1,Occupation,http://data.europa.eu/esco/occupation/000e93a3...,metal drawing machine operator,metal drawing machine technician\nmetal drawin...,Metal drawing machine operators set up and ope...,8121,8,81,812,8121
1,3,Occupation,http://data.europa.eu/esco/occupation/0022f466...,air traffic safety technician,air traffic safety electronics hardware specia...,Air traffic safety technicians provide technic...,3155,3,31,315,3155
2,4,Occupation,http://data.europa.eu/esco/occupation/002da35b...,hospitality revenue manager,hospitality revenues manager\nyield manager\nh...,Hospitality revenue managers maximise revenue ...,2431,2,24,243,2431
3,5,Occupation,http://data.europa.eu/esco/occupation/0044c991...,medical laboratory assistant,medical laboratory research assistant\nbiomedi...,Medical laboratory assistants work under super...,3212,3,32,321,3212
4,7,Occupation,http://data.europa.eu/esco/occupation/00674f21...,primary school teaching assistant,primary school teaching aide\nteaching assista...,Primary school teaching assistants provide ins...,5312,5,53,531,5312


## 2.3 US 2010 SOC to ISCO-08

The mapping between ISCO-08 to US 2010 SOC has been obtained from [BLS website](https://www.bls.gov/soc/soccrosswalks.htm) on February 28, 2020.

In [12]:
#US 2010 SOC to ISCO-08
isco_us2010soc = pd.read_excel(os.path.join(base_dir, 'lookups', 'ISCO_SOC_Crosswalk.xls'),
                               dtype = 'object',
                               skiprows = range(0,6))

In [13]:
isco_us2010soc.head()

Unnamed: 0,ISCO-08 Code,ISCO-08 Title EN,part,2010 SOC Code,2010 SOC Title,Comment 8/17/11
0,110,Commissioned armed forces officers,*,55-1011,Air Crew Officers,
1,110,Commissioned armed forces officers,*,55-1012,Aircraft Launch and Recovery Officers,
2,110,Commissioned armed forces officers,*,55-1013,Armored Assault Vehicle Officers,
3,110,Commissioned armed forces officers,*,55-1014,Artillery and Missile Officers,
4,110,Commissioned armed forces officers,*,55-1015,Command and Control Center Officers,


In [14]:
#Create mapping of US 2010 SOC options to ISCO-08
isco_soc_lookup = collections.defaultdict(list)
for name, group in isco_us2010soc.groupby('ISCO-08 Code'):
    options = group['2010 SOC Code']
    for option in options:
        isco_soc_lookup[name].append(option)

# 3. First initial mapping

ESCO level 5 (ESCO occupations that have an immediate ISCO parent code) &rarr; 4-digit ISCO code &rarr; US 2010 SOC &rarr; O\*NET occupation

In [15]:
#Retrieve US 2010 SOC options for each ESCO occupation using its corresponding 4-digit ISCO-08 code
us_soc = esco_occup_level5['isco_group'].apply(lambda x: isco_soc_lookup[str(x)])  
us_soc = us_soc.apply(lambda x: [elem.strip() for elem in x])

#Generate more granular O*NET options from US 2010 SOC
onet_options = us_soc.apply(lambda x: [onet_soc_lookup[elem] for elem in x])

#Create a flat list of O*NET codes
onet_options_flat = onet_options.apply(lambda x: [item for sublist in x for item in sublist])

#Generate a flat list of O*NET titles corresponding to the codes above
onet_names_flat = onet_options_flat.apply(lambda x: [onet_titles[elem] for elem in x])

In [16]:
lens = onet_names_flat.apply(lambda x: len(x))

We can now produce an initial mapping of ESCO occupations to possible O\*NET code options

In [17]:
mini_esco = esco_occup_level5[['id', 'preferred_label', 'alt_labels', 'description', 'isco_group']].copy()
mini_esco['onet_codes'] = onet_options_flat
mini_esco['onet_titles'] = onet_names_flat
mini_esco['lens'] = lens

In [18]:
mini_esco.head()

Unnamed: 0,id,preferred_label,alt_labels,description,isco_group,onet_codes,onet_titles,lens
0,1,metal drawing machine operator,metal drawing machine technician\nmetal drawin...,Metal drawing machine operators set up and ope...,8121,"[51-4021.00, 51-4023.00, 51-4051.00, 51-4052.0...","[Extruding and Drawing Machine Setters, Operat...",5
1,3,air traffic safety technician,air traffic safety electronics hardware specia...,Air traffic safety technicians provide technic...,3155,"[17-3023.00, 17-3023.01, 17-3023.03]",[Electrical and Electronic Engineering Technic...,3
2,4,hospitality revenue manager,hospitality revenues manager\nyield manager\nh...,Hospitality revenue managers maximise revenue ...,2431,"[13-1161.00, 27-3043.00, 27-3043.04, 27-3043.05]",[Market Research Analysts and Marketing Specia...,4
3,5,medical laboratory assistant,medical laboratory research assistant\nbiomedi...,Medical laboratory assistants work under super...,3212,"[29-2011.00, 29-2011.01, 29-2011.02, 29-2011.0...",[Medical and Clinical Laboratory Technologists...,5
4,7,primary school teaching assistant,primary school teaching aide\nteaching assista...,Primary school teaching assistants provide ins...,5312,[25-9041.00],[Teacher Assistants],1


### Quick summary of the first intermediate mapping

Out of 1701 ESCO level 5 occupations:

- 21 with no matches (military occupations, ISCO codes need padding with 0s)

- 341 1-1 matches

- 312 1-2 matches

### Next steps

- Calculate the semantic similarity of ESCO occupations with potential O\*NET options
- Identify the most similar O\*NET occupation
- Manually review the results

# 4. Measure semantic similarity of mapping options

## 4.1 Collect all known job titles for ESCO and O\*NET occupations

To analyse semantic similarity of ESCO occupations to O\*NET options, we collect the availabe occupation descriptions and known job titles. The similarity we use is a composite metric which reflects cosine similarity of Sentence-BERT embeddings and comprises:
- Highest pairwise similarity among all known job titles (40%)
- Median pairwise similarity between all known job titles (30%)
- Similarity of occupation descriptions (30%)


In [19]:
# Collect all titles for ESCO
mini_esco['isco_group'] = mini_esco['isco_group'].astype('str')
mini_esco['id'] = mini_esco['id'].astype('str')
mini_esco = mini_esco.fillna('na')
esco_alt_titles = collections.defaultdict(list)
for ix, row in mini_esco.iterrows():
    esco_code = row['id']
    esco_off_title = row['preferred_label']
    esco_alt_titles[esco_code].append(esco_off_title)
    esco_alt_title = row['alt_labels']
    if esco_alt_title != 'na':
        esco_alt_title = esco_alt_title.split('\n')
        esco_alt_titles[esco_code].extend(esco_alt_title)

In [20]:
# Collect job titles for O*NET
onet_titles_df = pd.read_excel(os.path.join(base_dir, 'lookups', 'Alternate Titles.xlsx'))

In [21]:
onet_alt_titles = collections.defaultdict(list)
for code, group in onet_titles_df.groupby('O*NET-SOC Code'):
    onet_off_title = group.iloc[0]['Title'].lower()
    onet_alt_title = list(group['Alternate Title'].values)
    onet_alt_title = [elem.lower() for elem in onet_alt_title]
    onet_alt_titles[code].append(onet_off_title)
    onet_alt_titles[code].extend(onet_alt_title)

## 4.2 Collect occupation descriptions for ESCO and O\*NET

In [22]:
# Collect occupation descriptions for ESCO
esco_desc = {}
for ix, row in mini_esco.iterrows():
    esco_code = row['id']
    esco_occ_desc = row['description'].lower()
    esco_desc[esco_code] = esco_occ_desc

In [23]:
# Collect occupation descriptions for O*NET
onet_occ_info = pd.read_excel(os.path.join(base_dir, 'lookups', 'Occupation Data.xlsx'))

In [24]:
onet_desc = {}
for ix, row in onet_occ_info.iterrows():
    onet_code = row['O*NET-SOC Code']
    onet_occ_desc = row['Description'].lower()
    onet_desc[onet_code] = onet_occ_desc

In [25]:
# Add official job titles
onet_off_titles = {}
for ix, row in onet_occ_info.iterrows():
    onet_code = row['O*NET-SOC Code']
    onet_occ_title = row['Title'].lower()
    onet_off_titles[onet_code] = onet_occ_title

In [26]:
#Save all description and title dicts
with open(os.path.join(base_dir, 'outputs', 'onet_desc.pkl'), 'wb') as f:
    pickle.dump(onet_desc, f)
    
with open(os.path.join(base_dir, 'outputs', 'esco_desc.pkl'), 'wb') as f:
    pickle.dump(esco_desc, f)
    
with open(os.path.join(base_dir, 'outputs', 'onet_alt_titles.pkl'), 'wb') as f:
    pickle.dump(onet_alt_titles, f)
    
with open(os.path.join(base_dir, 'outputs', 'esco_alt_titles.pkl'), 'wb') as f:
    pickle.dump(esco_alt_titles, f)

## 4.3 Calculate sentence embeddings for job titles and occupation descriptors

In [29]:
# WARNING: If you run this in a notebook (approx. 30 mins), the kernel might hang; suggestion is to run as a script
# Alternatively, you could skip this cell and read in the pre-computed embeddings if available.

start_time = time()

#ONET description embeddings
onet_desc_sentences = list(onet_desc.values())
onet_desc_embeddings = model.encode(onet_desc_sentences)
onet_desc_dict = {occup: embed for occup, embed in zip(list(onet_desc.keys()), onet_desc_embeddings)}

#ESCO description embeddings
esco_desc_sentences = list(esco_desc.values())
esco_desc_embeddings = model.encode(esco_desc_sentences)
esco_desc_dict = {occup: embed for occup, embed in zip(list(esco_desc.keys()), esco_desc_embeddings)}

#ONET title embeddings
all_onet_titles = [item for sublist in list(onet_alt_titles.values()) for item in sublist]
flat_onet_titles = list(set(all_onet_titles))
onet_title_embeddings = model.encode(flat_onet_titles)
onet_titles_dict = {title: embed for title, embed in zip(flat_onet_titles, onet_title_embeddings)}

#ESCO title embeddings
all_esco_titles = [item for sublist in list(esco_alt_titles.values()) for item in sublist]
flat_esco_titles = list(set(all_esco_titles))
esco_title_embeddings = model.encode(flat_esco_titles)
esco_titles_dict = {title: embed for title, embed in zip(flat_esco_titles, esco_title_embeddings)}

print(f'Done in {np.round(time() - start_time) / 60:.2f} minutes!')

#Save outputs
with open(os.path.join(base_dir, 'outputs', 'onet_desc_embed.pkl'), 'wb') as f:
    pickle.dump(onet_desc_dict, f)
    
with open(os.path.join(base_dir, 'outputs', 'esco_desc_embed.pkl'), 'wb') as f:
    pickle.dump(esco_desc_dict, f)
    
with open(os.path.join(base_dir, 'outputs', 'onet_title_embed.pkl'), 'wb') as f:
    pickle.dump(onet_titles_dict, f)
    
with open(os.path.join(base_dir, 'outputs', 'esco_title_embed.pkl'), 'wb') as f:
    pickle.dump(esco_titles_dict, f)


Done in 27.05 minutes!


Read in the pre-computed embeddings, if available (see instructions for downloading the embeddings in the readme.md document).

In [30]:
# Read in computed embeddings
with open(os.path.join(base_dir, 'outputs', 'onet_desc_embed.pkl'), 'rb') as f:
    onet_desc_dict = pickle.load(f)
    
with open(os.path.join(base_dir, 'outputs', 'esco_desc_embed.pkl'), 'rb') as f:
    esco_desc_dict = pickle.load(f)
    
with open(os.path.join(base_dir, 'outputs', 'onet_title_embed.pkl'), 'rb') as f:
    onet_titles_dict = pickle.load(f)
    
with open(os.path.join(base_dir, 'outputs', 'esco_title_embed.pkl'), 'rb') as f:
    esco_titles_dict = pickle.load(f)

## 4.4 Measure similarity of ESCO occupations against most likely O\*NET occupations

In [31]:
# Calculate similarities (approx. 5 mins);
# Alternatively, can skip two cells ahead if pre-computed results are available
start_time = time()

esco_onet_dict = collections.defaultdict(dict)
for ix, row in mini_esco.iterrows():
    esco_code = row['id']
    onet_codes = row['onet_codes']
    isco_code = row['isco_group']
    for code in onet_codes:
        res = eval_onet_match(esco_code, code)
        esco_onet_dict[esco_code][code] = res+(isco_code,)
        
print(f'Done in {np.round(time() - start_time) / 60:.2f} minutes!')

Done in 5.37 minutes!


In [32]:
# Uncomment if saving the `esco_onet_dict` dictionary
# with open(os.path.join(base_dir, 'outputs', 'esco_onet_dict.pkl'), 'wb') as f:
#     pickle.dump(esco_onet_dict, f)

In [33]:
# If pre-computed results available, can skip to here
with open(os.path.join(base_dir, 'outputs', 'esco_onet_dict.pkl'), 'rb') as f:
    esco_onet_dict = pickle.load(f)

In [34]:
# Condense the dict above and calculate single similarity value as a weighted average
compressed_esco_onet_dict = dict()
for k, v in esco_onet_dict.items():
    new_values = []
    for k2,v2 in v.items():
        score = v2[1]*0.4 + v2[2]*0.3 + v2[3]*0.3
        new_values.append((k2, v2[0], score, v2[4], v2[5]))
    new_values = sorted(new_values, key = lambda x: x[2], reverse = True)
    compressed_esco_onet_dict[k] = new_values

In [35]:
# Check
compressed_esco_onet_dict['956']

[('19-4021.00',
  'biological technicians',
  0.8532076448202133,
  ['laboratory assistant'],
  '3141'),
 ('19-4091.00',
  'environmental science and protection technicians, including health',
  0.7960260331630707,
  ['laboratory assistant'],
  '3141')]

In [36]:
esco_onet_df = pd.DataFrame.from_dict(compressed_esco_onet_dict, orient = 'index')
esco_onet_df['id'] = esco_onet_df.index
esco_onet_df['esco_title'] = esco_onet_df['id'].apply(lambda x: esco_alt_titles[x][0])

In [37]:
esco_onet_df.head(3)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,29,30,31,32,33,34,35,36,id,esco_title
1,"(51-4021.00, extruding and drawing machine set...","(51-4023.00, rolling machine setters, operator...","(51-4052.00, pourers and casters, metal, 0.737...","(51-4051.00, metal-refining furnace operators ...","(51-4191.00, heat treating equipment setters, ...",,,,,,...,,,,,,,,,1,metal drawing machine operator
3,"(17-3023.00, electrical and electronic enginee...","(17-3023.01, electronics engineering technicia...","(17-3023.03, electrical engineering technician...",,,,,,,,...,,,,,,,,,3,air traffic safety technician
4,"(13-1161.00, market research analysts and mark...","(27-3043.04, copy writers, 0.6808782815933228,...","(27-3043.00, writers and authors, 0.5433969378...","(27-3043.05, poets, lyricists and creative wri...",,,,,,,...,,,,,,,,,4,hospitality revenue manager


In [38]:
# This file was used for the first sweep of manual review
esco_onet_df.to_csv(os.path.join(base_dir, 'outputs', 'esco_onet_df.csv'))

# 5. First sweep of manual review

In the first sweep of the manual review, the 'constrained' matches were reviewed, and the most suitable match was recorded (if the reviewer was confident). The recommendations from the first sweep of reviews are saved in `reviews/esco_onet_recommended.csv`.

# 6. Measure similarity of ESCO occupations against all O\*NET occupations

In addition to evaluating the 'constrained' most likely matches, we also measured similarity of an ESCO occupation to all O\*NET occupations in case the best matching O\*NET occupation was not included in the set of 'constrained' O\*NET matches. 

In [40]:
# Find the best ESCO match against all ONET codes (may take several hours)
# Alternatively, can skip two cells ahead if pre-computed results are available
start_time = time()
esco_onet_best_dict = collections.defaultdict(dict)
for ix, row in mini_esco.iterrows():
    esco_code = row['id']
    onet_codes = onet_off_titles.keys()
    isco_code = row['isco_group']
    for code in onet_codes:
        res = eval_onet_match(esco_code, code)
        esco_onet_best_dict[esco_code][code] = res+(isco_code,)
print(f'Done in {np.round(time() - start_time) / 60:.2f} minutes!')        

In [None]:
# Uncomment if saving the `esco_onet_best_dict` dictionary
# with open(os.path.join(base_dir, 'outputs', 'esco_onet_best_dict.pkl'), 'wb') as f:
#     pickle.dump(esco_onet_best_dict, f)

In [45]:
# If pre-computed results available, can skip to here
with open(os.path.join(base_dir, 'outputs', 'esco_onet_best_dict.pkl'), 'rb') as f:
    esco_onet_best_dict = pickle.load(f)

In [49]:
compressed_esco_onet_best_dict = dict()
for k, v in esco_onet_best_dict.items():
    new_values = []
    for k2,v2 in v.items():
        score = v2[1]*0.4 + v2[2]*0.3 + v2[3]*0.3
        new_values.append((k2, v2[0], score, v2[4], v2[5]))
    new_values = sorted(new_values, key = lambda x: x[2], reverse = True)
    compressed_esco_onet_best_dict[k] = new_values[0]

# 7. Second sweep of manual review

The most likely 'constrained' matches, the recommendations from the first sweep of review, and the best matching options across all O\*NET occupations were combined in `esco_onet_merged.xlsx` and again manually reviewed.

In [50]:
# Read in recommendations from the first manual review
recommendations = pd.read_csv(os.path.join(base_dir, 'review','esco_onet_recommended.csv'), encoding = 'utf-8')
recommendations['id'] = recommendations['id'].astype(str)

In [51]:
recommendations.head()

Unnamed: 0,id,esco_title,Recommended option
0,7,primary school teaching assistant,"('25-9041.00', 'teacher assistants', 0.8063072..."
1,29,legal guardian,
2,30,weaver,
3,35,legal administrative assistant,"('43-6012.00', 'legal secretaries', 0.85337967..."
4,41,rooms division manager,"('11-9081.00', 'lodging managers', 0.788788807..."


In [52]:
# Combine the recommendation with the 'constrained' matches and the overall most similar option
merged = esco_onet_df.merge(recommendations[['id', 'esco_title', 'Recommended option']], 
                            how = 'inner', 
                            left_on = 'id',
                            right_on = 'id')

merged['most_similar_overall'] = merged['id'].apply(lambda x: compressed_esco_onet_best_dict[str(x)])

# This file was used to create 'esco_onet_merged.xlsx', which was then used 
# for the second sweep of manual reviews and independent validation
merged.to_csv(os.path.join(base_dir, 'outputs', 'esco_onet_merged.csv'), index = False)


# 8. Final crosswalk

**The final validated mapping between O\*NET and ESCO is saved in `esco_onet_crosswalk_Nov2020.csv`**  

For a number of occupations, additional research was required. This involved reading occupation descriptions and job requirements. We used the following considerations to decide between multiple potential matches: 

- ‘Constrained’ occupations (i.e. occupations that fit existing O*NET to ISCO mapping) were given preference. 
- A higher number of shared job titles was assumed to indicate a better match between occupations. 
- General O*NET occupational codes (e.g. 11-9039.00 ‘...all other’) were avoided if possible. 
- We attempted to take into account the ISCO-08 skill level (i.e. the first unit of ISCO which reflects the ranking of occupations from managerial to elementary) when assigning the corresponding O*NET occupations. 

The crosswalk also contains information about our level of confidence in the assigned match. There are three levels of confidence: 

- A score of 2 indicates that the best ‘constrained’ O*NET occupation was also the most semantically similar across all occupations (31 per cent of matches). 
- A score of 1 indicates that the two automatically identified options disagree but the reviewers have agreed on the best O*NET match following two rounds of manual review (65 per cent). 
- A score of 0.5 indicates that reviewers disagreed with the initial reviewer’s assignment and there was no consensus on the most suitable O\*NET match (4 per cent of the cases). In this case, the ESCO occupation in question was assigned to an O\*NET occupation suggested by a second reviewer. 