# Link all ESCO occupations to ONET codes

The crosswalk between ESCO and ONET links the 6-digit ONET occupations to the approximately 1700 top-level ESCO occupations (at the so-called Level 5 with respect ISCO codes). In this notebook, we link the rest of the lower level ESCO occupations to the ONET occupations, by assigning them their top-level parent occupations' ONET codes.

For example, ESCO `logistics and distribution manager`, which is crosswalked to ONET `11-3071.03: logistics managers`, is the top-level parent occupation to 83 other, more specialised occupations, such as "fish, crustaceans and molluscs distribution manager", "warehouse manager", "purchasing manager", "china and glassware distribution manager" etc. We assign to all of these 83 occupations the same ONET code `11-3071.03`.

Note that in the Mapping Career Causeways [report](https://www.nesta.org.uk/report/mapping-career-causeways-supporting-workers-risk/) we focused on a subset of transitions, between 1627 Level 5 ESCO occupations. By linking all 2942 ESCO occupations to data in the O\*NET database, we can generate even more granular transition recommendations.

In [1]:
%run ../notebook_preamble.ipy

In [2]:
# ESCO occupations
occupations = pd.read_csv(f'{data_folder}processed/ESCO_occupational_hierarchy/ESCO_occupational_hierarchy.csv')

# Crosswalk between ESCO and O*NET
esco_onet_xwalk = pd.read_csv(useful_paths.crosswalk_dir + 'esco_onet_crosswalk_MCC.csv')


In [3]:
print(f'The crosswalk has {len(esco_onet_xwalk)} top-level ESCO occupations.')
esco_onet_xwalk.head(2)

The crosswalk has 1680 top-level ESCO occupations.


Unnamed: 0,id,esco_occupation,isco_code,onet_code,onet_occupation,matching_job_titles,semantic_similarity,confidence,concept_uri
0,188,secretary of state,1111,11-1011.00,chief executives,['secretary of state'],0.810762,1.0,http://data.europa.eu/esco/occupation/0ee67d5e...
1,775,government minister,1111,11-1011.00,chief executives,['secretary of state'],0.806006,1.0,http://data.europa.eu/esco/occupation/404a50e9...


# 1. Assign ONET codes to all ESCO occupations

In [4]:
# Link ONET codes to ESCO (top level occupations)
occupations_onet = occupations.merge(
    esco_onet_xwalk[['id', 'onet_code', 'onet_occupation']], on='id', how='left')

# Assign ONET codes to the narrower occupations, based on the code of their top-level occupation
x = (occupations_onet.is_second_level==True) | (occupations_onet.is_third_level==True) | (occupations_onet.is_fourth_level==True)
df = occupations_onet[x].copy()
occupations_onet.loc[x, 'onet_code'] = occupations_onet.loc[df.top_level_parent_id.to_list(), 'onet_code'].to_list()
occupations_onet.loc[x, 'onet_occupation'] = occupations_onet.loc[df.top_level_parent_id.to_list(), 'onet_occupation'].to_list()


In [5]:
occupations_onet.sample(15)[['preferred_label','onet_code','onet_occupation']]

Unnamed: 0,preferred_label,onet_code,onet_occupation
1963,agricultural equipment design engineer,17-2141.00,mechanical engineers
1262,structural ironworker,47-2221.00,structural iron and steel workers
511,construction engineer,17-2051.00,civil engineers
829,drain technician,47-4071.00,septic tank servicers and sewer pipe cleaners
2153,law lecturer,25-1125.00,"history teachers, postsecondary"
2836,second-hand shop manager,41-1011.00,first-line supervisors of retail sales workers
1344,data quality specialist,15-1199.01,software quality assurance engineers and testers
1422,dental technician,51-9081.00,dental laboratory technicians
934,rental service representative in water transpo...,41-2021.00,counter and rental clerks
2744,plumber,47-2152.02,plumbers


## 1.1 Check null values

In [6]:
# Number of occupations without an ONET crosswalk
len(occupations_onet[occupations_onet.onet_code.isnull()])

21

In [7]:
# Occupations without ONET code
occupations_onet[occupations_onet.onet_code.isnull()][['preferred_label', 'isco_level_1']]

Unnamed: 0,preferred_label,isco_level_1
140,fleet commander,0
365,special forces officer,0
468,navy officer,0
652,intelligence communications interceptor,0
882,army corporal,0
1017,military engineer,0
1418,artillery officer,0
1612,warfare specialist,0
1864,army captain,0
2103,lieutenant,0


In [8]:
occupations_onet.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2942 entries, 0 to 2941
Data columns (total 18 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   id                    2942 non-null   int64  
 1   concept_type          2942 non-null   object 
 2   concept_uri           2942 non-null   object 
 3   preferred_label       2942 non-null   object 
 4   alt_labels            2903 non-null   object 
 5   description           2942 non-null   object 
 6   isco_level_1          2942 non-null   int64  
 7   isco_level_2          2942 non-null   int64  
 8   isco_level_3          2942 non-null   int64  
 9   isco_level_4          2942 non-null   int64  
 10  is_top_level          2942 non-null   bool   
 11  parent_occupation_id  1241 non-null   float64
 12  is_second_level       2942 non-null   bool   
 13  is_third_level        2942 non-null   bool   
 14  is_fourth_level       2942 non-null   bool   
 15  top_level_parent_id  

# 2. Export the final table

In [9]:
occupations_onet[['id','concept_uri','preferred_label','isco_level_4',
                  'onet_code','onet_occupation']].to_csv(
    f'{data_folder}processed/ESCO_ONET_xwalk_full.csv', index=False)
