# Linking ESCO occupations to UK employment statistics

Here, we use employment estimates from the EU LFS to derive rough estimates of workers employed in each 'top level' ESCO occupation. From the EU LFS data we have derived employment estimates at the level of three-digit ISCO occupational groups. We then uniformly redistribute the number of employed workers across all 'top level' ESCO occupations belonging to the respective ISCO three-digit group (top level refers to level 5 in the ESCO hierarchy which follows immediately after ISCO four-digit unit groups).


# 0. Import dependencies and inputs

In [1]:
%run ../notebook_preamble.ipy

In [2]:
# Import all ESCO occupations
occ = pd.read_csv(data_folder + 'processed/ESCO_occupational_hierarchy/ESCO_occupational_hierarchy.csv')

# Import EU LFS estimates of UK employment
file_path = useful_paths.project_dir + '/supplementary_online_data/demographic_analysis/national_count_isco/uk_breakdown_by_isco_w_risk.csv'
lfs_estimates = pd.read_csv(file_path)

# Which year to use
year = '2018'

# Total number of workers in employment
n_total = lfs_estimates[year].sum()
print(f'Total number of employed workers in {year}: {n_total/1e+3} million')

lfs_estimates.head(3)


Total number of employed workers in 2018: 32.151679 million


Unnamed: 0,isco_code,2014,2015,2016,2017,2018,isco_minor_group,RISK_CAT
0,111,6.636,8.115,10.438,7.273,14.537,Legislators and senior officials,2
1,112,74.033,84.088,76.883,84.772,118.716,Managing directors and chief executives,2
2,121,503.408,538.177,652.108,690.44,662.359,Business services and administration managers,2


# 1. Redistribute workers from 3-digit ISCO to ESCO occupations


In [3]:
# Distribute equally the number of workers across all lower level occupations
occupations_employment = occ.copy()

# Note: We only do this for the 
occupations_employment = occupations_employment[occupations_employment.is_top_level==True]

occupations_employment['employment_share'] = np.nan
occupations_employment['employment_count'] = np.nan

for j, row in lfs_estimates.iterrows():
    occ_rows =  occupations_employment[occupations_employment.isco_level_3==row.isco_code].id.to_list()
    occupations_employment.loc[occ_rows, 'employment_share'] = (row[year]/n_total) / len(occ_rows)
    occupations_employment.loc[occ_rows, 'employment_count'] = row[year] / len(occ_rows)

occupations_employment['employment_count'] = np.round(occupations_employment['employment_count'] * 1000)

# Check that we're still in the right range of total employment
print(occupations_employment.employment_count.sum())


32151777.0


In [4]:
# Sanity check
print(occupations_employment.employment_share.sum())

0.9999999999999999


One can compare this estimate to the Office for National Statistics data (e.g. see [here](https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/timeseries/mgrz/lms)).

In [5]:
# Largest ESCO occupations
occupations_employment.sort_values('employment_count', ascending=False).head(10)

Unnamed: 0,id,concept_type,concept_uri,preferred_label,isco_level_1,isco_level_2,isco_level_3,isco_level_4,is_top_level,is_second_level,is_third_level,is_fourth_level,parent_occupation_id,top_level_parent_id,employment_share,employment_count
1134,1134,Occupation,http://data.europa.eu/esco/occupation/5efb2c08...,secondary school teacher,2,23,233,2330,True,False,False,False,,1134,0.011984,385291.0
1933,1933,Occupation,http://data.europa.eu/esco/occupation/a30734bb...,hospital porter,5,53,532,5329,True,False,False,False,,1933,0.007772,249877.0
1339,1339,Occupation,http://data.europa.eu/esco/occupation/7005c266...,sterile services technician,5,53,532,5329,True,False,False,False,,1339,0.007772,249877.0
243,243,Occupation,http://data.europa.eu/esco/occupation/1391bdf9...,home care aide,5,53,532,5322,True,False,False,False,,243,0.007772,249877.0
2486,2486,Occupation,http://data.europa.eu/esco/occupation/d38c2107...,healthcare assistant,5,53,532,5321,True,False,False,False,,2486,0.007772,249877.0
1133,1133,Occupation,http://data.europa.eu/esco/occupation/5ee264b2...,phlebotomist,5,53,532,5329,True,False,False,False,,1133,0.007772,249877.0
1290,1290,Occupation,http://data.europa.eu/esco/occupation/6c999fc7...,office clerk,4,41,411,4110,True,False,False,False,,1290,0.005963,191719.0
658,658,Occupation,http://data.europa.eu/esco/occupation/36494988...,advanced nurse practitioner,2,22,222,2221,True,False,False,False,,658,0.005258,169057.0
303,303,Occupation,http://data.europa.eu/esco/occupation/18e14e61...,specialist nurse,2,22,222,2221,True,False,False,False,,303,0.005258,169057.0
1695,1695,Occupation,http://data.europa.eu/esco/occupation/8d3e8aaa...,nurse responsible for general care,2,22,222,2221,True,False,False,False,,1695,0.005258,169057.0


## 1.1 Check null values

Note that some ISCO three-digit codes have been omitted from the EU LFS results. Hence, some 'top level' ESCO occupations will not have an employment estimate.

In [6]:
occupations_employment.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1701 entries, 1 to 2941
Data columns (total 16 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   id                    1701 non-null   int64  
 1   concept_type          1701 non-null   object 
 2   concept_uri           1701 non-null   object 
 3   preferred_label       1701 non-null   object 
 4   isco_level_1          1701 non-null   int64  
 5   isco_level_2          1701 non-null   int64  
 6   isco_level_3          1701 non-null   int64  
 7   isco_level_4          1701 non-null   int64  
 8   is_top_level          1701 non-null   bool   
 9   is_second_level       1701 non-null   bool   
 10  is_third_level        1701 non-null   bool   
 11  is_fourth_level       1701 non-null   bool   
 12  parent_occupation_id  0 non-null      float64
 13  top_level_parent_id   1701 non-null   int64  
 14  employment_share      1627 non-null   float64
 15  employment_count     

# 2. Export 

In [7]:
occupations_employment[[
    'id', 'concept_uri', 'preferred_label', 'isco_level_3', 'isco_level_4', 'is_top_level',
    'employment_share', 'employment_count']].to_csv(
    data_folder + 'processed/linked_data/ESCO_top_occupations_UK_employment.csv', index=False)
