Last updated: 14th October 2020.
Licence.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This Github repo contains supplementary material for Nesta's report on "Developing experimental estimates of skill demand", produced as part of a research project funded by the Economics Statistics Centre of Excellence (ESCoE).
Stef Garasto (Nesta), Jyldyz Djumalieva (Nesta), Karlis Kanders (Nesta), Rachel Wilcock (Nesta) and Cath Sleeman (Nesta).
This repository contains three folders.
One ('report_pdf') has a pdf copy of the report. This is a copy of the version submitted for the ESCoE discussion paper, but it has not been reviewed yet. If needed, this repository will be updated with a link to (or a copy of) the published version [ADD LINK].
The other two folders contain supplementary tables with the stock of vacancies and samples of the code used for the report. Because of legal restrictions, it is not possible to add any of the underlying online job adverts data. No data can be published.
In the folder 'tables_report' there are files storing the annual stock of vacancies covering a period of 5 years, from 2015 to 2019. The stock of vacancies is broken down by one variable per file (that is, only by location or only by industry). All the variables used for the breakdown are:
- Occupations (at all levels of granularity of the Standard Occupational Classification, SOC2010).
- Industry (broad categories from the Standard Industrial Classification, SIC 2007).
- Location (Travel To Work Areas).
- Skill categories (at all levels of Nesta's skills taxonomy).
The stock of vacancies by Travel To Work Areas (TTWAs) is given as the stock of vacancies normalised by 100 economically active residents (source: Annual Population Survey) and estimates are shown only for TTWAs with at least 40,000 economically active residents aged 16 and over. For all the other variables the estimates are expressed as percentages (they sum up to 100 for each year).
Each file is provided in two formats:
- '.csv' - machine readable format. The first column contains the value of the breakdown variable, then there is one column per year.
- '.xlsx' - less machine readable, but contains more human readable information on the data provided.
The dataset of online job adverts on which these estimates are based was provided by Textkernel.
In the folder 'sample_code' there are pieces of code underlying key elements of the methods used to produced the estimates of skill demand (more details can be found in the report). Specifically, we provide code showing how we:
- Built a crosswalk from SOC2010 to SIC2007 (
01-SIC-REW.ipynb
). - Computed the most representative skill cluster for each job advert (
compute_top_clusters.py
). - Converted the flow of job adverts into a stock (
flow_to_stock_funcs.py
andflow_to_stock_model_by_sic_averaged_Sep20.ipynb
). - Aligned the stock of online job adverts to the stock of ONS vacancies from the ONS Vacancy survey (
flow_to_stock_model_by_sic_averaged_Sep20.ipynb
).
The code is provided "as is".