GitHub - ronaldahmed/labor-market-demand-analysis: peruvian labor market demand analysis using natural language processing and machine learning tools

This repository contains the necesary code and data to reproduce the study presented in:

R.A. Cardenas, K.S. Bello. "Labor market demand analysis for engineering majors in Peru using Shallow Parsing and Topic Modeling". In Poster Session of the Machine Learning Summer School Kyoto 2015, Kyoto, Japan.

The dataset used for this study consisted of more than 200000 job ads extracted from several job hunting websites in Peru. Data for other Latin American countries is available as well, although not included in the analysis.

Each dataset used in the models used is available here or upon request, and explained below.

Tokenized job ads: more than 900k job ads extracted from Latin American websites. The NLTK tokenizer was extended to capture technical words typical of these kind of advertisement (check the preprocessing folder). [available upon request]
Shallow Parsing models [annotated data.zip]: Consisting of 800 job ads, each one tokenized and manually annotated with POS tag information (EAGLE format for Spanish data) and Entity Label in BIO format. The Object_id (MongoDB primary key) of each job ad is listed in the file [annot_data_database_ids].
Topic models: Consisting of nearly 9000 job ads sampled from the database, tokenized and filtered from low-frequency words and tokens of no interest (phone numbers, salary, office hours, emails, urls). Then, the shallow parsers extract the relevant phrases.
- [filtered_FULL_TEXT_data.zip]: Tokenized and filtered complete ads.
- [filtered_CHUNKS_TEXT_data.zip]: Text extracted by the shallow parsers from tokenized and filtered complete ads.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
preprocessing		preprocessing
results		results
rule based major_extractor		rule based major_extractor
shallow parsing models		shallow parsing models
topic models		topic models
topic-browser		topic-browser
LICENSE		LICENSE
README.md		README.md
annot_data_database_ids		annot_data_database_ids
annotated data.zip		annotated data.zip
filtered_CHUNKS_TEXT_data.zip		filtered_CHUNKS_TEXT_data.zip
filtered_FULL_TEXT_data.zip		filtered_FULL_TEXT_data.zip
labor-market-analysis-tm-sp.pdf		labor-market-analysis-tm-sp.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

ronaldahmed/labor-market-demand-analysis

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages