Skip to content

rob-kwiec/olx-jobs-recommendations

Repository files navigation

Jobs recommendations

This repository contains the implementation of:

  • several recommender system models appropriate for large-scale jobs recommendations,
  • a hyperparameter tuning,
  • evaluation metrics.

Currently implemented models:

Environment configuration

If you use conda, set-up conda environment with a kernel (working with anaconda3):

make ckernel

If you use virtualenv, set-up virtual environment with a kernel:

make vkernel

Then activate the environment:

source activate jobs-research

Steps to reproduce the results

Getting data

The input data file interactions.csv should be stored in the directory data/raw/your-dataset-name. For example, data/raw/jobs_published/interactions.csv. The file is expected to contain the following columns: user, item, event, timestamp.

To reproduce our results download the olx-jobs dataset from Kaggle.

Running

Execute the command:

python run.py

The script will:

  • split the input data,
  • run the hyperparameter optimization for all models,
  • train the models,
  • generate the recommendations,
  • evaluate the models.

Details about each step

By default script executes all aforementioned steps, namely:

--steps '["prepare", "tune", "run", "evaluate"]'
Step prepare

This step:

  • loads the raw interactions,
  • splits the interactions into the train_and_validation and test sets,
  • splits the train_and_validation set into train and validation sets,
  • prepares target_users sets for whom recommendations are generated,
  • saves all the prepared datasets.

Due to the large size of our dataset, we introduced additional parameters which enable us to decrease the size of the train and validation sets used in the hyperparameter tuning:

--validation_target_users_size 30000
--validation_fraction_users 0.2
--validation_fraction_items 0.2
Step tune

This step performs Bayesian hyperparameter tuning on the train and validation sets.
For each model, the search space and the tuning parameters are defined in the src/tuning/config.py file. The results of all iterations are stored.

Step run

This step, for each model:

  • loads the best hyperparameters (if available),
  • trains the model,
  • generates and saves recommendations,
  • saves efficiency metrics.
Step evaluate

This step, for each model:

  • loads stored recommendations,
  • evaluates them based on the implemented metrics,
  • displays and stores the evaluation results.

Notebooks

data

Notebooks to analyze the dataset structure and distribution.

models

Notebooks to demonstrate the usage of the particular models.

evaluation

Notebooks to better understand the results. They utilize recommendations and metrics generated during the execution of the run script.

About

Implementations of recommender systems for jobs domain

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published