Solution for the 2018 Spotify RecSys Challenge
Solution for the 2018 Spotify RecSys Challenge by the team Definitive Turtles

How to run

  • place the million playlist dataset json files into the data_raw/million_playlist_dataset and the challenge set into the data_raw/challenge_set directories
  • The scripts 5-6-7 contain a variable named threads; set this as desired
  • run the python files 1-6 without parameters
  • run the script with parameters 0-4 (e.g. for i in $(seq 0 4); do python $i; done)
  • run python
  • run python output/merged.csv output/submission.csv


  • Python 3.5 with standard scientific packages (pandas, numpy, scipy, etc.)
  • 16gb of RAM
  • About 40gb free space

Reference environment

We ran the models using the following python version and packages:

Python 3.5.2 (we used the conda environment)
pandas 0.22.0
numpy 1.14.0
matplotlib 2.0.2
scipy 1.0.0
