Publication of the code we used in the RecSys Challenge 2018.
  • The code was tested with miniconda3 (python3.6.5)
  • All neccessary libraries can be installed via requirements.txt
pip install -r requirements.txt


  • Place the playlist JSON files in the data/original and the challenge set in the data/online directory
  • Run the python scripts in the following order
    • (Combines and converts the files into CSV with int ids for tracks, artists, and albums)
    • (Converts the challenge set file into CSV while mapping the URIs to our int ids)
    • (Optional, uses the libary spotipy to collect additional meta data for all tracks in the dataset)
      • USER, CLIENT_ID, and CLIENT_SECRET have to be adjusted in the file.
    • (Creates a sample of 50k random playlists with a test set of 500 playlists)
    • (Contains the code to individualy compute the predictions for all employed methods for the 50k sample)
    • (Combines all the individual solutions in our hybrid approach for the 50k sample)
      • The creation of our "creative" solution is commented out as the crawling is time consuming and marked as optional.
    • (Converts our solution format to the official submission format)
  • Most of the scripts defines FOLDER_TRAIN and FOLDER_TEST along with other important parameters in the head of the file
    • To reproduce our final submissions change those folders to 'data/data_formatted/' and 'data/online/', respectively.
    • As mentioned above, the creative solutions is commented out and can be included in line 53 of once the metadata is fully craweled.