This repository releases a source code to pre-process the data files published together with the paper Melon Playlist Dataset: a public dataset for audio-based playlist generation and music tagging. by Andres Ferraro, Yuntae Kim, Soohyeon Lee, Biho Kim, Namjun Jo, Semi Lim, Suyon Lim, Jungtaek Jang, Sehwan Kim, Xavier Serra, Dmitry Bogdanov.
After having clone this repository with
git clone repo-name
we suggest creating e virtual environment install the required Python dependencies with the following commands
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Download the dataset provided form the public Melon link placing them in the
.original_dataset/
To measure all the results it is necessary to run the following command in the terminal
python preprocess_data_with_pandas.py
Note that the operation is time-consuming. We have released preprocess_dataset to speed up it.
The result files will be stored in
.melon/*
with the following formats:
dataset.tsv
hasplaylist_id [TAB] song_id [TAB] 1 [TAB] sequence-order
playlist_title.tsv
hasplaylist_id [TAB] title
- ... same pattern in the other files