This repository provides the code for music playlist title generation model. For details, please refer to our paper Music Playlist Title Generation Using Artist Information. (This paper was accepted to AAAI-23 Workshop on Creative AI Acorss Modalities)
Melon Playlist Dataset : Download song_meta.json, test.json, train.json, and val.json from the official Melon Playlist Dataset webpage and place them in ./dataset/melon/data
as shown below.
- dataset
- melon
- data
- song_meta.json
- test.json
- train.json
- val.json
- data
- melon
Million Playlist Dataset : Download spotify_million_playlist_dataset.zip file from this webpage and unzip the file. Place mpd.slice.0-999.json, ..., mpd.slice.999000-999999.json in ./dataset/million/data
as shown below.
- dataset
- million
- data
- mpd.slice.0-999.json
- .
- .
- .
- mpd.slice.999000-999999.json
- data
- million
###Preprocessed data is available here
(If you downloaded preprocessed data, skip this part.)
To filter out noisy data, as suggested in the section 3 of the paper, run the following code.
$ python preprocessing.py
We provide the following parameters.
-
--dataset
: to choose between the Melon Playlist Dataset and the Million Playlist Dataset. E.g. "melon", "million" -
--dataset_dir
: to set the directory where the data is stored. Default:"./dataset"; (This means that the train, valid and test sets are stored in./dataset/{dataset_name}/sets
and the tokenizers are stored in./dataset/{dataset_name}/tokenizer
Run the following code to check statistics. (Figures are stored in ./figure
.)
$ python statistics.py
To train the model on preprocessed training data, run the following code. (Models are stored in ./checkpoint
.)
$ python train.py
--input_type
: "artist" for artist ID embedding and "track" for track ID embedding.
Run the following code to draw inferences. Each case is evaluated with the following metrics: BLEU( ./inference/{checkpoint_name}
.
$ python infer.py
Note : test_file_name
is set to test
by default, but set it to highest_ft
, lowest_ft
, highest_fa
, lowest_fa
to evaluate on the highest
Run the following code to get the Negative Log-Likelihood(NLL) on the test set. The result is saved in ./inference
.
$ python test.py
This repository includes code from the following repositories with modifications:
Please cite our paper if you use this code in your work:
@InProceedings{kim2023,
itle={Music Playlist Title Generation Using Artist Information},
author={Kim, Haven and Doh, Seungheon and Lee, Junwon and Nam, Juhan},
journal={AAAI-23 workshop on Creative AI Across Modalities},
year={2023}
}