This repository includes the LyCon dataset, introduced in our paper LyCon: LYRICS RECONSTRUCTION FROM THE BAG-OF-WORDS USING LARGE LANGUAGE MODELS, as well as a simple Python code to obtain statistical information about the dataset.
You can consider using this dataset for your project, if you're looking for copyright-free lyrics, with mood or genre annotations.
To obtain mood annotations, please visit the Deezer Mood Detection Dataset repository. There, you can find mood annotations corresponding to SID. We will provide sample code soon. If you use these annotations, please make sure to cite their paper.
You can access genre annotations from this link. We will also provide sample code for this soon. If you use these annotations, please remember to cite their paper.
If you find this dataset useful for your research, please consider citing our paper.
@misc{lycon,
title={LyCon: Lyrics Reconstruction from the Bag-of-Words Using Large Language Models},
author={Haven Kim and Kahyun Choi},
year={2024},
eprint={2408.14750},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2408.14750},
}