A system for generating emotionally and temporally synchronized music from video content.
Paper | Samples and Colab demo | Video presentation
If you use this work in your research, please consider citing our work:
S. Sulun, P. Viana, and M. E. P. Davies, “Video Soundtrack Generation by Aligning Emotions and Temporal Boundaries,” IEEE Transactions on Multimedia, 2026.
Install required Python libraries:
pip install -r requirements.txtGenerate music for your video using pre-trained models (automatically downloads required models):
python inference.py --input_path sample.mp4 --output_path output/sample_output.mp4Download and extract the Lakh Pianoroll 5 full (LPD-5-full) dataset:
- Homepage: https://hermandong.com/lakh-pianoroll-dataset/dataset
- Direct download: https://ucsdcloud-my.sharepoint.com/:u:/r/personal/h3dong_ucsd_edu/Documents/data/lpd/lpd_5/lpd_5_full.tar.gz?csf=1&web=1&e=sPANiy
# Extract the downloaded dataset
tar -xzf lpd_5_full.tar.gzpython -m midi.src.data.preprocess --input_dir lpd_5/lpd_5_full --output_dir lpd_5/processedTrain the music generation model (check config.py for hyperparameters):
python -m midi.src.train --data_dir lpd_5/processedNavigate to evaluation directory and download evaluation datasets:
cd evaluation/data./download_emomv.shpython download_ads.pyRun inference on all evaluation datasets:
cd ..
./run_inference_on_datasets.shMeasure synchronization between generated music and video content:
python get_av_alignment.pyEvaluate emotional consistency between video and generated music:
python get_kl_divergence.pyAnalyze survey results:
python analyze_surveys.py