WuYun: Exploring hierarchical skeleton-guided melody generation using knowledge-enhanced deep learning

Mentors: Kejun Zhang*, Tan Xu, Lingyun Sun
Authors: Xinda Wu*, Tieyao Zhang, Zhijie Huang, Liang Qihao, and Songruoyao Wu

∗ Equal contribution

WuYun （悟韵）：Paper arXiv | Demo Page | ...

Official PyTorch implementation of preprint paper "WuYun: Exploring hierarchical skeleton-guided melody generation using knowledge-enhanced deep learning" (Updated, Version3, Add Chord Tones Analysis, 202402).

Intro

WuYun (悟韵), is a knowledge-enhanced deep learning architecture for improving the structure of generated melodies. Inspired by the hierarchical organization principle of structure and prolongation, we decompose the melody generation process into melodic skeleton construction and melody inpainting stages, which first generate the most structurally important notes to construct a melodic skeleton and subsequently infill it with dynamically decorative notes into a full-fledged melody. Specifically, we introduce a melodic skeleton extraction framework from rhythm and pitch dimensions based on music domain knowledge to help the sequence learning model hallucinate or predict a novel melodic skeleton. The reconstructed melodic skeletons serve as additional knowledge to provide auxiliary guidance for the melody generation process and are saved as the underlying framework of the final generated melody.

Architecture of WuYun.

Installation

Clone this repository

cd /WuYun-Torch

Dependencies (Ours)

NVIDIA GPU + CUDA + CUDNN
python 3.8.5
Required packages:
- miditoolkit
- torch 2.0.1
- others...(install what your missing)

Data Preprocessing

core code: ./preprocessing/mdp_wuyun.py
doc: ./preprocessing/README.md

Core functions:

Select 4/4 ts ( requirement >= 8 bars )
Track Classification (midi-miner): lead melody, chord, bass, drum, and others.
MIDI Quantization (straight notes and triplets) (WuYun)
Octave Transposition
Filter midis by heuristic rules
Deduplication (pitch interval)
~~Chord Recognition (Magenta)~~
~~Tonality Unification (WuYun)~~
...

Note: For the detailed melody data processing procedure, please refer to WuYun and MelodyGLM

Melodic Skeleton Extraction

Extract the type of melody skeleton you need using class Melody_Skeleton_Extractor in code dir ./preprocessing/utils/melodic_skeleton
Type means the type of melodic skeleton (proportion of all the notes).

No.	Type	Ratio	Code
0	Down Beat	~39.79%	melodic_skeleton_analysis_rhythm.py
1	Long Note	~22.13%	melodic_skeleton_analysis_rhythm.py
2	Rhythm	~44.49%	melodic_skeleton_analysis_rhythm.py
3	Rhythm ∩ Chord Tones ∩ Tonal Tones	~14.76%	melodic_skeleton.py
4	Rhythm ∩ Chord Tones	~35.24%	melodic_skeleton.py
5	Rhythm ∩ Tonal Tones	~17.6%	melodic_skeleton.py
6	Syncopation	~8.7%	melodic_skeleton_analysis_rhythm.py
7	Tonal Tones	~28.46%	melodic_skeleton_analysis_tonal_tones.py

For the latest version of the popular music melodic skeleton extraction algorithm, please refer to the code.

WuYun Framework

Stage1 - Melodic Skeleton Construction (旋律骨架构建)

1. build dictionary

# prepare your chord vocabulary (optional)
python3 dataset/statistic.py

# build your pre-defined vocabulary
python3 modules/build_dictionary.py

2. tokenization

python3 models/skeleton/dataloader.py

3. train skeleton generation model

# if you want to use other kind of melodic skeleton, just change the type number according to your datasets
# for example
python3 models/skeleton/main.py --type 4 --gpu_id 4   # 'Rhythm ∩ Chord'

4. inference melodic skeleton from scratch
Note: Objective metrics don't directly reflect subjective results, so try a few more model checkpoint after the model converges.

# for example
python3 models/skeleton/inference.py --type 4 --gpu_id 2 --ckpt_fn 'ckpt_epoch_400.pth.tar' --epoch 400

Stage2 - Melodic Prolongation Realization (旋律延长/装饰实现)

1. tokenization

python3 models/prolongation/dataloader.py

2. train melodic prolongation model

# for example
python3 models/prolongation/main.py --type 4 --gpu_id 8   # 'Rhythm ∩ Chord'

3. inference from real melodic skeletons（基于人类音乐的旋律骨架完成装饰）

# for example
python3 models/prolongation/inference_real.py --type 4 --gpu_id 0 --ckpt_fn 'ckpt_epoch_25.pt' --epoch '25'

4. inference from generated melodic skeletons （基于AI生成的旋律骨架完成装饰）

# for example
python3 models/prolongation/inference_scratch.py --type 4 --gpu_id 0 --ckpt_fn 'ckpt_epoch_25.pt' --pro_epoch '25' --ske_epoch '400'

Evaluation

Evaluation Metrics list:

OA(PCH)
OA(IOI)
SE

code dir: './evaluation'

Add Accompaniment

you can write chord and bass tracks if the task is melody geration with chord progression.

python3 utils/add_chord_bass_track.py

WuYun System Design (close beta test)

Wuyun System.

Citation

@article{zhang2023wuyun,
  title={WuYun: Exploring hierarchical skeleton-guided melody generation using knowledge-enhanced deep learning},
  author={Zhang, Kejun and Wu, Xinda and Zhang, Tieyao and Huang, Zhijie and Tan, Xu and Liang, Qihao and Wu, Songruoyao and Sun, Lingyun},
  journal={arXiv preprint arXiv:2301.04488},
  year={2023}
}

@article{wu2023melodyglm,
  title={MelodyGLM: Multi-task Pre-training for Symbolic Melody Generation},
  author={Wu, Xinda and Huang, Zhijie and Zhang, Kejun and Yu, Jiaxing and Tan, Xu and Zhang, Tieyao and Wang, Zihao and Sun, Lingyun},
  journal={arXiv preprint arXiv:2309.10738},
  year={2023}
}

Acknowledgement

We appreciate to the following authors who make their code available or provide technical support:

Music Transformer: https://github.com/gwinndr/MusicTransformer-Pytorch
Compound Word Transformer: https://github.com/YatingMusic/compound-word-transformer
Melons: Yi Zou.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
configs		configs
dataset		dataset
evaluation		evaluation
img		img
models		models
modules		modules
preprocessing		preprocessing
utils		utils
.gitignore		.gitignore
README.md		README.md

NEXTLab-ZJU/wuyun

Folders and files

Latest commit

History

Repository files navigation