music_segmentation

282 final project link to SALAMI data (https://github.com/DDMAL/salami-data-public)

Project timeline: May 5th Project due

Todos:

Data collection By 19th Apr (Week 1 Fri)

Set up cloud instance [Nami]
Download audio files from SALAMI [Jon]
Download labels and mark boundary on 4-second window [Jinlin]
Transform audio files to spectrogram, format it to vector form each frame [Jinlin]
Feed through an identity map to check

Building network architecture By 21st Apr (Week 1 Sun)

LSTM
Attention (encoder-only self-attention?)

Training By 26th Apr (Week 2 Fri)
Evaluation By 4rd May (Week 3 Fri)

Distance metric (BLEU-like precision metric: overlap with ground truth/prediction sequence length) for first model (boundary or not-boundary)
Cross entropy loss for second model (label/prediction discrepency)

Notes on network structure

Two different models: one model to predict where the boundary of section is (a pair of timestamp: start and end) to segment the music into sections; the subsequent model to classify each section as one of the section label.

We need to train these two models sequentially because we need a good enough boundary-predicting model in order to go about section classification.

First model (boundary model)

input: spectrogram at each time step output: (start, end) - two timestamp of where boundary starts and ends label: (start, end) - computed from 4-second-window centered at the section onset time (e.g. data looks like 1:11 Bridge, 2:40 Chorus; boundary is (1:09, 1:13), (2:38, 2:42))

Second model (section model) - sequential version

input: spectrogram at each time step, if frames fall into previously predicted boundary range, replace it with a delimiter/token vector representation (ASK*)* intermediate hidden state: representation of the section is captured using the last time step activation before any boundary delimiter; output: each section representation generates a prediction of what that section is. label: section label

Questions:

What is a good size for frame/time step of spectrogram?

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
SALAMI		SALAMI
salami-data-public		salami-data-public
sequential_model		sequential_model
.DS_Store		.DS_Store
Audio to Mel Spectrogram.ipynb		Audio to Mel Spectrogram.ipynb
Bi-LSTM.ipynb		Bi-LSTM.ipynb
Label_and_Spectrogram_Padding_and_Reshape.ipynb		Label_and_Spectrogram_Padding_and_Reshape.ipynb
README.md		README.md
model.ipynb		model.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SALAMI

SALAMI

salami-data-public

salami-data-public

sequential_model

sequential_model

.DS_Store

.DS_Store

Audio to Mel Spectrogram.ipynb

Audio to Mel Spectrogram.ipynb

Bi-LSTM.ipynb

Bi-LSTM.ipynb

Label_and_Spectrogram_Padding_and_Reshape.ipynb

Label_and_Spectrogram_Padding_and_Reshape.ipynb

README.md

README.md

model.ipynb

model.ipynb

Repository files navigation

music_segmentation

Todos:

Notes on network structure

Questions:

About

Releases

Packages

Contributors 2

Languages

gdomnijl/music_segmentation

Folders and files

Latest commit

History

Repository files navigation

music_segmentation

Todos:

Notes on network structure

Questions:

About

Resources

Stars

Watchers

Forks

Languages