Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions book/_static/myfile.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
body {
font-family: system-ui;
}
22 changes: 11 additions & 11 deletions book/data/datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,17 @@
## Overview
Here's a quick overview of existing datasets for Music Source Separation:

| **Dataset** | **Year** | **Genre** | **Instrument categories** | **Tracks** | **Avgerage duration (s)** | **Full songs** | **Stereo** |
| ---------- | -------- | --------- | ------------------------- | ---------- | ------------------------- | -------------- | ---------- |
| [MASS](http://www.mtg.upf.edu/download/datasets/mass) | 2008 | ? | ? | 9 | 16 $\pm$ 7 | ❌ | ✅️ |
| [MIR-1K](https://sites.google.com/site/unvoicedsoundseparation/mir-1k) | 2010 | ? | 2 | 1,000 | 8 $\pm$ 8 | ❌ | ❌ |
| [QUASI](http://www.tsi.telecom-paristech.fr/aao/en/2012/03/12/quasi/) | 2011 | ? | ? | 5 | 206 $\pm$ 21 | ✅ | ✅ |
| [ccMixter](http://www.loria.fr/~aliutkus/kam/) | 2014 | ? | ? | 50 | 231 $\pm$ 77 | ✅ | ✅ |
| [MedleyDB](http://medleydb.weebly.com/) | 2014 | ? | 82 | 63 | 206 $\pm$ 121 | ✅ | ✅ |
| [iKala](http://mac.citi.sinica.edu.tw/ikala/) | 2015 | ? | 2 | 206 | 30 | ❌ | ❌ |
| [DSD100](/datasets/dsd100.md)| 2015 | ? | 4 | 100 | 251 $\pm$ 60 | ✅ | ✅ |
| [MUSDB18](https://sigsep.github.io/datasets/musdb.html) | 2017 | ? | 4 | 150 | 236 $\pm$ 95 | ✅ | ✅ |
| [Slakh2100](http://www.slakh.com/) | 2019 | ? | 34 | 2100 | ? | ✅ | ? |
| **Dataset** | **Year** | **Instrument categories** | **Tracks** | **Avgerage duration (s)** | **Full songs** | **Stereo** |
| ---------- | -------- | ------------------------- | ---------- | ------------------------- | -------------- | ---------- |
| [MASS](http://www.mtg.upf.edu/download/datasets/mass) | 2008 | N/A | 9 | 16 $\pm$ 7 | ❌ | ✅️ |
| [MIR-1K](https://sites.google.com/site/unvoicedsoundseparation/mir-1k) | 2010 | N/A | 1,000 | 8 $\pm$ 8 | ❌ | ❌ |
| [QUASI](http://www.tsi.telecom-paristech.fr/aao/en/2012/03/12/quasi/) | 2011 | N/A | 5 | 206 $\pm$ 21 | ✅ | ✅ |
| [ccMixter](http://www.loria.fr/~aliutkus/kam/) | 2014 | N/A | 50 | 231 $\pm$ 77 | ✅ | ✅ |
| [MedleyDB](http://medleydb.weebly.com/) | 2014 | 82 | 63 | 206 $\pm$ 121 | ✅ | ✅ |
| [iKala](http://mac.citi.sinica.edu.tw/ikala/) | 2015 | 2 | 206 | 30 | ❌ | ❌ |
| [DSD100](/datasets/dsd100.md)| 2015 | 4 | 100 | 251 $\pm$ 60 | ✅ | ✅ |
| [MUSDB18](https://sigsep.github.io/datasets/musdb.html) | 2017 | 4 | 150 | 236 $\pm$ 95 | ✅ | ✅ |
| [Slakh2100](http://www.slakh.com/) | 2019 | 34 | 2100 | 249 | ✅ | |
This extended table is based on: [SigSep/datasets](https://sigsep.github.io/datasets/), and reproduced with permission.

<!--- | [MUSDB18-HQ](https://sigsep.github.io/datasets/musdb.html) | 2019 | ? | ? | 150 | 236 $\pm$ 95 | ✅ | ✅ |) # omitted since almost identical to MUSDB18 --->
Expand Down
33 changes: 26 additions & 7 deletions book/data/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,34 @@
# Introduction

In this chapter we'll cover the key aspects we need to know about data for source separation: what do data for source
separation look like, relevant datasets and, importantly, how to programatically generate training and evaluation data
to minimize the time we spend data wrangling and maximize the performance we can squeeze out of our data.
separation look like, relevant datasets and, importantly, how to programatically generate training data (mixtures)
in a way that's efficient, reproducible, and maximizes the performance we can squeeze out of our data.

## Data for music source separation

The inputs and outputs of source separation model look like this:
At a high level, the inputs and outputs of a source separation model look like this:

PLACEHOLDER: image showing mixture --> model --> stems
```{figure} ../images/data/source_separation_io.png
---
height: 300px
name: fig-sourcesepio
---
Inputs and outputs of a source separation model.
```

For this tutorial, we will assume the inputs and outputs were created in the following way:
1. Each instrument or voice is recorded in isolation into a separate audio track, called a "stem". The stem may be
processed with effects such as compression, reverb, etc.
2. The mixture is obtained by summing the processed stems.

PLACEHOLDER: diagram of simplified mixing process
3. The model takes the mixture as input and outputs its estimate of each stem

```{figure} ../images/data/music_mixing.png
---
height: 300px
name: fig-mixing
---
Mixing stems to produce a mixture (mix).
```

```{note}
This is a simplified view of music creation. In practice, the mixture (musicians refer to this as the *mix*) typically
Expand All @@ -31,7 +44,13 @@ as input, the model outputs the estimated stems, and we compare these to the ori
mixture. The difference between the estimated stems and the original stems is used to update the model parameters during
training:

PLACEHOLDER: block diagram of training
```{figure} ../images/data/source_separation_training.png
---
height: 300px
name: fig-training
---
High-level diagram of training a source separation model.
```

The difference between the estimated stems and original stems is also used to *evaluate* a trained source separation model,
as we shall see later on.
Expand Down
12 changes: 8 additions & 4 deletions book/data/scaper.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1293,7 +1293,13 @@
"\n",
"That's because we just generated an *incoherent mixture*, i.e., a mixture where the stems are not necessarily from the same song, and even if they are, they are not necessarily temporally aligned:\n",
"\n",
"PLACEHOLDER FOR INCOHERENT MIXTURE GRAPHIC\n",
"```{figure} ../images/data/incoherent_vs_coherent_mixing.png\n",
"---\n",
"height: 400px\n",
"name: fig-incoherent_vs_coherent_mixing\n",
"---\n",
"Incoherent mixing vs coherent mixing.\n",
"```\n",
"\n",
"We can verify this by listening to the individual stems:"
]
Expand Down Expand Up @@ -1430,12 +1436,10 @@
"source": [
"## Coherent mixing\n",
"\n",
"To generate cohernet mixtures, we need to ensure that:\n",
"To generate cohernet mixtures (cf. {ref}`fig-incoherent_vs_coherent_mixing`), we need to ensure that:\n",
"1. All stem source files belong to the same song\n",
"2. We use the same time offset for sampling all source files (i.e., same `source_time`)\n",
"\n",
"PLACEHOLDER FOR COHERENT MIXING DIAGRAM\n",
"\n",
"Let's see how this is done. The following code will:\n",
"1. Define a random seed\n",
"2. Create a Scaper object\n",
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added book/images/data/music_mixing.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added book/images/data/source_separation_io.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added book/images/data/source_separation_training.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.