Skip to content

Commit 9fd3dcd

Browse files
Merge pull request #4 from source-separation/datapolish
Polish data section
2 parents d818fce + af66042 commit 9fd3dcd

File tree

8 files changed

+48
-22
lines changed

8 files changed

+48
-22
lines changed

book/_static/myfile.css

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
body {
2+
font-family: system-ui;
3+
}

book/data/datasets.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4,17 +4,17 @@
44
## Overview
55
Here's a quick overview of existing datasets for Music Source Separation:
66

7-
| **Dataset** | **Year** | **Genre** | **Instrument categories** | **Tracks** | **Avgerage duration (s)** | **Full songs** | **Stereo** |
8-
| ---------- | -------- | --------- | ------------------------- | ---------- | ------------------------- | -------------- | ---------- |
9-
| [MASS](http://www.mtg.upf.edu/download/datasets/mass) | 2008 | ? | ? | 9 | 16 $\pm$ 7 || ✅️ |
10-
| [MIR-1K](https://sites.google.com/site/unvoicedsoundseparation/mir-1k) | 2010 | ? | 2 | 1,000 | 8 $\pm$ 8 |||
11-
| [QUASI](http://www.tsi.telecom-paristech.fr/aao/en/2012/03/12/quasi/) | 2011 | ? | ? | 5 | 206 $\pm$ 21 |||
12-
| [ccMixter](http://www.loria.fr/~aliutkus/kam/) | 2014 | ? | ? | 50 | 231 $\pm$ 77 |||
13-
| [MedleyDB](http://medleydb.weebly.com/) | 2014 | ? | 82 | 63 | 206 $\pm$ 121 |||
14-
| [iKala](http://mac.citi.sinica.edu.tw/ikala/) | 2015 | ? | 2 | 206 | 30 |||
15-
| [DSD100](/datasets/dsd100.md)| 2015 | ? | 4 | 100 | 251 $\pm$ 60 |||
16-
| [MUSDB18](https://sigsep.github.io/datasets/musdb.html) | 2017 | ? | 4 | 150 | 236 $\pm$ 95 |||
17-
| [Slakh2100](http://www.slakh.com/) | 2019 | ? | 34 | 2100 | ? || ? |
7+
| **Dataset** | **Year** | **Instrument categories** | **Tracks** | **Avgerage duration (s)** | **Full songs** | **Stereo** |
8+
| ---------- | -------- | ------------------------- | ---------- | ------------------------- | -------------- | ---------- |
9+
| [MASS](http://www.mtg.upf.edu/download/datasets/mass) | 2008 | N/A | 9 | 16 $\pm$ 7 || ✅️ |
10+
| [MIR-1K](https://sites.google.com/site/unvoicedsoundseparation/mir-1k) | 2010 | N/A | 1,000 | 8 $\pm$ 8 |||
11+
| [QUASI](http://www.tsi.telecom-paristech.fr/aao/en/2012/03/12/quasi/) | 2011 | N/A | 5 | 206 $\pm$ 21 |||
12+
| [ccMixter](http://www.loria.fr/~aliutkus/kam/) | 2014 | N/A | 50 | 231 $\pm$ 77 |||
13+
| [MedleyDB](http://medleydb.weebly.com/) | 2014 | 82 | 63 | 206 $\pm$ 121 |||
14+
| [iKala](http://mac.citi.sinica.edu.tw/ikala/) | 2015 | 2 | 206 | 30 |||
15+
| [DSD100](/datasets/dsd100.md)| 2015 | 4 | 100 | 251 $\pm$ 60 |||
16+
| [MUSDB18](https://sigsep.github.io/datasets/musdb.html) | 2017 | 4 | 150 | 236 $\pm$ 95 |||
17+
| [Slakh2100](http://www.slakh.com/) | 2019 | 34 | 2100 | 249 || |
1818
This extended table is based on: [SigSep/datasets](https://sigsep.github.io/datasets/), and reproduced with permission.
1919

2020
<!--- | [MUSDB18-HQ](https://sigsep.github.io/datasets/musdb.html) | 2019 | ? | ? | 150 | 236 $\pm$ 95 | ✅ | ✅ |) # omitted since almost identical to MUSDB18 --->

book/data/introduction.md

Lines changed: 26 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,21 +2,34 @@
22
# Introduction
33

44
In this chapter we'll cover the key aspects we need to know about data for source separation: what do data for source
5-
separation look like, relevant datasets and, importantly, how to programatically generate training and evaluation data
6-
to minimize the time we spend data wrangling and maximize the performance we can squeeze out of our data.
5+
separation look like, relevant datasets and, importantly, how to programatically generate training data (mixtures)
6+
in a way that's efficient, reproducible, and maximizes the performance we can squeeze out of our data.
77

88
## Data for music source separation
99

10-
The inputs and outputs of source separation model look like this:
10+
At a high level, the inputs and outputs of a source separation model look like this:
1111

12-
PLACEHOLDER: image showing mixture --> model --> stems
12+
```{figure} ../images/data/source_separation_io.png
13+
---
14+
height: 300px
15+
name: fig-sourcesepio
16+
---
17+
Inputs and outputs of a source separation model.
18+
```
1319

1420
For this tutorial, we will assume the inputs and outputs were created in the following way:
1521
1. Each instrument or voice is recorded in isolation into a separate audio track, called a "stem". The stem may be
1622
processed with effects such as compression, reverb, etc.
1723
2. The mixture is obtained by summing the processed stems.
18-
19-
PLACEHOLDER: diagram of simplified mixing process
24+
3. The model takes the mixture as input and outputs its estimate of each stem
25+
26+
```{figure} ../images/data/music_mixing.png
27+
---
28+
height: 300px
29+
name: fig-mixing
30+
---
31+
Mixing stems to produce a mixture (mix).
32+
```
2033

2134
```{note}
2235
This is a simplified view of music creation. In practice, the mixture (musicians refer to this as the *mix*) typically
@@ -31,7 +44,13 @@ as input, the model outputs the estimated stems, and we compare these to the ori
3144
mixture. The difference between the estimated stems and the original stems is used to update the model parameters during
3245
training:
3346

34-
PLACEHOLDER: block diagram of training
47+
```{figure} ../images/data/source_separation_training.png
48+
---
49+
height: 300px
50+
name: fig-training
51+
---
52+
High-level diagram of training a source separation model.
53+
```
3554

3655
The difference between the estimated stems and original stems is also used to *evaluate* a trained source separation model,
3756
as we shall see later on.

book/data/scaper.ipynb

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1293,7 +1293,13 @@
12931293
"\n",
12941294
"That's because we just generated an *incoherent mixture*, i.e., a mixture where the stems are not necessarily from the same song, and even if they are, they are not necessarily temporally aligned:\n",
12951295
"\n",
1296-
"PLACEHOLDER FOR INCOHERENT MIXTURE GRAPHIC\n",
1296+
"```{figure} ../images/data/incoherent_vs_coherent_mixing.png\n",
1297+
"---\n",
1298+
"height: 400px\n",
1299+
"name: fig-incoherent_vs_coherent_mixing\n",
1300+
"---\n",
1301+
"Incoherent mixing vs coherent mixing.\n",
1302+
"```\n",
12971303
"\n",
12981304
"We can verify this by listening to the individual stems:"
12991305
]
@@ -1430,12 +1436,10 @@
14301436
"source": [
14311437
"## Coherent mixing\n",
14321438
"\n",
1433-
"To generate cohernet mixtures, we need to ensure that:\n",
1439+
"To generate cohernet mixtures (cf. {ref}`fig-incoherent_vs_coherent_mixing`), we need to ensure that:\n",
14341440
"1. All stem source files belong to the same song\n",
14351441
"2. We use the same time offset for sampling all source files (i.e., same `source_time`)\n",
14361442
"\n",
1437-
"PLACEHOLDER FOR COHERENT MIXING DIAGRAM\n",
1438-
"\n",
14391443
"Let's see how this is done. The following code will:\n",
14401444
"1. Define a random seed\n",
14411445
"2. Create a Scaper object\n",
222 KB
Loading

book/images/data/music_mixing.png

193 KB
Loading
181 KB
Loading
195 KB
Loading

0 commit comments

Comments
 (0)