source-separation
diff --git a/‎book/_static/myfile.css‎
Lines changed: 3 additions & 0 deletions b/‎book/_static/myfile.css‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎book/data/datasets.md‎
Lines changed: 11 additions & 11 deletions b/‎book/data/datasets.md‎
Lines changed: 11 additions & 11 deletions
diff --git a/‎book/data/introduction.md‎
Lines changed: 26 additions & 7 deletions b/‎book/data/introduction.md‎
Lines changed: 26 additions & 7 deletions
diff --git a/‎book/data/scaper.ipynb‎
Lines changed: 8 additions & 4 deletions b/‎book/data/scaper.ipynb‎
Lines changed: 8 additions & 4 deletions
diff --git a/‎book/images/data/incoherent_vs_coherent_mixing.png‎
222 KB b/‎book/images/data/incoherent_vs_coherent_mixing.png‎
222 KB
diff --git a/‎book/images/data/music_mixing.png‎
193 KB b/‎book/images/data/music_mixing.png‎
193 KB
diff --git a/‎book/images/data/source_separation_io.png‎
181 KB b/‎book/images/data/source_separation_io.png‎
181 KB
diff --git a/‎book/images/data/source_separation_training.png‎
195 KB b/‎book/images/data/source_separation_training.png‎
195 KB
@@ -0,0 +1,3 @@
+body {
+  font-family: system-ui;
+}
@@ -4,17 +4,17 @@
 ## Overview
 Here's a quick overview of existing datasets for Music Source Separation:
 
-| **Dataset** | **Year** | **Genre** | **Instrument categories** | **Tracks** | **Avgerage duration (s)** | **Full songs** | **Stereo** |
-| ----------  | -------- | --------- | ------------------------- | ---------- | ------------------------- | -------------- | ---------- |
-| [MASS](http://www.mtg.upf.edu/download/datasets/mass) | 2008 | ? | ? | 9 | 16 $\pm$ 7 | ❌ | ✅️ |
-| [MIR-1K](https://sites.google.com/site/unvoicedsoundseparation/mir-1k) | 2010 | ? | 2 | 1,000 | 8 $\pm$ 8 | ❌ | ❌ |
-| [QUASI](http://www.tsi.telecom-paristech.fr/aao/en/2012/03/12/quasi/) | 2011 | ? | ? | 5 | 206 $\pm$ 21 | ✅ | ✅ |
-| [ccMixter](http://www.loria.fr/~aliutkus/kam/)  | 2014 | ? | ? | 50 | 231 $\pm$ 77 | ✅ | ✅ |
-| [MedleyDB](http://medleydb.weebly.com/) | 2014 | ? | 82 | 63 | 206 $\pm$ 121 | ✅ | ✅ |
-| [iKala](http://mac.citi.sinica.edu.tw/ikala/)  | 2015 | ? |  2  | 206 | 30 | ❌ | ❌ |
-| [DSD100](/datasets/dsd100.md)| 2015 | ? | 4 | 100 | 251 $\pm$ 60 | ✅ | ✅ |
-| [MUSDB18](https://sigsep.github.io/datasets/musdb.html) | 2017 | ? | 4 | 150 | 236 $\pm$ 95 | ✅ | ✅ | 
-| [Slakh2100](http://www.slakh.com/) | 2019 | ? | 34 | 2100 | ? | ✅ | ? |  
+| **Dataset** | **Year** |  **Instrument categories** | **Tracks** | **Avgerage duration (s)** | **Full songs** | **Stereo** |
+| ----------  | -------- |  ------------------------- | ---------- | ------------------------- | -------------- | ---------- |
+| [MASS](http://www.mtg.upf.edu/download/datasets/mass) | 2008 | N/A | 9 | 16 $\pm$ 7 | ❌ | ✅️ |
+| [MIR-1K](https://sites.google.com/site/unvoicedsoundseparation/mir-1k) | 2010 | N/A | 1,000 | 8 $\pm$ 8 | ❌ | ❌ |
+| [QUASI](http://www.tsi.telecom-paristech.fr/aao/en/2012/03/12/quasi/) | 2011 | N/A | 5 | 206 $\pm$ 21 | ✅ | ✅ |
+| [ccMixter](http://www.loria.fr/~aliutkus/kam/)  | 2014 | N/A | 50 | 231 $\pm$ 77 | ✅ | ✅ |
+| [MedleyDB](http://medleydb.weebly.com/) | 2014 | 82 | 63 | 206 $\pm$ 121 | ✅ | ✅ |
+| [iKala](http://mac.citi.sinica.edu.tw/ikala/)  | 2015 |  2  | 206 | 30 | ❌ | ❌ |
+| [DSD100](/datasets/dsd100.md)| 2015 | 4 | 100 | 251 $\pm$ 60 | ✅ | ✅ |
+| [MUSDB18](https://sigsep.github.io/datasets/musdb.html) | 2017 | 4 | 150 | 236 $\pm$ 95 | ✅ | ✅ | 
+| [Slakh2100](http://www.slakh.com/) | 2019 | 34 | 2100 | 249 | ✅ | ❌ |  
 This extended table is based on: [SigSep/datasets](https://sigsep.github.io/datasets/), and reproduced with permission.
 
 <!--- | [MUSDB18-HQ](https://sigsep.github.io/datasets/musdb.html) | 2019 | ? | ? | 150 | 236 $\pm$ 95 | ✅ | ✅ |)  # omitted since almost identical to MUSDB18 --->
 
@@ -2,21 +2,34 @@
 # Introduction
 
 In this chapter we'll cover the key aspects we need to know about data for source separation: what do data for source 
-separation look like, relevant datasets and, importantly, how to programatically generate training and evaluation data 
-to minimize the time we spend data wrangling and maximize the performance we can squeeze out of our data.
+separation look like, relevant datasets and, importantly, how to programatically generate training data (mixtures)
+in a way that's efficient, reproducible, and maximizes the performance we can squeeze out of our data.
 
 ## Data for music source separation
 
-The inputs and outputs of source separation model look like this:
+At a high level, the inputs and outputs of a source separation model look like this:
 
-PLACEHOLDER: image showing mixture --> model --> stems
+```{figure} ../images/data/source_separation_io.png
+---
+height: 300px
+name: fig-sourcesepio
+---
+Inputs and outputs of a source separation model.
+```
 
 For this tutorial, we will assume the inputs and outputs were created in the following way:
 1. Each instrument or voice is recorded in isolation into a separate audio track, called a "stem". The stem may be 
 processed with effects such as compression, reverb, etc.
 2. The mixture is obtained by summing the processed stems.
-
-PLACEHOLDER: diagram of simplified mixing process 
+3. The model takes the mixture as input and outputs its estimate of each stem
+
+```{figure} ../images/data/music_mixing.png
+---
+height: 300px
+name: fig-mixing
+---
+Mixing stems to produce a mixture (mix).
+```
 
 ```{note}
 This is a simplified view of music creation. In practice, the mixture (musicians refer to this as the *mix*) typically 
@@ -31,7 +44,13 @@ as input, the model outputs the estimated stems, and we compare these to the ori
 mixture. The difference between the estimated stems and the original stems is used to update the model parameters during 
 training:
 
-PLACEHOLDER: block diagram of training
+```{figure} ../images/data/source_separation_training.png
+---
+height: 300px
+name: fig-training
+---
+High-level diagram of training a source separation model.
+```
 
 The difference between the estimated stems and original stems is also used to *evaluate* a trained source separation model,
 as we shall see later on.
 
@@ -1293,7 +1293,13 @@
     "\n",
     "That's because we just generated an *incoherent mixture*, i.e., a mixture where the stems are not necessarily from the same song, and even if they are, they are not necessarily temporally aligned:\n",
     "\n",
-    "PLACEHOLDER FOR INCOHERENT MIXTURE GRAPHIC\n",
+    "```{figure} ../images/data/incoherent_vs_coherent_mixing.png\n",
+    "---\n",
+    "height: 400px\n",
+    "name: fig-incoherent_vs_coherent_mixing\n",
+    "---\n",
+    "Incoherent mixing vs coherent mixing.\n",
+    "```\n",
     "\n",
     "We can verify this by listening to the individual stems:"
    ]
@@ -1430,12 +1436,10 @@
    "source": [
     "## Coherent mixing\n",
     "\n",
-    "To generate cohernet mixtures, we need to ensure that:\n",
+    "To generate cohernet mixtures (cf. {ref}`fig-incoherent_vs_coherent_mixing`), we need to ensure that:\n",
     "1. All stem source files belong to the same song\n",
     "2. We use the same time offset for sampling all source files (i.e., same `source_time`)\n",
     "\n",
-    "PLACEHOLDER FOR COHERENT MIXING DIAGRAM\n",
-    "\n",
     "Let's see how this is done. The following code will:\n",
     "1. Define a random seed\n",
     "2. Create a Scaper object\n",
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+body {`
	`2`	`+ font-family: system-ui;`
	`3`	`+}`