diff --git a/book/_static/myfile.css b/book/_static/myfile.css new file mode 100644 index 0000000..2b6a92a --- /dev/null +++ b/book/_static/myfile.css @@ -0,0 +1,3 @@ +body { + font-family: system-ui; +} \ No newline at end of file diff --git a/book/data/datasets.md b/book/data/datasets.md index 2efeb7b..c5bb795 100644 --- a/book/data/datasets.md +++ b/book/data/datasets.md @@ -4,17 +4,17 @@ ## Overview Here's a quick overview of existing datasets for Music Source Separation: -| **Dataset** | **Year** | **Genre** | **Instrument categories** | **Tracks** | **Avgerage duration (s)** | **Full songs** | **Stereo** | -| ---------- | -------- | --------- | ------------------------- | ---------- | ------------------------- | -------------- | ---------- | -| [MASS](http://www.mtg.upf.edu/download/datasets/mass) | 2008 | ? | ? | 9 | 16 $\pm$ 7 | ❌ | ✅️ | -| [MIR-1K](https://sites.google.com/site/unvoicedsoundseparation/mir-1k) | 2010 | ? | 2 | 1,000 | 8 $\pm$ 8 | ❌ | ❌ | -| [QUASI](http://www.tsi.telecom-paristech.fr/aao/en/2012/03/12/quasi/) | 2011 | ? | ? | 5 | 206 $\pm$ 21 | ✅ | ✅ | -| [ccMixter](http://www.loria.fr/~aliutkus/kam/) | 2014 | ? | ? | 50 | 231 $\pm$ 77 | ✅ | ✅ | -| [MedleyDB](http://medleydb.weebly.com/) | 2014 | ? | 82 | 63 | 206 $\pm$ 121 | ✅ | ✅ | -| [iKala](http://mac.citi.sinica.edu.tw/ikala/) | 2015 | ? | 2 | 206 | 30 | ❌ | ❌ | -| [DSD100](/datasets/dsd100.md)| 2015 | ? | 4 | 100 | 251 $\pm$ 60 | ✅ | ✅ | -| [MUSDB18](https://sigsep.github.io/datasets/musdb.html) | 2017 | ? | 4 | 150 | 236 $\pm$ 95 | ✅ | ✅ | -| [Slakh2100](http://www.slakh.com/) | 2019 | ? | 34 | 2100 | ? | ✅ | ? | +| **Dataset** | **Year** | **Instrument categories** | **Tracks** | **Avgerage duration (s)** | **Full songs** | **Stereo** | +| ---------- | -------- | ------------------------- | ---------- | ------------------------- | -------------- | ---------- | +| [MASS](http://www.mtg.upf.edu/download/datasets/mass) | 2008 | N/A | 9 | 16 $\pm$ 7 | ❌ | ✅️ | +| [MIR-1K](https://sites.google.com/site/unvoicedsoundseparation/mir-1k) | 2010 | N/A | 1,000 | 8 $\pm$ 8 | ❌ | ❌ | +| [QUASI](http://www.tsi.telecom-paristech.fr/aao/en/2012/03/12/quasi/) | 2011 | N/A | 5 | 206 $\pm$ 21 | ✅ | ✅ | +| [ccMixter](http://www.loria.fr/~aliutkus/kam/) | 2014 | N/A | 50 | 231 $\pm$ 77 | ✅ | ✅ | +| [MedleyDB](http://medleydb.weebly.com/) | 2014 | 82 | 63 | 206 $\pm$ 121 | ✅ | ✅ | +| [iKala](http://mac.citi.sinica.edu.tw/ikala/) | 2015 | 2 | 206 | 30 | ❌ | ❌ | +| [DSD100](/datasets/dsd100.md)| 2015 | 4 | 100 | 251 $\pm$ 60 | ✅ | ✅ | +| [MUSDB18](https://sigsep.github.io/datasets/musdb.html) | 2017 | 4 | 150 | 236 $\pm$ 95 | ✅ | ✅ | +| [Slakh2100](http://www.slakh.com/) | 2019 | 34 | 2100 | 249 | ✅ | ❌ | This extended table is based on: [SigSep/datasets](https://sigsep.github.io/datasets/), and reproduced with permission. diff --git a/book/data/introduction.md b/book/data/introduction.md index 32e6333..f0e14f7 100644 --- a/book/data/introduction.md +++ b/book/data/introduction.md @@ -2,21 +2,34 @@ # Introduction In this chapter we'll cover the key aspects we need to know about data for source separation: what do data for source -separation look like, relevant datasets and, importantly, how to programatically generate training and evaluation data -to minimize the time we spend data wrangling and maximize the performance we can squeeze out of our data. +separation look like, relevant datasets and, importantly, how to programatically generate training data (mixtures) +in a way that's efficient, reproducible, and maximizes the performance we can squeeze out of our data. ## Data for music source separation -The inputs and outputs of source separation model look like this: +At a high level, the inputs and outputs of a source separation model look like this: -PLACEHOLDER: image showing mixture --> model --> stems +```{figure} ../images/data/source_separation_io.png +--- +height: 300px +name: fig-sourcesepio +--- +Inputs and outputs of a source separation model. +``` For this tutorial, we will assume the inputs and outputs were created in the following way: 1. Each instrument or voice is recorded in isolation into a separate audio track, called a "stem". The stem may be processed with effects such as compression, reverb, etc. 2. The mixture is obtained by summing the processed stems. - -PLACEHOLDER: diagram of simplified mixing process +3. The model takes the mixture as input and outputs its estimate of each stem + +```{figure} ../images/data/music_mixing.png +--- +height: 300px +name: fig-mixing +--- +Mixing stems to produce a mixture (mix). +``` ```{note} This is a simplified view of music creation. In practice, the mixture (musicians refer to this as the *mix*) typically @@ -31,7 +44,13 @@ as input, the model outputs the estimated stems, and we compare these to the ori mixture. The difference between the estimated stems and the original stems is used to update the model parameters during training: -PLACEHOLDER: block diagram of training +```{figure} ../images/data/source_separation_training.png +--- +height: 300px +name: fig-training +--- +High-level diagram of training a source separation model. +``` The difference between the estimated stems and original stems is also used to *evaluate* a trained source separation model, as we shall see later on. diff --git a/book/data/scaper.ipynb b/book/data/scaper.ipynb index 74b71df..5af22a5 100644 --- a/book/data/scaper.ipynb +++ b/book/data/scaper.ipynb @@ -1293,7 +1293,13 @@ "\n", "That's because we just generated an *incoherent mixture*, i.e., a mixture where the stems are not necessarily from the same song, and even if they are, they are not necessarily temporally aligned:\n", "\n", - "PLACEHOLDER FOR INCOHERENT MIXTURE GRAPHIC\n", + "```{figure} ../images/data/incoherent_vs_coherent_mixing.png\n", + "---\n", + "height: 400px\n", + "name: fig-incoherent_vs_coherent_mixing\n", + "---\n", + "Incoherent mixing vs coherent mixing.\n", + "```\n", "\n", "We can verify this by listening to the individual stems:" ] @@ -1430,12 +1436,10 @@ "source": [ "## Coherent mixing\n", "\n", - "To generate cohernet mixtures, we need to ensure that:\n", + "To generate cohernet mixtures (cf. {ref}`fig-incoherent_vs_coherent_mixing`), we need to ensure that:\n", "1. All stem source files belong to the same song\n", "2. We use the same time offset for sampling all source files (i.e., same `source_time`)\n", "\n", - "PLACEHOLDER FOR COHERENT MIXING DIAGRAM\n", - "\n", "Let's see how this is done. The following code will:\n", "1. Define a random seed\n", "2. Create a Scaper object\n", diff --git a/book/images/data/incoherent_vs_coherent_mixing.png b/book/images/data/incoherent_vs_coherent_mixing.png new file mode 100644 index 0000000..431ac34 Binary files /dev/null and b/book/images/data/incoherent_vs_coherent_mixing.png differ diff --git a/book/images/data/music_mixing.png b/book/images/data/music_mixing.png new file mode 100644 index 0000000..8cd2795 Binary files /dev/null and b/book/images/data/music_mixing.png differ diff --git a/book/images/data/source_separation_io.png b/book/images/data/source_separation_io.png new file mode 100644 index 0000000..a736496 Binary files /dev/null and b/book/images/data/source_separation_io.png differ diff --git a/book/images/data/source_separation_training.png b/book/images/data/source_separation_training.png new file mode 100644 index 0000000..fd6d6b8 Binary files /dev/null and b/book/images/data/source_separation_training.png differ