Merge branch 'master' of github.com:source-separation/tutorial

justinsalamon · justinsalamon · commit 56ea6bde7e49 · 2020-10-11T11:25:07.000-07:00
diff --git a/book/_toc.yml b/book/_toc.yml
@@ -49,5 +49,6 @@
   chapters:
     - file: zzz_refs
     - file: appendix/resources
+    - file: appendix/cite_this
     - file: appendix/acknowledgements
     - file: appendix/authors
diff --git a/book/appendix/acknowledgements.md b/book/appendix/acknowledgements.md
@@ -9,9 +9,6 @@ contributed their time and expertise to the open source projects that we
 showcase. We want use this page to acknowledge the great work that they have
 done. 
 
-
-### Primary Resources
-
 ### nussl
 
  - Ethan Manilow
@@ -74,3 +71,18 @@ The authors have used the following images from the [Noun Project](https://theno
 - "piano keyboard" by b farias
 
 
+## Special Thanks
+
+The authors want to acknowledge the many members of the community who graciously
+let us use parts of their work in this book. Their names are placed throughout
+the text.
+
+We also want to thank these people, without whom we would not be able to write
+this book:
+
+- Bryan Pardo
+- Jonathan Le Roux
+- Gordon Wichern
+
+
+
diff --git a/book/appendix/cite_this.md b/book/appendix/cite_this.md
@@ -0,0 +1,16 @@
+# Cite this Book
+
+If you wish to reference this book in your own work, we ask that you 
+use the following `bibtex` entry:
+
+```
+@misc{opensourceseparation,
+  author       = {Ethan Manilow and
+                  Prem Seetharman and
+                  Justin Salamon},
+  title        = {Open Source Tools & Data for Music Source Separation},
+  month        = oct,
+  year         = 2020,
+  url          = {https://source-separation.github.io/tutorial/}
+}
+```
diff --git a/book/approaches/deep/building_blocks.md b/book/approaches/deep/building_blocks.md
@@ -638,11 +638,11 @@ When computing losses with spectrograms, we compare the spectrogram
 of the true source to the input spectrogram with the network's mask
 applied. Given some ground truth STFT for source $i$
 $S_i \in \mathbb{C}^{F\times T}$, an input
-mixture $X \in \mathbb{C}^{F\times T}$, and a net's estimated
+mixture $Y \in \mathbb{C}^{F\times T}$, and a net's estimated
 mask $\hat{M}_i \in \mathbb{R}^{F\times T}$ we compute the loss like
 
 $$
-\mathcal{L}_{\text{spec}} = \Big\| S_i - \hat{M}_i \odot |X| \Big\|_p,
+\mathcal{L}_{\text{spec}} = \Big\| |S_i| - \hat{M}_i \odot |Y| \Big\|_p,
 $$
 
 where$\odot$ denotes element-wise product adn $p$ is the _norm_ of
@@ -661,7 +661,7 @@ the _Magnitude Spectrum Approximation_ or MSA {cite}`weninger2014discriminativel
 This is just the same equation as above unmodified:
 
 $$
-\text{MSA} =  |S_i| - \hat{M}_i \odot |X|
+\text{MSA} =  |S_i| - \hat{M}_i \odot |Y|
 $$
 
 
@@ -671,16 +671,16 @@ the phase data by including it in our target calculation like so
 
 
 $$
-\text{tPSA} = \hat{M}_{i} \odot |X|  - \operatorname{T}_{0}^{|X|}\left(|S_i| \odot \cos(\angle S_i - \angle X)\right)
+\text{tPSA} = \hat{M}_{i} \odot |Y|  - \operatorname{T}_{0}^{|Y|}\left(|S_i| \odot \cos(\angle S_i - \angle Y)\right)
 $$
 
 
 where $\angle S_i$ is the true
-phase of Source i, $\angle X$ is the mixture phase, and
-$\operatorname{T}_{0}^{|X|}(x)= \min(\max(x,0),|X|)$ is a truncation
+phase of Source i, $\angle Y$ is the mixture phase, and
+$\operatorname{T}_{0}^{|Y|}(x)= \min(\max(x,0),|Y|)$ is a truncation
 function ensuring the target can be reached with a sigmoid activation function.
 Specifically, we incorporate constructive and destructive interference 
-of the source and mixture into the target with the term $\cos(\angle S_i - \angle X)$.
+of the source and mixture into the target with the term $\cos(\angle S_i - \angle Y)$.
 
 
 ```{tip}
diff --git a/book/basics/evaluation.ipynb b/book/basics/evaluation.ipynb
@@ -66,7 +66,7 @@
     "\\text{SAR} := 10 \\log_{10} \\left( \\frac{\\| s_{\\text{target}} + e_{\\text{interf}} + e_{\\text{noise}} \\|^2}{ \\| e_{\\text{artif}} \\|^2} \\right)\n",
     "$$\n",
     "\n",
-    "This is usually interpreted as the amount of unwanted \\text{artif}acts a source \n",
+    "This is usually interpreted as the amount of unwanted artifacts a source \n",
     "estimate has with relation to the true source.\n",
     "\n",
     "\n",
@@ -81,7 +81,7 @@
     "[\"bleed\", or \"leakage\"](https://en.wikipedia.org/wiki/Spill_(audio)). \n",
     "\n",
     "\n",
-    "**Source-to-Interference Ratio (SIR)**\n",
+    "**Source-to-Distortion Ratio (SDR)**\n",
     "\n",
     "$$\n",
     "\\text{SDR} := 10 \\log_{10} \\left( \\frac{\\| s_{\\text{target}} \\|^2}{ \\| e_{\\text{interf}} + e_{\\text{noise}} + e_{\\text{artif}} \\|^2} \\right)\n",
@@ -94,7 +94,11 @@
     "\n",
     "```{note}\n",
     "As of this writing (October 2020), the best reported SDR for singing\n",
-    "voice separation is $7.24 dB$. {cite}`takahashi2020d3net`\n",
+    "voice separation on MUSDB18 is $7.24 dB$. {cite}`takahashi2020d3net`\n",
+    "Recent research papers have been reporting vocal SDRs on MUSDB18\n",
+    "in the range of 6-7 dB.\n",
+    "Compare the SDR of different systems at this\n",
+    "[Papers with Code link.](https://paperswithcode.com/sota/music-source-separation-on-musdb18)\n",
     "```\n",
     "\n",
     "\n",
diff --git a/book/basics/phase.md b/book/basics/phase.md
@@ -18,7 +18,9 @@ the source estimation.
 alt: Phase is an important component of sound.
 name: circle_phase
 ---
-An audio signal's phase is fundamental to representing the signal.
+Phase is the instantaneous amplitude of an audio signal. Phase is a fundamental part of representing
+the signal.
+Adapted from [Wikimedia](https://commons.wikimedia.org/wiki/File:Phase_shifter_using_IQ_modulator.gif).
 ```
 
 An audio signal, $y(t)$, composed of exactly one sine wave,
@@ -92,19 +94,19 @@ than at the lower frequencies.
 alt: Phase is sensitive to frequency and its initial starting point.
 name: phase_sensitivity
 ---
-Getting a snapshot of the phase (the black dotted vertical line) is very
+Getting a snapshot of the phase (the black dotted vertical lines) is very
 sensitive to the frequencies and initial phases of the sine waves. This
 is similar to what happens when take an STFT: many snapshots of sine waves
 with many frequencies and initial phase offsets.
 ```
 
 
-The gif above shows two sine waves. They both start at A440, or 440 Hz. But then the bottom one
-gradually changes frequency up an octave higher (880 Hz). The dotted black
-line shows a shapshot of the phase as the frequency changes. The initial phase also changes
-in the interval $[0.0, 2\pi]$. Notice how sensitive the snapshot is to changes
-in the frequency and initial phase.
-
+The gif above shows a sine wave with varying frequency and initial phase.
+The frequency starts at A440, or 440 Hz and gradually changes frequency up an octave higher (880
+Hz). The initial phase also changes in the interval $[0.0, 2\pi]$.
+The dotted black lines show two shapshots of the value of the sine wave as the frequency and
+initial phase both change.
+Notice how sensitive the snapshot are to changes in the frequency and initial phase.
 
 Another big difficulty when dealing with phase is that humans do not always
 perceive phase differences, _i.e._,
diff --git a/book/basics/representations.md b/book/basics/representations.md
@@ -313,7 +313,8 @@ STFT, $\log{|X|^2} \in \mathbb{R}^{T \times F}$.
 
 ```{tip}
 Even though it is hard to visualize the detail in a magnitude or power spectrogram,
-most source separation algorithms work completely fine on these representations.
+some source separation algorithms work completely fine on these representations, while
+some need log spectrograms. Make sure to set your spectrograms correctly!
 ```
 
 
@@ -333,6 +334,7 @@ is being discussed when possible.
 ---
 alt: A visual comparison of linear-scaled vs mel-spaced y axies.
 name: mel_spectrograms
+scale: 35%
 ---
 A visual comparison of linear-scaled vs mel-spaced y axies.
 Lower frequencies have a larger representation in a mel-spaced spectrogram.
diff --git a/book/data/musdb18.ipynb b/book/data/musdb18.ipynb
@@ -520,4 +520,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 4
-}
+}
diff --git a/book/first_steps/byo_hpss.ipynb b/book/first_steps/byo_hpss.ipynb
@@ -238,7 +238,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "And, as always, we can make an interactive version of this. Try whistling and clapping\n",
+    "And, as always, we can make an interactive version of this. Try recording yourself whistling and clapping\n",
     "at the same time and see how the results sound!"
    ]
   },
@@ -272,6 +272,38 @@
     "my_hpss.interact(share=True, source='microphone')"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If you want to upload a song, you can also remove `source='microphone'` in the `interact()` call:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "NameError",
+     "evalue": "name 'my_hpss' is not defined",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mNameError\u001b[0m                                 Traceback (most recent call last)",
+      "\u001b[0;32m<ipython-input-1-af6b4bc55694>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      2\u001b[0m \u001b[0;31m# interactively in Colab or Jupyter Notebook\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      3\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mmy_hpss\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0minteract\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mshare\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
+      "\u001b[0;31mNameError\u001b[0m: name 'my_hpss' is not defined"
+     ]
+    }
+   ],
+   "source": [
+    "%%capture\n",
+    "# Comment out the line above to run this cell\n",
+    "# interactively in Colab or Jupyter Notebook\n",
+    "\n",
+    "my_hpss.interact(share=True)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
diff --git a/book/first_steps/repetition.ipynb b/book/first_steps/repetition.ipynb
@@ -419,9 +419,12 @@
    "outputs": [],
    "source": [
     "# Will make AudioSignal objects after we're run the algorithm\n",
+    "repet = nussl.separation.primitive.Repet(mix)\n",
+    "repet.run()\n",
     "repet_bg, repet_fg = repet.make_audio_signals()\n",
     "\n",
     "# Will run the algorithm and return AudioSignals in one step\n",
+    "repet = nussl.separation.primitive.Repet(mix)\n",
     "repet_bg, repet_fg = repet()"
    ]
   },
diff --git a/book/images/basics/circle_phase.gif b/book/images/basics/circle_phase.gif
diff --git a/book/images/basics/phase_sensitivity.gif b/book/images/basics/phase_sensitivity.gif
diff --git a/book/intro/open_src_projects.md b/book/intro/open_src_projects.md
@@ -108,6 +108,8 @@ their research papers. In the era of deep learning, the trained models are also
 sometimes released. Here is a non-exhaustive list of some recent open source
 projects. We have prioritized open source projects with code and downloadable
 trained models by the original authors of the research papers described.
+[Papers With Code](https://paperswithcode.com/task/music-source-separation) also
+has a nice section on many of these methods.
 
 We will discuss some of these architectures in more detail in later sections,
 but here we will provide some highlights and links to their Github repositories,
diff --git a/book/landing.md b/book/landing.md
@@ -1,4 +1,4 @@
-Open-Source Tools & Data for Music Source Separation
+Open Source Tools & Data for Music Source Separation
 ====================================================
 
 **By Ethan Manilow, Prem Seetharaman, and Justin Salamon**
diff --git a/common/image_maker.py b/common/image_maker.py