Skip to content

Commit 56ea6bd

Browse files
committed
Merge branch 'master' of github.com:source-separation/tutorial
2 parents 51c8ce7 + ea60872 commit 56ea6bd

File tree

15 files changed

+144
-90
lines changed

15 files changed

+144
-90
lines changed

book/_toc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,5 +49,6 @@
4949
chapters:
5050
- file: zzz_refs
5151
- file: appendix/resources
52+
- file: appendix/cite_this
5253
- file: appendix/acknowledgements
5354
- file: appendix/authors

book/appendix/acknowledgements.md

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,6 @@ contributed their time and expertise to the open source projects that we
99
showcase. We want use this page to acknowledge the great work that they have
1010
done.
1111

12-
13-
### Primary Resources
14-
1512
### nussl
1613

1714
- Ethan Manilow
@@ -74,3 +71,18 @@ The authors have used the following images from the [Noun Project](https://theno
7471
- "piano keyboard" by b farias
7572

7673

74+
## Special Thanks
75+
76+
The authors want to acknowledge the many members of the community who graciously
77+
let us use parts of their work in this book. Their names are placed throughout
78+
the text.
79+
80+
We also want to thank these people, without whom we would not be able to write
81+
this book:
82+
83+
- Bryan Pardo
84+
- Jonathan Le Roux
85+
- Gordon Wichern
86+
87+
88+

book/appendix/cite_this.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Cite this Book
2+
3+
If you wish to reference this book in your own work, we ask that you
4+
use the following `bibtex` entry:
5+
6+
```
7+
@misc{opensourceseparation,
8+
author = {Ethan Manilow and
9+
Prem Seetharman and
10+
Justin Salamon},
11+
title = {Open Source Tools & Data for Music Source Separation},
12+
month = oct,
13+
year = 2020,
14+
url = {https://source-separation.github.io/tutorial/}
15+
}
16+
```

book/approaches/deep/building_blocks.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -638,11 +638,11 @@ When computing losses with spectrograms, we compare the spectrogram
638638
of the true source to the input spectrogram with the network's mask
639639
applied. Given some ground truth STFT for source $i$
640640
$S_i \in \mathbb{C}^{F\times T}$, an input
641-
mixture $X \in \mathbb{C}^{F\times T}$, and a net's estimated
641+
mixture $Y \in \mathbb{C}^{F\times T}$, and a net's estimated
642642
mask $\hat{M}_i \in \mathbb{R}^{F\times T}$ we compute the loss like
643643

644644
$$
645-
\mathcal{L}_{\text{spec}} = \Big\| S_i - \hat{M}_i \odot |X| \Big\|_p,
645+
\mathcal{L}_{\text{spec}} = \Big\| |S_i| - \hat{M}_i \odot |Y| \Big\|_p,
646646
$$
647647

648648
where$\odot$ denotes element-wise product adn $p$ is the _norm_ of
@@ -661,7 +661,7 @@ the _Magnitude Spectrum Approximation_ or MSA {cite}`weninger2014discriminativel
661661
This is just the same equation as above unmodified:
662662

663663
$$
664-
\text{MSA} = |S_i| - \hat{M}_i \odot |X|
664+
\text{MSA} = |S_i| - \hat{M}_i \odot |Y|
665665
$$
666666

667667

@@ -671,16 +671,16 @@ the phase data by including it in our target calculation like so
671671

672672

673673
$$
674-
\text{tPSA} = \hat{M}_{i} \odot |X| - \operatorname{T}_{0}^{|X|}\left(|S_i| \odot \cos(\angle S_i - \angle X)\right)
674+
\text{tPSA} = \hat{M}_{i} \odot |Y| - \operatorname{T}_{0}^{|Y|}\left(|S_i| \odot \cos(\angle S_i - \angle Y)\right)
675675
$$
676676

677677

678678
where $\angle S_i$ is the true
679-
phase of Source i, $\angle X$ is the mixture phase, and
680-
$\operatorname{T}_{0}^{|X|}(x)= \min(\max(x,0),|X|)$ is a truncation
679+
phase of Source i, $\angle Y$ is the mixture phase, and
680+
$\operatorname{T}_{0}^{|Y|}(x)= \min(\max(x,0),|Y|)$ is a truncation
681681
function ensuring the target can be reached with a sigmoid activation function.
682682
Specifically, we incorporate constructive and destructive interference
683-
of the source and mixture into the target with the term $\cos(\angle S_i - \angle X)$.
683+
of the source and mixture into the target with the term $\cos(\angle S_i - \angle Y)$.
684684

685685

686686
```{tip}

book/basics/evaluation.ipynb

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@
6666
"\\text{SAR} := 10 \\log_{10} \\left( \\frac{\\| s_{\\text{target}} + e_{\\text{interf}} + e_{\\text{noise}} \\|^2}{ \\| e_{\\text{artif}} \\|^2} \\right)\n",
6767
"$$\n",
6868
"\n",
69-
"This is usually interpreted as the amount of unwanted \\text{artif}acts a source \n",
69+
"This is usually interpreted as the amount of unwanted artifacts a source \n",
7070
"estimate has with relation to the true source.\n",
7171
"\n",
7272
"\n",
@@ -81,7 +81,7 @@
8181
"[\"bleed\", or \"leakage\"](https://en.wikipedia.org/wiki/Spill_(audio)). \n",
8282
"\n",
8383
"\n",
84-
"**Source-to-Interference Ratio (SIR)**\n",
84+
"**Source-to-Distortion Ratio (SDR)**\n",
8585
"\n",
8686
"$$\n",
8787
"\\text{SDR} := 10 \\log_{10} \\left( \\frac{\\| s_{\\text{target}} \\|^2}{ \\| e_{\\text{interf}} + e_{\\text{noise}} + e_{\\text{artif}} \\|^2} \\right)\n",
@@ -94,7 +94,11 @@
9494
"\n",
9595
"```{note}\n",
9696
"As of this writing (October 2020), the best reported SDR for singing\n",
97-
"voice separation is $7.24 dB$. {cite}`takahashi2020d3net`\n",
97+
"voice separation on MUSDB18 is $7.24 dB$. {cite}`takahashi2020d3net`\n",
98+
"Recent research papers have been reporting vocal SDRs on MUSDB18\n",
99+
"in the range of 6-7 dB.\n",
100+
"Compare the SDR of different systems at this\n",
101+
"[Papers with Code link.](https://paperswithcode.com/sota/music-source-separation-on-musdb18)\n",
98102
"```\n",
99103
"\n",
100104
"\n",

book/basics/phase.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,9 @@ the source estimation.
1818
alt: Phase is an important component of sound.
1919
name: circle_phase
2020
---
21-
An audio signal's phase is fundamental to representing the signal.
21+
Phase is the instantaneous amplitude of an audio signal. Phase is a fundamental part of representing
22+
the signal.
23+
Adapted from [Wikimedia](https://commons.wikimedia.org/wiki/File:Phase_shifter_using_IQ_modulator.gif).
2224
```
2325

2426
An audio signal, $y(t)$, composed of exactly one sine wave,
@@ -92,19 +94,19 @@ than at the lower frequencies.
9294
alt: Phase is sensitive to frequency and its initial starting point.
9395
name: phase_sensitivity
9496
---
95-
Getting a snapshot of the phase (the black dotted vertical line) is very
97+
Getting a snapshot of the phase (the black dotted vertical lines) is very
9698
sensitive to the frequencies and initial phases of the sine waves. This
9799
is similar to what happens when take an STFT: many snapshots of sine waves
98100
with many frequencies and initial phase offsets.
99101
```
100102

101103

102-
The gif above shows two sine waves. They both start at A440, or 440 Hz. But then the bottom one
103-
gradually changes frequency up an octave higher (880 Hz). The dotted black
104-
line shows a shapshot of the phase as the frequency changes. The initial phase also changes
105-
in the interval $[0.0, 2\pi]$. Notice how sensitive the snapshot is to changes
106-
in the frequency and initial phase.
107-
104+
The gif above shows a sine wave with varying frequency and initial phase.
105+
The frequency starts at A440, or 440 Hz and gradually changes frequency up an octave higher (880
106+
Hz). The initial phase also changes in the interval $[0.0, 2\pi]$.
107+
The dotted black lines show two shapshots of the value of the sine wave as the frequency and
108+
initial phase both change.
109+
Notice how sensitive the snapshot are to changes in the frequency and initial phase.
108110

109111
Another big difficulty when dealing with phase is that humans do not always
110112
perceive phase differences, _i.e._,

book/basics/representations.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -313,7 +313,8 @@ STFT, $\log{|X|^2} \in \mathbb{R}^{T \times F}$.
313313

314314
```{tip}
315315
Even though it is hard to visualize the detail in a magnitude or power spectrogram,
316-
most source separation algorithms work completely fine on these representations.
316+
some source separation algorithms work completely fine on these representations, while
317+
some need log spectrograms. Make sure to set your spectrograms correctly!
317318
```
318319

319320

@@ -333,6 +334,7 @@ is being discussed when possible.
333334
---
334335
alt: A visual comparison of linear-scaled vs mel-spaced y axies.
335336
name: mel_spectrograms
337+
scale: 35%
336338
---
337339
A visual comparison of linear-scaled vs mel-spaced y axies.
338340
Lower frequencies have a larger representation in a mel-spaced spectrogram.

book/data/musdb18.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -520,4 +520,4 @@
520520
},
521521
"nbformat": 4,
522522
"nbformat_minor": 4
523-
}
523+
}

book/first_steps/byo_hpss.ipynb

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -238,7 +238,7 @@
238238
"cell_type": "markdown",
239239
"metadata": {},
240240
"source": [
241-
"And, as always, we can make an interactive version of this. Try whistling and clapping\n",
241+
"And, as always, we can make an interactive version of this. Try recording yourself whistling and clapping\n",
242242
"at the same time and see how the results sound!"
243243
]
244244
},
@@ -272,6 +272,38 @@
272272
"my_hpss.interact(share=True, source='microphone')"
273273
]
274274
},
275+
{
276+
"cell_type": "markdown",
277+
"metadata": {},
278+
"source": [
279+
"If you want to upload a song, you can also remove `source='microphone'` in the `interact()` call:"
280+
]
281+
},
282+
{
283+
"cell_type": "code",
284+
"execution_count": 1,
285+
"metadata": {},
286+
"outputs": [
287+
{
288+
"ename": "NameError",
289+
"evalue": "name 'my_hpss' is not defined",
290+
"output_type": "error",
291+
"traceback": [
292+
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
293+
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
294+
"\u001b[0;32m<ipython-input-1-af6b4bc55694>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;31m# interactively in Colab or Jupyter Notebook\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mmy_hpss\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0minteract\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mshare\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
295+
"\u001b[0;31mNameError\u001b[0m: name 'my_hpss' is not defined"
296+
]
297+
}
298+
],
299+
"source": [
300+
"%%capture\n",
301+
"# Comment out the line above to run this cell\n",
302+
"# interactively in Colab or Jupyter Notebook\n",
303+
"\n",
304+
"my_hpss.interact(share=True)"
305+
]
306+
},
275307
{
276308
"cell_type": "markdown",
277309
"metadata": {},

book/first_steps/repetition.ipynb

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -419,9 +419,12 @@
419419
"outputs": [],
420420
"source": [
421421
"# Will make AudioSignal objects after we're run the algorithm\n",
422+
"repet = nussl.separation.primitive.Repet(mix)\n",
423+
"repet.run()\n",
422424
"repet_bg, repet_fg = repet.make_audio_signals()\n",
423425
"\n",
424426
"# Will run the algorithm and return AudioSignals in one step\n",
427+
"repet = nussl.separation.primitive.Repet(mix)\n",
425428
"repet_bg, repet_fg = repet()"
426429
]
427430
},

0 commit comments

Comments
 (0)