In [None]:
!pip install nussl
!pip install scaper

(data:scaper)=
# Data generation with Scaper

In this section we will learn how to generate training data for music source separation using the [Scaper python library](https://github.com/justinsalamon/scaper). Parts of this section are based on the [Scaper tutorial](https://scaper.readthedocs.io/en/latest/tutorial.html).



## Why Scaper?

Before we dive in, you might be wondering why we need a python library to mix stems at all - can't we just sum them in python? What about data loaders provided by deep learning frameworks - can't we just use those?

While there are various ways to programatically generate mixes, we will see that Scaper is particularly well suited for this task, offering a number of benefits that make it seriously preferable to simple mixing via ad-hoc code:
* **Scaper supports complex, programatic, and stochastic mixing pipelines** 
    * For example, it can sample mixing parameters (such as per-stem SNR) from a variety of distributions. This allows you to generate a potentially infinite number of unique mixtures from the same set of stems.
* **Scaper supports data augmentation** 
    * Such as pitch shifting and time-stretching.
* **Scaper pipelines are highly reproducible**
    * Scaper generates detailed annotation files - you can re-create an entire dataset of mixtures just from Scaper's annotations as long as you have access to the stems. The same can be achived by sharing your scaper code + stems as a "recipe": there's no need to transfer heavy sets of mixtures!
* **Scaper is optimized for performance**
    * Scaper can generate training data on the fly for GPU training without being a bottleneck. It's also good for batch generation, for example, on a machine with 8 CPUs, Scaper can generate 20,000 ten-second mixtures (mix + stems + annotations) in under 10 minutes.
* **Scaper can generate data for other audio domains**
    * Scaper can generate speech/noise mixtures for ASR, environmental soundscapes for sound event detection (SED) and classification (SEC), bioacoustic mixtures for species classification, etc. Once you know how to use Scaper for one domain, you know how to use it for all domains.
* **Scaper is documented, tested, actively maintained and updated**
    * Will your ad-hoc mixing code work a few years from now? Will you remember how to use it? Will someone else know how to use it? Does it cover all corner cases? Can it be easily extended to support new features? Ad-hoc mixing code might seem like a time saver in the short term, but it's bound to be a time sink in the long run.


## Scaper overview

Scaper can be viewed as a programatic audio mixer. At a high-level, the input to Scaper is:
1. *source material*: audio recordings you want to mix together ("soundbank" in the diagram below).
2. *event specification*: a "recipe" for how to mix the recordings.

Scaper takes these and generates mixtures using the source material by following the event specification. Since the event specification can be stochastic (random), multiple different mixtures can be generated from the same source material and event specification. For each generated mixture Scaper outputs:
1. The mixture audio signal ("soundscape" in the diagram).
2. The mixture annotation in JAMS format (detailed) and in a simpligied tabular format (sparse).
3. The audio of each processed stem (or sound event) used to create the mixture.

<img src="https://www.justinsalamon.com/uploads/4/3/9/4/4394963/scaper-diagram_orig.png">