# \*\*\*\* Source Separation Library Demo

### \*\*\*\* is an open source python library for audio source separation.

It is built to be easy to use existing source separation algorithms and to develop new algorithms. In this demo we will explore some basic functionality of \*\*\*\*, including running and evaluating some source separation algorithms.

Let's get started by exploring how to import audio.


In [1]:
from __future__ import division
import numpy as np
import pprint
import librosa
import matplotlib.pyplot as plt
%matplotlib inline

# This allows us to play music in the browser. Works best on Firefox
from audio_embed_master.audio_embed import utilities
utilities.apply_style()

# AudioSignal

The AudioSignal object is the entryway to using \*\*\*\*. It provides an easy way to import an audio file into \*\*\*\*. If you have ffmpeg installed, you can open many types of files in nussl, but with this tutorial we will only open .wav files, which works whether or not ffmpeg is installed.


In [2]:
import nussl

path_to_source1 = "demo_files/drums.wav"
source1 = nussl.AudioSignal(path_to_source1)

utilities.audio(source1.audio_data.T, source1.sample_rate)

That's it! Now the audio is loaded into nussl and stored in an AudioSignal object. We can explore other aspects of this file with the AudioSignal object as well...

In [3]:
print("Path to file: {}"              .format(source1.path_to_input_file))
print("Filename: {}"                  .format(source1.file_name))
print("Sample Rate: {} Hz"            .format(source1.sample_rate))
print("Length of file in samples: {}" .format(source1.signal_length))
print("Length of file in seconds: {}" .format(source1.signal_duration))
print("Number of channels: {}"        .format(source1.num_channels))

Path to file: demo_files/drums.wav
Filename: drums.wav
Sample Rate: 44100 Hz
Length of file in samples: 441000
Length of file in seconds: 10.0
Number of channels: 1


That's great. Now let's explore the audio data. The audio data is stored here:

In [4]:
source1.audio_data

array([[ 0.0000000e+00, -3.0517578e-05,  0.0000000e+00, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00]], dtype=float32)

In [5]:
type(source1.audio_data)

numpy.ndarray

In [6]:
source1.audio_data.shape

(1, 441000)

(# channels, # samples)


What if we already have our data in a numpy array? We can initialize an AudioSignal with that, too...


In [7]:
sample_rate = 44100  # Hz
dt = 1.0 / sample_rate
dur = 10.0  # seconds
freq = 440  # Hz
x1 = np.arange(0.0, dur, dt)
x1 = np.sin(2 * np.pi * freq * x1) * 0.3
x2 = -x1    # Invert the phase in one channel
x = np.vstack([x1, x2])

sine_wave = nussl.AudioSignal(audio_data_array=x)

sine_wave *= 0.5  # apply gain

utilities.audio(sine_wave.audio_data.T, sine_wave.sample_rate)

In [8]:
print("Path to file: {}"              .format(sine_wave.path_to_input_file))
print("Filename: {}"                  .format(sine_wave.file_name))
print("Sample Rate: {} Hz"            .format(sine_wave.sample_rate))
print("Length of file in samples: {}" .format(sine_wave.signal_length))
print("Length of file in seconds: {}" .format(sine_wave.signal_duration))
print("Number of channels: {}"        .format(sine_wave.num_channels))

Path to file: None
Filename: None
Sample Rate: 44100 Hz
Length of file in samples: 441000
Length of file in seconds: 10.0
Number of channels: 2


<br />
<br />
<br />
Now we have two signals. We can try to "mix" them by adding them together. But one is mono and the other is stereo...

In [9]:
new_signal = sine_wave + source1

Exception: Cannot do operation with two signals that have a different number of channels!

<br /><br /><br />

This won't work. We can make the second AudioSignal mono like so...

In [11]:
sine_wave.to_mono(overwrite=True)  # average the two channels
new_signal = sine_wave + source1

utilities.audio(new_signal.audio_data.T, new_signal.sample_rate)

<br /><br />
Because I inverted the left and right channels for the `sine_wave` object, when it's bounced to mono, it should be silent. We can only hear the drums, so it works!


## Spectral Transforms

Most audio source separation algorithms do their operations in the spectral domain. Here's how to get up and running with spectral transformations.

In [18]:
# STFT object with all of the default parameters
stft = nussl.core.transforms.STFT()

print('stft.has_audio_data          = {}'  .format(stft.has_audio_data))
print('stft.has_representation_data = {}'  .format(stft.has_representation_data))
print('stft.is_empty                = {}'  .format(stft.is_empty))

stft.has_audio_data          = False
stft.has_representation_data = False
stft.is_empty                = True


In [23]:
# Reinitialize our AudioSignal object with the STFT object
source = nussl.AudioSignal(path_to_source1, transform=stft)

# Now source owns a copy of stft and initializes it accordingly
print('source.stft.has_audio_data          = {}'  .format(source.stft.has_audio_data))
print('source.stft.has_representation_data = {}'  .format(source.stft.has_representation_data))
print('source.has_representation_data      = {}'  .format(source.has_representation_data))
print('source.stft.is_empty                = {}'  .format(source.stft.is_empty))

# stft has the same audio data that source has
print('\nsource.audio_data == stft.audio_data? {}'.format(np.array_equal(source.audio_data, stft.audio_data)))

source.stft.has_audio_data          = True
source.stft.has_representation_data = False
source.stft.is_empty                = False

souce.audio_data == stft.audio_data? True


But, `stft` has no data stft yet. Because we haven't actually computed the STFT. Let's do that:

In [27]:
stft_data = source.calculate_stft()



In [28]:
print('source.stft.has_audio_data          = {}'  .format(source.stft.has_audio_data))
print('source.stft.has_representation_data = {}'  .format(source.stft.has_representation_data))
print('source.has_representation_data      = {}'  .format(source.has_representation_data))
print('source.stft.is_empty                = {}'  .format(source.stft.is_empty))

source.stft.has_audio_data          = True
source.stft.has_representation_data = True
source.has_representation_data      = True
source.stft.is_empty                = False


In [29]:
print('stft shape = {}'.format(stft_data.shape))

stft shape = (1025, 432, 1)


(num_freq, num_hops, num_channels)

In [33]:
# Do a low pass filter with the stft
lp_cutoff = 400  # Hz
frequency_vector = source.stft.freq_vector  # a vector of frequency values for each FFT bin
idx = (np.abs(frequency_vector - lp_cutoff)).argmin()  # trick to find the index of the closest value to 400 Hz
# idx = source.get_closest_frequency_bin(lp_cutoff)
source.stft.stft_data[idx:, :, :] = 0.0j  # every freq above ~400 Hz is 0 now

In [36]:
source.invert_stft()

TypeError: only size-1 arrays can be converted to Python scalars

In [38]:
source.stft.stft_data.shape[nussl.STFT_CHAN_INDEX]

1