# Warm up

This notebook will give you a brief introduction to working with audio in python.

By the end of the notebook, you'll have loaded audio, played it back in the browser, computed its frequency content, mapped its loudness to decibels, and generated several visualizations.


# Setup

The very first thing we'll need to do is import our libraries.

This is done in python by using the `import` command.  By convention, some libraries (particularly those with long names) are often renamed on import to make it easier to type.  This is done by using the `as` command, as seen below.

In [None]:
# First: numpy, the core numeric building blocks
# By convention, numpy is usually renamed to `np`
import numpy as np

In [None]:
# Next: librosa, and its display module.
# These need to be imported independently
import librosa
import librosa.display

In [None]:
# Next, matplotlib.  By convention, we rename this to `plt`
import matplotlib.pyplot as plt

# To tell jupyter that plots should be rendered in the browser (and not saved to files),
# we need this "magic" command:
%matplotlib inline

In [None]:
# Finally, if we want to play back audio, we need the Audio module from IPython
from IPython.display import Audio

---
# Loading data

Now that we have all of our packages in place, we can load some data.

`librosa` does this by using `librosa.load`.  It supports most commonly used audio formats.

Note: if you ever want to learn how something works in Jupyter, you can use the `?` operator.  Type the command you're interested in to a blank cell, followed by `?` (with no space) and hit `Shift+Enter`.

For example, executing the code below will bring up the documentation for `librosa.load`:

In [None]:
librosa.load?

In [None]:
# We can actually load some data by specifying a path to a file.
# The result of `load` will be the audio data and its sampling rate.
# Note that librosa will by default convert everything to 22KHz mono.

y, sr = librosa.load('sir_duke_fast.wav')

In [None]:
# To test that this worked, we can play it back as follows:
Audio(data=y, rate=sr)

# Transformation and visualization

Before going further, let's visualize the audio we just loaded as a wave over time.

This introduces the `librosa.display` module, which contains everything pertaining to visualization.

In [None]:
librosa.display.waveplot(y=y, sr=sr)

# To zoom in on a small region of time, uncomment the following line
#plt.xlim([3, 5])

In [None]:
# Now, let's compute a spectrogram of the audio.  This is done by the `stft` function

D = librosa.stft(y)

In [None]:
# What does this return?  Let's take a look!
D

In [None]:
# These numbers are complex!
# This is because STFT computes both magnitude and phase for each frequency at each time step.
# If we don't care about phase, we can focus on magnitude by taking the absolute value

S = np.abs(D)

In [None]:
# Now, we can plot the spectrum.
# For context, we can put time marks on the horizontal axis
librosa.display.specshow(S, x_axis='time')

# What?  That doesn't look like much...

Indeed!  This is because magnitudes measure loudness on a *linear* scale, but humans perceive loudness *logarithmically*.  This is where decibels (dB) come in!

You can convert to a decibel scale as follows:

In [None]:
S_dB = librosa.amplitude_to_db(S)

In [None]:
# Let's try plotting again!
librosa.display.specshow(S_dB, x_axis='time')

### Much more information there!  But what does it mean?

In the first plot, the colormaps went from dark to light because the data was all non-negative.

In the second plot, the data are both positive (louder, red) and negative (quieter, blue) with respect to an arbitrary reference point.

We can use a better reference point by comparing to the loudest (maximum) part of the recording:

In [None]:
S_dB = librosa.amplitude_to_db(S, ref=np.max)

In [None]:
# Let's try plotting again!
librosa.display.specshow(S_dB, x_axis='time')

### Better, but I still don't know what this means!

Ok, let's put some more decoration on the plot.

We can add a colorbar to explain what the colors mean

In [None]:
librosa.display.specshow(S_dB, x_axis='time')
plt.colorbar()

In [None]:
# And tick marks on the vertical axis to show what the frequencies are
# We use 'linear' to indicate that the frequencies are spaced linearly (evenly)
# between 0 and sr/2
librosa.display.specshow(S_dB, x_axis='time', y_axis='linear')
plt.colorbar()

# Wrapping up the warm up

We've just done the following:

1. Loaded an audio file
2. Played it back in the browser
3. Created a plot of its waveform
4. Computed its frequency content using a short-time Fourier transform
5. Visualized its frequency content by a spectrogram display

# What would you like to do next?