# Handling audio data in Python part one

There are a number of different libraries which can assist in the handling of audio in Python.

In this case we'll be using scipy.io for audio I/O, part of the [SciPy](http://www.scipy.org/) package.

The audio sample data is stored in familiar NumPy arrays.

I'd also recommend installing [Audacity](http://audacity.sourceforge.net/) so you can easily audition/edit any audio files

## Reading audio data

Now we are going to load in a 16 bit uncompressed wave file.

First we'll get our initial imports in:

In [1]:
%matplotlib inline
import numpy as np
from scipy.io import wavfile
import matplotlib.pyplot as plt
import matplotlib.pylab as pylab
import mpld3

ImportError: No module named mpld3

These calls enable the mpld3 render engine on this notebook and increase the plot size

In [None]:
mpld3.enable_notebook()
pylab.rcParams['figure.figsize'] = 15, 8
pylab.rcParams['font.size'] = 16

Then we read the audio data where samplerate = sample rate and samples = the audio sample data:

In [None]:
fs, samples = wavfile.read('audio/1k.wav')

The audio is sampled at 44,100 times per second (Hz), so we have 44,100 sample values/second of audio, with 1 channel (mono) of 3 seconds in length

In [None]:
print 'Sample rate: ', fs, 'Hz'
print 'Length in samples: ', len(samples)
print 'Length in seconds: ', '%.3f' % (float(len(samples)) / fs)

Now output the first 100 audio sample values to get a feel for the data

In [None]:
print samples[0:100]

The audio sample data is in the form of signed 16 bit integers with a range of possible values between -32768 and +32767

In [None]:
print 'Sample data type: ', samples.dtype

Plot the first 100 samples of audio data with markers at each sample point

In [None]:
plt.plot(samples[0:100], marker='o')
plt.title('1000Hz sine wave @ 100% amplitude in its native 16 bit integer sample format')
plt.xlabel('Time (samples)')
plt.ylabel('Amplitude')

It'll be easier to work with the data in a floating point format, where each sample can be a value between -1 and +1

In [None]:
samples = samples / 32768.0

Now output the first 100 audio sample values

In [None]:
print samples[0:100]

The audio sample data is now stored as 64 bit floating point values between -1 and +1

In [None]:
print 'Sample data type: ', samples.dtype

Plot the first 100 samples of audio data with markers at each sample point in its new format

In [None]:
plt.plot(samples[0:100], marker='o')
plt.ylim([-1.1, 1.1])
plt.title('1000Hz sine wave @ 100% amplitude')
plt.xlabel('Time (samples)')
plt.ylabel('Amplitude')
plt.savefig('test.pdf')

Reduce the samples amplitude by 50% and re-plot

In [None]:
samples = samples * 0.50
plt.plot(samples[0:100], marker='o')
plt.ylim([-1.1, 1.1])
plt.title('1000Hz sine wave @ 50% amplitude')
plt.xlabel('Time (samples)')
plt.ylabel('Amplitude')

## Write audio data to a wave file

Now we are going to write our amplitude reduced samples to a new wave file

First we need to convert the samples back to the int16 format between -32768 and +32767

In [None]:
samples = np.int16(samples * 32768)
print 'Sample data type is now: ', samples.dtype

Write the sample data the new file using the same sample rate as the original (44100Hz)

In [None]:
wavfile.write('audio/1k_0.5.wav', fs, samples)

## Generating audio data 

When working with audio you usually have to generate signals for various reasons such as when designing test signals for measurements

We will start by creating 1 second of white noise (noise with constant energy across all frequency bands)

We will be using a sample rate of 44,100Hz

In [None]:
fs = 44100

The signals duration will be 3 seconds and it will have an amplitude of 1.0

In [None]:
dur = 3
A = 1.0
sample_len = dur * fs

This preallocates our noise array, ready for filling

In [None]:
noise_samples = np.zeros(sample_len)

Then we will create an array of floating point numbers between -1 and +1, by generating each sample individually from a random number

In [None]:
for i in range(sample_len):
    noise_samples[i] = A * np.random.uniform(-1, 1)

Plot the first 1000 samples of white noise to see it in the time domain

In [None]:
plt.plot(noise_samples[0:1000])
plt.ylim([-1.1, 1.1])
plt.title('White noise')
plt.xlabel('Time (samples)')
plt.ylabel('Amplitude')

Convert the random samples to 16bit signed integers

In [None]:
noise_samples = np.int16(noise_samples * 32768)

Write the random samples to disk

In [None]:
wavfile.write('audio/white_noise.wav', fs, noise_samples)

Next we will create a sine wave

We will start by defining its characteristics

It will be sampled at 44,100Hz, have a duration of 3 seconds, a frequency of 440Hz and an amplitude of 1.0

In [None]:
fs = 44100
dur = 3
freq = 440
A = 1.0

Store a value for the resultant signals total length

In [None]:
sample_len = dur * fs

Preallocate an empty array to store our sine wave

In [None]:
sine_samples = np.zeros(sample_len)

Create an array containing an ascending number of float values between 0 and 3 seconds with intervals of 1/fs (our sampling interval)

In [None]:
T = np.linspace(0, dur, num=sample_len)

Loop through the empty array, filling it with an oscillating sine wave

In [None]:
for i in range(sample_len):
    sine_samples[i] = A * np.sin(2 * np.pi * freq * T[i])

Plot the first 100 samples, but this time we are going to display time in milliseconds on the x axis. Note the lower frequency than the previous sine wave

In [None]:
plt.plot(T[0:100]*1000, sine_samples[0:100], marker='o')
plt.ylim([-1.1, 1.1])
plt.title('440Hz sine wave')
plt.xlabel('Time (ms)')
plt.ylabel('Amplitude')

Convert the float values to int16 ready for wave file write

In [None]:
sine_samples = np.int16(sine_samples * 32768)

Write the sample data to wave file

In [None]:
wavfile.write('audio/440.wav', fs, sine_samples)

## Combining audio data

Next we are going to combine 2 audio signals to make a more complex waveform

This will be created by the addition of our white noise and sine wave signals

This is performed very easily by adding the 2 signals together

First we recreate our noise signal, but this time with a 50% amplitude multiplier to reduce its influence on the resultant waveform

In [None]:
fs = 44100
dur = 3
A = 0.5
sample_len = dur * fs
noise_samples = np.zeros(sample_len)

for i in range(sample_len):
    noise_samples[i] = np.random.uniform(-1, 1)

Now we recreate our 440Hz sine wave

In [None]:
fs = 44100
dur = 3
freq = 440
A = 1.0

sine_samples = np.zeros(sample_len)
T = np.linspace(0, dur, num=sample_len)

for i in range(sample_len):
    sine_samples[i] = A * np.sin(2 * np.pi * freq * T[i])

Then we combine them

In [None]:
comb_samples = noise_samples + sine_samples

Now plot the resultant signal to see how it looks, but change the plot scale because the combination of signals will lead to amplitude changes

In [None]:
plt.plot(T[0:100]*1000, comb_samples[0:100], marker='o')
plt.ylim([-2.1, 2.1])
plt.title('Combined white noise and 440Hz sine wave')
plt.xlabel('Time (ms)')
plt.ylabel('Amplitude')

You can clearly see the 440Hz sine wave present in the signal, but with the white noise at 50% amplitude overlaid

The addition of the 2 signals has also increased the resultant signals amplitude to around 2.0. This is above our maximum amplitude value so we must "normalize" this signal back to the permissable -1.0 to +1.0 range

We do this by dividing the entire combined signal with its maximum value, normalizing the values down to a range with maximum and minimum values of -1 and +1

In [None]:
comb_samples = comb_samples / np.max(comb_samples)

Now plot

In [None]:
plt.plot(T[0:100]*1000, comb_samples[0:100], marker='o')
plt.ylim([-1.1, 1.1])
plt.title('Combined and normalized white noise and 440Hz sine wave')
plt.xlabel('Time (ms)')
plt.ylabel('Amplitude')

And save our combined file by first coverting to 16 bit integers

In [None]:
comb_samples = np.int16(comb_samples * 32768)

In [None]:
wavfile.write('audio/440_noise.wav', fs, comb_samples)

This is called the superposition principle and forms the basis of additive synthesis and inversely, Fourier's Theorem, which opens the doors to Fourier analysis 

## Objective audio features primer

In acoustics and especially in policy formation/enforcement we like to reduce audio signals down to easy to interprete values

A standard measure of average signal amplitude over time is the Root Mean Squared (RMS)

It represents the arithmetic mean of the amplitude squared of each discrete point of a waveform and is described by the following equation:

\begin{equation*} x_{rms} = \sqrt{\frac{1}{n}(x_1^2 + x_2^2 \dots + x_n^2)} \end{equation*}