# Install audio libraries 

In [40]:
# https://librosa.org/doc/latest/tutorial.html
# https://wiki.python.org/moin/Audio
#   This page tries to provide a starting point for those who want to work with audio in combination with Python
# https://wiki.python.org/moin/FrontPage  
#   A user-editable compendium of knowledge based around the Python programming language. Some pages are protected against casual editing 
# https://wiki.python.org/moin/PythonInMusic
#   This page is divided in three sections: Music software written in Python, Music programming in Python, and Music software supporting Python
# https://wiki.python.org/moin/PythonAudioMaterial : 
import sys

if not sys.warnoptions:
    import warnings
    warnings.simplefilter("ignore")

## 1) librosa

librosa is a python package for music and audio analysis. It provides the building blocks necessary to create music information retrieval systems.
Reference manual and tutorial for osa is at https://librosa.org/doc/ 

In [1]:
#conda install -c conda-forge librosa OR
# !pip install librosa

### The librosa package is structured as collection of submodules:


[librosa.beat](https://librosa.org/doc/latest/beat.html#beat)
> Functions for estimating tempo and detecting beat events.

[librosa.core](https://librosa.org/doc/latest/core.html#core)
> Core functionality includes functions to load audio from disk, compute various spectrogram representations, and a variety of commonly used tools for music analysis. For convenience, all functionality in this submodule is directly accessible from the top-level librosa.* namespace.

[librosa.decompose](https://librosa.org/doc/latest/decompose.html#decompose)
> Functions for harmonic-percussive source separation (HPSS) and generic spectrogram decomposition using matrix decomposition methods implemented in scikit-learn.

[librosa.display](https://librosa.org/doc/latest/display.html#display)
> Visualization and display routines using matplotlib.

[librosa.effects](https://librosa.org/doc/latest/effects.html#effects)
> Time-domain audio processing, such as pitch shifting and time stretching. This submodule also provides time-domain wrappers for the decompose submodule.

[librosa.feature](https://librosa.org/doc/latest/feature.html#feature)
> Feature extraction and manipulation. This includes low-level feature extraction, such as chromagrams, Mel spectrogram, MFCC, and various other spectral and rhythmic features. Also provided are feature manipulation methods, such as delta features and memory embedding.

[librosa.filters](https://librosa.org/doc/latest/filters.html#filters)
> Filter-bank generation (chroma, pseudo-CQT, CQT, etc.). These are primarily internal functions used by other parts of librosa.

[librosa.onset](https://librosa.org/doc/latest/onset.html#onset)
> Onset detection and onset strength computation.

[librosa.segment](https://librosa.org/doc/latest/segment.html#segment)
> Functions useful for structural segmentation, such as recurrence matrix construction, time-lag representation, and sequentially constrained clustering.

[librosa.sequence](https://librosa.org/doc/latest/sequence.html#sequence)
> Functions for sequential modeling. Various forms of Viterbi decoding, and helper functions for constructing transition matrices.

[librosa.util](librosa.util)
> Helper utilities (normalization, padding, centering, etc.)

In [2]:
import librosa

In [22]:
'''
Load the audio as a waveform `y`. librosa.load loads and decodes the audio as a time series into a numpy array
Store the sampling rate as `sr`, number of samples per second of audio
By default, all audio is mixed to mono and resampled to 22050 Hz at load time 
Frames here correspond to short windows of the signal (y), each separated by hop_length = 512 samples. 
librosa uses centered frames, so that the kth frame is centered around sample k * hop_length
'''
y, sr = librosa.load("data/010211-120252.wav")

In [23]:
'''
    Run the default beat tracker
'''
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)

In [19]:
print("Sampling rage is {}".format(sr))
print('Estimated tempo: {:.2f} beats per minute'.format(tempo))

Sampling rage is 22050
Estimated tempo: 112.35 beats per minute


In [32]:
'''
    converts the frame numbers beat_frames into timings
    beat_times will be an array of timestamps (in seconds) corresponding to detected beat events.
    The output of the beat tracker is an estimate of the tempo (in beats per minute), 
        and an array of frame numbers corresponding to detected beat events.
'''
beat_times = librosa.frames_to_time(beat_frames, sr=sr)

In [33]:
beat_times

array([  1.18421769,   1.71827664,   2.32199546,   2.87927438,
         3.45977324,   4.01705215,   4.59755102,   5.13160998,
         5.71210884,   6.26938776,   6.84988662,   7.40716553,
         7.9876644 ,   8.52172336,   9.12544218,   9.63628118,
        10.19356009,  10.70439909,  11.261678  ,  11.77251701,
        12.32979592,  12.86385488,  13.39791383,  13.93197279,
        14.46603175,  15.0000907 ,  15.53414966,  16.06820862,
        16.62548753,  17.15954649,  17.69360544,  18.2276644 ,
        18.7385034 ,  19.29578231,  19.85306122,  20.36390023,
        20.89795918,  21.43201814,  21.9660771 ,  22.50013605,
        23.03419501,  23.56825397,  24.10231293,  24.63637188,
        25.19365079,  25.7044898 ,  26.26176871,  26.81904762,
        27.35310658,  27.88716553,  28.44444444,  28.9785034 ,
        29.53578231,  30.09306122,  30.65034014,  31.20761905,
        31.76489796,  32.32217687,  32.83301587,  33.36707483,
        33.90113379,  34.43519274,  34.94603175,  35.45

In [38]:
# Feature extraction example
import numpy as np
import librosa

# Load the example clip
y, sr = librosa.load(librosa.ex('nutcracker'))

In [41]:
'''
    Use effects module for harmonic-percussive separation.
    Harmonic/percussive source separation (HPSS) consists in separating the pitched instruments from the percussive parts in a music mixture.
    
    When listening to our environment, there exists a wide variety of different sounds. However, on a
    very coarse level, many sounds can be categorized to belong in either one of two classes: harmonic
    or percussive sounds. Harmonic sounds are the ones which we perceive to have a certain pitch
    such that we could for example sing along to them. The sound of a violin is a good example of a
    harmonic sound. Percussive sounds often stem from two colliding objects like for example the two
    shells of castanets. An important characteristic of percussive sounds is that they do not have a
    pitch but a very clear localization in time. Many real-world sounds are mixtures of harmonic and
    percussive components. For example, a note played on a piano has a percussive onset (resulting
    from the hammer hitting the strings) preceding the harmonic tone (resulting from the vibrating
    string). Read this interesting paper https://www.audiolabs-erlangen.de/content/05-fau/professor/00-mueller/02-teaching/2016w_mpa/LabCourse_HPSS.pdf
'''
# Set the hop length; at 22050 Hz, 512 samples ~= 23ms
hop_length = 512

# Separate harmonics and percussives into two waveforms
y_harmonic, y_percussive = librosa.effects.hpss(y)

22050

In [42]:
'''
        
    The result of this line is that the time series y has been separated into two time series, 
    containing the harmonic (tonal) and percussive (transient) portions of the signal.
'''
# Beat track on the percussive signal
tempo, beat_frames = librosa.beat.beat_track(y=y_percussive,
                                             sr=sr)

In [43]:
'''
    EXTRACT the Mel-frequency cepstral coefficients from the raw signal 
'''
# Compute MFCC features from the raw signal
mfcc = librosa.feature.mfcc(y=y, sr=sr, hop_length=hop_length, n_mfcc=13)

mfcc

array([[-602.36, -602.36, -602.36, ..., -602.36, -602.36, -602.36],
       [   0.  ,    0.  ,    0.  , ...,    0.  ,    0.  ,    0.  ],
       [   0.  ,    0.  ,    0.  , ...,    0.  ,    0.  ,    0.  ],
       ...,
       [   0.  ,    0.  ,    0.  , ...,    0.  ,    0.  ,    0.  ],
       [   0.  ,    0.  ,    0.  , ...,    0.  ,    0.  ,    0.  ],
       [   0.  ,    0.  ,    0.  , ...,    0.  ,    0.  ,    0.  ]],
      dtype=float32)

In [44]:
'''
    delta, computes (smoothed) first-order differences among columns of its inpu

'''
# And the first-order differences (delta features)
mfcc_delta = librosa.feature.delta(mfcc)
mfcc_delta

array([[-2.4221520e-14, -2.4221520e-14, -2.4221520e-14, ...,
        -5.7067871e-03, -5.7067871e-03, -5.7067871e-03],
       [ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -2.9969288e-03, -2.9969288e-03, -2.9969288e-03],
       [ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
         5.8454503e-03,  5.8454503e-03,  5.8454503e-03],
       ...,
       [ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -6.3627991e-03, -6.3627991e-03, -6.3627991e-03],
       [ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -6.9731898e-03, -6.9731898e-03, -6.9731898e-03],
       [ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
         1.1842677e-03,  1.1842677e-03,  1.1842677e-03]], dtype=float32)

In [45]:
'''
    sync, aggregates columns of its input between sample indices (e.g., beat frames)
'''

# Stack and synchronize between beat events
# This time, we'll use the mean value (default) instead of median
beat_mfcc_delta = librosa.util.sync(np.vstack([mfcc, mfcc_delta]),
                                    beat_frames)
beat_mfcc_delta

array([[-5.96050842e+02, -4.71969910e+02, -4.18730255e+02, ...,
        -2.30211304e+02, -1.81636154e+02, -4.22517181e+02],
       [ 5.29716206e+00,  1.19485054e+02,  1.19477348e+02, ...,
         6.63386002e+01,  6.66869888e+01,  7.45657654e+01],
       [ 1.79359317e+00,  5.01007881e+01,  1.31559191e+01, ...,
        -6.48276062e+01, -6.93418808e+01, -2.06324558e+01],
       ...,
       [-6.53347746e-02, -1.46815509e-01,  5.34748845e-02, ...,
        -1.71011150e-01,  6.23303831e-01,  1.29031271e-01],
       [-2.28583626e-02, -1.37432203e-01, -1.16937226e-02, ...,
        -2.50851601e-01,  2.50305921e-01,  8.25022347e-03],
       [ 2.85042245e-02, -2.00774282e-01, -5.55300675e-02, ...,
         7.40871802e-02, -6.14577591e-01,  7.62199908e-02]], dtype=float32)

In [46]:
'''
    Compute a chromagram using just the harmonic component.
'''
# Compute chroma features from the harmonic signal
chromagram = librosa.feature.chroma_cqt(y=y_harmonic,
                                        sr=sr)
chromagram

array([[0.624562  , 0.6000365 , 0.3891103 , ..., 0.12988181, 0.26408905,
        0.46952796],
       [0.45346972, 0.31484362, 0.29985076, ..., 0.14454272, 0.13794322,
        0.20688398],
       [0.4300288 , 0.4736917 , 0.3885005 , ..., 0.5249095 , 0.4183366 ,
        0.35342664],
       ...,
       [0.6775574 , 0.46899557, 0.31649804, ..., 0.15098049, 0.20618644,
        0.41384566],
       [0.4585575 , 0.43789858, 0.31573945, ..., 0.27923036, 0.28339976,
        0.5822882 ],
       [1.        , 1.        , 1.        , ..., 0.96677846, 0.8960364 ,
        1.        ]], dtype=float32)

In [48]:
'''
    Again synchronize the chroma between beat events
'''
# Aggregate chroma features between beat events
# We'll use the median value of each feature between beat frames
beat_chroma = librosa.util.sync(chromagram,
                                beat_frames,
                                aggregate=np.median)

# Finally, stack all beat-synchronous features together
beat_features = np.vstack([beat_chroma, beat_mfcc_delta])

In [49]:
beat_features

array([[ 0.32793146,  0.03347357,  0.10583013, ...,  0.15275124,
         0.07555592,  0.08132137],
       [ 0.31918573,  0.04075494,  0.0842821 , ...,  0.09635098,
         0.12809394,  0.09471031],
       [ 0.4736917 ,  0.05757335,  0.09427972, ...,  0.44614288,
         0.18659325,  0.13297296],
       ...,
       [-0.06533477, -0.14681551,  0.05347488, ..., -0.17101115,
         0.62330383,  0.12903127],
       [-0.02285836, -0.1374322 , -0.01169372, ..., -0.2508516 ,
         0.25030592,  0.00825022],
       [ 0.02850422, -0.20077428, -0.05553007, ...,  0.07408718,
        -0.6145776 ,  0.07621999]], dtype=float32)

In [50]:
# https://librosa.org/doc/latest/advanced.html#advanced

## 2) ffmpeg

For more audio decoding options, install ffmpeg.
I used brew install fmpeg on the mac.

In [9]:
!brew install ffmpeg

Updating Homebrew...
[34m==>[0m [1mAuto-updated Homebrew![0m
Updated 5 taps (homebrew/cask-versions, homebrew/core, homebrew/cask, homebrew/services and mongodb/brew).
[34m==>[0m [1mNew Formulae[0m
aliddns                    datalad                    name-that-hash
ansible@2.9                grokmirror                 nuclei
cherrytree                 hexo                       osmcoastline
cloudflare-wrangler        htmltest                   parliament
coin3d                     ko                         rtl_433
cpplint                    libmd                      saml2aws
crane                      libprelude                 truffle
crcany                     luv                        vitess
curlie                     mpdecimal                  vsh
dasel                      msc-generator              xcprojectlint
[34m==>[0m [1mUpdated Formulae[0m
Updated 5013 formulae.
[34m==>[0m [1mRenamed Formulae[0m
glibmm@2.64 -> glibmm@2.66               pangomm@2.42 -> pa

import ffmpeg