# Computational Analysis of Sound and Music

# MIR 2 - Harmonic Analysis

Dr.-Ing. Jakob Abeßer, jakob.abesser@idmt.fraunhofer.de

**Last update:** 29.04.2024

**Outline**

In this notebook, you will learn how to implement a simple **chord recognition** algorithm using **binary template matching**.

In [None]:
!pip install wget

In [None]:
import glob
import os
import librosa
import numpy as np
import matplotlib.pyplot as pl
import numpy as np
import IPython.display as ipd
import wget
import zipfile

Let's make sure we have the dataset we need

In [None]:
for fn in ['c_g_am_f_piano.wav', 'c_g_am_f_violin.wav']:
    if not os.path.isfile(fn):
        print('Please wait a couple of seconds ...')
        wget.download(f'https://github.com/machinelistening/machinelistening.github.io/blob/master/{fn}?raw=true', 
                          out=fn, bar=None)
        print(f'{fn} downloaded successfully ...')
    else:
        print('Files already exist!')
print('All files ready :)')

## Check the audio files 

In this notebook, we'll use two audio files of a simple **4-chord sequence**:
 - C major
 - G major
 - A minor
 - F major
 
 This chord sequence is often referred to as "I -> V -> vi -> IV" and is the most popular chord sequence according to https://www.hooktheory.com/theorytab/common-chord-progressions/1 

The sequence is played once with **piano** and once with **violins**:

In [None]:
for fn in ['c_g_am_f_piano.wav', 'c_g_am_f_violin.wav']:
    x, fs = librosa.load(fn)
    ipd.display(ipd.Audio(data=x, rate=fs))

## Chords

A **chord** in music is a combination of three or more notes played simultaneously or in succession. 

It is the basic building block of harmony and can create a sense of tension or release in a piece of music.

Here, we distinguish between the most basic **3-note chord types**: 

- major chords (e.g. the first chord in the audio examples: C-major)
- minor chords (e.g. the third chord in the audio examples: A-minor)

Each chord type includes a **root**, the **third** interval above the root, and the **firth** interval above the root.

Each **interval** has a particular number of **semitones**:

- Major chord
  - major third interval -> **4 semitones** above the root
  - (pure) fifth interval -> **7 semitones** above the root

- Minor chord
  - minor third interval -> **3 semitones** above the root
  - (pure) fifth interval -> **7 semitones** above the root

To summarize, each of the two chord types we consider includes 3 notes, which have a different pitch distance to each other.

## Templates

Now, we want to represent chords as **binary templates** on a **12-dimensional chroma vector**

Remember, the corresponding chroma values are C, C#, D, D#, E etc.

In [None]:
major_template = np.zeros(12, dtype=int)
minor_template = np.zeros(12, dtype=int)

For the **C-major** template, we need to set three of these 12 values to one:
  - root note (C) -> index 0 
  - "major third" interval (E) -> index 4
  - "fifth" interval (G) -> index 7
  
Similarly, for the **C-minor** template, we need to set three of these 12 values to one:
  - root note (C) -> index 0 
  - "minor third" interval (Eb) -> index 3
  - "fifth" interval (G) -> index 7
  
  

In [None]:
major_template[0] = 1
major_template[4] = 1
major_template[7] = 1
print(f"Major chord - binary template: {major_template}")

minor_template[0] = 1
minor_template[3] = 1
minor_template[7] = 1
print(f"Minor chord - binary template: {minor_template}")

**Note**: The binary tempaltes are a **strong simplification** as we assume that 
  - only the fundamental frequency of each tone is audible (which is not true, as we know there are additional harmonics, which end up in different chroma bins)
  - each tone is equally loud (which is also rareley the case)

## Semitone rotation

As we have 12 possible chroma values (C, C#, D, ...), we can rotate these templates accordingly:

In [None]:
templates_major = np.zeros((12, 12), dtype=int)
templates_minor = np.zeros((12, 12), dtype=int)
for i in range(0, 12):
    templates_major[i, :] = np.roll(major_template, i)
    templates_minor[i, :] = np.roll(minor_template, i)
    
print(f"All 12 major templates in the rows of \n {templates_major}")
print(f"All 12 minor templates in the rows of \n{templates_minor}")

# Finally, let's stack all templates
templates = np.vstack((templates_major, templates_minor))
print(templates.shape)

In [None]:
# let's create a list of chord labels
chord_labels = []
for chord_type in ('major', 'minor'):
    for chord_root in ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B']:
        chord_labels.append(f"{chord_root}-{chord_type}")
print(chord_labels)
        

## Chromagram

Let's compute the chromagrams of the piano and violin recording and visualize them

In [None]:
pl.figure(figsize=(8,6))
for f, fn in enumerate([('Piano', 'c_g_am_f_piano.wav'), 
                        ('Violins', 'c_g_am_f_violin.wav')]):
    y, sr = librosa.load(fn[1], mono=True, sr=44100)
    chroma_cens = librosa.feature.chroma_cens(y=y, sr=sr)
    
    pl.subplot(2,1,f+1)
    pl.title(fn[0])
    librosa.display.specshow(chroma_cens, y_axis='chroma', x_axis='time', ax=pl.gca())
pl.tight_layout()
pl.show()

## Template Matching

Now we want to match our binary templates against the computed chromagrams to see **which chords are most likely at a particular time in the audio recording**.

Following ..., we use the **inner product of normalized vectors** as similarity vector between a **binary chord template** and a **chromagram** frame. We first normalize the chromagram and the chord templates.

In [None]:
y, sr = librosa.load('c_g_am_f_piano.wav', mono=True, sr=44100)
chroma_cens = librosa.feature.chroma_cens(y=y, sr=sr)

# normalize chromagram by dividing each frame (column) by its 2-norm
chroma_cens_norm = chroma_cens / np.sqrt(np.sum(chroma_cens**2, axis=0))

# normalize the chord templates in a similar way (but per row!)
templates_norm = templates / np.sqrt(np.sum(templates**2, axis=1))[:, np.newaxis]

# Now we compute the similarity of all chromagram frames with all major types efficiently:

# (1) chromagram dimension: 12 x N_frames
print(chroma_cens_norm.shape)

# (2) templates dimension N_chords x 12
print(templates_norm.shape)

# Therefore, we can use a matrix multiplication
chord_similarity = np.matmul(templates_norm, chroma_cens_norm)

In [None]:
# let's visualize our chord detection results
pl.figure(figsize=(10,6))
pl.imshow(chord_similarity, aspect="auto", origin="lower", interpolation="None")
pl.yticks(np.arange(24), chord_labels)
pl.ylabel('Chord')
pl.xlabel('Frame')
pl.colorbar()

## Conclusion

Remember our initial chord progression? 
 - C major
 - G major
 - A minor
 - F major
 
 **Looks like our template matching approach works well for this example :)**

## Further steps

- read more about the template matching approach at https://www.audiolabs-erlangen.de/resources/MIR/FMP/C5/C5S2_ChordRec_Templates.html#Template-Based-Pattern-Matching

- try to define other templates, e.g. for seventh chords like "major-seventh" or "dominant-seventh"
- apply the approach on other music styles / recordings with multiple instruments playing