# Using MIR to help create a Wall of Sound Composition 
## Comparing Spectral Centroid / Zero Crossing Rate / Spectral Flatness

### Description:

Wall of Sound is a production and sound design technique based on layering sounds to create a fuller and thicker sound. You can find a deeper explanation of this technique <a href="https://en.wikipedia.org/wiki/Wall_of_Sound">here</a>.

I am creating a wall of sound with a collection of audio samples that I selected because I think that will sound good together. An extensive collection of samples was analyzed previously, and for this exercise, I choose the ones that are in the same tonality.

When layering sounds, I have observed that there is the need of distributing the samples across the spectrum. For this I have created three separate frequency bands (low, mid and high frequencies) and I need to classify the samples to fit into this bands. This classification may create the problem of what is considered to belong to each band, because samples may contain energy in all frequencies. In this exercise I am trying to see where this sounds contribute the best so I can select the right amount of samples per band.

### Objective:

To distribute this sounds, I have found that audio features like Spectral Centroid, Zero Crossing Rate and Spectral Flatness can help in finding the sounds position in the spectrum. To make a complete analysis, I will compare this features to find the right crossover frequencies (low-mid crossover frequency and mid-high crossover frequency), based on ground truth I provide using Leave One Out Cross Validation model. 

This notebook helps to analyze a set of audio files, ther ground truth and cross validates to find the right low-mid crossover and mid-high crossover frequencies to distribute the samples across the spectrum.

### Tools:

For this task I will use Essentia library, which is a library for audio analysis and feature extraction, as well as Pandas, a library for Data Science to analyze the data. Pandas uses a data structure called Data Frame, and it stores, merges, analyzes and retrieves data in a way similar to a SQL Relational Database. This tool helps to visualize the steps and analyze this process. Often in this notebook I will display the data in this Data Frames to help understand the current status in the process.

__Note:__ The term Crossover Frequency is used in Audio Engineering to refer to the set frequencies that divide the spectrum into a number of bands. In this classification problem it refers to the thresholds that separate classification of samples, even when I continue to refer to them as Crossover, both in the code and comments.

Joaquin Jimenez Sauma (SMC17)

In [1]:
import os
import matplotlib.pyplot as plt 
import numpy as np
import IPython
import pandas as pd
#import ipywidgets as widgets
from essentia import *
from essentia.standard import *

## Ground truth

The next cell shows the classification I am using according to my personal opinion after listening each file. Sounds like pinknoise.wav, whitenoise.wav, sine20khz.wav, and sine50hz.wav are for testing purposes, and won't be used in the actual piece.

In [2]:
groundTruth = {
    'pinknoise.wav': 'm'
    ,'sine20khz.wav': 'h'
    ,'sine50hz.wav': 'l'
    ,'sound1.wav': 'm'
    ,'sound2.wav': 'm'
    ,'sound3.wav': 'm'
    ,'sound4.wav': 'h'
    ,'sound5.wav': 'l'
    ,'sound6.wav': 'h'
    ,'sound7.wav': 'l'
    ,'sound8.wav': 'h'
    ,'sound9.wav': 'm'
    ,'sound10.wav': 'm'
    ,'sound11.wav': 'h'
    ,'sound12.wav': 'm'
    ,'sound13.wav': 'l'
    ,'sound14.wav': 'h'
    ,'sound15.wav': 'l'
    ,'whitenoise.wav': 'h'
}

### Next cell reads files in `path` and loads them to compute the spectral centroid, Zero Crossing Rate and Spectral Flatness

Selected files are in the specified path in the next cell, where I am loading them into variables in memory, si I can extract and process its features. This code looks for files with `.wav` extension, but you can change it and use any kind of audio file format. 

This cell saves the stream of values for each feature in `scList`, `zcList` and `sfList`, where __sc__ stands for Spectral Centroid, __zc__ for Zero Crossing Rate and __sf__ for Spectral Flatness, I will use this prefixes from now on in this notebook.

I use `data` Data Frame to compute and store the average of this features per file for future reference and operations.

__Please update `path` variable to point to the folder that contains the audio files.__

In [3]:
# This cell reads files and extracts features. Features are saved in scList, zcList and sfList lists
# Averages features per file and are saved on data (a Pandas Datagrid)

path = '../../data/wavs/'

row = {}
data = pd.DataFrame(row, columns=('file', 'sc', 'zc', 'sf', 'gt'))
data.set_index(['file'])
w = Windowing()
spec = Spectrum()
centroid = Centroid(range=1) # Normalized to 1 so it can be compared to ZCR and SF

scList = {} #spectral centroid
zcList = {} #zero crossing rate
sfList = {} #spectral flatness

# Create data Datagrid to contain mean of each feature for each file
# Keeps features by frame in arrays (scList, zcList and sfList) to plot
for file in os.listdir(path):
    if file.endswith(".wav"):
        audio = MonoLoader(filename= path + file, sampleRate=44100)()
        c = []
        z = []
        f = []
        for frame in FrameGenerator(audio, frameSize = 1024, hopSize = 512):
            c.append(centroid(spec(w(frame))))
            z.append(ZeroCrossingRate()(frame))
            f.append(Flatness()(spec(w(frame))))        
        scList[file] = np.array(c)
        zcList[file] = np.array(z)
        sfList[file] = np.array(f)
        row['file'] = file
        row['sc'] = np.mean(scList[file][5:-5])
        row['zc'] = np.mean(zcList[file][5:-5])
        row['sf'] = np.mean(sfList[file][5:-5])
        row['gt'] = groundTruth[file]
        data.loc[len(data)] = (row)

## Generate the list of tests to be performed

Next cell generates a list of the tests to be performed, and will be used in the process. 

Please note that this is a long process, it depends on the value of variable `step`. This defines the step size when generating crossovers. If you want to run the process faster, it is recommended to increase its size. Values can vary from 40 to 400.

In [4]:
# Generate a set of tests for each file

step = 50
files = data.sort_values(by=['file'])['file'].tolist()
row={}
tests = pd.DataFrame(row, columns = ['file', # Training file
                                     'test', # Test file (only used until the end)
                                     'lowCrossover', # Low Thresdhold value, the one that is left out
                                     'highCrossover', # High Thresdhold value
                                     'scClass', # Spectral Centroid class (can be 'l', 'm', 'h')
                                     'zcClass', # Zero Crossing Rate class (can be 'l', 'm', 'h')
                                     'sfClass']) # Spectral Flatness class (can be 'l', 'm', 'h')
tests.set_index(['file', 'test', 'lowCrossover', 'highCrossover'])

for file in files: 
    for leftOut in files: # Select the file that is left out, is shown here just for debugging, not being tested here
        if file != leftOut:
            for lowCrossover in range(5, 950, step): 
                for highCrossover in range (10, 1000, step):
                    if lowCrossover < highCrossover:
                        row['file'] = file
                        row['test'] = leftOut
                        row['lowCrossover'] = lowCrossover/1000
                        row['highCrossover'] = highCrossover/1000
                        row['scClass'] = ''
                        row['zcClass'] = ''
                        row['sfClass'] = ''
                        tests.loc[len(tests)] = (row)   
                        
# Tests Dataframe with a combination of crossovers for each file
tests.sort_values(
    by=['test',
        'file',
        'lowCrossover', 
        'highCrossover' ])[
    ['file', 
     'test',
     'lowCrossover', 
     'highCrossover', 
     'scClass', 
     'zcClass', 
     'sfClass']]

try:
    os.remove(path + 'tests.csv')
except OSError:
    pass

# Save tests to csv file
tests.to_csv(path + 'tests.csv')

# Use provided data file or file created in previous cell
tests = pd.read_csv(path + 'tests.csv')

## Process

Next cells process implement Leave One Out Cross Validation model.

In [5]:
tests.set_index(['file', 'test', 'lowCrossover', 'highCrossover'])

# Merge tests dataframe with data dataframe to have ground truth and extracted features for each training file
tests = pd.merge(tests,
                   data,
                   left_on='file',
                   right_on='file',
                   how='left')

In [6]:
# Classify each feature on each file for every set of crossovers
for row in tests:
    tests['scClass'] = np.where(tests.sc < tests.lowCrossover, 'l', 
        np.where(tests.sc > tests.highCrossover, 'h', 'm'))
    tests['zcClass'] = np.where(tests.zc < tests.lowCrossover, 'l', 
        np.where(tests.zc > tests.highCrossover, 'h', 'm'))
    tests['sfClass'] = np.where(tests.sf < tests.lowCrossover, 'l', 
        np.where(tests.sf > tests.highCrossover, 'h', 'm'))  

In [7]:
# Classify each feature on each file for every set of crossovers
for test in tests:
    tests['scResult'] = np.where(tests['scClass'] == tests['gt'], True, False)
    tests['zcResult'] = np.where(tests['zcClass'] == tests['gt'], True, False)
    tests['sfResult'] = np.where(tests['sfClass'] == tests['gt'], True, False)
        


In [8]:
# From dataframe we can start to compute our results

# Computing learned crossovers 

learned = pd.merge(pd.merge(
    tests.loc[tests['scResult'] == True].agg({
    'lowCrossover': ['mean'],
    'highCrossover': ['mean']
    }), 
    tests.loc[tests['zcResult'] == True].agg({
    'lowCrossover': ['mean'],
    'highCrossover': ['mean']
    }), 
    right_index=True, left_index=True, sort=False),
    tests.loc[tests['sfResult'] == True].agg({
    'lowCrossover': ['mean'],
    'highCrossover': ['mean']
    }),
    right_index=True, left_index=True, sort=False)

learned.rename(columns ={'highCrossover_x': 'scHC'}, inplace =True)
learned.rename(columns ={'highCrossover_y': 'zcHC'}, inplace =True)
learned.rename(columns ={'highCrossover': 'sfHC'}, inplace =True)
learned.rename(columns ={'lowCrossover_x': 'scLC'}, inplace =True)
learned.rename(columns ={'lowCrossover_y': 'zcLC'}, inplace =True)
learned.rename(columns ={'lowCrossover': 'sfLC'}, inplace =True)

In [9]:
# Computing the error

# Total number of tests
total = len(tests)

error = pd.merge(pd.merge(
    tests.loc[tests['scResult'] == True].agg({
    'file': ['count'] 
    }), 
    tests.loc[tests['zcResult'] == True].agg({
    'file': ['count']
    }), 
    right_index=True, left_index=True, sort=False),
    tests.loc[tests['sfResult'] == True].agg({
    'file': ['count']
    }),
    right_index=True, left_index=True, sort=False)

error.rename(columns ={'file_x': 'SC'}, inplace =True)
error.rename(columns ={'file_y': 'ZC'}, inplace =True)
error.rename(columns ={'file': 'SF'}, inplace =True)

error['SC'] = error['SC'].iloc[0] / total
error['ZC'] = error['ZC'].iloc[0] / total
error['SF'] = error['SF'].iloc[0] / total
error.reset_index(drop = True, inplace = True)

In [10]:
# Preparing a new Dataframe to show learned crossovers
results = pd.merge(pd.merge(
    tests.loc[tests['scResult'] == True].groupby(['test']).agg({
    'lowCrossover': ['mean'],
    'highCrossover': ['mean'],
    'file': ['count']
    }), 
    tests.loc[tests['zcResult'] == True].groupby(['test']).agg({
    'lowCrossover': ['mean'],
    'highCrossover': ['mean'],
    'file': ['count']
    }), 
    right_index=True, left_index=True, sort=False),
    tests.loc[tests['sfResult'] == True].groupby(['test']).agg({
    'lowCrossover': ['mean'],
    'highCrossover': ['mean'],
    'file': ['count']
    }),
    right_index=True, left_index=True, sort=False)

# Rename columns
results = pd.concat([results, tests.groupby(['test']).size().to_frame()], axis = 1)
results.rename(columns ={0: 'Total'}, inplace =True)
results.rename(columns ={('highCrossover_x', 'mean'): 'scHC'}, inplace =True)
results.rename(columns ={('highCrossover_y', 'mean'): 'zcHC'}, inplace =True)
results.rename(columns ={('highCrossover', 'mean'): 'sfHC'}, inplace =True)
results.rename(columns ={('lowCrossover_x', 'mean'): 'scLC'}, inplace =True)
results.rename(columns ={('lowCrossover_y', 'mean'): 'zcLC'}, inplace =True)
results.rename(columns ={('lowCrossover', 'mean'): 'sfLC'}, inplace =True)
results.rename(columns ={('file_x', 'count'): 'scTotal'}, inplace =True)
results.rename(columns ={('file_y', 'count'): 'zcTotal'}, inplace =True)
results.rename(columns ={('file', 'count'): 'sfTotal'}, inplace =True)
results.index.name = 'test'
results.reset_index(inplace=True)

In [11]:
# Compute confidence level for each feature, for each file

for row in results:
    results['scConfidence'] = results['scTotal'] / results['Total']
    results['zcConfidence'] = results['zcTotal'] / results['Total']
    results['sfConfidence'] = results['sfTotal'] / results['Total']

In [12]:
# Merge results dataframe with data dataframe to have ground truth and extracted features for each training file
# First I'll put the indexes in order
results = pd.merge(results,
                   data,
                   left_on='test',
                   right_on='file',
                   how='left')
results = results.copy()

# The merge step put all fields together, so I will select the ones I want to keep
results = results [[
    'test',
    'gt',
    'sc',
    'zc',
    'sf',
    'scLC',
    'scHC',
    'scTotal',
    'zcLC',
    'zcHC',
    'zcTotal',
    'sfLC',
    'sfHC',
    'sfTotal',
    'Total',
    'scConfidence',
    'zcConfidence',
    'sfConfidence'
    ]]

In [13]:
def testValue(gt, value, lc, hc):
    if value < lc:
        r = 'l'
    elif value > hc:
        r = 'h'
    else:
        r = 'm'
    
    if r == gt:
        return True
    else:
        return False

# Test left out file on learned crossovers
for row in results:
    results['scPassed'] = testValue(results['gt'].iloc[0], results['sc'].iloc[0], results['scLC'].iloc[0], results['scHC'].iloc[0])
    results['zcPassed'] = testValue(results['gt'].iloc[0], results['zc'].iloc[0], results['zcLC'].iloc[0], results['zcHC'].iloc[0])
    results['sfPassed'] = testValue(results['gt'].iloc[0], results['sf'].iloc[0], results['sfLC'].iloc[0], results['sfHC'].iloc[0])

## Results

Next cell prints the results of the evaluation. The crossovers learned, error per feature and the list of tests for each iteration of the cross validation.

In [17]:
print('Learned crossovers:')
print(learned[['scLC', 'scHC', 'zcLC', 'zcHC', 'sfLC', 'sfHC']])
print('')
print('Error per feature:')
print(error)


Learned crossovers:
          scLC      scHC      zcLC      zcHC      sfLC      sfHC
mean  0.267033  0.604337  0.277296  0.608709  0.264533  0.611169

Error per feature:
         SC        ZC        SF
0  0.391337  0.360866  0.409469


## Conclusions

Leave One Out Cross Validation seems accurate to find crossover frequencies. When testing using more values (setting smaller values for `step` variable), error numbers are smaller. This may indicate that with more detail, we can find better values. This is a lenghty process and requires a lot of computational power, but with more time I would like to experiment with even more detail. Anonther good experiment would be to use more samples to see how much this method can be improved.

