# Exercise: Learning to recognise touch sounds
This exercise will look at recognising the audio registered by a piezo contact microphone on a mobile device when different parts of it are touched by a user. This is data from the **Stane** project ([Paper](http://www.dcs.gla.ac.uk/~rod/publications/MurWilHugQua08.pdf) and [video](http://www.dcs.gla.ac.uk/~rod/Videos/i_chi2.mov)), which used 3D printed surfaces to make super-cheap touch controllers.

<img src="imgs/stane_1.png" width="400px">
<img src="imgs/stane_2.png" width="400px">

The machine learning problem is simple: given a set of recordings of a user rubbing discrete touch zones on this 3D printed case, train a classifier which can distinguish which zone is being touched. This is in essence similar to speech recognition, but with a much simpler acoustic problem and no need to deal with language modeling.

We will use multi-class classification to distinguish the touch zones from the audio alone. We can assume a small number of discrete touch areas, and that there is no model governing how they might be touched (i.e. touches happen at random).



## A data processing pipeline
We need to develop a *pipeline* to process the data. There are several stages common to most supervised learning tasks:

1. Loading the original data (from files, databases etc.)
1. Pre-processing (removing outliers, resampling or interpolating, raw normalisation)
1. Feature extraction (transforming data into fixed length feature vectors)
1. Feature processing (offset removal and normalisation)
1. Data splitting (dividing into testing and training sections)
1. Classification (training the classifier)
1. Evaluation (testing the classifier performance)



## Step 1: Loading the data
The first thing we need to do is to load the data. The data is in `datasets/stane/` and consists of five wave files from scratching five different surfaces, each 60 seconds in length, 4Khz, 16 bit PCM. 



In [None]:
# standard imports
import numpy as np
import scipy.io.wavfile as wavfile
import scipy.signal as sig
import matplotlib.pyplot as plt
import seaborn
# force plots to appear inline on this page
%matplotlib inline

In [None]:
%cd datasets\stane
%ls

In [None]:
# load each of the files into sound_files
sound_files = []
for texture in "12345":
    # load the wavefile
    fname = "stane_%s.wav" % texture
    sr, data = wavfile.read(fname)
    print "Loaded %s, %s samples at %dHz (%f seconds)" % (fname, len(data), sr, len(data)/float(sr))
    sound_files.append(data)
    
    

This has loaded each of the wave files into `sound_files[]`, one for each of our 5 classes. We must process this into fixed length feature vectors which we can feed to a classifier. This is the major "engineering" of the machine learning process -- good feature selection is essential to getting good performance.

It's important that we can change the parameters of the feature extraction and learning and be able to rerun the entire process in one go. We define a dictionary called `params` which will hold every adjustable parameter and a function called `run_pipeline()` which will run our entire pipeline. For now, it does nothing.

In [None]:
params = {'sample_rate':4096,
         }

def run_pipeline(sound_files, params):            
    # this is the outline of our pipeline
    pre_processed = pre_process(sound_files, params)
    features, targets = feature_extract(pre_processed, params)
    train, validate, test = split_features(features, targets, params)    
    classifier = train_classifier(features, targets, params)
    evaluate(classifier, features, targets, params)
    
    

## Step 2: Pre-processing
This data is pretty clean already. We can plot a section of the data to have a look at it:


In [None]:
one_second = params["sample_rate"]

# plot two of the files
plot_section_0 = sound_files[0][:one_second] 
plot_section_1 = sound_files[1][:one_second] 

# generate time indices
timebase = np.arange(len(plot_section_0)) / float(params["sample_rate"])
plt.figure()
plt.plot(timebase, plot_section_0)
plt.xlabel("Time (s)")
plt.figure()
plt.plot(timebase, plot_section_1)
plt.xlabel("Time (s)")



We can also view this in the frequency domain using `plt.specgram()`. We have to choose an FFT size and overlap (here I used N=256 samples, overlap=128)

In [None]:
# the cmap= just selects a prettier heat map
_ = plt.specgram(plot_section_0, NFFT=256, Fs=params["sample_rate"], noverlap=128, cmap="gist_heat")
plt.figure()
_ = plt.specgram(plot_section_1, NFFT=256, Fs=params["sample_rate"], noverlap=128, cmap="gist_heat")

### Preprocessing steps
Two things we should do in the pre-processing step:
1. normalise the data to 0-1 range
2. apply bandpass filtering to select frequencies we are interested in

In [None]:
def bandpass(x, low, high, sample_rate):
    # scipy.signal.filtfilt applies a linear filter to data (*without* phase distortion)
    # scipy.signal.butter will design a linear Butterworth filter     
    nyquist = sample_rate / 2    
    b,a = sig.butter(4, [low/float(nyquist), high/float(nyquist)], btype="band")    
    return sig.filtfilt(b,a,x)
    

def pre_process(sound_files, params):    
    processed = []
    for sound_file in sound_files:
        normalised = sound_file / 32768.0
        p = bandpass(normalised, params["low_cutoff"], params["high_cutoff"], params["sample_rate"])
        processed.append(p)
    return processed
        
        

###  Testing pre-processing
We can test this and check it working by plotting the time series and spectrogram before and after. We can create a quick function to plot this:

In [None]:
def plot_second(x, params):
    one_second = params["sample_rate"]
    plot_section = x[:one_second] 
    # generate time indices
    timebase = np.arange(len(plot_section)) / float(params["sample_rate"])
    plt.figure()
    plt.plot(timebase, plot_section)
    plt.ylabel("Amplitude")
    plt.xlabel("Time (s)")
    plt.figure()
    _ = plt.specgram(plot_section, NFFT=256, Fs=params["sample_rate"], noverlap=128, cmap='gist_heat')
    plt.ylabel("Freq (Hz)")
    plt.xlabel("Time (s)")
    
    
    

In [None]:
# test the filtering; these are example values only
params["low_cutoff"]=100
params["high_cutoff"]=1500
processed = pre_process(sound_files, params)

# plot the results
plot_second(sound_files[0], params)
plot_second(processed[0], params)




## Feature extraction

The next step is to make fixed length feature vectors. This requires some assumptions: we have a continuous signal, so how do we split it up? What processing should we apply to transform the data?  



In [None]:

## Step 3: Feature extraction
## Step 4: Building a classifier
## Step 6: Evaluating performance
## Step 7: A better classifier
## Step 8: Better evaluation methods
## Step 9: Experiment