# Predictive Convolution

In the spirit of Tom Erbe's SoundHack, this project is an attempt at recreating the great convolution tool that has inspired so many composers, producers, makers, and sound artists using a modern approach: Machine Learning. The result of this exercise is less dependent on an individual machine learning model, but explores the relationship between the model and the data used to train them and leverage their predictive modelings.

Audio files were obtained via http://soundbible.com/. 

## Initialization and FFT analysis

In [None]:
from scipy.io import wavfile as wav
from scipy.fftpack import fft, ifft
import numpy as np
import pandas as pd
import math

fileA = './input_audio/bells-tibetan-daniel_simon.wav'
fileB = './input_audio/Warbling_Vireo-Mike_Koenig-89869915.wav'

Below are methods to allow for wav data to be extracted, and then data is generated from the raw waveform and using a Fast Fourier Transform (FFT) and methods to convert the resulting complex number to `phase` and `magnitude`.

Note that any potential header or footer added to the wav file is not taken into account here.

In [None]:
def drop_check(df):
    samples, channels = df.shape
    if channels > 1:
        df.drop(1, axis=1, inplace=True)

def extract_wav_data(filename):
    rate, data = wav.read(filename)
    sin_data = np.sin(data)
    sineRaw = pd.DataFrame(sin_data)
    inputRaw = pd.DataFrame(data)
    
    drop_check(sineRaw)
    drop_check(inputRaw)

    bufferdata = pd.concat([inputRaw, sineRaw], axis=1)
    bufferdata.columns=['raw', 'sine']
    
    bufferdata_raw_fft = fft_wav_data(bufferdata['raw'], 'raw')
    bufferdata_sine_fft = fft_wav_data(bufferdata['sine'], 'sine')
    
    bufferdata = pd.concat([bufferdata, bufferdata_raw_fft, bufferdata_sine_fft], axis=1)
    return bufferdata

def fft_wav_data(bufferdataCol, designatorStr):
    fftout = pd.DataFrame()
    complexColName = '{}_complex'.format(designatorStr)
    fftout[complexColName] = fft(bufferdataCol)
    fftout['{}_complex_real'.format(designatorStr)] = fftout[complexColName].real
    fftout['{}_complex_imag'.format(designatorStr)] = fftout[complexColName].imag
    fftout['{}_magnitude'.format(designatorStr)] = np.hypot(fftout[complexColName].real, fftout[complexColName].imag)
    fftout['{}_phase'.format(designatorStr)] = np.angle(fftout[complexColName])
    return fftout

The data returned is in a multi-dimensional array consisting of two items per sample due to the stereophonic nature of the data. At the moment, we strip out the second channel so that only one channel is processed.

Here's an example of what the extracted data looks like:

In [None]:
dataExample = extract_wav_data(fileB)
print(dataExample.info())
dataExample.head()

## Initial Attempt (Failure)

This is here to show how _not_ to do it. I first attempted to predict the raw waveform using the generated data above. This produced some curious results, but the output is always reflected horizantally around the midpoint of the output file's duration. As a result, I decided to try a different approach.

_Note:_ The beginning and ending of the output files have a considerable spike compared to the rest of the file. I recommend deleting these pieces and then normalizing the file for more effective representation.

In [None]:
def pred_conv_fail(wav1, wav2, model, outputfile):
    bufferdata_A = extract_wav_data(wav1)
    bufferdata_B = extract_wav_data(wav2)
    X = bufferdata_A[['raw_phase', 'raw_magnitude', 'sine_phase', 'sine_magnitude', 'sine']]
    Y = bufferdata_A['raw']

    model.fit(X, Y)
    Y_predict = model.predict(bufferdata_B[['raw_phase', 'raw_magnitude', 'sine_phase', 'sine_magnitude', 'sine']])

    rate, data = wav.read(wav2)
    scaled = (Y_predict / Y_predict.max()) * 0.95
    wav.write(outputfile, rate, scaled)

In [None]:
from sklearn.linear_model import BayesianRidge

bayesR = BayesianRidge(compute_score=True)

pred_conv_fail(fileA, fileB, bayesR, './output_audio/fail1.wav')
pred_conv_fail(fileB, fileA, bayesR, './output_audio/fail2.wav')

### Lessons Learned

So there's a couple of problems here - the data coming back seems to be 'reflected' horizontally around the midpoint of the file duration. This appears to be an artifact from the Bayesian Ridge model; however, there's still the problem of the numbers falling off to a particular level midway through the dataset, so it may be more useful to find a way to convert back to a raw waveform using an Inverse Fast-Fourier Transform (iFFT).

The second is that after generating the `.hypot` and the `.angle`, we can't convert that back into the complex number needed for the ifft. So we need to anticipate the imaginary and real postions instead.

## Successful Predictive Convolution

Because of this behavior, I'm assuming that the data needs resynthesized using an inverse FFT, so rather than extracting the predicted raw dataform given a series of magnitudes and phase, I'm going to solve and predict twice:
1. First, solve for the real number,
2. Next, solve for the imaginary number

Then the output is run through an ifft to create a 'predicted' audio file. Note that the output of the ifft is a list of complex numbers with extremely small imaginary portions. For the purposes of this demonstration, I'm only using the real portions to create the final raw waveform.

In [None]:
def pred_conv(wav1, wav2, model, outputfile):
    bufferdata_A = extract_wav_data(wav1)
    bufferdata_B = extract_wav_data(wav2)
    
    features = ['raw', 'raw_magnitude', 'raw_phase', 'sine_magnitude', 'sine_phase', 'sine']
    
    X_train = bufferdata_A[features]
    Y_real = bufferdata_A['raw_complex_real']
    Y_imag = bufferdata_A['raw_complex_imag']

    # First, let's predict the real part of the complex number
    model.fit(X_train, Y_real)
    Y_predict_real = model.predict(bufferdata_B[features])

    # Next, let's predict the imaginary part of the complex number
    model.fit(X_train, Y_imag)
    Y_predict_imag = model.predict(bufferdata_B[features])
    
    # Now let's generate complex numbers using the two and output the predicted waveform using ifft.
    output_df = pd.concat([pd.DataFrame(Y_predict_real), pd.DataFrame(Y_predict_imag)], axis=1)
    output_df.columns=['real', 'imag']
    # We need to format the real and imaginary portions as a single complex number
    output_df['complex'] = output_df['real'] + (output_df['imag'] * 1j)
    # And strip out the imaginary portions to create the 'raw' waveform
    output_df['raw'] = ifft(output_df['complex']).real
    
    rate, data = wav.read(wav2)
    write_wav_file(outputfile, output_df['raw'].values, rate)
    
def write_wav_file(filename, data, rate):
    # Some scaling to avoid insane clipping, with a touch of headroom
    scaled = (data / np.abs(data).max()) * 0.99
    # Adding some padding to reduce popping at the ends of the output for some algorithms
    from_zero = slew(0, scaled[0], math.floor(len(data)/20))
    to_zero = slew(scaled[-1], 0, math.floor(len(data)/20))
    from_zero.extend(scaled)
    from_zero.extend(to_zero)
    # Writing the complete, concatenated data array out
    wav.write(filename, rate, np.array(from_zero))
    
def slew(start, end, values):
    # First, we calculate the step size
    stepSize = (start + end)/values
    # Initialize array with the starting value
    output = [start]
    # Basically, if the magnitude of the start is greater than that of the end,
    #  we need to shrink the magnitude, not expand it
    if(np.abs(start) < np.abs(end)):
        for i in range(values - 1):
            output.append(output[i] + stepSize)
    else:
        for i in range(values - 1):
            output.append(output[i] - stepSize)
    # Adding the final value to the end for good measure
    output.append(end)
    return output

With this method defined, we can now create a couple of models and play around with the audio files. Note that each type of model produces very different results.

As mentioned above, the files required some trimming and normalization to be fully appreciated. I also recommend performing some noise filtering on some of the noisier outputs. The LinearRegression model seems to create an awful lot of noise, but if noise reduction is used in software like Audacity, the output can be very interesting.

Also keep in mind that these take a while to run. Raw audio has a lot of data points, and machine learning, while rapidly improving, is also still relatively slow.

In [None]:
from sklearn.linear_model import BayesianRidge

bayesR = BayesianRidge(compute_score=True)

pred_conv(fileA, fileB, bayesR, './output_audio/bayes1.wav')
pred_conv(fileB, fileA, bayesR, './output_audio/bayes2.wav')

In [None]:
from sklearn.linear_model import LinearRegression

lin_reg = LinearRegression()

pred_conv(fileA, fileB, lin_reg, './output_audio/lin_reg1.wav')
pred_conv(fileB, fileA, lin_reg, './output_audio/lin_reg2.wav')

In [None]:
from sklearn.linear_model import LassoLars

lasso = LassoLars(alpha=0.42, fit_intercept=False, eps=0.2, max_iter=200)

pred_conv(fileA, fileB, lasso, './output_audio/lasso1.wav')
pred_conv(fileB, fileA, lasso, './output_audio/lasso2.wav')

In [None]:
from sklearn.linear_model import PassiveAggressiveRegressor

pa_reg = PassiveAggressiveRegressor(C=0.42, random_state=42, max_iter=580)

pred_conv(fileA, fileB, pa_reg, './output_audio/pa_reg1.wav')
pred_conv(fileB, fileA, pa_reg, './output_audio/pa_reg2.wav')

## Testing Hyperparameters and Additional Models

In [None]:
from sklearn.linear_model import LinearRegression

# By default, normalize=False and fit_intercept=True, so let's try the inverse

lin_reg_1 = LinearRegression(fit_intercept=False, normalize=True)

pred_conv(fileA, fileB, lin_reg_1, './output_audio/lin_reg_2_1.wav')

In [None]:
from sklearn.linear_model import LassoLarsCV

lassoCV = LassoLarsCV(normalize=False, max_iter=20, fit_intercept=False)

pred_conv(fileA, fileB, lassoCV, './output_audio/lassoCV1.wav')
pred_conv(fileB, fileA, lassoCV, './output_audio/lassoCV2.wav')

# Positive = True didn't produce interesting results.

In [None]:
from sklearn.linear_model import RANSACRegressor

ransac = RANSACRegressor(stop_probability=0.42, max_skips=50)

pred_conv(fileA, fileB, ransac, './output_audio/ransac1.wav')
pred_conv(fileB, fileA, ransac, './output_audio/ransac2.wav')