# Genetic Algorithm for Synthesizer Design Optimization 🧬🎹

---

Welcome to this Jupyter Notebook where we will explore the fascinating intersection of genetics, computer science, and sound design! 💻🔊

Our objective? To develop an implementation of a genetic algorithm that's capable of optimizing synthesizer design parameters. By doing so, we aim to generate a synthesized sound that closely matches a given target sound. 🎯

The world of sound synthesis is deeply complex and nuanced, with a myriad of parameters that can be manipulated to create unique sounds. From the frequency of an oscillator to the attack time of an envelope, each parameter shapes the resulting sound in its own way. 🎛️🎚️

And, here's where the genetic algorithms come in. Inspired by natural evolution, genetic algorithms are a search heuristic that are particularly well-suited for black box optimization problems, such as the one we are tackling here. 🏞️🔍

Just like natural selection favors the fittest individuals, our algorithm will favor synthesizer configurations that produce a sound closest to our target sound. Over many generations, the algorithm will "evolve" the parameters to improve the fitness of our synthesized sound. 🔄🧬

For our demo, we're going to use a unique, expressive piano sample from the track 'Blue in Green' by the legendary jazz pianist, Bill Evans. 🎶🎹🎼

As we proceed, we'll detail each step of the process, from defining the genetic algorithm, to setting up the synthesizer, selecting the fitness function, and more. We hope this guide serves as a useful resource for anyone interested in combining the power of genetics, artificial intelligence, and sound design. 📚🤓

Let's dive in!

## Imports

In [1]:
import numpy as np
import dawdreamer
import sys; sys.path.append('../')
import os
import IPython.display as ipd
import pandas as pd
import torch
from librosa import load
from librosa.feature import mfcc
from src.utils import *
from src.config import ga_settings, daw_settings

In [2]:
# print the ../ path in full as a check
print(os.path.abspath('../'))

/Users/malek8/My Drive (malek8@mit.edu)/MIT/Spring 2023/4.453 (Creative Machine Learning for Design)/GASS Term Project/gass_repo


## Load the Target Sound

In [3]:
# specify the folder of the target sound
target_folder = '../timbre-exp/target-dataset'
# print the absolute path of the target folder
print(f'Absolute Path: {os.path.abspath(target_folder)}')

Absolute Path: /Users/malek8/My Drive (malek8@mit.edu)/MIT/Spring 2023/4.453 (Creative Machine Learning for Design)/GASS Term Project/gass_repo/timbre-exp/target-dataset


In [4]:
# specify the target sound file base name
target_file_name = 'bill-evans-piano'

# combine the target folder and target file name to get the target file path
target_file_path = os.path.join(target_folder,target_file_name) + '.wav'

print(f'Target File Path: {os.path.abspath(target_file_path)}')

Target File Path: /Users/malek8/My Drive (malek8@mit.edu)/MIT/Spring 2023/4.453 (Creative Machine Learning for Design)/GASS Term Project/gass_repo/timbre-exp/target-dataset/bill-evans-piano.wav


In [5]:
# read the wav file
target_audio, target_sample_rate = load(target_file_path, sr=44100)
print(f'Target Sample Rate: {target_sample_rate}')
print(f'Target Audio Shape: {target_audio.shape}')

Target Sample Rate: 44100
Target Audio Shape: (44100,)


In [6]:
# determine the length of the target audio in seconds
target_audio_length = librosa.get_duration(target_audio, sr=target_sample_rate)
print(f'Target Audio Length: {target_audio_length}')

Target Audio Length: 1.0


 -1.5258789e-05 -1.5258789e-05] as keyword args. From version 0.10 passing these as positional arguments will result in an error
  target_audio_length = librosa.get_duration(target_audio, sr=target_sample_rate)


In [7]:
# play the target audio
ipd.Audio(target_audio,rate=target_sample_rate)

## Obtain Feature Representation of Target Sound

In [8]:
# apply the MFCC transform to the target audio
target_mfcc = mfcc(y=target_audio,sr=target_sample_rate).reshape(-1)
print(f'Target MFCC Shape: {target_mfcc.shape}')

Target MFCC Shape: (1740,)


## Find Closest Feature Match in Preset Dataset $\implies$ Obtain $p^* \in \mathbb{R}^{k \times 1}$

---

Here $k$ denotes the number of parameters in the vector that controls the output of the synthesizer.

### Load the Preset Dataset

In [9]:
# load the preset dataset
dataset = pd.DataFrame(torch.load('../dataset/processed_preset_dataset_musicnn.pt'))
print(f'Dataset Shape: {dataset.shape}')


Dataset Shape: (367, 6)


In [10]:
# Define constants related to the TAL-Uno plugin
PRESET_FOLDER = "/Users/malek8/Library/Application Support/ToguAudioLine/TAL-U-No-LX/presets"
PRESET_EXT = ".pjunoxl"

In [11]:
# obtain the first row of the dataset
row = dataset.iloc[0]

# index into the parameters column
parameters = row['parameters']

# find the max and minimum values of the parameters
max_value = np.max(parameters)
min_value = np.min(parameters)

# print the max and min
print(f'Max Value: {max_value}')
print(f'Min Value: {min_value}')

Max Value: 1.0
Min Value: 0.0


In [12]:
# obtain a list of all the preset paths
preset_paths = []
for root, dirs, files in os.walk(PRESET_FOLDER):
    for file in files:
        if file.endswith(PRESET_EXT):
            preset_paths.append(os.path.join(root, file))
print(f'The first 5 preset paths:\n{preset_paths[:5]}')

The first 5 preset paths:
['/Users/malek8/Library/Application Support/ToguAudioLine/TAL-U-No-LX/presets/Default.pjunoxl', '/Users/malek8/Library/Application Support/ToguAudioLine/TAL-U-No-LX/presets/MT Tailwhip Organ (2).pjunoxl', '/Users/malek8/Library/Application Support/ToguAudioLine/TAL-U-No-LX/presets/TAL Presets Bank/FX/Super Jumper.pjunoxl', '/Users/malek8/Library/Application Support/ToguAudioLine/TAL-U-No-LX/presets/TAL Presets Bank/FX/Nice Filter Sweep.pjunoxl', '/Users/malek8/Library/Application Support/ToguAudioLine/TAL-U-No-LX/presets/TAL Presets Bank/FX/Chillin R2D2.pjunoxl']


### Find Closest Match Based on Minimium Distance 

In [13]:
top10presets, target_note = find_closest_preset_from_mfcc(target_audio, target_sample_rate, dataset, return_note=True)
print(f'The top 10 presets:\n{top10presets}')
print(f'Target Note: {target_note}')

Likely note: F4, likely frequency: 349.1941058508811
The top 10 presets:
['CHO Voice Chorus FMR' 'PRC Thump Bass Drum FMR' 'ORG Organ 3 FMR'
 'ORG Jazz Organ FMR' 'BAS Synth Bass 1 FMR' 'BAS Pulse Bass 2 FMR'
 'BAS Wire Bass FMR' 'WND Flute 1 FMR' 'The Difference Bass'
 'BAS Round Bass 2 FMR']
Target Note: F4


In [14]:
# find the path of the top preset in top10presets using preset_paths
top_preset_path = [x for x in preset_paths if top10presets[0] in x][0]

# print the top preset path
print(f'Top Preset Path: {top_preset_path}')


Top Preset Path: /Users/malek8/Library/Application Support/ToguAudioLine/TAL-U-No-LX/presets/FMR Presets Bank/Wind/WND Flute 1 FMR.pjunoxl


In [15]:
# find the row of the dataset corresponding to the top preset name
top_preset_row = dataset[dataset['preset_names'] == top10presets[0]]
top_preset_row.head()

Unnamed: 0,preset_names,parameters,parameters_names,mapped_parameter_names,raw_audio,musicnn_features
140,WND Flute 1 FMR,"[0.0, 0.0400000028, 0.288000017, 0.667000055, ...","[@modulation, @dcolfovalue, @dcopwmvalue, @dco...","[{'tal-uno param name': '@modulation', 'dawdre...","{'C2': [tensor(3.4535e-05), tensor(3.5455e-05)...","{'C2': [tensor(-0.0003), tensor(-0.6242), tens..."


### Compare the Audio assoiciated with the Target and the Closest Match

In [16]:
# define some constants related to dawdreamer loading
BUFFER_SIZE = 128
SYNTH_PLUGIN_PATH = "/Library/Audio/Plug-Ins/VST3/TAL-U-NO-LX-V2.vst3"
SYNTH_NAME = "TAL-Uno"

In [17]:
# initialize the DawDreamer engine
engine = dawdreamer.RenderEngine(sample_rate=target_sample_rate, block_size=BUFFER_SIZE)

In [18]:
# initialize the DawDreamer plugin
plugin = load_plugin_with_dawdreamer(SYNTH_PLUGIN_PATH,SYNTH_NAME,engine)

error: attempt to map invalid URI `/Library/Audio/Plug-Ins/VST3/TAL-U-NO-LX-V2.vst3'


In [19]:
top_preset_row['mapped_parameter_names'].iloc[0][0].keys()

dict_keys(['tal-uno param name', 'dawdreamer param name', 'value', 'dawdreamer index', 'tal-uno index'])

In [20]:
loaded_synth = load_synth_from_dataset(plugin, engine, top_preset_row)

In [21]:
# specify the midi note based on the target sound fundamental frequency
midi_piano_note = piano_note_to_midi_note('F3') # we use F3 here because the instrument has a master +1 octave parameter

# specify the duration of the midi note
midi_duration = 0.5

# clear the midi notes
loaded_synth.clear_midi()

# generate a sound using the plugin (MIDI note, velocity, start sec, duration sec)
loaded_synth.add_midi_note(midi_piano_note, 127, 0.0, midi_duration)

engine.load_graph([(loaded_synth, [])])

# loaded_preset_synth.open_editor()
engine.render(1) # use *1.2 to capture release/reverb

# render the audio
audio = engine.get_audio()

In [22]:
print(midi_piano_note)

53


In [23]:
# play the audio of the plugin
ipd.Audio(audio,rate=target_sample_rate)

In [24]:
# play the target audio
ipd.Audio(target_audio,rate=target_sample_rate)

## Apply Genetic Algorithm $\implies$ Obtain $x^*$

---

Here, we seek to apply the genetic algorithm on our initial population obtained in the previous step to obtain $x^*$, the optimal synthesizer preset configuration that produces a sound most similar to the target input sound $y \in \mathbb{R}^{n \times 1}$.

In [26]:
# return the top 10 preset rows in order from the df
top10preset_rows = dataset[dataset['preset_names'].isin(top10presets)]

In [27]:
# order the rows based on the order of the top10presets
top10preset_rows = top10preset_rows.set_index('preset_names').loc[top10presets].reset_index()

In [42]:
ga_settings ={
    'num_generations': 300,
    'num_parents_mating': 10,
    'sol_per_pop': 10,
    'crossover_type':'uniform',
    'mutation_type':'random',
    'mutation_percent_gene':10,
    'mutation_rate':0.01,
    'mutation_by_replacement':True,
    'random_mutation_min_val':0,
    'random_mutation_max_val':1,
}

In [3]:
output = optimize_preset_with_ga_mfcc(top10preset_rows,plugin,engine,target_mfcc,target_audio_length,target_note,daw_settings,ga_settings,verbosity=2)

NameError: name 'top10preset_rows' is not defined

### Investigate the Null Hypothesis (i.e. Preset Selection is not Important)

In [33]:
# select 10 random preset rows from the dataset
random_preset_rows = dataset.sample(10)
random_preset_rows.head()

Unnamed: 0,preset_names,parameters,parameters_names,mapped_parameter_names,raw_audio,musicnn_features
236,SYN Dynamic Polysynth FMR,"[0.0, 0.088000007, 0.5, 0.444000036, 0.0, 1.0,...","[@modulation, @dcolfovalue, @dcopwmvalue, @dco...","[{'tal-uno param name': '@modulation', 'dawdre...","{'C2': [tensor(-0.0048), tensor(-0.0085), tens...","{'C2': [tensor(-0.0003), tensor(-0.2845), tens..."
260,DRN JX-3P Drone FMR,"[0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.5, ...","[@modulation, @dcolfovalue, @dcopwmvalue, @dco...","[{'tal-uno param name': '@modulation', 'dawdre...","{'C2': [tensor(0.0012), tensor(0.0012), tensor...","{'C2': [tensor(-0.0003), tensor(-0.6242), tens..."
289,MFX Effect Sound One FMR,"[0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.73600006...","[@modulation, @dcolfovalue, @dcopwmvalue, @dco...","[{'tal-uno param name': '@modulation', 'dawdre...","{'C2': [tensor(0.0151), tensor(-0.0019), tenso...","{'C2': [tensor(-0.0004), tensor(-0.6242), tens..."
201,PRC Percussion 2 FMR,"[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...","[@modulation, @dcolfovalue, @dcopwmvalue, @dco...","[{'tal-uno param name': '@modulation', 'dawdre...","{'C2': [tensor(-0.0041), tensor(0.0806), tenso...","{'C2': [tensor(-0.0003), tensor(-0.6242), tens..."
102,STR Low Strings FMR,"[0.0, 0.0, 0.684000015, 1.0, 1.0, 1.0, 0.0, 0....","[@modulation, @dcolfovalue, @dcopwmvalue, @dco...","[{'tal-uno param name': '@modulation', 'dawdre...","{'C2': [tensor(0.1241), tensor(0.0891), tensor...","{'C2': [tensor(-0.0003), tensor(-0.6242), tens..."


In [47]:
# reset the index of the random preset rows
random_preset_rows = random_preset_rows.reset_index(drop=True)
random_preset_rows.head()

Unnamed: 0,preset_names,parameters,parameters_names,mapped_parameter_names,raw_audio,musicnn_features
0,SYN Dynamic Polysynth FMR,"[0.0, 0.088000007, 0.5, 0.444000036, 0.0, 1.0,...","[@modulation, @dcolfovalue, @dcopwmvalue, @dco...","[{'tal-uno param name': '@modulation', 'dawdre...","{'C2': [tensor(-0.0048), tensor(-0.0085), tens...","{'C2': [tensor(-0.0003), tensor(-0.2845), tens..."
1,DRN JX-3P Drone FMR,"[0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.5, ...","[@modulation, @dcolfovalue, @dcopwmvalue, @dco...","[{'tal-uno param name': '@modulation', 'dawdre...","{'C2': [tensor(0.0012), tensor(0.0012), tensor...","{'C2': [tensor(-0.0003), tensor(-0.6242), tens..."
2,MFX Effect Sound One FMR,"[0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.73600006...","[@modulation, @dcolfovalue, @dcopwmvalue, @dco...","[{'tal-uno param name': '@modulation', 'dawdre...","{'C2': [tensor(0.0151), tensor(-0.0019), tenso...","{'C2': [tensor(-0.0004), tensor(-0.6242), tens..."
3,PRC Percussion 2 FMR,"[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...","[@modulation, @dcolfovalue, @dcopwmvalue, @dco...","[{'tal-uno param name': '@modulation', 'dawdre...","{'C2': [tensor(-0.0041), tensor(0.0806), tenso...","{'C2': [tensor(-0.0003), tensor(-0.6242), tens..."
4,STR Low Strings FMR,"[0.0, 0.0, 0.684000015, 1.0, 1.0, 1.0, 0.0, 0....","[@modulation, @dcolfovalue, @dcopwmvalue, @dco...","[{'tal-uno param name': '@modulation', 'dawdre...","{'C2': [tensor(0.1241), tensor(0.0891), tensor...","{'C2': [tensor(-0.0003), tensor(-0.6242), tens..."


In [49]:
null_output = optimize_preset_with_ga_mfcc(random_preset_rows,plugin,engine,target_mfcc,target_audio_length,target_note,daw_settings,ga_settings,verbosity=1)

Generation = 1, Fitness = 0.0006113103001979818
Generation = 2, Fitness = 0.000594231716131331
Generation = 3, Fitness = 0.0006082933732407053
Generation = 4, Fitness = 0.0007685983297810367
Generation = 5, Fitness = 0.0007578748862540072
Generation = 6, Fitness = 0.0008141289535739784
Generation = 7, Fitness = 0.0007578748862540072
Generation = 8, Fitness = 0.0007578748862540072
Generation = 9, Fitness = 0.0007578748862540072
Generation = 10, Fitness = 0.0007578748862540072
Generation = 11, Fitness = 0.0007578748862540072
Generation = 12, Fitness = 0.000759274910996983
Generation = 13, Fitness = 0.000770766425218815
Generation = 14, Fitness = 0.000770766425218815
Generation = 15, Fitness = 0.000770766425218815
Generation = 16, Fitness = 0.0009303806769746001
Generation = 17, Fitness = 0.0009064499802709922
Generation = 18, Fitness = 0.0009064499802709922
Generation = 19, Fitness = 0.0009064499802709922
Generation = 20, Fitness = 0.0009064499802709922
Generation = 21, Fitness = 0.00094

In [41]:
initial_parameters = [row['parameters'] for i, row in random_preset_rows.iterrows()] # 2D list, 10 rows, each list has 77 parameters (for Tal Uno LX)


In [42]:
len(initial_parameters)

10

In [44]:
assert len(initial_parameters) == ga_settings['sol_per_pop'], "There must be 10 initial parameters."
assert len(set([len(x) for x in initial_parameters])) == 1, "All initial parameters must have the same length."

## Evaluate Sounds Produced by $x^*$

In [45]:
x_star = output['best_solution']
x_star_fitness = output['best_solution_fitness']
x_star_match_idx = output['best_match_idx']
fitness_history = output['fitness_history']

In [46]:
x_star

array([0.95695212, 0.89920452, 0.98088135, 1.        , 1.        ,
       0.8179563 , 0.87121801, 0.9302821 , 0.97614578, 0.9193883 ,
       0.9682945 , 0.96344059, 1.        , 0.90846915, 0.84632421,
       1.        , 0.98278728, 0.97610937, 1.        , 0.94322891,
       0.97131979, 0.93046351, 0.99255348, 0.93816782, 0.98770148,
       0.83436886, 0.99564118, 0.99890568, 0.84884754, 0.91754332,
       0.95678905, 0.86535121, 1.        , 0.92441957, 0.97661551,
       0.95090588, 0.89740703, 0.89544539, 0.99957905, 0.90775217,
       0.99304436, 0.91826564, 0.97219077, 0.96210436, 0.98424944,
       0.94458833, 0.9682203 , 0.71968196, 0.92716233, 0.9594785 ,
       0.96226049, 0.94958011, 0.91739863, 0.7692401 , 0.96817112,
       0.98073142, 0.99126296, 0.95387651, 0.99218923, 0.9736352 ,
       0.98061393, 0.86344386, 0.9353392 , 0.95589628, 0.91641892,
       0.75104974, 0.9852362 , 0.96519497, 0.95389935, 0.93650262,
       0.99995087, 0.94743951, 0.86199459, 0.90063309, 0.87243

In [30]:
# render some audio from the best solution
loaded_synth = set_parameters(plugin, x_star)

# render the audio
best_solution_audio = render_audio(midi_piano_note, 127, 0.5, loaded_synth, engine, target_audio_length, daw_settings['SAMPLE_RATE'], verbosity=0)

# print the shape of the audio
print(f'Best solution audio shape: {best_solution_audio.shape}')
print(f'Best Solution Fitness: {x_star_fitness}')


Best solution audio shape: (44100,)
Best Solution Fitness: 0.001425009526011251


In [31]:
# play the best solution audio
ipd.Audio(best_solution_audio,rate=target_sample_rate)

In [32]:
# play the audio of the target
ipd.Audio(target_audio,rate=target_sample_rate)

## Evaluate Sounds Produced By $x^*_{null}$

In [51]:
x_star_null = null_output['best_solution']
x_star_null_fitness = null_output['best_solution_fitness']

In [52]:
# render some audio from the best x_star_null solution
loaded_synth = set_parameters(plugin, x_star_null)

# render the audio
best_solution_audio_null = render_audio(midi_piano_note, 127, 0.5, loaded_synth, engine, target_audio_length, daw_settings['SAMPLE_RATE'], verbosity=0)

In [53]:
# play the best solution audio
ipd.Audio(best_solution_audio_null,rate=target_sample_rate)