Python JavaScript HTML CSS Other
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Evolving artificial neural networks for cross-adaptive audio effects

The project will investigate methods of evaluating the musical applicability of cross-adaptive audio effects. The field of adaptive audio effects has been researched during the last 10-15 years, where analysis of various features of the audio signal is used to adaptively control parameters of audio processing of the same signal. Cross-adaptivity has been used similarly in automatic mixing algorithms. The relatively new field of signal interaction relates to the use of these techniques where features of one signal affect the processing of another in a live performance setting. As an example, the pitch tracking data of vocals used to control the reverberation time of the drums, or the noisiness measure of a guitar used to control the filtering of vocals. This also allows for complex signal interactions where features from several signals can be used to affect the processing of another signal. As these kinds of signal interactions are relatively uncharted territory, methods to evaluate various cross-coupling of features have not been formalized and as such currently left to empirical testing. The project should investigate AI methods for finding potentially useful mappings and evaluating their fitness.

Install dependencies (Ubuntu)

Assuming you have a clean Ubuntu 14.04, here's what needs to be installed:

  • Update package index files: sudo apt-get update
  • Install git: sudo apt-get -y install git
  • Install pip: sudo apt-get -y install python-pip
  • Install matplotlib dependencies: sudo apt-get -y install libfreetype6-dev libpng-dev blt-dev
  • Install csound: sudo apt-get -y install csound
  • Install aubio: sudo apt-get -y install aubio-tools libaubio-dev libaubio-doc
  • Install essentia extractors (optional):
    • cd ~/ && wget
    • tar xzf essentia-extractors-v2.1_beta2-linux-x86_64.tar.gz
    • echo 'PATH=~/essentia-extractors-v2.1_beta2/:$PATH' >> ~/.bashrc
    • source ~/.bashrc
  • Install MultiNEAT:
    • Install boost c++ libraries: sudo apt-get -y install libboost-all-dev
    • cd ~/ && git clone
    • cd MultiNEAT
    • [sudo] python install (needs at least 2 GB RAM)
  • Install NodeJS
    • sudo apt-get -y install nodejs npm
  • Install Sonic Annotator
    • cd ~/ && wget
    • sudo dpkg -i sonic-annotator_1.4cc1-1_amd64.deb
    • If installation fails due to missing dependencies don't worry: Just run sudo apt-get -y -f install and then try to install again.
  • Download the libXtract vamp plugin:
    • cd ~/ && wget
    • tar -xf vamp-libxtract-plugins-
    • sudo mkdir -p /usr/local/lib/vamp
    • sudo mv vamp-libxtract-plugins-* /usr/local/lib/vamp

Install dependencies (Windows)

Install Python 2.7

Note: You need the 32-bit version, as the 64-bit version might fail to interoperate with Csound

Install Csound

Install MultiNEAT, matplotlib and numpy

Building these dependencies from source can be difficult and time-consuming, so let's download wheel binaries instead

  • Go to
    • Download numpy-1.11.2+mkl-cp27-cp27m-win32.whl
    • Download matplotlib-1.5.3-cp27-cp27m-win32.whl
    • Download MultiNEAT-0.3-cp27-none-win32.whl
  • pip install numpy-1.11.2+mkl-cp27-cp27m-win32.whl
  • pip install matplotlib-1.5.3-cp27-cp27m-win32.whl
  • pip install MultiNEAT-0.3-cp27-none-win32.whl

Install Aubio

Install Essentia Extractors (optional)

Install Sonic Annotator and the libXtract vamp plugin

Install NodeJS

Setup of this project (same for Windows and Ubuntu)

If you're on Windows, you might want to run the following commands in Git Bash

  • Clone the cross-adaptive-audio repository: cd ~/ && git clone && cd cross-adaptive-audio
  • Get a local settings file: cp
  • Make sure that all dependencies are installed: [sudo] pip install -r requirements.txt (run without sudo on Windows)
  • Install Node.js dependencies: cd node_server && npm install && cd -
  • If you plan to run experiments with the example sounds in the test_audio folder, you need to copy them to the input folder: cp test_audio/*.wav input


First, run nosetests to check if things are running smoothly.

Running an experiment

Input audio files that you use in experiments should reside in the input folder. When you run an experiment with two input files, the two audio clips should be of equal length. Furthermore, the format should be:

  • Number of channels: 1 (mono)
  • Sampling rate: 44100 Hz
  • Bit depth: 16

There are some example files in the test_audio folder. For example, copy drums.wav and noise.wav from the test_audio folder to the input folder.

Our goal in the following example experiment is to make noise.wav sound like drums.wav by running noise.wav through the "dist_lpf" audio effect. The audio effect has a set of parameters that are controlled by the output of a neural network. The experiment is all about evolving one or more neural networks that behave such that the processed version of noise.wav sounds like drums.wav

Run the command python -i drums.wav noise.wav -g 30 -p 20

This will run the evolutionary algorithm for 30 generations with a population of 20. While this is running, you might want to open another command line instance and run python This will start a server for a web client that interactively visualizes the results of the experiment as they become available. Websockets are used to keep the web client synchronized with whatever has finished doing. Just visit http://localhost:8080 in your favorite browser. The web client looks somewhat like this:

Screenshot of visualization

To get information about all the parameters that understands, run python --help

The most important parameters:

                        The filename of the target sound and the filename of
                        the input sound, respectively
  --fitness {similarity,multi-objective,hybrid,novelty,mixed}
                        similarity: Average local similarity, calculated with
                        euclidean distance between feature vectors for each
                        frame. multi-objective optimizes for a diverse
                        population that consists of various non-dominated
                        trade-offs between similarity in different features.
                        Hybrid fitness is the sum of similarity and multi-
                        objective, and gives you the best of both worlds.
                        Novelty fitness ignores the objective and optimizes
                        for novelty. Mixed fitness chooses a random fitness
                        evaluator for each generation.
  --neural-mode {a,ab,b,s,targets}
                        Mode a: target sound is neural input. Mode ab: target
                        sound and input sound is neural input. Mode b: input
                        sound is neural input. Mode s: static input, i.e. only
                        bias. Mode targets: evolve targets separately for each
                        timestep, with only static input
                        The name(s) of the sound effect(s) to use. See the
                        effects folder for options. In composite effects, use
                        "new_layer" to separate layers of parallel effects.
  --fs-neat [FS_NEAT]   Use FS-NEAT (automatic feature selection)
  --experiment-settings EXPERIMENT_SETTINGS
                        Filename of json file in the experiment_settings
                        folder. This file specifies which features to use as
                        neural input and for similarity calculations.

In the experiment_settings folder you can add your own json file where you specify which audio features to use for a) similarity calculations and b) neural input. Here's one possible configuration, as in mfcc_basic.json:

  "parameter_lpf_cutoff": 50,
  "similarity_channels": [
      "name": "mfcc_0",
      "weight": 1.0
      "name": "mfcc_1",
      "weight": 0.2
  "neural_input_channels": [

In this example we are using mfcc_0 and mfcc_1 for similarity calculations, and mfcc_1 is given less weight than mfcc_amp. In other words, mfcc_1 errors matter less than mfcc_amp errors. Neural input is mfcc_1, mfcc_amp and the derivative (gradient) of mfcc_amp, which is written as "mfcc_amp__derivative" in the config file. You can add the derivative of any feature by writing "{feature_name}__derivative", (replace {feature_name} with the name of the feature)

To see all the available audio features you can add in experiment_settings.json, run python

When you are done with your experiment(s), run python This will delete all files written during the experiment(s).

Data augmentation

If you have a short sound and you'd like to create variations of it, in terms of gain and playback speed, you can use If you train neural networks on the augmented sound, they will typically generalize better to unseen sounds. Example command, assuming you have drums_short.wav in the input folder:

python -i drums_short.wav --factor 8

This will write the augmented sound drums_short.wav.augmented.wav to the input folder.

Live mode

If the only neural input is from features computed by Csound analyzer, then you can apply the evolved cross-adaptive audio effect (to unseen data/sound) in live performances. You can still use other features, such as bark bands, in the similarity measure. For example, you can use the experiment settings in csound_bark.json.

Example command for evolving the effect:

python -i drums.wav noise.wav -g 50 -p 20 --experiment-settings csound_bark.json

Assuming you want to use the best individual in the last generation of the most recent experiment, run the following command:


This will write a file live.csd to the live_csd folder. This is a Csound code file with some inline python code with a base64 data blob containing data about the evolved neural network, amongst other things. Note that this file is built to run only in the python environment where it was created. The Csound file can be used in live mode like this:

csound live_csd/live.csd -iadc -odac

This assumes that you have a working sound card with stereo audio input. The target audio should be in the left channel and the input audio should be in the right channel.

You can also use the csd file offline, to speed up computation and write the output audio to disk:

csound live_csd/live.csd -iinput/drums_synth.wav -ooutput/synth_cross_adapted.wav

This assumes that you have a stereo audio file drums_synth.wav with drums in the left channel and synth in the right channel.

Use RAM disk

Experiments can run ~10% faster if you use a RAM disk to reduce I/O overhead. When you have a RAM disk running, set BASE_DIR in to the path of the RAM disk (f.ex. '/mnt/ramdisk' for Ubuntu or 'R:\\' for Windows) and run python The latter command will ensure that directories are present in the RAM disk and copy audio input files and the web-based visualization system. If you want to experiment with new audio input files after you ran python, you can put the new audio files directly in the input folder on the RAM disk.

RAM disk setup (Ubuntu)

Assuming you have no RAM disk set up already, and you want one with 3 GB of space:

  • sudo mkdir -p /mnt/ramdisk
  • sudo mount -t tmpfs -o size=3072m tmpfs /mnt/ramdisk

If you want to have this ramdisk also after reboot, run sudo nano /etc/fstab and add the following line:

tmpfs /mnt/ramdisk tmpfs nodev,nosuid,noexec,nodiratime,size=3072M 0 0

The ramdisk should now be mounted on startup/reboot. You can confirm this by rebooting and running df -h /mnt/ramdisk

RAM disk setup (Windows)

I haven't been able to get a performance gain by using a RAM disk on a Windows machine with an SSD, but if you want to try, you can install a program like this:

Use at your own risk

Known issues

  • libXtract may produce wrong results on Windows. Use a different analyzer or use Linux.
  • If you stop while it's running, csound may fail to terminate correctly. If that happens, kill the csound process(es) manually. On Ubuntu, this can be done by running pkill -f csound
  • The visualization interface may become slow for large experiments with thousands of generations


Contributions are welcome