Skip to content

Tensorflow code for training deep convolutional neural networks for music audio tagging


Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



54 Commits

Repository files navigation


In this repository you will find the Tensorflow code for training deep convolutional neural networks for audio-tagging.

We employed this code for training musicnn, a set of pre-trained deep convolutional neural networks for music audio tagging.


git clone

Create a python3 virtual environment python3 -m venv env, activate it source ./env/bin/activate, and install the dependencies pip install -r requirements.txt

Install tensorflow for CPU pip install tensorflow==1.14.0 or for CUDA-enabled GPU pip install tensorflow-gpu==1.14.0. Note this code was developed following Tensorflow 1 API, not Tensorflow 2.


Download a music audio tagging dataset:

For example, download the MagnaTagATune dataset.

Preprocess the dataset:

To preprocess the data, first set some variables:

  • DATA_FOLDER, where you want to store all your intermediate files (see folders structure below).
  • config_preprocess['audio_folder'], where your dataset is located.

Preprocess the data acessing to src/ and running python mtt_spec. The mtt_spec option is defined in

After running, the computed spectrograms are in ../DATA_FOLDER/audio_representation/mtt__time-freq/

Rename ../DATA_FOLDER/audio_representation/mtt__time-freq/index_0.tsv to index.tsv. This is because this script is parallelizable. In case you parallelized the pre-processing accross several machines, copy the files in the same direcotry and run cat index* > index.tsv

Train and evaluate a model:

Define the training parameters by setting the config_train dictionary in, and run CUDA_VISIBLE_DEVICES=0 python spec. The spec option is defined in

Once training is done, the trained model is stored in, e.g.: ../DATA_FOLDER/experiments/1563524626spec/

To evaluate the model, run CUDA_VISIBLE_DEVICES=0 python 1563524626spec


Configuration and preprocessing scripts:

  • file with all configurable parameters.
  • pre-computes and stores the spectrograms.

Scripts for running deep learning experiments:

  • run it to train your model. First set config_train in
  • run it to evaluate the previously trained model.
  •,,,, scripts where the architectures are defined.

Auxiliar scripts:

  • script containing util functions (e.g., for plotting or loading files).
  • script to successively run several pre-configured experiments.

Folders structure

  • /src: folder containing previous scripts.
  • /aux: folder containing additional auxiliar scripts. These scripts are used to generate the index files for each dataset. The index files are already computed in /data/index/
  • /data: where all intermediate files (spectrograms, results, etc.) will be stored.
  • /data/index/: indexed files containing the correspondences between audio files and their ground truth. Index files for the MagnaTagATune dataset (mtt) and the Million Song Dataset (msd) are already provided.

When running previous scripts, the following folders will be created:

  • ./data/audio_representation/: where spectrogram patches are stored.
  • ./data/experiments/: where the results of the experiments are stored.


  title={End-to-end learning for music audio tagging at scale},
  author={Pons, Jordi and Nieto, Oriol and Prockup, Matthew and Schmidt, Erik M. and Ehmann, Andreas F. and Serra, Xavier},
  booktitle={19th International Society for Music Information Retrieval Conference (ISMIR2018)},
  title={musicnn: pre-trained convolutional neural networks for music audio tagging},
  author={Pons, Jordi and Serra, Xavier},
  booktitle={Late-breaking/demo session in 20th International Society for Music Information Retrieval Conference (LBD-ISMIR2019)},


Tensorflow code for training deep convolutional neural networks for music audio tagging







No releases published


No packages published