Dead simple audio classification 🎢
Branch: master
Clone or download
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin pypi test, readme updated Oct 15, 2018
build fix Dec 11, 2018
example fix Dec 11, 2018
pyaudioclassification fix Dec 11, 2018
.gitignore initial commit Oct 15, 2018
LICENSE pypi test, readme updated Oct 15, 2018 initial commit Oct 15, 2018 Update Dec 12, 2018 updated to say python 3, ty u/Kaarjuus Dec 12, 2018


Dead simple audio classification

PyPI - Python Version PyPI

Who is this for? πŸ‘©β€πŸ’» πŸ‘¨β€πŸ’»

People who just want to classify some audio quickly, without having to dive into the world of audio analysis. If you need something a little more involved, check out pyAudioAnalysis or panotti

Quick install

pip install pyaudioclassification


  • Python 3
  • Keras
  • Tensorflow
  • librosa
  • NumPy
  • Soundfile
  • tqdm
  • matplotlib

Quick start

from pyaudioclassification import feature_extraction, train, predict
features, labels = feature_extraction(<data_path>)
model = train(features, labels)
pred = predict(model, <data_path>)

Or, if you're feeling reckless, you could just string them together like so:

pred = predict(train(feature_extraction(<training_data_path>)), <prediction_data_path>)

A full example with saving, loading & some dummy data can be found here.

Read below for a more detailed look at each of these calls.

Detailed Guide

Step 1: Preprocessing 🐢 🐱

First, add all your audio files to a directory in the following structure

β”œβ”€β”€ <class_name>/
β”‚   β”œβ”€β”€ <file_name>
β”‚   └── ...
└── ...

For example, if you were trying to classify dog and cat sounds it might look like this

β”œβ”€β”€ cat/
β”‚   β”œβ”€β”€ cat1.ogg
β”‚   β”œβ”€β”€ cat2.ogg
β”‚   β”œβ”€β”€ cat3.wav
β”‚   └── cat4.wav
└── dog/
    β”œβ”€β”€ dog1.ogg
    β”œβ”€β”€ dog2.ogg
    β”œβ”€β”€ dog3.wav
    └── dog4.wav

Great, now we need to preprocess this data. Just call feature_extraction(<data_path>) and it'll return our input and target data. Something like this:

features, labels = feature_extraction('/Users/mac2015/data/')

(If you don't want to print to stdout, just pass verbose=False as a argument)

Depending on how much data you have, this process could take a while... so it might be a good idea to save. You can save and load with NumPy'%s.npy' % <file_name>, features)
features = np.load('%s.npy' % <file_name>)

Step 2: Training πŸ’ͺ

Next step is to train your model on the data. You can just call...

model = train(features, labels)

...but depending on your dataset, you might need to play around with some of the hyper-parameters to get the best results.


  • epochs: The number of iterations. Default is 50.

  • lr: Learning rate. Increase to speed up training time, decrease to get more accurate results (if your loss is 'jumping'). Default is 0.01.

  • optimiser: Choose any of these. Default is 'SGD'.

  • print_summary: Prints a summary of the model you'll be training. Default is False.

  • loss_type: Classification type. Default is categorical for >2 classes, and binary otherwise.

You can add any of these as optional arguments, for example train(features, labels, lr=0.05)

Again, you probably want to save your model once it's done training. You can do this with Keras:

from keras.models import load_model'my_model.h5')
model = load_model('my_model.h5')

Step 3: Prediction πŸ™ πŸ™Œ

Now the fun part- try your trained model on new data!

pred = predict(model, <data_path>)

Your <data_path> should point to a new, untested audio file.


If you have 2 classes (or if you force selected 'binary' as a type), pred will just be a single number for each file.

The closer it is to 0, the closer the prediction is for the first class, and the closer it is to 1 the closer the prediction is to the second class.

So for our cat/dog example, if it returns 0.2 it's 80% sure the sound is a cat, and if it returns 0.8 it's 80% sure it's a dog.


If you have more than 2 classes (or if you force selected 'categorical' as a type), pred will be an array for each sound file.

It'll look something like this

[[1.6454633e-06 3.7017996e-11 9.9999821e-01 1.5900606e-07]]

The index of each item in the array will correspond to the prediction for that class.

You can pretty print the predictions by showing them in a leaderboard, like so:

print_leaderboard(pred, <training_data_path>)

It looks like this:

1. Cow 100.0% (index 2)
2. Rooster 0.0% (index 0)
3. Frog 0.0% (index 3)
4. Pig 0.0% (index 1)