Skip to content

martin-laurent/kapre

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kapre

Keras Audio Preprocessors

News

  • 18 March 2017

    • dataset.py; GTZan, MagnaTagATune, MusicNet, FMA are available.
  • 16 March 2017 (kapre v0.0.3.1)

    • Compatible to Keras 2.0. Kapre won't support Keras 1.0 and require Keras 2.0 now.
    • There's no change on Kapre API and you can just use, save, and load.
    • Stft is not working and will be fixed later.
  • 15 March 2017 (kapre v0.0.3)

Installation

$ git clone https://github.com/keunwoochoi/kapre.git
$ cd kapre
$ python setup.py install

(Kapre is on pip, but pip version is not always up-to-date. So please use git version until it becomes more stable.)

Datasets

  • GTZan (30s, 10 genres, 1,000 mp3)
  • MagnaTagATune (29s, 188 tags, 25,880 mp3)
  • MusicNet: (full length 330 classicals music, note-wise annotations)
  • FMA: small (30s, 10 genres, 4,000 mp3s), medium (30s, 20 genres, 14,511 mp3s)

Layers

Usage Example

Mel-spectrogram

from keras.models import Sequential
from kapre.time_frequency import Melspectrogram
from kapre.utils import Normalization2D
from kapre.augmentation import AdditiveNoise

# 6 channels (!), maybe 1-sec audio signal
input_shape = (6, 44100) 
sr = 44100
model = Sequential()
# A mel-spectrogram layer
model.add(Melspectrogram(n_dft=512, n_hop=256, input_shape=input_shape,
                         border_mode='same', sr=sr, n_mels=128,
                         fmin=0.0, fmax=sr/2, power=1.0,
                         return_decibel_melgram=False, trainable_fb=False,
                         trainable_kernel=False,
                         name='trainable_stft'))
# Maybe some additive white noise.
model.add(AdditiveNoise(power=0.2))
# If you wanna normalise it per-frequency
model.add(Normalization2D(str_axis='freq')) # or 'channel', 'time', 'batch', 'data_sample'
# After this, it's just a usual keras workflow. For example..
# Add some layers, e.g., model.add(some convolution layers..)
# Compile the model
model.compile('adam', 'categorical_crossentropy') # if single-label classification
# train it with raw audio sample inputs
x = load_x() # e.g., x.shape = (10000, 6, 44100)
y = load_y() # e.g., y.shape = (10000, 10) if it's 10-class classification
# and train it
model.fit(x, y)
# write a paper and graduate or get paid. Profit!

When you wanna save/load model w these layers

Use custom_objects keyword argument as below.

import keras
import kapre

model = keras.models.Sequential()
model.add(kapre.time_frequency.Melspectrogram(512, input_shape=(1, 44100)))
model.summary()
model.save('temp_model.h5')

model2 = keras.models.load_model('temp_model.h5', 
  custom_objects={'Melspectrogram':kapre.time_frequency.Melspectrogram})
model2.summary()

Downloading datasets

import kapre

kapre.datasets.load_gtzan_genre('datasets')
# checkout datasets/gtzan,
# also `datasets/gtzan_genre/dataset_summary_kapre.csv`
kapre.datasets.load_magnatagatune('/Users/username/all_datasets')
# for magnatagatune, it doesn't create csv file as it already come with.
kapre.datasets.load_gtzan_speechmusic('datasets')
# check out `datasets/gtzan_speechmusic/dataset_summary_kapre.csv`
kapre.datasets.load_fma('datasets', size='small')
kapre.datasets.load_fma('datasets', size='medium')
kapre.datasets.load_musicnet('datasets', format='hdf')
kapre.datasets.load_musicnet('datasets', format='npz')
# Kapre does NOT remove zip/tar.gz files after extracting.

Documentation

Please read docstrings at this moment. Fortunately I quite enjoy writing docstrings.

Plan

  • time_frequency: Spectrogram, Mel-spectrogram
  • utils: AmplitudeToDB, Normalization2D
  • filterbank: filterbanks (init with mel)
  • stft: FFT-based STFT (Done for theano-backend only)
  • data_augmentation: (Random-gain) white noise
  • datasets.py: download and manage MIR datasets.
  • data_augmentation: Dynamic Range Compression1D, some other noises
  • utils: A-weighting
  • filterbank: Parameteric Filter bank
  • Decompose: Harmonic-Percussive separation
  • InverseSpectrogram: istft, (+ util: magnitude + phase merger)
  • TimeFrequency: Harmonic/Spiral representations, chromagram, tempogram

Citation

Please cite it as...

@article{choi2016kapre,
  title={kapre: Keras Audio PREprocessors},
  author={Choi, Keunwoo},
  journal={GitHub repository: https://github.com/keunwoochoi/kapre},
  year={2016}
}

About

kapre: Keras Audio Preprocessors

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%