Skip to content

Latest commit

 

History

History
35 lines (21 loc) · 2.45 KB

README.md

File metadata and controls

35 lines (21 loc) · 2.45 KB

This is intended to implement a CNN for audio classification of voice and data transmissions.

The MXNet perl API is used to classify audio files (currently 2 categories). Results so far are good. With my simple requirements and minimal test data there is 100% correct classificaiton.

The input is radio transmissions (.wav) that represent either a human speaking or a data transmission. Previously, I have been doing this classification with SoX voice detection function (much lower success rate).

Unlike Gluon Audio which uses librosa to extract MFCCs I am creating spectrograms (png image files) as input to the network. I would like to use the Gluon Audio approach however it is currently dependent on librosa which is python only. Gluon Audio mentions MXNet FFT operator on CPU as a possible future replacement for this dependency. So hopefully this can be used at some point.

Although the the use of machine learning for my requirements is probably overkill I plan on expanding the categories/capability in the future.

It would be great if this helps anyone like the examples below helped me. I am open to any feedback.

To create training data

WAV file -> extract middle second -> generate spectrogram PNG

Currently ffmpeg is used to generate spectrograms outside the training process. Training data is created via a seperate program that uses metadata from database and audio files from disk. Spectrograms are generated like so:

/usr/bin/ffmpeg -i audio.wav -lavfi showspectrumpic=s=100x50:scale=log:legend=off audio.png

The spectrograms should be placed in a folder structure as documented in ImageFolderDataset.

Dependencies

  • MXNet pull request against ImageFolderDataset
  • ffmpeg

Based on these examples