Neural network for classification of flac compression quality.
penthy is an audiophile tool 🎶 to check whether a flac file contains truly lossless music, as it is supposed to, or comes from a lossy source, like an mp3 file.
The flac file format compresses music in a way that it remains unaltered, in contrast to lossy compression formats (like mp3) that sacrifice sound quality to limit file size. Confirming that your flac discography is truly lossless is an open-ended problem. One approach to tell the difference is machine learning. Although, penthy cannot evaluate every aspect of digital audio to guarantee that your file is exactly what came out of the studio, she tries to identify transcoding from mp3 sources. A song that passes penthy's challenge is not necessarily genuine, but it is unlikely to be an mp3 wearing a flac trenchcoat.
penthy is short for Penthesilea, a skilled queen of the Amazons who fought in the Trojan War, according to Greek mythology.
A Convolutional Neural Network was trained with the highest frequencies of several songs in the form of spectrogram images, in order to recognize flac files that were once mp3s.
The songs are split into small segments to produce the spectrograms which are given to the network as inputs. The training dataset contained the truly lossless versions of the songs and their fake counterparts – flac files transcoded from mp3 files generated from the originals. Various music genres and mp3 qualities were included. Each spectrogram is a 128x128 px RGB image depicting only the 16200-22000 Hz frequency range for 8 seconds of audio, saved as a numpy array. The trained model accepts flac or wav tracks as input and outputs a float number from 0 to 1. An output of '0' corresponds to audio transcoded to mp3 and back to a lossless format. An output of '1' classifies the song as not transcoded from an mp3 source, but it could still be transcoded from a different format, subjected to upsampling or altered in other ways.
The CNN is structured as follows:
The current trained model performs generally well, with an approximate accuracy of 90%. False negatives (genuine files classified as transcoded) are more common than false positives (transcoded files classified as truly lossless), especially for songs that lack higher frequencies.
- Python 3
- TensorFlow 2 with Keras
- FFmpeg
Update: This project is not actively maintained. To use penthy, you will need to install old versions of python packages, which can be troublesome and time-consuming.
You may use this code to evaluate your flac files with the pretrained model or train your own if you have access to truly lossless discography.
- neural_net.py builds a dataset with flexible multiprocessing and trains a new model.
- trained.py evaluates a single file.
- trained_dir.py evaluates all applicable files in a directory with multiprocessing (recommended if you use penthy to scan your collection).
- audio_manipulation.py is used by all modules to generate the spectrograms.
- No dataset included.
For instance, running trained_dir.py for a directory that contains both genuine and transcoded files will output something like this:
You do not need both versions of a file to get an accurate evaluation, as in this example. Each file is classified separately.
There is also an online demo, to scan your files without downloading or installing anything. Performance is significantly better when run locally, though.
Update: This project is not actively maintained. To use penthy, you will need to install old versions of python packages, which can be troublesome and time-consuming.
- Python (3.7 64-bit has been tested) (in Windows, make sure to add Python to the PATH environment variable)
- FFmpeg (including ffprobe) (in Windows, make sure to add FFmpeg to the PATH environment variable)
and the following python packages (or see requirements.txt, generated by PyCharm):
- tensorflow and optionally its demanding requirements for GPU support (recommended)
- keras
- numpy
- ffmpeg-python
- colorama
- wakepy (only if you train new models)
plus the dependencies of these packages that will come up during installation (e.g. pandas, scipy, matplotlib, scikit-learn).
The license of this repository refers to code written by the author and not the libraries and functions used. For those, look at the respective licenses of the original projects. Music in the example of usage is courtesy of Dean Washburn (nvlachost@gmail.com). Proper attribution of penthy requires mentioning all parties of the following crediting.
Achilleas Papastamatiou developed penthy as part of his undergraduate thesis at the Department Of Computer Science And Telecommunications at the University of Thessaly in Greece. The project was supervised by professor Vaggelis Spyrou.
(Not required to run or fork penthy)
- Audacity
- Spek
- VLC media player
- Audiochecker (by Dester)