Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Clone this wiki locally
What is Audio Fingerprinting
Audio fingerprinting denotes a set of techniques to perform audio identification. The latter covers the detection and the identification of an audio excerpt (a music track, an advertisement, a jingle ...) in an audio recording (either a short excerpt or a broadcast stream).
While, audio watermarking relies on the embedding within of meta-information the very audio signal to be processed, audio fingerprinting (sometimes called audio hashing), is based on the detection of audio occurrences through the recognition of code signatures extracted from short snippets of the signal.
The design of such signature codes must jointly answer several constraints :
Robustness : the representation must be as invariant as possible with regard to typical audio distortions, such as :
- Noise addition
- Transmission distortions (channel filtering, analog/digital conversion...)
- Time scale change, with subsequent pitching
- Amplitude changes, including dynamic amplitude compression
- Typical audio encodings, i.e. MPEG encoding, Real Audio, WMA, or even GSM...
- Temporal shifting between the reference track and the search cue.
- Compacity : the complexity of the search of new codes among the databases codes is directly related to its dimensions. The signature code must therefore be as compact as possible.
- Discrimination : however, a compact code implies a more narrow scope of values. Codes from different tracks get closer and harder to discriminate. The signature code must therefore meet an acceptable compromise between compacity and discrimination.
- Computability : finally the codes must be easily and quickly computable to ensure live processing on any audio query.
An audio fingerprinting method is classically the conjunction of two key elements : the design of the signature code, described therebefore, and an efficient search strategy to retrieve an unknown code within the database. The search strategy is very important to ensure the scalabable of the algorithm, i.e. its ability to scale up to very large databases including several millions reference tracks. This scalability issue of course implies a computational aspect but the main bottleneck generally lies in the handling of multiple memory and hard drive accesses. Another key contraint of the complexity, audio identification systems are often applied to the live monitoring of audio streams. A real-time functionning is therefore required. Of course, these two issues must be answered while guaranteeing the accuracy (the performance) of the system.
While the complexity aspect is not considered here, the PyAFE toolkit provided here brings the experimentator a consistant framework for the evaluation of any audio fingerprinting system.
The PyAFE toolkit was developped in the framework of the Evaluation work-package of the Quaero project and is made freely available as open-source software. It is designed as modular piece of software, in ordre to be easilly extended in the future:
- two modules provides the necessary functions to parse groundtruth and detection XML files.
- the core module includes the actual implementation of audio fingerprinting score metrics. Computation of the number of correct detections, misses and false alarms is obviously also available. Included in the PyAFE toolkit, an all-in-one command line evaluation tool is also available. It provides an easy to use, straightforward way of getting evaluation results. It gets as simple as typing:
python full_eval.py --groundtruth=/path/to/groundtruth/files --submission=/path/to/detection/files
Documentation can be found below.
Clone the GitHub repository for current development version
Version 0.5.2 - 26/01/2011
- Date/time format has changed. Microseconds are now mandatory.
- Updated sample data accordingly.
Version 0.5.1 - 14/12/2010
- Updated sample data
Version 0.5.0 - 14/12/2010
- First public release
- Python 2.7 or later
- lxml 2.2.6 or later, http://codespeak.net/lxml/
Available soon (last update: 2011-01-11)