mp3crc is a utility designed to compute the check-sum of a LAME-encoded MP3 file's audio stream and compare it to the MusicCRC field stored in the file's header. This allows for integrity verification of MP3 files without concern for non-audio meta-data (such as ID3 tags and artwork) changing the computed value of the file.
# Check a folder of MP3s using default settings % mp3crc "Belle & Sebastian - Dog On Wheels (LAME V0)" 4 files checked: 4 verified OK; 0 corrupted; 0 unsupported # Check a folder of MP3s using more verbose settings % mp3crc -HO "Belle & Sebastian - Dog On Wheels (LAME V0)" 01 - Dog On Wheels.mp3: MusicCRC: 3444 / 3444 | Info tag CRC: ABB0 / ABB0 02 - The State I Am In.mp3: MusicCRC: 405D / 405D | Info tag CRC: 1996 / 1996 03 - String Bean Jean.mp3: MusicCRC: 757D / 757D | Info tag CRC: FFA1 / FFA1 04 - Belle & Sebastian.mp3: MusicCRC: A255 / A255 | Info tag CRC: 95C5 / 95C5 4 files checked: 4 verified OK; 0 corrupted; 0 unsupported
mp3crc determines automatically whether to act on single files or directories of files based on the files/folders given to it. (For example,
mp3crc myfolder/ will work on all MP3 files in
mp3crc myfile.mp3 will work only on the file given.)
mp3crc only returns 'human-readable' indications of corrupt files; those that have been verified as good are not explicitly listed. However, there are many display settings, including ones which enable piping of corrupted files to other applications.
background / details
The LAME encoder automatically adds CRC16 check-sums of both the file's audio stream (excluding any meta-data) and the file's Info tag header to every MP3 it creates. These check-sums are independent of additional meta-data like ID3 and APE tags, album art, lyrics, and so on, and enable integrity verification of the file data at a later time without having to worry about meta-data changes altering the computed check-sum (as would be the case with SFVs or MD5 hashes of the whole files).
The MusicCRC field is located at bytes $BC–$BD in the LAME Info tag; the Info tag CRC is located immediately following it at bytes $BE-$BF.
The purpose of the Info tag CRC is to verify the integrity of the data that LAME stores in the Info tag, including encoder version and settings, audio length, and the MusicCRC field itself.
mp3crc takes the Info tag CRC into account when determining whether a file is corrupt — if the Info tag is corrupt there is of course the possibility that the MusicCRC is corrupt as well.
It should be reiterated that this utility ONLY supports MP3 files created with the LAME encoder. MP3s created with other encoders, such as Apple's FhG-based iTunes encoder, will not contain the MusicCRC field and therefore cannot be verified by this method. (MP3s encoded with versions of LAME prior to 3.90 will not contain this field either, and are therefore also unsupported.)
mp3crc currently requires
mutagen, Quod Libet's Python-based audio meta-data library. However, only one or two very small dependencies remain, so i hope to eventually remove this requirement.
mp3crc also relies on
crcmod, a Python library used for calculating CRCs (which is integral to this tool's functionality, of course). I was previously using CRC code taken from the LAME project — and this code still remains in the script (just change one variable to use it) — but i found that it was extremely slow.
crcmod provides a substantial increase in speed. (For instance, computing the CRC for a 45-meg test file took 21 seconds using the previous method, whilst
crcmod took only 0.25 seconds.)
limitations / to do
mp3crc seems to be working well for the most part — i've run it on several-thousand of my own MP3 files and it seems at least 90% accurate. However, it does sometimes incorrectly parse strangely formatted Info tags, leading it to conclude the files were not encoded with LAME or that the Info tag is missing altogether. I hope to fix this soon.
Additionally, piping files from
xargs or where-ever is currently broken.
The only other major bug i'm aware of is that, on the Windows platform only,
mp3crc chokes on files and folders which contain non-ASCII characters. I guess this is because, even though NTFS stores file names as UTF-16, the Windows API by default returns them in the system locale (which is obviously not Unicode). I'm sure this can be fixed somehow, but i can't be arsed to do any more research on it. Using Windows is damaging to my mental health.
- I'd like to add the ability to display percentages in the summary (which should be simple of course)
- I'd like to add a debug mode containing useful information about each file and how
mp3crcis parsing it
- 49 (2012/12/07) — Fixed sorting of files when working with directories.
- 48 (2011/11/24) — Fixed a small miscalculation in the audio stream length — this didn't affect CRC computation, just the debug output. I've also now personally verified the results of running
mp3crc48 on my collection of ~16'000 files and LameTag seems to agree with everything that it's listed as corrupt (which was only about 0.5% of the lot — not bad!), so i'm guessing this might be ready for 'serious' usage. (I'm sure that after i say this i'll find a huge bug but w/e)
- 47 (2011/11/24) — Fixed crash caused by missing ID3v1 tags. (Oops)
- 46 (2011/11/24) — Fixed a bug in the detection of ID3v1 tags. Also added additional debug info.
- 45 (2011/11/24) — Fixed a bug in the detection of the Info tag offset which caused the script to wrongly consider certain Info tags to be invalid. This fix enabled a large number of previously 'invalid' files in my library to verify OK. :) Also added the ability to turn full paths on and off in the normal (non-bare) display formats. Additionally set full paths to be the default. (Toggle with
- 44 (2011/11/23) — Apparently in some cases (misbehaving software?) the ID3v1 tag can become duplicated, leading to a
TAGmarker at EOF - 256 bytes. This is illegal i guess but it doesn't seem to hurt anything, so we probably shouldn't mark it as corrupt. I've fixed it so that the tool now allows for this scenario. Also added a little more debug info.
- 43 (2011/11/23) — Improved detection of the next frame following the Info tag (my first 'improvement' was no good...); this fixes some bugs i detected in test files. Also the beginnings of a debug mode (accessible via
- 42 (2011/11/23) — Slightly better-looking output code (no functionality change).
- 41 (2011/11/23) — Now detects files altered by MP3Gain.
- 40 (2011/11/23) — Detection of tags at the end (ID3v1, APE, &c.) is faster, more accurate, and less stupidly written.
- 39 (2011/11/22) — Hopefully more robust detection of the next frame following the Info tag. Also i think this passes all of mutagen's test cases, but i'm not positive.
- 38 (2011/11/22) — Now including crcmod and mutagen.
- 37 (2011/11/22) — Fixed display of all different line options (-1, -2, -3, -5).
- 36 (2011/11/22) — Verbose options now implemented.
- 35 (2011/11/22) — Re-arranged a lot of things, added several comments, made it easier to use the in-built (slow) CRC method.
- 34 (2011/11/22) — First public release. Still extremely rough, but supports most of the intended options. Smart enough to properly compute CRCs for files with most common variations of tags (ID3v1, ID3v2, ID3v1+ID3v2, ID3v1+APEv2, no tags). Should do directory recursion. Ignores files that don't appear to be MP3s.
- 1 (2011/11/21) — Initial version.
I am an extremely poor programmer (and on top of that have never used Python before starting this), so i apologise in advance for what i'm sure is the embarrassing quality of my code.
references, related tools, and acknowledgements
MP3 Info Tag rev 1 specifications:
LAME, the open-source encoder which makes this utility possible:
LameTag, a somewhat similar utility for Windows:
mp3crc is based in part on a script created by Joe Wreschnig called 'MP3 stream header information support for Mutagen'. It was originally found on the mutagen issue tracker:
I'm not partial to it myself, but since Joe Wreschnig's original script was GPL, and this tool is based on it, i have unfortunately had to make it GPL as well. Apologies to anyone who needed BSD. Maybe some day if i get smarter i can re-write it. :(
This program is free software; you can redistribute it and/or modify it under the terms of version 2 of the GNU General Public License as published by the Free Software Foundation.