Skip to content

Take random MP3s from *wherever* and organize them by artist.

License

Notifications You must be signed in to change notification settings

skyzyx/organizing-mp3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Organizing MP3s

A few hours worth of work to take MP3 files (which were probably acquired from Napster 20+ years ago), figure out what they actually are, then organize them into folders by artist.

This will probably be useful if you are old-school and listen to music with Winamp or VLC.

Or maybe you want to stream your music from Plex (e.g., with Plexamp) instead of using Apple Music, Spotify, Pandora, YouTube Music, or one of the other popular streaming services since you already have the damn songs anyway.

This is itch-and-scratch-ware. I had an itch, then I scratched it. Now it doesn't itch anymore. Issues are disabled. You can submit a PR if you want to see something added, or you can fork it and do whatever you want with it.

What does this do

  1. Use AcoustID to create an acoustic fingerprint and compare it to the MusicBrainz database.
  2. Get back the results.
  3. Look at the ID3 tags (if any) and merge them into the results.
  4. Which results do we see the most? Is there an obvious winner? If so, use it.
  5. Not an obvious winner? Look at the top results, and compare them against the filename using Levenshtein Distance.
  6. Now that we have a high-probability artist and song title, create folders and move files as appropriate.
  7. If there are any failures along the way, stop processing that file, and move onto the next file. This allows you to do manual cleanup instead of having this script probably screw it up on your behalf.

What does this NOT do

  • It does not support writing ID3 tags back into the MP3 file (although it could).
  • It does not support formats beyond MP3 (e.g., AAC, M4A) (although it could).
  • It does not do anything with albums. Album metadata is generally a mess when it comes to homegrown databases (e.g., AcoustID).
  • It does not aim for perfection. We're using databases that were developed outside the music industry, by music fans. If you're looking for data infallibility, be prepared to spend some coin on a Gracenote license. Having said that, my testing shows over 98% accuracy.

Requirements

Prerequisites

  • Python 3.6 or newer, with pip.
  • Download Chromaprint-fpcalc for your platform and CPU architecture, and add it to your path. If you can't add it to your path, set the FPCALC environment variable to point to it. (This is a requirement of pyacoustid.)
  • Register a new application on AcoustID, and save your API key as the value of the ACOUSTICID_KEY environment variable.

If you want to further develop this, there is a yapf definition to help you maintain consistency in formatting.

Installation (macOS/Linux)

It's a very, very standard Python installation. However, it's critical that you don't confuse any possible Python2 installation with a Python3 installation. Python2 is explicitly unsupported.

Activate an isolated virtualenv virtual environment

ℹ️ NOTE: While I'm using python3 and pip3 in these instructions, they may be called python or pip on your system. Adjust as necessary.

  1. Install virtualenv using pip.

    pip3 install -U virtualenv
  2. Create a new, isolated virtualenv virtual environment for running this Python code, so that it doesn't impact any other Python code you have on your system.

    virtualenv .venv
  3. Now activate your isolated virtualenv virtual environment.

    source .venv/bin/activate

    (You may need to substitute activate (Bash) for activate.csh (C-shell), activate.fish (Fish), or activate.ps1 (Windows PowerShell), depending on your platform.)

Working inside your isolated virtualenv virtual environment

ℹ️ NOTE: Now that we're inside the isolated virtualenv virtual environment, pip is always called pip, and python is always called python. Don't worry about the 3s anymore.

  1. Validate that it's working. If you get a result, you're good.

    which pip | grep ".venv/bin/pip"
  2. Install the Python packages that this script depends on.

    pip install -r requirements.txt

    Some of these dependencies are GPL-licensed. This will matter more in the "Licensing" section, at the end of this document.

Installation (Windows)

TBD. I need to boot up my Windows VM and do some testing.

Basic usage (macOS/Linux)

ℹ️ NOTE: This snippet assumes you're using GNU tools, which do not ship by default on macOS (which uses BSD tools). If you're using macOS, see “Using GNU command line tools in macOS instead of FreeBSD tools” for more information. This script should work without modification in common Linuxes, as well as the Windows Subsystem for Linux (WSL2).

This will automatically move/organize the MP3s into artist folders (not album artist) and rename the files to match the song title.

One file

Let's say you have a filename called 09 Paradise in Me.mp3 (Napster, remember?). You can run the script against this one file to test it out.

Starting File structure (relevant files only):

.
├── 09 Paradise in Me.mp3
└── aidmatch.py

Command:

./aidmatch.py "09 Paradise in Me.mp3"

Output:

09 Paradise in Me.mp3 ~> "Paradise in Me" by K's Choice

Ending File structure (relevant files only):

.
├── K's Choice
│   └── Paradise in Me.mp3
└── aidmatch.py

Find all MP3s in the current directory and pass them through the script

This script does not support *, such as *.mp3. It only supports one file at a time. However, we can use find to discover all of the MP3s, then xargs to execute this script once-for-each-file.

Starting File structure (relevant files only):

.
├── 09 Paradise in Me.mp3
├── 09 The Click Five - Just the Girl.mp3
├── 10 Far Behind.mp3
├── 10 Higher.mp3
├── 11 Only God Knows Why.mp3
└── aidmatch.py

Command:

find . -maxdepth 1 -type f -name "*.mp3" -print0 | xargs -0 --no-run-if-empty -I% ./aidmatch.py "%"

Output:

./10 Far Behind.mp3 ~> "Far Behind" by Candlebox
./09 Paradise in Me.mp3 ~> "Paradise in Me" by K’s Choice
./11 Only God Knows Why.mp3 ~> "Only God Knows Why" by Kid Rock
./10 Higher.mp3 ~> "Higher" by Creed
./09 The Click Five - Just the Girl.mp3 ~> "Just the Girl" by The Click Five

Ending File structure (relevant files only):

.
├── Candlebox/
│   └── Far Behind.mp3
├── Creed/
│   └── Higher.mp3
├── K's Choice/
│   └── Paradise in Me.mp3
├── Kid Rock/
│   └── Only God Knows Why.mp3
├── The Click Five/
│   └── Just the Girl.mp3
└── aidmatch.py

Find all MP3s in the current directory AND all child directories, and pass them through the script

WAIT! This will also include any files that you have already run through in a previous run. Move those out of the way first.

Starting File structure (relevant files only):

.
├── sub/
│   └── subsub/
│       └── subsubsub/
│           ├── Candlebox/
│           │   └── Far Behind.mp3
│           ├── Creed/
│           │   └── Higher.mp3
│           ├── K’s Choice/
│           │   └── Paradise in Me.mp3
│           ├── Kid Rock/
│           │   └── Only God Knows Why.mp3
│           └── The Click Five/
│               └── Just the Girl.mp3
└── aidmatch.py

Command:

find . -type f -name "*.mp3" -print0 | xargs -0 --no-run-if-empty -I% ./aidmatch.py "%"

Output:

./sub/subsub/subsubsub/K’s Choice/Paradise in Me.mp3 ~> "Paradise in Me" by K’s Choice
./sub/subsub/subsubsub/Candlebox/Far Behind.mp3 ~> "Far Behind" by Candlebox
./sub/subsub/subsubsub/Kid Rock/Only God Knows Why.mp3 ~> "Only God Knows Why" by Kid Rock
./sub/subsub/subsubsub/Creed/Higher.mp3 ~> "Higher" by Creed
./sub/subsub/subsubsub/The Click Five/Just the Girl.mp3 ~> "Just the Girl" by The Click Five

Ending File structure (relevant files only):

.
├── Candlebox/
│   └── Far Behind.mp3
├── Creed/
│   └── Higher.mp3
├── K’s Choice/
│   └── Paradise in Me.mp3
├── Kid Rock/
│   └── Only God Knows Why.mp3
├── sub/
│   └── subsub/
│       └── subsubsub/
│           ├── Candlebox/
│           ├── Creed/
│           ├── K's Choice/
│           ├── Kid Rock/
│           └── The Click Five/
├── The Click Five/
│   └── Just the Girl.mp3
└── aidmatch.py

Dealing with errors or missing data

In this example, you may have a file (e.g., Munkafust - Down For Days(1).mp3) who's Acoustic ID doesn' match anything in the AcoustID database. It will fallback to using the ID3 tags (exclusively). If there are also no ID3 tags, this script will give up and let you deal with it yourself.

There may also be some files where Chromaprint cannot determine the acoustic fingerprint of the song at all.

Command:

./aidmatch.py "Fuel - Shimmer.mp3"

Output:

!!!!!!!!!! fingerprint could not be calculated (Fuel - Shimmer.mp3)

Maybe you can try to identify it with Shazam or SoundHound? Or (ahem) obtain a better quality version of the file?

Basic usage (Windows)

TBD. I need to boot up my Windows VM and do some testing.

License

This is a little tricky, so I'll do my best to be specific.

I support intellectual property, therefore, I choose not to license my software under "Free Software" licenses such as those from the Free Software Foundation (e.g., GPL).

Instead, I support empowering people to build the best software they can without the intellectual property restrictions required by the GPL. For this purpose, I tend to use "Open Source" licenses such as MIT, BSD, or Apache 2.0 which essentially boil down to "use this software for whatever you want; hide your source code if you want; don't be a dick" (I am not a lawyer; this is not legal advice).

The aidmatch.py file in this repo is essentially a completely rewritten sample taken from the MIT-licensed pyacoustid project. As such, this source code in this repository is MIT licensed because that's the license I choose.

Here's the wrinkle: GPLv2 has a provision about code that is intermingled with GPLv2 code, in that it becomes GPL code itself. However this has limits at the process boundary. This precludes the output of one app being passed as the input to another app (e.g., shell piping). However, because Python is interpreted, and all Python code in this project runs in the same process, the code while it's being run is GPLv2.

So, if you plan to download and run this code as-is with all of its dependencies, it's GPLv2. If you find this code on GitHub and just want to copy bits of it without necessarily running it, it's MIT.

At least, that's my intention. This stance is based on my understanding of Implications of using GPL-licensed client-side JavaScript and The JavaScript Trap — which apply (I believe) because Python code is also interpreted (although it's not explicitly pushed to user's computers like JavaScript-on-a-webpage is.)

About

Take random MP3s from *wherever* and organize them by artist.

Topics

Resources

License

Stars

Watchers

Forks

Languages