Skip to content

Retrieve the release date and record label of a set of music tracks with high precision & accuracy. 20,000 guessed so far!

License

Notifications You must be signed in to change notification settings

n42r/guestrrday

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

guestrrday

because for music heads, context matters!

A CLI tool that does one thing and does it well: guess the release date and record label of a set of music tracks with high accuracy and precision. 20,000+ songs guessed so far!

Description

The pop music historian's dream: given a directory of music tracks or a list of one or more song names (in a textfile or CLI args), guess the year of release and the record label of each track.

Example:

Ike & Tina Turner - A Love Like Yours (Don't Come Knocking Everyday)

=>

Ike & Tina Turner - A Love Like Yours (Don't Come Knocking Everyday) (London American, 1966)

The script queries discogs.com and includes some tweaks to increase the precision and accuracy of the search (see Why This Tool? below).

The tool will rename files if the input is a directory, and will create a new textfile in the case of file, and will print to stdout in case of cli args.

Installation

  • guesterrday can be installed by running pip install guestrrday.
  • To update guesterrday run pip install --upgrade guestrrday.

On some systems you might have to change pip to pip3.

Usage

NOTE: You need to have a free discogs user token to use the tool. This is how to do it:

  1. Create a free account on discogs.com
  2. Visit the developer tools section and create a new token
  3. Set the DISCOGS_TOKEN environmental variable on your system with that value. Alternatively, the tool will prompt you for it on the first run and then store in an environmental variable.

General Usage:

guestrrday --input SOURCE

Where SOURCE can be any of:

  • Directory containing music files, or
  • Filename for a text file containing a list song names, one per line, or
  • "[song1] [, song2 ] [...]": Comma-separated list of song names

The tool will automatically detect which one you mean.

You can run guestrrday as a package if running it as a script doesn't work:

python -m guestrrday --input SOURCE

Why This Tool?

TLDR; The devil in the detail

Guestrrday scans a list of song titles and queries discogs.com for the year and label of each. This is clearly a simple function, right? Why a whole tool?

I made this tool because I had two strict requirements: high prediction rate and accuracy at scale. So what is the current prediction rate and accuracy you ask? Well, there are three variables affecting prediction rate and accuracy: (1) the completeness of the discogs.com database, (2) unsanitized tracknames / filenames, and (3) the limitations of the discogs search engine (ex., sensitivity to slight changes in search terms). We can't control (1) but we can control (2) & (3), and this is what this tool focuses on1. The completeness of the discogs DB and any music DB really varies based on the music: for example, data on 90s electronic music singles is much more complete than pre-war blues releases (1930s, 40s).

To throw rough estimates from experience, I would say the average completness of the discogs DB with regards to the year of release is around 85-90%. 95% of those are typically detected by this tool, which gives it a 95% prediction rate.

However, it must be noted, there are many singles on discogs which have no original year of release, in that case the tool will return the year of release of the earliest available reissue. For example, a disco single released in 1970s but for which no year is available for the original release, but a reissue exists (say, one released in 2015), the detected year for that track would be 2015 (and the label would be the reissue label).

Contribution

Pull requests welcome, just create an issue.

License

This project is Licensed under the MIT License.

Footnotes

  1. I tried generic fuzzy matching (thefuzz) but couldn't seem to get as good of an accuracy in search results: I got more false positives and couldn't find a sweetspot in the numerical score to optimize false negatives. Ofcourse I could have run a large test sets and optimized for that value and fuzzy matching algorithm, but for such a small project it was not worth it.

About

Retrieve the release date and record label of a set of music tracks with high precision & accuracy. 20,000 guessed so far!

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages