Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added basic revive functionality. doesn't detect dead items automatically yet though. #20

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

IntendedConsequence
Copy link

Right now it looks for the same artist/album artist, album, track title and track number in the library and replaces selected item(s) with best bitrate. In short, it is the first commit to address #4. I'm planning also to address #8 and convert both picking best version and replacing with library version of the track to using progress bars. Later:

  • Get a list of all dead items in the current playlist/all playlists
  • Replace all dead items with best bitrate version from the library
  • Replace selected items/all items in all playlists with best bitrate and/or version from the library across all playlists
  • Keep a local database like foo_playcount with audio fingerprints for all tracks in media library with an option to calculate fingerprints for tracks outside media library as well for a much more accurate dead item revival as well as best bitrate version lookup option that doesn't depend on precise tags. Will use open source libchromaprint. Preliminary tests show great promise, and it is also used in MusicBrainz's AcoustID which opens new possibilities for tagging.

…ist/album artist, album, track title and track number in the library and replaces selected item(s) with best bitrate
@hymerman
Copy link
Owner

hymerman commented Aug 1, 2016

Hi Irwin! Thanks for this - looks like you've made a good start.

So is the idea that foobar playlists keep information about track numbers and other track metadata even if those files are missing? And that we can read that and find the equivalent track in the DB from it?

Good call on AcoustID - that's something that wasn't properly available when I started working/thinking on this but should definitely be incorporated. I think I started writing some code to match tracks on MBID but of course that's far too specific.

@IntendedConsequence
Copy link
Author

Yes, foobar keeps metadata for items in its playlists, even if the items are dead. As far as I know, foo_playlist_revive relies on file size and duration primarily to revive dead items (see https://hydrogenaud.io/index.php/topic,73910.msg651473.html#msg651473). That's why it doesn't work if you converted original files to another format or got them from another source. I believe we can do better than that.

Regarding libchromaprint, there are some possible problems with it:

  • I only tested the example command line utility fpcalc.exe, and not the actual dll itself, although I compiled both of them successfully. fpcalc.exe depends on ffmpeg which means you can just throw any format that is supported by the ffmpeg version you linked it against. libchromaprint on the other hand requires raw audio data to compute the fingerprint, so we're gonna feed it decoded audio data from foobar itself. The example decode.cpp in foobar2k SDK shows how to do that.
  • We should find out what maximum duration AcoustID uses to be compatible, just in case.
  • I couldn't compile libchromaprint on windows. I only managed to cross-compile it on linux using mingw32 to target windows, which added dependencies libgcc_s_sjlj-1.dll and libstdc++-6.dll which should be included with foo_bestversion if it's linked against libchromaprint (along with libchromaprint.dll as well). I didn't figure out a way to get rid of these dependencies just yet.
  • Fingerprinting an entire music library, which in my case is about 12000 files is a long process even on my i5 6600k and I used python's multiprocessing module to fingerprint in 4 processes at the same time. So not only we should do the fingerprinting in a non-locking UI way, but also do it in multiple threads for speed purposes. After the bulk of the work is done, we can monitor library(and possibly playlists) for changes and fingerprint new additions in background.
  • Reviving dead items using audio fingerprints will only work if the dead items were previously fingerprinted. That means we should keep a local db of audio fingerprints like foo_playcount. I'm currently looking at https://unqlite.org/intro.html to do that, since I don't want to bother with SQL queries for SQLite, since we're probably going to be storing a very limited subset of metadata for each fingerprint (file hash, duration, fingerprint, maybe artist-title-album). If you have experience/better suggestion in that regard I'm open to suggestions.
  • Matching a single fingerprint against thousands of existing fingerprints is also a time-consuming task. I converted the function match_fingerprints2 from https://bitbucket.org/acoustid/acoustid-server/src/cb303c2a3588ff055b7669cf6f1711a224ab9183/postgresql/acoustid_compare.c?at=master&fileviewer=file-view-default to work on its own (it comes as postgresql extension there) and we can put it into foo_bestversion without problems but that might not be enough. There might be a need to develop a better algorithm using trees or something to make fingerprint matching faster, something I'm also not very experienced with. Otherwise it might take at least minutes or even dozens of minutes to match 20 dead items fingerprints against the whole library. We could potentially limit fingerprint matching based on artist/album artist but that won't work that well on files with incorrect/missing tags.
  • I think it's also a good idea to run a function that substitutes most commonly used accents in letters like ä,ü,ö etc for common latin letters for both best version matching and other track replacer functions since not all tags are always correct in that regard.

As you see there is a lot of interesting work ahead of us and we have a potential to make a really useful foobar2000 plugin if we implement at least half of the above stated suggestions =)

@hymerman
Copy link
Owner

hymerman commented Aug 2, 2016

  • Cool, if it's keeping metadata that should make this quite easy :)
  • I think the task of fingerprinting is outside the scope of this plugin - I personally pass everything through Musicbrainz Picard before including it in my foobar library so I would just rely on the existing tags to match tracks on AcoustID. There are plenty of other ways of fingerprinting. If fingerprinting from within foobar would be useful, that should be another separate plugin. Unless I'm misunderstanding and actually there's a good reason that the fingerprint needs to be done as part of the revival/matching process? What do you think?
  • Do you know if foobar keeps all metadata from dead playlist items? If so then no local cache would be needed - we'd just scan the library for tracks matching the fingerprints in the dead items.
  • Multithreading is definitely useful - but let's tackle that as a separate issue to keep the scope of revival down. Add progress bar when picking best version of many tracks #8 is for that as you've mentioned.
  • Same thing with accents etc - that's a really useful feature and I'm surprised I don't already have an issue open for it :) Could you open a new issue and we'll track it separately?

@IntendedConsequence
Copy link
Author

  • Well I intend to use fingerprinting mostly for dead item revival purposes, just as a way to identify that a specific track exists in the library to complement tag and other metadata matching. Say you deleted a track from a studio album, but you also have a compilation album that has the same track but it obviously has a different album, and a different track number as well, and maybe even has VA in its artist tag. With fingerprinting, the plugin can determine automatically that there is a track in this compilation album that matches 98% to the dead item fingerprint, even if the duration is a bit different (leading/trailing 2-3 seconds of silence etc) and file format and quality are also different. If you don't know how audio fingerprinting and matching work, you can think of it as very similar to reverse image search like tineye.com - an audio fingerprint is a special hash that doesn't change completely when one bit changes as is a case with conventional hashes. And like conventional hashes it has to be computed ahead of time. Tagging files using these fingerprints isn't the scope of this plugin, I agree. Though it wouldn't hurt to be compatible with AcoustID default format. And as goes for the dead item revival process - the fingerprints of the dead items should already be computed. I mean, you can't hash a non-existing file, right? =) In case of finding best version of the track, matching fingerprints is also used just to identify that candidates for best version of the track to replace are indeed the same versions of the track that sound the same. Think of it as just an extra way to be sure you won't replace a track with a slightly different sounding track from a remaster album or a re-recorded version of the track that could otherwise be matched based on tags alone. Also, don't forget that there are a lot of users with a lot of different interests in music that don't even exist in MusicBrainz databases, so you can't rely on Picard for that.
  • It seems that foobar does keep all of the metadata in cache, except maybe for lyrics or other long fields. There is a file LargeFieldsConfig.txt in the root of foobar2000 directory which specifies maximum length for such fields as well as fields that it ignores completely (doesn't cache). The problem is that I don't want to change file tags that exist outside of the music library, that is why there is a need for a local db. And as far as I know you can't add metadata to the foobar2000 database if the file isn't in the music library.
  • Ok, I'll open a new issue.

I started playing with audio fingerprinting to solve a specific problem of mine. I have a lot of music in a specific format, and I want to convert that music to another format. The problem is that I have 300+ playlists that I don't want to recreate manually after the conversion. The existing foo_playlist_revive doesn't work if the tracks are in different formats, and it can operate on single playlist at a time. So I decided to write my own reviving plugin that doesn't need to rely on the tags to do its work. I searched first if there exists already an open source plugin that already does something similar so I could just modify it to do what I need, and I found foo_bestversion. I've noticed that dead item revival is on your planned features list so I thought that my changes could be useful to users of your plugin. If you don't want to include fingerprinting, that is also fine, I will just continue working on it on my own, and you can cherry-pick the changes that you deem useful. It's just I figured that if this functionality is useful to me, it could be also useful to others, and the idea of having different forks of foo_bestversion, one with fingerprinting and one without would be messy and confusing, even if I rename the resulting plugin, it still kind of does the same thing, right?
And in regards to being compatible to AcoustID fingerprints, you never know how it might turn out. Maybe it will motivate the creator of foo_musicbrainz to implement AcoustID matching, and in that case you could tag your files from within foobar2000 without the need of Picard =)

@hymerman
Copy link
Owner

hymerman commented Aug 3, 2016

Ok, I think we're actually on almost the same page! I'll be happy to incorporate relevant changes you make (I don't have any time to work on this myself but would merge requests and put out new builds).

I think you can get what you want (and add useful meaningful functionality for everyone else) with these steps:

  • Leave the process of fingerprinting and tagging files with fingerprints to another plugin or program (IIRC there was a tagger provided by the AcoustID project to help populate its DB before it was used by MusicBrainz?)
  • Add matching by AcoustID to foo_bestversion as an optional matching step, if files are tagged with fingerprints already (this is awesome!)
  • Add 'revive dead items from playlist' and 'revive dead items from all playlists' as menu/context commands
  • Implement Add progress bar when picking best version of many tracks #8 since the fingerprint matching will be slower, as you mentioned

The reason I say we're 'almost' on the same page is that I'm afraid I still don't understand why a cached database of fingerprints is needed - it sounds like you're saying you have playlists with files that aren't in foobar's library so need to look up their fingerprints in a cache? Is that right? If that's the case I don't see why you're using a foobar plugin for this task.

@IntendedConsequence
Copy link
Author

IntendedConsequence commented Aug 7, 2016

Well, I already started working on the dead item detection across all playlists and so far it looks good. Now, about that database of fingerprints. First of all, even for the music that is in the media library it's generally a good idea to give users the option of not writing the fingerprints into tags. A good example of this is again foo_playcount - there is an option to write (and synchronize) playback statistics with file tags but it is optional.
In short, what I'm trying to accomplish is to have a way to access a track's fingerprint even if the track is dead AND it doesn't have its fingerprint written to its tag. The way foobar caches file tags has little to do with this. If the dead item had fingerprint in its tag - great, we might not have to look it up in the database. Unfortunately if the fingerprint is going to be longer than the maximum tag length specified in foobar's LargeFieldsConfig.txt, foobar is not going to cache it, so in this case we also will have to look it up in the local db. And we can't expect that every user is going to edit their LargeFieldsConfig.txt file, it just isn't practical. Also, what if the user doesn't want to change file tags at all? We can't look up the fingerprint in the file's tag cache because in this case the fingerprint never was written to the file's tag in the first place. If we don't have a local database of fingerprints, we obviously wouldn't be able to do any matching for dead items. So how I see it, the fingerprinter plugin(if it's going to be a separate plugin) computes the audio hash and stores it in the local database (with an option to write it to tags), and when we have to do matching we first look if the fingerprint exists in file tag, if not then look it up in the local database, if it isn't there (and the item is dead) we shrug and skip fingerprint matching for that item and fall back to artist, title and other metadata matching.

The only potential problem I see now with having the fingerprinting functionality in a separate plugin, is when we want to fingerprint some selected tracks during revival/best version selection process from foo_bestversion. That kind of dll interoperability might get too complex, and to be honest I don't know how to do it.

By the way, the current way of title matching for finding the best version has a problem. If the title has a '[' or '(' it just drops what's after. Since there are tracks with (instrumental) or (remix by) or (feat. artist) in title, when looking for the best version it can make a wrong choice. That's why for replacement with library version of the track I made it to match title exactly, and I also think that it should be changed for the best version lookup as well. In future, it would be cool to add an options dialog where you can choose whether to ignore parenthesis in titles, ignore case, accented letters etc. If you've heard about levenshtein distance (fuzzy matching), that also could be added to the options where you can specify how closely titles (and maybe other metadata) should be matched and integrate it into the rating function.

Did I clear things up? =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants