Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support using MusicBrainz IDs instead of other metadata #1363

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

vs49688
Copy link
Contributor

@vs49688 vs49688 commented Sep 23, 2021

Converts the library to use MusicBrainz IDs for albums, artists,
albumartists, and tracks. All playlists, stars, etc. are kept.

This is an irreversible transformation; the user is given a warning
and may cancel within 10 seconds by pressing ^C.

The mediafiles table is used as the sole source-of-truth to build
an entirely new hierarchy, based off the MusicBrainz IDs.

Existing entity metadata is applied to the new entities, except in the
case where one entity may need to be split (e.g. different releases of
the same album). These split entities are left blank.

References are then updated for both play queues and playlists.

The existing entities are then deleted, and a full-rescan is triggered
to fill in any missing information.

Closes #489.

@certuna
Copy link
Contributor

certuna commented Sep 24, 2021

I think it's a good idea in principle, just some remarks:

  • this will split albums where some tracks have different or no mbz_album_id, which may catch more than a few users by surprise, in particular those who group "singletons" together into one pseudo-compilation album (e.g. "Loose Tracks"). The alternative is using Grouping or Release Type for grouping these together (and excluding them from the album list), but unfortunately those features aren't implemented yet so you'd leave those users with no effective way to manage singletons
  • a good thing is this will allow splitting artists with the same name

@vs49688
Copy link
Contributor Author

vs49688 commented Sep 24, 2021

I think it's a good idea in principle, just some remarks:

* this will split albums where some tracks have different or no `mbz_album_id`, which may catch more than a few users by surprise, in particular those who group "singletons" together into one pseudo-compilation album (e.g. "Loose Tracks"). The alternative is using Grouping or Release Type for grouping these together (and excluding them from the album list), but unfortunately those features aren't implemented yet so you'd leave those users with no effective way to manage singletons

In this case, I'm of the mind to tell the user to strip the MusicBrainz tags from those files, retaining the current behaviour. By moving them into a "Loose Tracks" album, you're effectively creating a "new" album. If this "new" album isn't on MusicBrainz, it shouldn't have MusicBrainz tags. Also, isn't this what playlists are for?
Of course, this might be a unpopular opinion, but I think it's the more "correct" one.

Another option would be to add a "navidrome_mbignore" tag, which if set would tell Navidrome to ignore any MusicBrainz tags it finds when scanning.

* a good thing is this will allow splitting artists with the same name

Indeed, I never considered this may need to be done with artists until after I'd pushed the commit.

@vs49688
Copy link
Contributor Author

vs49688 commented Sep 25, 2021

Pushed new version that should handle track/album/artist/albumartist IDs.

@vs49688 vs49688 changed the title [DRAFT] scanner: use musicbrainz release id when generating album id scanner/mapping: take MusicBrainz tags into account when generating IDs Sep 25, 2021
@certuna
Copy link
Contributor

certuna commented Sep 25, 2021

One issue will be: what does Navidrome do with artists that have the same mbid but different tags in the Artist field, such as "DJ Tiesto" and "Tiesto"?

It's not so easy, you might get some strange behaviour, consider this case:

  • DJ Tiesto - Album 1 (with mbid's)
  • Tiesto - Album 2 (with mbid's)
  • DJ Tiesto - Album 3 (without mbid's)
  • Tiesto - Album 4 (without mbid's)

Current behaviour: album 1 and 3 grouped under DJ Tiesto, album 2 and 4 grouped under Tiesto
What will the new behaviour be?

@vs49688
Copy link
Contributor Author

vs49688 commented Sep 25, 2021

Regardless of my changes, albums 3 & 4 should be under different artists (or the artists fixed).
As for 1 & 2, perhaps I should change it to only use the mbid if it exists?

@vs49688
Copy link
Contributor Author

vs49688 commented Sep 25, 2021

Actually, thinking about it some more... What do we want the behaviour to be?
Do we want Navidrome to merge artists/albums/whatever based on mbid? That'd be a pretty drastic change in behaviour, but not an unreasonable one.

All I intended was to make it not merge different releases of the same album. If the tags in the files are wrong (thus causing Navidrome to split the artists/albums), then they should be fixed.

I'm fine with either option (it doesn't affect me as I run everything through Picard and beets), @deluan what do you think?

@certuna
Copy link
Contributor

certuna commented Sep 26, 2021

The more I think of it, it's indeed a drastic change.

Navidrome currently does all organization based on official tag fields: Artist, Album Artist, Album Title. It's not perfect (no disambiguation possible between artists or albums with the same name), but at least it's pretty transparent why stuff happens.

Moving to organization based on custom Musicbrainz ID tags is good if you consistently use those, but that puts the onus on users to have not only an immaculately tagged collection with 'regular' tags, but also consistently have the mbz tags correct, because you'll create splits between stuff with and without mbz id's. And this decreases transparency to the user why this happens, and we'll end up like Plex where the scanner feels like a black box where "anything can happen".

For example:

  • currently a user can move an album to another artist just by changing the Artist tag, that's very logical and simple. But the album still retains the old artist_mbid tag, what will Navidrome do?
  • currently a user can move a track to another album just by changing the Album Title tag. Logical, simple, and that's what pretty much every music player (foobar, Apple Music, MusicBee, etc) allows you to do. Grouping by release_mbid breaks that.

@vs49688
Copy link
Contributor Author

vs49688 commented Sep 26, 2021

The more I think of it, it's indeed a drastic change.

Navidrome currently does all organization based on official tag fields: Artist, Album Artist, Album Title. It's not perfect (no disambiguation possible between artists or albums with the same name), but at least it's pretty transparent why stuff happens.

Moving to organization based on custom Musicbrainz ID tags is good if you consistently use those, but that puts the onus on users to have not only an immaculately tagged collection with 'regular' tags, but also consistently have the mbz tags correct, because you'll create splits between stuff with and without mbz id's. And this decreases transparency to the user why this happens, and we'll end up like Plex where the scanner feels like a black box where "anything can happen".

As long as the scanning behaviour is explicitly-and-well-documented, I don't really see the problem.

For example:

* currently a user can move an album to another artist just by changing the `Artist` tag, that's very logical and simple. But the album still retains the old `artist_mbid` tag, what will Navidrome do?

* currently a user can move a track to another album just by changing the `Album Title` tag. Logical, simple, and that's what pretty much every music player (foobar, Apple Music, MusicBee, etc) allows you to do. Grouping by `release_mbid` breaks that.

How about this?

Add a ScanUseMbzID option in the configuration file, unset by default.
When unset, the behaviour is unchanged, keeping it as it is now.

When set, if an album/artist has a mbid, Navidrome will use it (and only it). This provides a solution to #489, assuming everything's tagged correctly. If the mbid is NOT set, it'll fall back and create a new/duplicate artist/album. It is up to the user to tag the files correctly and trigger a rescan.

Forgive me for being pushy on this, but I need a fix for the #489 - it's affecting large part of my library.

@deluan
Copy link
Member

deluan commented Sep 26, 2021

Hey, sorry being late for the party :)

After reading the thread, these are my two cents (mostly agreeing with latest @vs49688 comment):

  1. The new behaviour (using mbids) has to be configurable, and turned off by default
  2. When the feature is enabled, if no mbid is found, fallback to the current behaviour.
  3. I think it is useful only for albums, albumartists and artists, but not for tracks.

On top of that, a migration plan is a MUST, or else users that enable this will break their playlists, stars, playcounts, etc... The migration should go over all tables with an album_id or artist_id and recalculates a new id. Foreign keys must be observed. To make things even trickier, this will have to happen only when the config option is first enabled, and cannot be disabled afterwards (or it would mess with the DB), making the implementation tricky (similar to what I had to do for PasswordEncryptionKey)

@vs49688
Copy link
Contributor Author

vs49688 commented Sep 27, 2021

How'd you prefer the migrations to be done?

  1. Duplicate entries, except with the mbid as the id, fix up references, delete old, vacuum
  2. Disable foreign keys, update in-place

@deluan
Copy link
Member

deluan commented Sep 27, 2021

How'd you prefer the migrations to be done?

  1. Duplicate entries, except with the mbid as the id, fix up references, delete old, vacuum
  2. Disable foreign keys, update in-place

Whichever is faster. I'd assume it is the later, but it is worth to test it out if you can.

@vs49688 vs49688 force-pushed the adedup branch 9 times, most recently from 5298640 to 867aaf4 Compare September 29, 2021 02:39
@vs49688
Copy link
Contributor Author

vs49688 commented Sep 29, 2021

It's mostly finished, all that needs to be done now is some cleanups, and to trigger a full rescan.

Also, where's the best place for this? I've just put it in the top-level cmd package to start with, but it should probably go elsewhere. I'd like to put it in db/migrations (specifically so I can use forceFullRescan), but can't because of circular imports.

@vs49688 vs49688 marked this pull request as ready for review October 2, 2021 15:08
@deluan deluan mentioned this pull request Oct 6, 2021
@vs49688
Copy link
Contributor Author

vs49688 commented Oct 6, 2021

Ping on this?

Specific issues I'd like feedback on:

  • It's in the root cmd package for now, where should it actually go?
  • Do the DeleteMany functions I've added to the {artist,album}Repository interfaces have security implications I'm not aware of - i.e. does it automatically create a REST route for it (or something, I haven't really looked at the code for this).

The current behaviour:

  • If "Scanner.UseMbzIDs" is enabled:
    • Converts library to use MusicBrainz IDs for albums, artists, and albumartists
    • All playlists, stars, etc. are kept.
  • This is an irreversible transformation; adequate warning is given.
  • If set accidentally, the user may cancel the migration within 10 seconds by pressing ^C.
  • Once enabled, it will throw an error if attempted to disable.
  • A full rescan is required to finish the process - this must be triggered manually.

@deluan
Copy link
Member

deluan commented Oct 7, 2021

Sorry for the delay. Will take a look tonight

Copy link

Download the artifacts for this pull request:

@arg274
Copy link
Contributor

arg274 commented Jan 4, 2024

Still using the builds from this PR. If I'm not mistaken, if there's an artist with the same ID but multiple aliases, the name of the merged entity is set to whichever alias is scanned first (?)
I think a minor improvement would be to take the most recent one (based on the most recent release) or the mode of all the aliases.

@vs49688
Copy link
Contributor Author

vs49688 commented Jan 4, 2024

Still using the builds from this PR.

Good to know, I thought I was the only one.

If I'm not mistaken, if there's an artist with the same ID but multiple aliases, the name of the merged entity is set to whichever alias is scanned first (?) I think a minor improvement would be to take the most recent one (based on the most recent release) or the mode of all the aliases.

The approach this PR takes has been rejected (apparently not everyone is as Musicbrainz-only as I am 🤷‍♂️), and as such is in maintenance mode.

I'll keep updating it as necessary, but I'm only keeping it around until the equivalent functionality is implemented in Navidrome proper and hits a release.

Then I'll probably write a migration tool, and this PR can finally be laid to rest.

@vs49688 vs49688 force-pushed the adedup branch 2 times, most recently from a5420f6 to 865841f Compare January 26, 2024 05:26
Copy link

Download the artifacts for this pull request:

Copy link

Download the artifacts for this pull request:

Copy link

Download the artifacts for this pull request:

@danni-storm
Copy link

I'll keep updating it as necessary, but I'm only keeping it around until the equivalent functionality is implemented in Navidrome proper and hits a release.

Is there a chance of that happening sometime soon? Is the team planning on merging any of this code or implementing their own version? I see a lot of commits on your part to keep the PR in sync with the main releases but don't see any activity from the Navidrome team here for the last year or two. I know its foss and they have their hands full - more curious than anything as I think this would be an awesome feature since I have many albums with multiple versions.

Copy link

Download the artifacts for this pull request:

@deluan
Copy link
Member

deluan commented Apr 2, 2024

implementing their own version?

Yes, this has been discused in #1976 (comment) and will be implemented in #2709

For now I want to thank @vs49688 for keeping this up-to-date, as it obviously has value for a lot of other users.

@danni-storm
Copy link

Yes, this has been discused in #1976 (comment) and will be implemented in #2709

Ah somehow I missed that when reading over the comments. I apologize for posting such a silly question (the answer was right in front of me) and thank you for you patience 😃.

For now I want to thank @vs49688 for keeping this up-to-date, as it obviously has value for a lot of other users.

Yes, thank you so much, this is invaluable!

@vs49688
Copy link
Contributor Author

vs49688 commented May 16, 2024

For those relying on this, sorry for the delay, I've not had much time recently - I'll try to rebase this by the weekend

vs49688 added 10 commits May 16, 2024 23:39
Not changing Delete() as to avoid changing rest.Persistable.
Not MD5'ing them because they're UUIDs and can be
eyeball'd easily in the database.
Converts the library to use MusicBrainz IDs for albums, artists,
albumartists, and tracks. All playlists, stars, etc. are kept.

This is an irreversible transformation; the user is given a warning
and may cancel within 10 seconds by pressing ^C.

The mediafiles table is used as the sole source-of-truth to build
an entirely new hierarchy, based off the MusicBrainz IDs.

Existing entity metadata is applied to the new entities, except in the
case where one entity may need to be split (e.g. different releases of
the same album). These split entities are left blank.

References are then updated for both play queues and playlists.

The existing entities are then deleted, and a full-rescan is triggered
to fill in any missing information.
Copy link

Download the artifacts for this pull request:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support multiple releases of the same album
6 participants