New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add playlist field to smart playlists - #1417 #1884
Conversation
Unfortunately, it works only in the "positive" direction. Adding two (or more) lists together works.
I'm looking at the sql query again before un-drafting this PR. |
7ecf9e9
to
d5f970b
Compare
To support all kinds of smart filtering, (I believe) there is no way around a subquery. This change introduces two new operators:
To combine playlists:
The code could be extended to support more fields from the This is the first time I've ever used Go. Please tell me if something is terribly wrong! |
d5f970b
to
578116e
Compare
Hey, thanks for taking a look at this. I'm not sure if a subquery will be fast enough though.. Can you please rebase this so I can test it with my own library? Thanks! |
578116e
to
292079a
Compare
I was also thinking about the performance when adding this. Using another I don't know if sqlite is smart enough to not do a string comparison for every row. If it is not, it would be better to precache the playlist IDs and get rid of the subQuery := squirrel.Select("pl.media_file_id").
From("playlist_tracks pl").
Where(squirrel.Eq{"pl.playlist_id": name_to_id_cache[playlistname]})
# ...
return "IN (" + subQText + ")", subQArgs, nil But nonetheless, I'm curious to see how fast the subqueries actually are with large databases. |
model/criteria/operators.go
Outdated
subQuery := squirrel.Select("1"). | ||
From("playlist_tracks pl"). | ||
LeftJoin("playlist on pl.playlist_id = playlist.id"). | ||
Where(squirrel.And{squirrel.Expr("pl.media_file_id = media_file.id"), squirrel.Eq{"playlist.name": playlistname}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative would be
media_file.id IN (
SELECT media_file_id
FROM playlist_tracks
LEFT JOIN playlist
ON playlist_tracks.playlist_id = playlist.id
WHERE playlist.name = "<name>"
)
I'm not an SQL expert so I can't tell definitively whether it would be better, but my gut feeling is that it should be easier to optimize for the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding on here: This is significantly faster. Here's what I used to replicate your results
// Subquery to check if media_file.id matches with the playlist name in playlist_tracks table
subQuery := squirrel.Select("media_file_id").
From("playlist_tracks pl").
LeftJoin("playlist on pl.playlist_id = playlist.id").
Where(squirrel.Eq{"playlist.name": playlistname})
subQText, subQArgs, err := subQuery.PlaceholderFormat(squirrel.Question).ToSql()
if err != nil {
return "", nil, err
}
if negate {
return "media_file.id NOT IN (" + subQText + ")", subQArgs, nil
} else {
return "media_file.id IN (" + subQText + ")", subQArgs, nil
}
Performance:
OLD: 947.9 ms
PM GO.1 | TRAC[0216] SQL: `INSERT INTO playlist_tracks (id,playlist_id,media_file_id) SELECT row_number() over (order by random()) as id, '...' as playlist_id, media_file.id as media_file_id FROM media_file LEFT JOIN annotation on (annotation.item_id = media_file.id AND annotation.item_type = 'media_file' AND annotation.user_id = '...') LEFT JOIN media_file_genres ag on media_file.id = ag.media_file_id LEFT JOIN genre on ag.genre_i
8:58:23 PM GO.1 | > d = genre.id WHERE (EXISTS(SELECT 1 FROM playlist_tracks pl LEFT JOIN playlist on pl.playlist_id = playlist.id WHERE (pl.media_file_id = media_file.id AND playlist.name = ?))) GROUP BY media_file.id ORDER BY random()` args="[[...],<nil>]" elapsedTime=947.9ms requestId=ok/0AOnerGlUd-000103 rowsAffected=4282
NEW: 34 ms
PM GO.1 | TRAC[0004] SQL: `INSERT INTO playlist_tracks (id,playlist_id,media_file_id) SELECT row_number() over (order by random()) as id, '...' as playlist_id, media_file.id as media_file_id FROM media_file LEFT JOIN annotation on (annotation.item_id = media_file.id AND annotation.item_type = 'media_file' AND annotation.user_id = '...') LEFT JOIN media_file_genres ag on media_file.id = ag.media_file_id LEFT JOIN genre on ag.genre_i
8:59:14 PM GO.1 | > d = genre.id WHERE (media_file.id IN (SELECT media_file_id FROM playlist_tracks pl LEFT JOIN playlist on pl.playlist_id = playlist.id WHERE playlist.name = ?)) GROUP BY media_file.id ORDER BY random()` args="[[...],<nil>]" elapsedTime=33.4ms rowsAffected=4282
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Playlist JSON used for testing:
{
"all": [
{"inPlaylist": {"name": "..." }}
],
"sort": "random"
}
I compared results of EXPLAIN SELECT id
FROM media_file
WHERE id IN (
SELECT media_file_id
FROM playlist_tracks
LEFT JOIN playlist
ON playlist_tracks.playlist_id = playlist.id
WHERE playlist.name = '<name>'
); and EXPLAIN SELECT id
FROM media_file
WHERE EXISTS (
SELECT 1
FROM playlist_tracks
LEFT JOIN playlist
ON playlist_tracks.playlist_id = playlist.id
WHERE playlist.name = '<name>'
AND playlist_tracks.media_file_id = media_file.id
); The former (my proposed one) gives:
and the latter (this PR) gives
I'm not very familiar with SQLite bytecode, but my naive interpretation is that the former runs the subquery once to construct an ephemeral table containing all |
model/criteria/operators.go
Outdated
subQuery := squirrel.Select("1"). | ||
From("playlist_tracks pl"). | ||
LeftJoin("playlist on pl.playlist_id = playlist.id"). | ||
Where(squirrel.And{squirrel.Expr("pl.media_file_id = media_file.id"), squirrel.Eq{"playlist.name": playlistname}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also maybe we need to check the owner id of the playlist?
On the one hand, a user shouldn't be able to hijack tracks into another user's smart playlist just by giving their playlist a conflicting name. On the other hand, a user should also not be able to snoop on other users' playlists by creating a smart playlist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do note that there is no constraint in the names being unique, so you should definitely check for the user ownership.
It is possible for one user to have multiple playlists of the same name as well, so you should give the option of providing the ID instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current interface of adding smart playlist, there is no way that user can specify the id of a playlist, but on the other hand, in the future when we have UI to add smart playlist, it's unlikely we want to support using name matching. So maybe in this case we want to translate name to id when importing, and only persist the id-form in the database?
But the same issue probably applies to most of other criteria, so maybe it's not a big issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That first point isn't entirely true. You can definitely reference the ID of an existing playlist, you would just need to add that SQL in the rule, so you could have something like
{
"all": [
{"inPlaylist": {"id": "guid" }}
],
"sort": "random"
}
Of course, this could run into the problem if you're referencing other playlists, and suppose you migrate to a new system but /shrug/. You could still support both.
Also I guess it is worth point out that smart playlists (currently) are created for the first admin user, so idk how permissions would really shake out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But how can a user be expected to know what the playlist id is? I don't think everyone knows how to do SQL 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case I think we should only support id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one benefit I can think of for also supporting names:
Suppose you are creating multiple playlists at once, and at least one of them is a smart playlist.
You know what all their names are going to be (by the file name), but the ID is unknown until creation.
So, suppose you had files A, B, C, and D depending on A, and E depending on D and B (this is a contrived example).
If you support by name, Navidrome will eventually figure out the relationships and all playlists will be filled.
By contrast, if you need an ID, then you couldn't create D or E until A, and C are created.
Is that worth it? I don't know. But it is a use case. Regardless, I definitely think ID should be a possible field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned before, when we have UI to create playlist, I think it's more likely we would just let user pick a playlist for existing ones, so unlikely name would be useful at that point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the playlist owner, I'd love to be able to combine playlists of different users.
A use case would be that person A, B and C create their own playlist "roadtrip". And when the three go on a roadtrip, they create a smart playlist to merge the three playlists dynamically into one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The playlist should at least be shared I guess.
But if we use id rather than name, it is less of a concern to accidentally include or leak tracks of a playlist.
@deluan Have you been able to try this and see how it performs in a large database? |
(crossposting from discord)
|
Sorry for being late to the party. I'm kinda lost in the back and forth of comments here. What is the consensus? Re: Performance: Re: Using names vs ids |
@deluan given #1884 (comment) I think using |
Another thing worth thinking about is what should we do with cycle dependency between playlists I guess. |
With just the playlist id, using my modification would be quite fast. If you want to check user, you would also need to add a way for the filter to know the user of the current playlist. |
We need a way to detect this...
Hey @flyingOwl, can you try to implement these changes? Do you need any guidance? |
Sure! I will take a look at this. Hopefully, I'll find some time in the next few days. |
292079a
to
2f9b90f
Compare
The subquery is now evaluating the playlist ids and checks if the playlist is public. Is there a way to get the owner of the playlist being processed? Currently, it can only combine public playlists. How bad are circular dependencies? And is it worth the effort to completely eliminate all cases? The worst I can imagine is a set of playlists, excluding and including each other, that might play ping pong on each sync. |
I think the only way to avoid cycle dependency is to read all playlists that depend on other playlists, and do a check as a graph. And that can probably be done when updating the criteria of a playlist. |
Thanks for that!
The owner should be determined by the
Yeah, that makes sense. I'll test it and see if there's no other side-effects. Thanks! |
Download the artifacts for this pull request: |
This is the field I'm trying to access in the scope of the Take In other words, what should replace the |
This is the place where the criteria is applied: navidrome/persistence/playlist_repository.go Lines 229 to 236 in 2f9b90f
As you can see, we don't currently join with the playlist itself, which would give us its |
I think that is the best solution for now. I had some ideas on how to access the owner of the smart playlist, but they would need non-trivial changes to other parts of the code. Regarding my use case, making the playlists public is not an issue. I have rebased my changes and dropped the |
2f9b90f
to
e7b7e2d
Compare
Hey @flyingOwl , sorry for the delay. This looks good now. Can you please add a few tests? |
I'm very sorry for not responding sooner and mostly abandoning this pull request. It will probably take until the Christmas holidays before I can work on this feature again. Could you give me some pointers on where and how to add tests? That would be much appreciated and the hurdle to continue here would be a bit lower |
No problem!
Basically we want to be sure that the new operator converts correctly to SQL and to JSON. You just need to add both checks in the two Take a look and let me know if you still have any questions. |
A smart playlist can use the playlist id for filtering. This can be used to create combined playlists or to filter multiple playlists. To filter by a playlist id, a subquery is created that will match the media ids with the playlists within the playlist_tracks table. Signed-off-by: flyingOwl <ofenfisch@googlemail.com>
e7b7e2d
to
dbcc860
Compare
Alright. I've added tests for the operators. Actually, I expected some full-grown database test setup where I would have to add all kinds of tests for various cases. So I'm glad it was that easy 😃 |
I am a bit unsure about the two new left joins. They work the same way as the
genre
filter. It would be great, if they are only used when we actually filter bygenre
or aplaylist
. But there seems to be no easy way to do this (I guess?) as the fields are already processed into a query long before the left joins are appended.