Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add playlist field to smart playlists - #1417 #1884

Merged
merged 1 commit into from Jan 20, 2024

Conversation

flyingOwl
Copy link
Contributor

A smart playlist can use the playlist name for filtering. This can be used to create combined playlists or to filter multiple playlists.

I am a bit unsure about the two new left joins. They work the same way as the genre filter. It would be great, if they are only used when we actually filter by genre or a playlist. But there seems to be no easy way to do this (I guess?) as the fields are already processed into a query long before the left joins are appended.

@flyingOwl flyingOwl marked this pull request as draft September 25, 2022 08:34
@flyingOwl
Copy link
Contributor Author

Unfortunately, it works only in the "positive" direction. Adding two (or more) lists together works.
Subtracting playlists does not work:

List 1: Song A, Song B, Song C, Song D
List 2: Song A, Song C, Song E

Smart Playlist:
is: playlist: List 1
isNot: playlist: List 2

expected: Song B, Song D
current result: Song A, Song B, Song C, Song D (List 1)

I'm looking at the sql query again before un-drafting this PR.

@flyingOwl
Copy link
Contributor Author

To support all kinds of smart filtering, (I believe) there is no way around a subquery.

This change introduces two new operators: inPlaylist and notInPlaylist which allow only the name key. A smart playlist could look like this:

{
	"all": [
		{"inPlaylist": {"name": "Birthday playlist"}},
		{"notInPlaylist": {"name": "List of songs Matt doesnt like"}}
	]
}

To combine playlists:

{
	"any": [
		{"inPlaylist": {"name": "List 1"}},
		{"inPlaylist": {"name": "List 2"}}
	]
}

The code could be extended to support more fields from the playlist table, like song_count or the owner.

This is the first time I've ever used Go. Please tell me if something is terribly wrong!

@flyingOwl flyingOwl marked this pull request as ready for review September 25, 2022 11:40
@deluan
Copy link
Member

deluan commented Sep 27, 2022

Hey, thanks for taking a look at this. I'm not sure if a subquery will be fast enough though.. Can you please rebase this so I can test it with my own library? Thanks!

@flyingOwl
Copy link
Contributor Author

I was also thinking about the performance when adding this. Using another WHERE clause with a new LEFT JOIN was not enough (this was my first try). Regarding the performance of the new subquery, I could swap the EXISTS (SELECT 1 ... with a IN (SELECT media_file.id ....

I don't know if sqlite is smart enough to not do a string comparison for every row. If it is not, it would be better to precache the playlist IDs and get rid of the LEFT JOIN in the subquery. It could look like this:

subQuery := squirrel.Select("pl.media_file_id").
		From("playlist_tracks pl").
		Where(squirrel.Eq{"pl.playlist_id": name_to_id_cache[playlistname]})
# ...
return "IN (" + subQText + ")", subQArgs, nil

But nonetheless, I'm curious to see how fast the subqueries actually are with large databases.

Comment on lines 260 to 263
subQuery := squirrel.Select("1").
From("playlist_tracks pl").
LeftJoin("playlist on pl.playlist_id = playlist.id").
Where(squirrel.And{squirrel.Expr("pl.media_file_id = media_file.id"), squirrel.Eq{"playlist.name": playlistname}})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be

media_file.id IN (
  SELECT media_file_id
  FROM playlist_tracks
  LEFT JOIN playlist
  ON playlist_tracks.playlist_id = playlist.id
  WHERE playlist.name = "<name>"
)

I'm not an SQL expert so I can't tell definitively whether it would be better, but my gut feeling is that it should be easier to optimize for the database.

Copy link
Contributor

@kgarner7 kgarner7 Mar 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding on here: This is significantly faster. Here's what I used to replicate your results

// Subquery to check if media_file.id matches with the playlist name in playlist_tracks table
subQuery := squirrel.Select("media_file_id").
	From("playlist_tracks pl").
	LeftJoin("playlist on pl.playlist_id = playlist.id").
	Where(squirrel.Eq{"playlist.name": playlistname})
subQText, subQArgs, err := subQuery.PlaceholderFormat(squirrel.Question).ToSql()
if err != nil {
	return "", nil, err
}
if negate {
	return "media_file.id NOT IN (" + subQText + ")", subQArgs, nil
} else {
	return "media_file.id IN (" + subQText + ")", subQArgs, nil
}

Performance:
OLD: 947.9 ms

PM GO.1 |  TRAC[0216] SQL: `INSERT INTO playlist_tracks (id,playlist_id,media_file_id) SELECT row_number() over (order by random()) as id, '...' as playlist_id, media_file.id as media_file_id FROM media_file LEFT JOIN annotation on (annotation.item_id = media_file.id AND annotation.item_type = 'media_file' AND annotation.user_id = '...') LEFT JOIN media_file_genres ag on media_file.id = ag.media_file_id LEFT JOIN genre on ag.genre_i
8:58:23 PM GO.1 |  >  d = genre.id WHERE (EXISTS(SELECT 1 FROM playlist_tracks pl LEFT JOIN playlist on pl.playlist_id = playlist.id WHERE (pl.media_file_id = media_file.id AND playlist.name = ?))) GROUP BY media_file.id ORDER BY random()`  args="[[...],<nil>]" elapsedTime=947.9ms requestId=ok/0AOnerGlUd-000103 rowsAffected=4282

NEW: 34 ms

PM GO.1 |  TRAC[0004] SQL: `INSERT INTO playlist_tracks (id,playlist_id,media_file_id) SELECT row_number() over (order by random()) as id, '...' as playlist_id, media_file.id as media_file_id FROM media_file LEFT JOIN annotation on (annotation.item_id = media_file.id AND annotation.item_type = 'media_file' AND annotation.user_id = '...') LEFT JOIN media_file_genres ag on media_file.id = ag.media_file_id LEFT JOIN genre on ag.genre_i
8:59:14 PM GO.1 |  >  d = genre.id WHERE (media_file.id IN (SELECT media_file_id FROM playlist_tracks pl LEFT JOIN playlist on pl.playlist_id = playlist.id WHERE playlist.name = ?)) GROUP BY media_file.id ORDER BY random()`  args="[[...],<nil>]" elapsedTime=33.4ms  rowsAffected=4282

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Playlist JSON used for testing:

{
    "all":  [
        {"inPlaylist": {"name": "..." }}
    ],  
    "sort": "random"
}

@upsuper
Copy link
Contributor

upsuper commented Feb 12, 2023

I compared results of

EXPLAIN SELECT id
FROM media_file
WHERE id IN (
  SELECT media_file_id
  FROM playlist_tracks
  LEFT JOIN playlist
  ON playlist_tracks.playlist_id = playlist.id
  WHERE playlist.name = '<name>'
);

and

EXPLAIN SELECT id
FROM media_file
WHERE EXISTS (
  SELECT 1
  FROM playlist_tracks
  LEFT JOIN playlist
  ON playlist_tracks.playlist_id = playlist.id
  WHERE playlist.name = '<name>'
  AND playlist_tracks.media_file_id = media_file.id
);

The former (my proposed one) gives:

addr  opcode         p1    p2    p3    p4             p5  comment      
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     33    0                    0   
1     OpenRead       3     16    0     k(1,)          2   
2     BeginSubrtn    0     2     0                    0   
3       Once           0     22    0                    0   
4       OpenEphemeral  4     1     0     k(1,B)         0   
5       OpenRead       2     106   0     2              0   
6       OpenRead       5     46    0     k(2,,)         2   
7       OpenRead       1     63    0     3              0   
8       OpenRead       6     61    0     k(2,,)         2   
9       String8        0     3     0     <name>         0   
10      SeekGE         5     22    3     1              0   
11        IdxGT          5     22    3     1              0   
12        DeferredSeek   5     0     2                    0   
13        Column         2     0     4                    0   
14        SeekGE         6     21    4     1              0   
15          IdxGT          6     21    4     1              0   
16          DeferredSeek   6     0     1                    0   
17          Column         1     2     5                    0   
18          MakeRecord     5     1     6     A              0   
19          IdxInsert      4     6     5     1              0   
20        Next           6     15    0                    0   
21      Next           5     11    1                    0   
22    Return         2     3     1                    0   
23    Rewind         4     32    0                    0   
24      Column         4     0     1                    0   
25      IsNull         1     31    0                    0   
26      SeekGE         3     31    1     1              0   
27        IdxGT          3     31    1     1              0   
28        Column         3     0     7                    0   
29        ResultRow      7     1     0                    0   
30      Next           3     27    0                    0   
31    Next           4     24    0                    0   
32    Halt           0     0     0                    0   
33    Transaction    0     0     236   0              1   
34    Goto           0     1     0                    0   

and the latter (this PR) gives

addr  opcode         p1    p2    p3    p4             p5  comment      
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     31    0                    0   
1     OpenRead       3     16    0     k(1,)          0   
2     Rewind         3     30    1     0              0   
3       BeginSubrtn    0     2     0                    0   
4         Integer        0     3     0                    0   
5         Integer        1     4     0                    0   
6         OpenRead       2     106   0     2              0   
7         OpenRead       4     46    0     k(2,,)         2   
8         OpenRead       1     63    0     3              0   
9         OpenRead       5     61    0     k(2,,)         2   
10        String8        0     5     0     <name>         0   
11        SeekGE         4     25    5     1              0   
12          IdxGT          4     25    5     1              0   
13          DeferredSeek   4     0     2                    0   
14          Column         2     0     6                    0   
15          SeekGE         5     24    6     1              0   
16            IdxGT          5     24    6     1              0   
17            DeferredSeek   5     0     1                    0   
18            Column         1     2     7                    0   
19            Column         3     0     8                    0   
20            Ne             8     23    7     BINARY-8       81  
21            Integer        1     3     0                    0   
22            DecrJumpZero   4     25    0                    0   
23          Next           5     16    0                    0   
24        Next           4     12    1                    0   
25      Return         2     4     1                    0   
26      IfNot          3     29    1                    0   
27      Column         3     0     10                   0   
28      ResultRow      10    1     0                    0   
29    Next           3     3     0                    1   
30    Halt           0     0     0                    0   
31    Transaction    0     0     236   0              1   
32    Goto           0     1     0                    0   

I'm not very familiar with SQLite bytecode, but my naive interpretation is that the former runs the subquery once to construct an ephemeral table containing all media_file_id of playlists satisfying the condition, and then goes through that table to return all the ids, and the latter has to run the subquery for each item in media_file, so I tend to believe that the former should be faster.

subQuery := squirrel.Select("1").
From("playlist_tracks pl").
LeftJoin("playlist on pl.playlist_id = playlist.id").
Where(squirrel.And{squirrel.Expr("pl.media_file_id = media_file.id"), squirrel.Eq{"playlist.name": playlistname}})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also maybe we need to check the owner id of the playlist?

On the one hand, a user shouldn't be able to hijack tracks into another user's smart playlist just by giving their playlist a conflicting name. On the other hand, a user should also not be able to snoop on other users' playlists by creating a smart playlist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do note that there is no constraint in the names being unique, so you should definitely check for the user ownership.
It is possible for one user to have multiple playlists of the same name as well, so you should give the option of providing the ID instead

Copy link
Contributor

@upsuper upsuper Mar 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current interface of adding smart playlist, there is no way that user can specify the id of a playlist, but on the other hand, in the future when we have UI to add smart playlist, it's unlikely we want to support using name matching. So maybe in this case we want to translate name to id when importing, and only persist the id-form in the database?

But the same issue probably applies to most of other criteria, so maybe it's not a big issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That first point isn't entirely true. You can definitely reference the ID of an existing playlist, you would just need to add that SQL in the rule, so you could have something like

{
    "all":  [
        {"inPlaylist": {"id": "guid" }}
    ],  
    "sort": "random"
}

Of course, this could run into the problem if you're referencing other playlists, and suppose you migrate to a new system but /shrug/. You could still support both.
Also I guess it is worth point out that smart playlists (currently) are created for the first admin user, so idk how permissions would really shake out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But how can a user be expected to know what the playlist id is? I don't think everyone knows how to do SQL 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I think we should only support id.

Copy link
Contributor

@kgarner7 kgarner7 Mar 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one benefit I can think of for also supporting names:

Suppose you are creating multiple playlists at once, and at least one of them is a smart playlist.
You know what all their names are going to be (by the file name), but the ID is unknown until creation.
So, suppose you had files A, B, C, and D depending on A, and E depending on D and B (this is a contrived example).
If you support by name, Navidrome will eventually figure out the relationships and all playlists will be filled.
By contrast, if you need an ID, then you couldn't create D or E until A, and C are created.

Is that worth it? I don't know. But it is a use case. Regardless, I definitely think ID should be a possible field

Copy link
Contributor

@upsuper upsuper Mar 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned before, when we have UI to create playlist, I think it's more likely we would just let user pick a playlist for existing ones, so unlikely name would be useful at that point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the playlist owner, I'd love to be able to combine playlists of different users.

A use case would be that person A, B and C create their own playlist "roadtrip". And when the three go on a roadtrip, they create a smart playlist to merge the three playlists dynamically into one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The playlist should at least be shared I guess.

But if we use id rather than name, it is less of a concern to accidentally include or leak tracks of a playlist.

@upsuper
Copy link
Contributor

upsuper commented Feb 13, 2023

@deluan Have you been able to try this and see how it performs in a large database?

@kgarner7
Copy link
Contributor

kgarner7 commented Mar 1, 2023

(crossposting from discord)
Testing this with local dev environment, I have a few cursory thoughts:

  • for one check involving isInPlaylist of size 4300, getting and refreshing takes 1-2 seconds
  • I made chained smart playlists off of this one (original -> test -> test2 -> test3), and it takes up to ~6 seconds on the first fetch
  • the navidrome web ui just kinda stops responding when I do a playlist refresh (Firefox). No idea what could be causing this
    The nested playlist is noticeably slower, at least when comparing the /tracks performance (Example below: 44ms vs 1s)
# Taken from regular
PM GO.1 |  DEBU[0827] HTTP: GET http://localhost:4633/api/playlist/.../tracks?_end=100&_order=ASC&_sort=id&_start=0&playlist_id=...  elapsedTime=44ms
# Taken from nested
PM GO.1 |  DEBU[0840] HTTP: GET http://localhost:4633/api/playlist/.../tracks?_end=100&_order=ASC&_sort=id&_start=0&playlist_id=...  elapsedTime=1.03s

@deluan
Copy link
Member

deluan commented Mar 30, 2023

Sorry for being late to the party. I'm kinda lost in the back and forth of comments here. What is the consensus?

Re: Performance:
@upsuper / @kgarner7 Is there a way to make it fast?
@flyingOwl Can you take a look at any suggestions?

Re: Using names vs ids
I think we should only support IDs, and the query has to make sure the playlist is owned by the user, or the playlist is shared.
@kgarner7 , the use case you mentioned is an extreme edge case. Once we add the UI for playlists (that will only use IDs from a Select), we can think of a way to support multiple smart playlists in one .NSP document. I think this would be a better way to solve the use case you mentioned.

@upsuper
Copy link
Contributor

upsuper commented Mar 30, 2023

@deluan given #1884 (comment) I think using id in should make it fast enough, and also I believe using playlist id only should make it even faster.

@upsuper
Copy link
Contributor

upsuper commented Mar 30, 2023

Another thing worth thinking about is what should we do with cycle dependency between playlists I guess.

@kgarner7
Copy link
Contributor

With just the playlist id, using my modification would be quite fast. If you want to check user, you would also need to add a way for the filter to know the user of the current playlist.

@deluan
Copy link
Member

deluan commented Apr 28, 2023

Another thing worth thinking about is what should we do with cycle dependency between playlists I guess.

We need a way to detect this...

With just the playlist id, using my modification would be quite fast. If you want to check user, you would also need to add a way for the filter to know the user of the current playlist.

Hey @flyingOwl, can you try to implement these changes? Do you need any guidance?

@flyingOwl
Copy link
Contributor Author

Sure! I will take a look at this. Hopefully, I'll find some time in the next few days.

@flyingOwl
Copy link
Contributor Author

The subquery is now evaluating the playlist ids and checks if the playlist is public.

Is there a way to get the owner of the playlist being processed? Currently, it can only combine public playlists.

How bad are circular dependencies? And is it worth the effort to completely eliminate all cases? The worst I can imagine is a set of playlists, excluding and including each other, that might play ping pong on each sync.

@upsuper
Copy link
Contributor

upsuper commented May 7, 2023

I think the only way to avoid cycle dependency is to read all playlists that depend on other playlists, and do a check as a graph. And that can probably be done when updating the criteria of a playlist.

@deluan
Copy link
Member

deluan commented May 17, 2023

The subquery is now evaluating the playlist ids and checks if the playlist is public.

Thanks for that!

Is there a way to get the owner of the playlist being processed? Currently, it can only combine public playlists.

The owner should be determined by the owner_id playlist field.

How bad are circular dependencies? And is it worth the effort to completely eliminate all cases? The worst I can imagine is a set of playlists, excluding and including each other, that might play ping pong on each sync.

Yeah, that makes sense. I'll test it and see if there's no other side-effects. Thanks!

@github-actions
Copy link

github-actions bot commented May 17, 2023

Download the artifacts for this pull request:

@flyingOwl
Copy link
Contributor Author

The owner should be determined by the owner_id playlist field.

This is the field I'm trying to access in the scope of the inList function. Not for the playlist referenced by the playlist id, but for the smart playlist that uses the inPlaylist operator.

Take playlistA which somehow references playlistB by {"inPlaylist": {"id": "playlistB-id"}}. The inList function generates the query to include songs from playlistB and can evaluate the owner of playlistB. But how can I get the owner of playlistA which wants to include playlistB? Is it possible to get the smart playlist which caused the lookup of another playlist with the inList function?

In other words, what should replace the TODO placeholder here: https://github.com/navidrome/navidrome/pull/1884/files#diff-0d3777fa1584c406daa9ba2874f8c6520a54ff900ffe58dd30e915e323ca9cd6R266

@deluan
Copy link
Member

deluan commented May 19, 2023

In other words, what should replace the TODO placeholder here: https://github.com/navidrome/navidrome/pull/1884/files#diff-0d3777fa1584c406daa9ba2874f8c6520a54ff900ffe58dd30e915e323ca9cd6R266

This is the place where the criteria is applied:

sql := Select("row_number() over (order by "+rules.OrderBy()+") as id", "'"+pls.ID+"' as playlist_id", "media_file.id as media_file_id").
From("media_file").LeftJoin("annotation on (" +
"annotation.item_id = media_file.id" +
" AND annotation.item_type = 'media_file'" +
" AND annotation.user_id = '" + userId(r.ctx) + "')").
LeftJoin("media_file_genres ag on media_file.id = ag.media_file_id").
LeftJoin("genre on ag.genre_id = genre.id").GroupBy("media_file.id")
sql = r.addCriteria(sql, rules)

As you can see, we don't currently join with the playlist itself, which would give us its owner_id. Unfortunately I don't have time to try to solve this, so I suggest we make it a requirement for the playlist to be public to be used in the inPlaylistList operator. What dou you think?

@flyingOwl
Copy link
Contributor Author

Unfortunately I don't have time to try to solve this, so I suggest we make it a requirement for the playlist to be public to be used in the inPlaylistList operator. What dou you think?

I think that is the best solution for now. I had some ideas on how to access the owner of the smart playlist, but they would need non-trivial changes to other parts of the code. Regarding my use case, making the playlists public is not an issue.

I have rebased my changes and dropped the todo part.

@deluan
Copy link
Member

deluan commented Jun 16, 2023

Hey @flyingOwl , sorry for the delay. This looks good now. Can you please add a few tests?

@deluan deluan added the keep label Nov 26, 2023
@deluan deluan mentioned this pull request Nov 26, 2023
25 tasks
@flyingOwl
Copy link
Contributor Author

Hey @flyingOwl , sorry for the delay. This looks good now. Can you please add a few tests?

I'm very sorry for not responding sooner and mostly abandoning this pull request.

It will probably take until the Christmas holidays before I can work on this feature again. Could you give me some pointers on where and how to add tests? That would be much appreciated and the hurdle to continue here would be a bit lower

@deluan
Copy link
Member

deluan commented Dec 2, 2023

I'm very sorry for not responding sooner and mostly abandoning this pull request.

No problem!

Could you give me some pointers on where and how to add tests

Basically we want to be sure that the new operator converts correctly to SQL and to JSON. You just need to add both checks in the two DescribeTable sections of the operators_test.go file: https://github.com/navidrome/navidrome/blob/bbd3882a752c956a8891938973f653e08a947710/model/criteria/operators_test.go

Take a look and let me know if you still have any questions.

A smart playlist can use the playlist id for filtering. This can be
used to create combined playlists or to filter multiple playlists.

To filter by a playlist id, a subquery is created that will match the
media ids with the playlists within the playlist_tracks table.

Signed-off-by: flyingOwl <ofenfisch@googlemail.com>
@flyingOwl
Copy link
Contributor Author

Alright. I've added tests for the operators. Actually, I expected some full-grown database test setup where I would have to add all kinds of tests for various cases. So I'm glad it was that easy 😃

@deluan deluan merged commit dfa453c into navidrome:master Jan 20, 2024
6 checks passed
@flyingOwl flyingOwl deleted the add-a-playlist-field/1417 branch January 21, 2024 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants