feat: Add playlist field to smart playlists - #1417 #1884

flyingOwl · 2022-09-24T22:31:08Z

A smart playlist can use the playlist name for filtering. This can be used to create combined playlists or to filter multiple playlists.

I am a bit unsure about the two new left joins. They work the same way as the genre filter. It would be great, if they are only used when we actually filter by genre or a playlist. But there seems to be no easy way to do this (I guess?) as the fields are already processed into a query long before the left joins are appended.

flyingOwl · 2022-09-25T08:39:15Z

Unfortunately, it works only in the "positive" direction. Adding two (or more) lists together works.
Subtracting playlists does not work:

List 1: Song A, Song B, Song C, Song D
List 2: Song A, Song C, Song E

Smart Playlist:
is: playlist: List 1
isNot: playlist: List 2

expected: Song B, Song D
current result: Song A, Song B, Song C, Song D (List 1)

I'm looking at the sql query again before un-drafting this PR.

flyingOwl · 2022-09-25T11:40:41Z

To support all kinds of smart filtering, (I believe) there is no way around a subquery.

This change introduces two new operators: inPlaylist and notInPlaylist which allow only the name key. A smart playlist could look like this:

{
	"all": [
		{"inPlaylist": {"name": "Birthday playlist"}},
		{"notInPlaylist": {"name": "List of songs Matt doesnt like"}}
	]
}

To combine playlists:

{
	"any": [
		{"inPlaylist": {"name": "List 1"}},
		{"inPlaylist": {"name": "List 2"}}
	]
}

The code could be extended to support more fields from the playlist table, like song_count or the owner.

This is the first time I've ever used Go. Please tell me if something is terribly wrong!

deluan · 2022-09-27T21:15:09Z

Hey, thanks for taking a look at this. I'm not sure if a subquery will be fast enough though.. Can you please rebase this so I can test it with my own library? Thanks!

flyingOwl · 2022-09-28T07:46:36Z

I was also thinking about the performance when adding this. Using another WHERE clause with a new LEFT JOIN was not enough (this was my first try). Regarding the performance of the new subquery, I could swap the EXISTS (SELECT 1 ... with a IN (SELECT media_file.id ....

I don't know if sqlite is smart enough to not do a string comparison for every row. If it is not, it would be better to precache the playlist IDs and get rid of the LEFT JOIN in the subquery. It could look like this:

subQuery := squirrel.Select("pl.media_file_id").
		From("playlist_tracks pl").
		Where(squirrel.Eq{"pl.playlist_id": name_to_id_cache[playlistname]})
# ...
return "IN (" + subQText + ")", subQArgs, nil

But nonetheless, I'm curious to see how fast the subqueries actually are with large databases.

upsuper · 2023-02-12T12:20:50Z

model/criteria/operators.go

+	subQuery := squirrel.Select("1").
+		From("playlist_tracks pl").
+		LeftJoin("playlist on pl.playlist_id = playlist.id").
+		Where(squirrel.And{squirrel.Expr("pl.media_file_id = media_file.id"), squirrel.Eq{"playlist.name": playlistname}})


An alternative would be

media_file.id IN ( SELECT media_file_id FROM playlist_tracks LEFT JOIN playlist ON playlist_tracks.playlist_id = playlist.id WHERE playlist.name = "<name>" )

I'm not an SQL expert so I can't tell definitively whether it would be better, but my gut feeling is that it should be easier to optimize for the database.

Adding on here: This is significantly faster. Here's what I used to replicate your results

// Subquery to check if media_file.id matches with the playlist name in playlist_tracks table subQuery := squirrel.Select("media_file_id"). From("playlist_tracks pl"). LeftJoin("playlist on pl.playlist_id = playlist.id"). Where(squirrel.Eq{"playlist.name": playlistname}) subQText, subQArgs, err := subQuery.PlaceholderFormat(squirrel.Question).ToSql() if err != nil { return "", nil, err } if negate { return "media_file.id NOT IN (" + subQText + ")", subQArgs, nil } else { return "media_file.id IN (" + subQText + ")", subQArgs, nil }

Performance:
OLD: 947.9 ms

PM GO.1 | TRAC[0216] SQL: `INSERT INTO playlist_tracks (id,playlist_id,media_file_id) SELECT row_number() over (order by random()) as id, '...' as playlist_id, media_file.id as media_file_id FROM media_file LEFT JOIN annotation on (annotation.item_id = media_file.id AND annotation.item_type = 'media_file' AND annotation.user_id = '...') LEFT JOIN media_file_genres ag on media_file.id = ag.media_file_id LEFT JOIN genre on ag.genre_i 8:58:23 PM GO.1 | > d = genre.id WHERE (EXISTS(SELECT 1 FROM playlist_tracks pl LEFT JOIN playlist on pl.playlist_id = playlist.id WHERE (pl.media_file_id = media_file.id AND playlist.name = ?))) GROUP BY media_file.id ORDER BY random()` args="[[...],<nil>]" elapsedTime=947.9ms requestId=ok/0AOnerGlUd-000103 rowsAffected=4282

NEW: 34 ms

PM GO.1 | TRAC[0004] SQL: `INSERT INTO playlist_tracks (id,playlist_id,media_file_id) SELECT row_number() over (order by random()) as id, '...' as playlist_id, media_file.id as media_file_id FROM media_file LEFT JOIN annotation on (annotation.item_id = media_file.id AND annotation.item_type = 'media_file' AND annotation.user_id = '...') LEFT JOIN media_file_genres ag on media_file.id = ag.media_file_id LEFT JOIN genre on ag.genre_i 8:59:14 PM GO.1 | > d = genre.id WHERE (media_file.id IN (SELECT media_file_id FROM playlist_tracks pl LEFT JOIN playlist on pl.playlist_id = playlist.id WHERE playlist.name = ?)) GROUP BY media_file.id ORDER BY random()` args="[[...],<nil>]" elapsedTime=33.4ms rowsAffected=4282

Playlist JSON used for testing:

{ "all": [ {"inPlaylist": {"name": "..." }} ], "sort": "random" }

upsuper · 2023-02-12T12:57:10Z

I compared results of

EXPLAIN SELECT id
FROM media_file
WHERE id IN (
  SELECT media_file_id
  FROM playlist_tracks
  LEFT JOIN playlist
  ON playlist_tracks.playlist_id = playlist.id
  WHERE playlist.name = '<name>'
);

and

EXPLAIN SELECT id
FROM media_file
WHERE EXISTS (
  SELECT 1
  FROM playlist_tracks
  LEFT JOIN playlist
  ON playlist_tracks.playlist_id = playlist.id
  WHERE playlist.name = '<name>'
  AND playlist_tracks.media_file_id = media_file.id
);

The former (my proposed one) gives:

addr  opcode         p1    p2    p3    p4             p5  comment      
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     33    0                    0   
1     OpenRead       3     16    0     k(1,)          2   
2     BeginSubrtn    0     2     0                    0   
3       Once           0     22    0                    0   
4       OpenEphemeral  4     1     0     k(1,B)         0   
5       OpenRead       2     106   0     2              0   
6       OpenRead       5     46    0     k(2,,)         2   
7       OpenRead       1     63    0     3              0   
8       OpenRead       6     61    0     k(2,,)         2   
9       String8        0     3     0     <name>         0   
10      SeekGE         5     22    3     1              0   
11        IdxGT          5     22    3     1              0   
12        DeferredSeek   5     0     2                    0   
13        Column         2     0     4                    0   
14        SeekGE         6     21    4     1              0   
15          IdxGT          6     21    4     1              0   
16          DeferredSeek   6     0     1                    0   
17          Column         1     2     5                    0   
18          MakeRecord     5     1     6     A              0   
19          IdxInsert      4     6     5     1              0   
20        Next           6     15    0                    0   
21      Next           5     11    1                    0   
22    Return         2     3     1                    0   
23    Rewind         4     32    0                    0   
24      Column         4     0     1                    0   
25      IsNull         1     31    0                    0   
26      SeekGE         3     31    1     1              0   
27        IdxGT          3     31    1     1              0   
28        Column         3     0     7                    0   
29        ResultRow      7     1     0                    0   
30      Next           3     27    0                    0   
31    Next           4     24    0                    0   
32    Halt           0     0     0                    0   
33    Transaction    0     0     236   0              1   
34    Goto           0     1     0                    0

and the latter (this PR) gives

addr  opcode         p1    p2    p3    p4             p5  comment      
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     31    0                    0   
1     OpenRead       3     16    0     k(1,)          0   
2     Rewind         3     30    1     0              0   
3       BeginSubrtn    0     2     0                    0   
4         Integer        0     3     0                    0   
5         Integer        1     4     0                    0   
6         OpenRead       2     106   0     2              0   
7         OpenRead       4     46    0     k(2,,)         2   
8         OpenRead       1     63    0     3              0   
9         OpenRead       5     61    0     k(2,,)         2   
10        String8        0     5     0     <name>         0   
11        SeekGE         4     25    5     1              0   
12          IdxGT          4     25    5     1              0   
13          DeferredSeek   4     0     2                    0   
14          Column         2     0     6                    0   
15          SeekGE         5     24    6     1              0   
16            IdxGT          5     24    6     1              0   
17            DeferredSeek   5     0     1                    0   
18            Column         1     2     7                    0   
19            Column         3     0     8                    0   
20            Ne             8     23    7     BINARY-8       81  
21            Integer        1     3     0                    0   
22            DecrJumpZero   4     25    0                    0   
23          Next           5     16    0                    0   
24        Next           4     12    1                    0   
25      Return         2     4     1                    0   
26      IfNot          3     29    1                    0   
27      Column         3     0     10                   0   
28      ResultRow      10    1     0                    0   
29    Next           3     3     0                    1   
30    Halt           0     0     0                    0   
31    Transaction    0     0     236   0              1   
32    Goto           0     1     0                    0

I'm not very familiar with SQLite bytecode, but my naive interpretation is that the former runs the subquery once to construct an ephemeral table containing all media_file_id of playlists satisfying the condition, and then goes through that table to return all the ids, and the latter has to run the subquery for each item in media_file, so I tend to believe that the former should be faster.

upsuper · 2023-02-13T10:37:30Z

model/criteria/operators.go

+	subQuery := squirrel.Select("1").
+		From("playlist_tracks pl").
+		LeftJoin("playlist on pl.playlist_id = playlist.id").
+		Where(squirrel.And{squirrel.Expr("pl.media_file_id = media_file.id"), squirrel.Eq{"playlist.name": playlistname}})


Also maybe we need to check the owner id of the playlist?

On the one hand, a user shouldn't be able to hijack tracks into another user's smart playlist just by giving their playlist a conflicting name. On the other hand, a user should also not be able to snoop on other users' playlists by creating a smart playlist.

Do note that there is no constraint in the names being unique, so you should definitely check for the user ownership.
It is possible for one user to have multiple playlists of the same name as well, so you should give the option of providing the ID instead

In the current interface of adding smart playlist, there is no way that user can specify the id of a playlist, but on the other hand, in the future when we have UI to add smart playlist, it's unlikely we want to support using name matching. So maybe in this case we want to translate name to id when importing, and only persist the id-form in the database?

But the same issue probably applies to most of other criteria, so maybe it's not a big issue.

That first point isn't entirely true. You can definitely reference the ID of an existing playlist, you would just need to add that SQL in the rule, so you could have something like

{ "all": [ {"inPlaylist": {"id": "guid" }} ], "sort": "random" }

Of course, this could run into the problem if you're referencing other playlists, and suppose you migrate to a new system but /shrug/. You could still support both.
Also I guess it is worth point out that smart playlists (currently) are created for the first admin user, so idk how permissions would really shake out.

But how can a user be expected to know what the playlist id is? I don't think everyone knows how to do SQL 😄

In that case I think we should only support id.

There is one benefit I can think of for also supporting names:

Suppose you are creating multiple playlists at once, and at least one of them is a smart playlist.
You know what all their names are going to be (by the file name), but the ID is unknown until creation.
So, suppose you had files A, B, C, and D depending on A, and E depending on D and B (this is a contrived example).
If you support by name, Navidrome will eventually figure out the relationships and all playlists will be filled.
By contrast, if you need an ID, then you couldn't create D or E until A, and C are created.

Is that worth it? I don't know. But it is a use case. Regardless, I definitely think ID should be a possible field

As I mentioned before, when we have UI to create playlist, I think it's more likely we would just let user pick a playlist for existing ones, so unlikely name would be useful at that point.

Regarding the playlist owner, I'd love to be able to combine playlists of different users.

A use case would be that person A, B and C create their own playlist "roadtrip". And when the three go on a roadtrip, they create a smart playlist to merge the three playlists dynamically into one.

The playlist should at least be shared I guess.

But if we use id rather than name, it is less of a concern to accidentally include or leak tracks of a playlist.

upsuper · 2023-02-13T10:39:12Z

@deluan Have you been able to try this and see how it performs in a large database?

kgarner7 · 2023-03-01T04:44:20Z

(crossposting from discord)
Testing this with local dev environment, I have a few cursory thoughts:

for one check involving isInPlaylist of size 4300, getting and refreshing takes 1-2 seconds
I made chained smart playlists off of this one (original -> test -> test2 -> test3), and it takes up to ~6 seconds on the first fetch
the navidrome web ui just kinda stops responding when I do a playlist refresh (Firefox). No idea what could be causing this
The nested playlist is noticeably slower, at least when comparing the /tracks performance (Example below: 44ms vs 1s)

# Taken from regular
PM GO.1 |  DEBU[0827] HTTP: GET http://localhost:4633/api/playlist/.../tracks?_end=100&_order=ASC&_sort=id&_start=0&playlist_id=...  elapsedTime=44ms
# Taken from nested
PM GO.1 |  DEBU[0840] HTTP: GET http://localhost:4633/api/playlist/.../tracks?_end=100&_order=ASC&_sort=id&_start=0&playlist_id=...  elapsedTime=1.03s

deluan · 2023-03-30T22:38:17Z

Sorry for being late to the party. I'm kinda lost in the back and forth of comments here. What is the consensus?

Re: Performance:
@upsuper / @kgarner7 Is there a way to make it fast?
@flyingOwl Can you take a look at any suggestions?

Re: Using names vs ids
I think we should only support IDs, and the query has to make sure the playlist is owned by the user, or the playlist is shared.
@kgarner7 , the use case you mentioned is an extreme edge case. Once we add the UI for playlists (that will only use IDs from a Select), we can think of a way to support multiple smart playlists in one .NSP document. I think this would be a better way to solve the use case you mentioned.

upsuper · 2023-03-30T23:01:19Z

@deluan given #1884 (comment) I think using id in should make it fast enough, and also I believe using playlist id only should make it even faster.

upsuper · 2023-03-30T23:04:33Z

Another thing worth thinking about is what should we do with cycle dependency between playlists I guess.

kgarner7 · 2023-03-31T03:05:01Z

With just the playlist id, using my modification would be quite fast. If you want to check user, you would also need to add a way for the filter to know the user of the current playlist.

deluan · 2023-04-28T00:37:21Z

Another thing worth thinking about is what should we do with cycle dependency between playlists I guess.

We need a way to detect this...

With just the playlist id, using my modification would be quite fast. If you want to check user, you would also need to add a way for the filter to know the user of the current playlist.

Hey @flyingOwl, can you try to implement these changes? Do you need any guidance?

flyingOwl · 2023-04-30T09:59:19Z

Sure! I will take a look at this. Hopefully, I'll find some time in the next few days.

flyingOwl · 2023-05-07T09:15:03Z

The subquery is now evaluating the playlist ids and checks if the playlist is public.

Is there a way to get the owner of the playlist being processed? Currently, it can only combine public playlists.

How bad are circular dependencies? And is it worth the effort to completely eliminate all cases? The worst I can imagine is a set of playlists, excluding and including each other, that might play ping pong on each sync.

upsuper · 2023-05-07T10:24:18Z

I think the only way to avoid cycle dependency is to read all playlists that depend on other playlists, and do a check as a graph. And that can probably be done when updating the criteria of a playlist.

deluan · 2023-05-17T17:26:19Z

The subquery is now evaluating the playlist ids and checks if the playlist is public.

Thanks for that!

Is there a way to get the owner of the playlist being processed? Currently, it can only combine public playlists.

The owner should be determined by the owner_id playlist field.

How bad are circular dependencies? And is it worth the effort to completely eliminate all cases? The worst I can imagine is a set of playlists, excluding and including each other, that might play ping pong on each sync.

Yeah, that makes sense. I'll test it and see if there's no other side-effects. Thanks!

github-actions · 2023-05-17T17:37:52Z

Download the artifacts for this pull request:

flyingOwl · 2023-05-17T20:12:28Z

The owner should be determined by the owner_id playlist field.

This is the field I'm trying to access in the scope of the inList function. Not for the playlist referenced by the playlist id, but for the smart playlist that uses the inPlaylist operator.

Take playlistA which somehow references playlistB by {"inPlaylist": {"id": "playlistB-id"}}. The inList function generates the query to include songs from playlistB and can evaluate the owner of playlistB. But how can I get the owner of playlistA which wants to include playlistB? Is it possible to get the smart playlist which caused the lookup of another playlist with the inList function?

In other words, what should replace the TODO placeholder here: https://github.com/navidrome/navidrome/pull/1884/files#diff-0d3777fa1584c406daa9ba2874f8c6520a54ff900ffe58dd30e915e323ca9cd6R266

deluan · 2023-05-19T15:29:18Z

In other words, what should replace the TODO placeholder here: https://github.com/navidrome/navidrome/pull/1884/files#diff-0d3777fa1584c406daa9ba2874f8c6520a54ff900ffe58dd30e915e323ca9cd6R266

This is the place where the criteria is applied:

navidrome/persistence/playlist_repository.go

Lines 229 to 236 in 2f9b90f

    
           sql := Select("row_number() over (order by "+rules.OrderBy()+") as id", "'"+pls.ID+"' as playlist_id", "media_file.id as media_file_id"). 
        
           	From("media_file").LeftJoin("annotation on (" + 
        
           	"annotation.item_id = media_file.id" + 
        
           	" AND annotation.item_type = 'media_file'" + 
        
           	" AND annotation.user_id = '" + userId(r.ctx) + "')"). 
        
           	LeftJoin("media_file_genres ag on media_file.id = ag.media_file_id"). 
        
           	LeftJoin("genre on ag.genre_id = genre.id").GroupBy("media_file.id") 
        
           sql = r.addCriteria(sql, rules)

As you can see, we don't currently join with the playlist itself, which would give us its owner_id. Unfortunately I don't have time to try to solve this, so I suggest we make it a requirement for the playlist to be public to be used in the inPlaylistList operator. What dou you think?

flyingOwl · 2023-05-20T09:20:09Z

Unfortunately I don't have time to try to solve this, so I suggest we make it a requirement for the playlist to be public to be used in the inPlaylistList operator. What dou you think?

I think that is the best solution for now. I had some ideas on how to access the owner of the smart playlist, but they would need non-trivial changes to other parts of the code. Regarding my use case, making the playlists public is not an issue.

I have rebased my changes and dropped the todo part.

deluan · 2023-06-16T18:17:48Z

Hey @flyingOwl , sorry for the delay. This looks good now. Can you please add a few tests?

flyingOwl · 2023-12-01T22:56:13Z

Hey @flyingOwl , sorry for the delay. This looks good now. Can you please add a few tests?

I'm very sorry for not responding sooner and mostly abandoning this pull request.

It will probably take until the Christmas holidays before I can work on this feature again. Could you give me some pointers on where and how to add tests? That would be much appreciated and the hurdle to continue here would be a bit lower

deluan · 2023-12-02T03:06:52Z

I'm very sorry for not responding sooner and mostly abandoning this pull request.

No problem!

Could you give me some pointers on where and how to add tests

Basically we want to be sure that the new operator converts correctly to SQL and to JSON. You just need to add both checks in the two DescribeTable sections of the operators_test.go file: https://github.com/navidrome/navidrome/blob/bbd3882a752c956a8891938973f653e08a947710/model/criteria/operators_test.go

Take a look and let me know if you still have any questions.

A smart playlist can use the playlist id for filtering. This can be used to create combined playlists or to filter multiple playlists. To filter by a playlist id, a subquery is created that will match the media ids with the playlists within the playlist_tracks table. Signed-off-by: flyingOwl <ofenfisch@googlemail.com>

flyingOwl · 2024-01-13T11:19:35Z

Alright. I've added tests for the operators. Actually, I expected some full-grown database test setup where I would have to add all kinds of tests for various cases. So I'm glad it was that easy 😃

flyingOwl marked this pull request as draft September 25, 2022 08:34

flyingOwl force-pushed the add-a-playlist-field/1417 branch from 7ecf9e9 to d5f970b Compare September 25, 2022 11:30

flyingOwl marked this pull request as ready for review September 25, 2022 11:40

flyingOwl force-pushed the add-a-playlist-field/1417 branch from d5f970b to 578116e Compare September 27, 2022 14:57

flyingOwl force-pushed the add-a-playlist-field/1417 branch from 578116e to 292079a Compare September 28, 2022 07:02

deluan force-pushed the master branch from a0971c5 to 03640ca Compare November 29, 2022 19:48

deluan force-pushed the master branch from 080ed8f to dad4949 Compare February 5, 2023 05:58

upsuper reviewed Feb 12, 2023

View reviewed changes

upsuper reviewed Feb 13, 2023

View reviewed changes

flyingOwl force-pushed the add-a-playlist-field/1417 branch from 292079a to 2f9b90f Compare May 7, 2023 09:11

flyingOwl force-pushed the add-a-playlist-field/1417 branch from 2f9b90f to e7b7e2d Compare May 20, 2023 09:24

deluan added the keep label Nov 26, 2023

deluan mentioned this pull request Nov 26, 2023

Smart Playlists #1417

Open

25 tasks

flyingOwl force-pushed the add-a-playlist-field/1417 branch from e7b7e2d to dbcc860 Compare January 13, 2024 11:14

deluan merged commit dfa453c into navidrome:master Jan 20, 2024
6 checks passed

flyingOwl deleted the add-a-playlist-field/1417 branch January 21, 2024 09:58

ScaldedWilder mentioned this pull request Mar 14, 2024

Support Navidrome Smart Playlist inPlaylist/notInPlaylist jeffvli/feishin#547

Closed

feat: Add playlist field to smart playlists - #1417 #1884

feat: Add playlist field to smart playlists - #1417 #1884

Conversation

flyingOwl commented Sep 24, 2022

flyingOwl commented Sep 25, 2022

flyingOwl commented Sep 25, 2022

deluan commented Sep 27, 2022

flyingOwl commented Sep 28, 2022

Choose a reason for hiding this comment

kgarner7 Mar 1, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

upsuper commented Feb 12, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

upsuper Mar 1, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kgarner7 Mar 1, 2023 • edited

Choose a reason for hiding this comment

upsuper Mar 1, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

upsuper commented Feb 13, 2023

kgarner7 commented Mar 1, 2023

deluan commented Mar 30, 2023

upsuper commented Mar 30, 2023

upsuper commented Mar 30, 2023

kgarner7 commented Mar 31, 2023

deluan commented Apr 28, 2023

flyingOwl commented Apr 30, 2023

flyingOwl commented May 7, 2023

upsuper commented May 7, 2023

deluan commented May 17, 2023

github-actions bot commented May 17, 2023 • edited

flyingOwl commented May 17, 2023

deluan commented May 19, 2023

flyingOwl commented May 20, 2023

deluan commented Jun 16, 2023

flyingOwl commented Dec 1, 2023

deluan commented Dec 2, 2023

flyingOwl commented Jan 13, 2024

kgarner7 Mar 1, 2023 •

edited

upsuper commented Feb 12, 2023 •

edited

upsuper Mar 1, 2023 •

edited

kgarner7 Mar 1, 2023 •

edited

upsuper Mar 1, 2023 •

edited

github-actions bot commented May 17, 2023 •

edited