Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement full text search #9197

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fhriley
Copy link
Contributor

@fhriley fhriley commented Jan 28, 2023

Changes
This implements a true full text search using sqlite's built in fts5 extension. Currently, jellyfin is trying to implement full text search by doing some sql trickery, but it doesn't support things like keyword searches. For example, it doesn't support a search like 90 day what now, which in a FTS would return 90 Day Fiancé - What Now! (2017).

This new implementation supports FTS with three different search types, of which one can be specified by the client: Phrase, Prefix, and Keyword. It defaults to Prefix, which makes it backwards compatible with the existing search. In addition, this implementation is using the Porter tokenizer, which implements the Porter stemming algorithm. (EDIT: I took out the Porter tokenizer. It can give confusing results.) The fts5 extension provides a rank for each search so the results are returned in rank order.

This should require no intervention on the user's part. On startup, the user's FTS index is seeded automatically if their index is empty. Real time FTS index updates are achieved using db triggers.

Two changes were made from the existing search

  1. Tags are not included in the FTS index. It doesn't make sense to include it because a FTS plus tag search can be done as follows: searchTerm=<search term>&tags=<tags>.

  2. ProviderIds is in the FTS index. This means a search like the following will return the item of interest (if it exists): searchTerm=Tmdb%3D61575 or searchTerm=Tmdb+61575

@github-actions
Copy link

github-actions bot commented Jan 28, 2023

Changes in OpenAPI specification found. Expand to see details.

What's Changed


GET /Search/Hints
Parameters:

Added: searchType in query

Optional. Set the type of full text search to do. Defaults to "Prefix".

Changed: searchTerm in query

Filter based on a full text search using this search term.

GET /Artists
Parameters:

Added: searchType in query

Optional. Set the type of full text search to do. Defaults to "Prefix".

Changed: searchTerm in query

Optional. Filter based on a full text search using this search term.

GET /Artists/AlbumArtists
Parameters:

Added: searchType in query

Optional. Set the type of full text search to do. Defaults to "Prefix".

Changed: searchTerm in query

Optional. Filter based on a full text search using this search term.

GET /Genres
Parameters:

Added: searchType in query

Optional. Set the type of full text search to do. Defaults to "Prefix".

Changed: searchTerm in query

Optional. Filter based on a full text search using this search term.

GET /Items
Parameters:

Added: searchType in query

Optional. Set the type of full text search to do. Defaults to "Prefix".

Changed: searchTerm in query

Optional. Filter based on a full text search using this search term.

GET /Users/{userId}/Items
Parameters:

Added: searchType in query

Optional. Set the type of full text search to do. Defaults to "Prefix".

Changed: searchTerm in query

Optional. Filter based on a full text search using this search term.

GET /Users/{userId}/Items/Resume
Parameters:

Added: searchType in query

Optional. Set the type of full text search to do. Defaults to "Prefix".

Changed: searchTerm in query

Optional. Filter based on a full text search using this search term.

GET /MusicGenres
Parameters:

Added: searchType in query

Optional. Set the type of full text search to do. Defaults to "Prefix".

Changed: searchTerm in query

Optional. Filter based on a full text search using this search term.

GET /Persons
Parameters:

Changed: searchTerm in query

Optional. The search term.

GET /Studios
Parameters:

Added: searchType in query

Optional. Set the type of full text search to do. Defaults to "Prefix".

Changed: searchTerm in query

Optional. Filter based on a full text search using this search term.

GET /Trailers
Parameters:

Added: searchType in query

Optional. Set the type of full text search to do. Defaults to "Prefix".

Changed: searchTerm in query

Optional. Filter based on a full text search using this search term.

@fhriley fhriley changed the title Implement full text search WIP: Implement full text search Jan 28, 2023
@fhriley fhriley changed the title WIP: Implement full text search Implement full text search Jan 28, 2023
@fhriley fhriley force-pushed the full_text_search branch 5 times, most recently from a02c137 to d73052b Compare January 29, 2023 00:19
Jellyfin.Api/Controllers/SearchController.cs Outdated Show resolved Hide resolved
Jellyfin.Data/Enums/FullTextSearchType.cs Outdated Show resolved Hide resolved
@fhriley fhriley requested a review from barronpm January 29, 2023 01:07
@fhriley fhriley changed the title Implement full text search WIP: Implement full text search Jan 29, 2023
@fhriley
Copy link
Contributor Author

fhriley commented Jan 29, 2023

I discovered that VACUUM can change rowids so I won't be able to use a contentless table. I've marked this WIP as I make this change.

@fhriley fhriley changed the title WIP: Implement full text search Implement full text search Jan 29, 2023
@fhriley
Copy link
Contributor Author

fhriley commented Jan 29, 2023

All good. It's now using a content table and linking through the guid so if the rowid changes it won't matter.

@fhriley
Copy link
Contributor Author

fhriley commented Jan 29, 2023

Found issues with input sanitization. Back to WIP.

@fhriley fhriley changed the title Implement full text search WIP: Implement full text search Jan 29, 2023
@fhriley fhriley changed the title WIP: Implement full text search Implement full text search Jan 29, 2023
@fhriley
Copy link
Contributor Author

fhriley commented Jan 29, 2023

Should be ready for review. It is now not possible to use the search term in an unsanitized form.

@DomiStyle
Copy link
Contributor

How are the fields in _fullTextSearchColumns combined when searching?

Looking at the code if seems like you cannot search for the year (PremiereDate) at the moment?
This feature would be perfect if you could search by name + year like berserk 1997.

But of course this should only work in combination with the title because searching for 2012 should return the movie first and not every movie ever released in 2012. Could probably work around that by prioritizing name matches before the year?
Would probably also have to get only the year from PremiereDate since most likely nobody searches for exact dates.

@jellyfin-bot jellyfin-bot added the merge conflict Merge conflicts should be resolved before a merge label Feb 4, 2023
@fhriley
Copy link
Contributor Author

fhriley commented Feb 4, 2023

The question is, do you add things like year, tags, etc to the FTS index? The more you add to the index, the more false positives you will get. The backend already supports searching by those so I lean towards no. For example, a FTS title and year search can be done like this: searchTerm=<title>&years=<year>. My opinion is this is a frontend problem to solve. It needs a simple grammar to translate user input into backend API calls. For example, this search some title years:2007 tags:action,comedy could become this backend call: searchTerm=some+title&years=2007&tags=action,comedy.

@DomiStyle
Copy link
Contributor

The question is, do you add things like year, tags, etc to the FTS index?

No, I don't think those fields have to be indexed, they just have to be included in the search in a smart way.

My opinion is this is a frontend problem to solve.

I think Jellyfin should provide a server side way to search through the API that works in every client.

For example, Musicbrainz allows searching in a way similar to what you described through its API: https://musicbrainz.org/doc/Indexed_Search_Syntax

There's also IMDb, TheTVDB any many others which have no issues with fuzzy searching combined fields like berserk 1997 or jojo 2012.

I think it would be awesome if Jellyfin could also recognize years at the end of a query and search for the combination of search term and year as well as the normal search.

Of course both of these features are for a follow-up PR, but I wanted to mention it before this is merged.

@c4mz
Copy link

c4mz commented Sep 30, 2023

FYI, I'm not sure if my example is align with your PR but, if I search for the TV series "Dark", it does not show up in the results. I can't get it to show up at all.

@fogolin

This comment was marked as off-topic.

@JPVenson JPVenson added stale Stale and will be closed if no activity occurs needs testing area:database Issues relating to the server databases and/or database access feature Adding a new feature, or substantial improvements on existing functionality labels Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:database Issues relating to the server databases and/or database access feature Adding a new feature, or substantial improvements on existing functionality merge conflict Merge conflicts should be resolved before a merge needs testing stale Stale and will be closed if no activity occurs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants