Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search: Omit full text index if query is too short #1517

Closed
alexislefebvre opened this issue Sep 9, 2021 · 11 comments
Closed

Search: Omit full text index if query is too short #1517

alexislefebvre opened this issue Sep 9, 2021 · 11 comments
Assignees
Labels
enhancement Refactoring, improvement or maintenance task released Available in the stable release

Comments

@lastzero
Copy link
Member

lastzero commented Sep 9, 2021

Stopwords, one or two letter Latin "words" and purely numeric values will be ignored when searching because they are not indexed (for the obvious reason). So it's not known if there are results for those or not. Any specific reason you search for zz01?

@alexislefebvre
Copy link
Contributor Author

Stopwords, one or two letter Latin "words" and purely numeric values will be ignored when searching because they are not indexed (for the obvious reason). So it's not known if there are results for those or not.

Could you please explain the obvious reason? Why ignore the numbers?

Any specific reason you search for zz01?

I restarted the indexing to detect faces, I noticed a filename with only numbers in the logs, so I wanted to see that file, so I searched it. The tests with zz… were to try to understand what was happening. I could have used the filters name or original though.

@lastzero
Copy link
Member

lastzero commented Sep 9, 2021

Because it's a FULL TEXT index, we index words, not numbers. You can find single letters or numbers in almost any file, so it's not useful for filtering / searching and blows up your index / database on top of it, leading to bad performance. If you search for specific fields like the file name, this does not apply.

@lastzero
Copy link
Member

lastzero commented Sep 9, 2021

We could in fact automatically do a filename wildcard search for queries like zz01. Would that make sense and match your expectations?

@alexislefebvre
Copy link
Contributor Author

We can already use the name filter for searching by filename, I think it's the good way. And I think that doing automatic search would be confusing for users.

I rephrased my request: As an end user, if I search something and it doesn't match with photos (because of the way the database is indexed), I expect to see a message like “No results” and/or a message that explain why it's invalid.

@lastzero
Copy link
Member

We don't know if it matches because there is no index. Compare to MySQL:

The default minimum length of words that are found by full-text searches is three characters for InnoDB search indexes, or four characters for MyISAM

@lastzero
Copy link
Member

In fact we had a "query too short" error, but the limitation specifically applies to Latin characters: There is no length limitation for Chinese etc

@lastzero lastzero self-assigned this Sep 10, 2021
@lastzero lastzero added the enhancement Refactoring, improvement or maintenance task label Sep 10, 2021
@lastzero lastzero changed the title Some searches return all the files Search: Show error if query is too short Sep 10, 2021
@lastzero lastzero changed the title Search: Show error if query is too short Search: Omit full text index if query is too short Sep 17, 2021
@lastzero lastzero added the please-test Ready for acceptance test label Sep 17, 2021
@graciousgrey graciousgrey added released Available in the stable release and removed please-test Ready for acceptance test labels Sep 26, 2021
@alexislefebvre
Copy link
Contributor Author

alexislefebvre commented Sep 26, 2021

I think that there's still an issue for end users: is a string with 5 characters treated differently?

@lastzero
Copy link
Member

True, because zzz is long enough to be a word, but it's on our stoplist as it's not a word - so the index couldn't contain it, even when it exists. We could also detect this special case and look in file names only in a future release.

@alexislefebvre
Copy link
Contributor Author

alexislefebvre commented Sep 27, 2021

Thanks for your answer, ali show all the results but bob doesn't. Because it's in that list?

lastzero added a commit that referenced this issue Sep 29, 2021
Default to photo name when search term is too short or on the stop list.
Search full text index otherwise, which now include names of people
(requires reindexing).
@lastzero
Copy link
Member

Started a new Development Preview build including the following changes:

  • People's names will be added to the full text index in the background and when editing photos.
  • Keep in mind index updates may take a while, they don't happen in real time.
  • Using the full text index means names matching a stopword like "img" won't be found.
  • If the query is too short for the full text index or matches a stopword, a photo name wildcard search will be performed instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Refactoring, improvement or maintenance task released Available in the stable release
Projects
Status: Release 🌈
Development

No branches or pull requests

3 participants