Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(server): add filename search #6394

Merged
merged 4 commits into from
Jan 15, 2024

Conversation

sushain97
Copy link
Contributor

@sushain97 sushain97 commented Jan 15, 2024

Fixes #5982.

There are basically three options:

  1. Search originalFileName by dropping a file extension from the query (if present). Lower fidelity but very easy - just a standard index & equality.
  2. Search originalPath by adding an index on reverse(originalPath) and using starts_with(reverse(query) + "/", reverse(originalPath). A weird index & query but high fidelity.
  3. Add a new generated column called originalFileNameWithExtension or something. More storage, kinda jank.

TBH, I think (1) is good enough and easy to make better in the future. For example, if I search "DSC_4242.jpg", I don't really think it matters if "DSC_4242.mov" also shows up.

edit: There's a fourth approach that we discussed a bit in Discord and decided we could switch to it in the future: using a GIN. The minor issue is that Postgres doesn't tokenize paths in a useful (they're a single token and it won't match against partial components). We can solve that by tokenizing it ourselves. For example:

immich=# with vecs as (select to_tsvector('simple', array_to_string(string_to_array('upload/library/sushain/2015/2015-08-09/IMG_275.JPG', '/'), ' ')) as vec)  select * from vecs where vec @@ phraseto_tsquery('simple', array_to_string(string_to_array('library/sushain', '/'), ' '));
                                      vec
-------------------------------------------------------------------------------
 '-08':6 '-09':7 '2015':4,5 'img_275.jpg':8 'library':2 'sushain':3 'upload':1
(1 row)

The query is also tokenized with the 'split-by-slash-join-with-space' strategy. This strategy results in IMG_275.JPG, 2015, sushain and library/sushain matching. But, 08 and IMG_275 do not match. The former is because the token is -08 and the latter because the img_275.jpg token is matched against exactly.

@sushain97 sushain97 changed the title Add filename search fix(server): add filename search Jan 15, 2024
`(e."exifTextSearchableColumn" || COALESCE(si."smartInfoTextSearchableColumn", to_tsvector('english', '')))
@@ PLAINTO_TSQUERY('english', :query)`,
{ query },
).orWhere('asset."originalFileName" = :path', { path: path.parse(query).name });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is where "path parsing" should be happening. It seems like something that should happen in the domain and be passed as a specific input into this method instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that would be nicer. But, I'm not sure there's a way to detect a path at the call-site? The string "samsung" could equally be a path or a model. Ideally, the search input would support richer queries like path:samsung but I think that kind of change is a bit of a scope creep for a bug fix.

Copy link
Contributor

@mertalev mertalev Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with keeping this in the repo for now. It's unsound to rely on path.parse for non-path queries, and we don't have a way to know whether a query is searching for a path.

Copy link
Contributor

@mertalev mertalev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Love that you added e2e tests

@alextran1502 alextran1502 merged commit 7fc1954 into immich-app:main Jan 15, 2024
22 of 23 checks passed
@vibragiel
Copy link

Hi! I just wanted to chime in by suggesting a relatively simple fifth approach: a GIN (or GiST) index with trigrams, so you can have super fast queries by arbitrary substrings with standard ILIKE syntax:

SELECT * FROM "assets" WHERE "originalPath" ILIKE '%IMG_275%';

@sushain97
Copy link
Contributor Author

Hi! I just wanted to chime in by suggesting a relatively simple fifth approach: a GIN (or GiST) index with trigrams, so you can have super fast queries by arbitrary substrings with standard ILIKE syntax:

SELECT * FROM "assets" WHERE "originalPath" ILIKE '%IMG_275%';

Interesting idea! I briefly read the trigram index docs but didn't look further for two reasons:

  1. I'm not sure that it makes sense to match partial path components. Maybe users can choose by searching by adding /s if they want full components?
  2. It seemed like an module that wasn't being in immich yet - not a reason to avoid it of course but I didn't see anything to pattern match on.

Either way, it seems reasonable to explore further, especially if there are folks especially interested in having full-path search rather than just filename.

@jkumeboshi
Copy link

Hello, I still can't get results using both m:filename and m:description.
Should I've to re-run some jobs to reindex something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Search by m:filename returns nothing
6 participants