-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(server): add filename search #6394
Conversation
`(e."exifTextSearchableColumn" || COALESCE(si."smartInfoTextSearchableColumn", to_tsvector('english', ''))) | ||
@@ PLAINTO_TSQUERY('english', :query)`, | ||
{ query }, | ||
).orWhere('asset."originalFileName" = :path', { path: path.parse(query).name }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is where "path parsing" should be happening. It seems like something that should happen in the domain and be passed as a specific input into this method instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that would be nicer. But, I'm not sure there's a way to detect a path at the call-site? The string "samsung" could equally be a path or a model. Ideally, the search input would support richer queries like path:samsung
but I think that kind of change is a bit of a scope creep for a bug fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with keeping this in the repo for now. It's unsound to rely on path.parse
for non-path queries, and we don't have a way to know whether a query is searching for a path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! Love that you added e2e tests
Hi! I just wanted to chime in by suggesting a relatively simple fifth approach: a GIN (or GiST) index with trigrams, so you can have super fast queries by arbitrary substrings with standard SELECT * FROM "assets" WHERE "originalPath" ILIKE '%IMG_275%'; |
Interesting idea! I briefly read the trigram index docs but didn't look further for two reasons:
Either way, it seems reasonable to explore further, especially if there are folks especially interested in having full-path search rather than just filename. |
Hello, I still can't get results using both |
Fixes #5982.
There are basically three options:
originalFileName
by dropping a file extension from the query (if present). Lower fidelity but very easy - just a standard index & equality.originalPath
by adding an index onreverse(originalPath)
and usingstarts_with(reverse(query) + "/", reverse(originalPath)
. A weird index & query but high fidelity.originalFileNameWithExtension
or something. More storage, kinda jank.TBH, I think (1) is good enough and easy to make better in the future. For example, if I search "DSC_4242.jpg", I don't really think it matters if "DSC_4242.mov" also shows up.
edit: There's a fourth approach that we discussed a bit in Discord and decided we could switch to it in the future: using a GIN. The minor issue is that Postgres doesn't tokenize paths in a useful (they're a single token and it won't match against partial components). We can solve that by tokenizing it ourselves. For example:
The query is also tokenized with the 'split-by-slash-join-with-space' strategy. This strategy results in
IMG_275.JPG
,2015
,sushain
andlibrary/sushain
matching. But,08
andIMG_275
do not match. The former is because the token is-08
and the latter because theimg_275.jpg
token is matched against exactly.