feat(search): very basic email multi term#1951
Merged
Conversation
first pass, will clean up later
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Email now supports multi term search similar to gmail's api (not exactly there but better than before). This is by default an AND operation over various fields of the email body. Mismatches with gmail's api, like fuzziness, stopword removal, attachment indexing still exist, but we can improve on the algorithm later once we have this basic version.
In order to do this, I deprecate terms field in the serach service request in favor of a single query, so that the backend handles the terms splitting, with double quotes wrapping a term. So for non-email indices, this will operate the same as before with or without double quotes. For emails, if there's a single term it will operate as before (splitting content search in opensearch and subject serach in postgres), and if there's multiple terms it will try to do everything in opensearch so we can take advantage of the multi-term AND logic that doesn't require a difficult joining across different sources of truth. Email messages also benefit from having all their content indexed as a single opensearch entry, whereas markdown docs are currently stored with separate entries per paragraph (node id).
there's a lot more to be done here but this will come in a future pr