Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

39 wildcard searching #41

Merged
merged 10 commits into from
Nov 20, 2021
Merged

39 wildcard searching #41

merged 10 commits into from
Nov 20, 2021

Conversation

mikegoatly
Copy link
Owner

Implementation of wildcard searching #39

This introduces some breaking changes:

IFullTextIndex

  • New Search(IQuery) overload on IFullTextIndex

IIndexNavigator

  • New method CreateBookmark that allows the current state of the navigator to be captured and subsequently reapplied using IIndexNavigatorBookmark.Apply
  • EnumerateIndexedTokens will throw an exception if a bookmark has been applied during the usage of the navigator.
  • New method EnumerateNextCharacters that enumerates any characters that can be navigated to from the navigators current position

ITokenizer

  • New method Normalize(ReadOnlySpan<char>) - normalizes the given text according to any input preprocessors, but doesn't apply any additional tokenization rules, e.g. stemming.
  • Tokenizing will no longer split on % characters - these are considered part of a wildcard search.

Removals

  • IWordQueryPart - any types implement it only need implement IQueryPart
  • StartsWithWordQueryPart - redundant now that WildcardQueryPart exists

@mikegoatly mikegoatly changed the base branch from master to v3.0.0 November 20, 2021 15:55
@mikegoatly
Copy link
Owner Author

I'm rebasing this onto a new v3 branch due to the breaking changes. I want to see if the breaking changes are enough to support some form of Levenshtein distance fuzzy matching before completing the version release.

@mikegoatly mikegoatly marked this pull request as ready for review November 20, 2021 16:03
@mikegoatly
Copy link
Owner Author

Note to self: I think some performance work is going to be needed for wildcard searches that return lots of results, especially when multiple wildcard are included in the query

@mikegoatly mikegoatly closed this Nov 20, 2021
@mikegoatly mikegoatly reopened this Nov 20, 2021
@mikegoatly mikegoatly merged commit 56f45e9 into v3.0.0 Nov 20, 2021
@mikegoatly mikegoatly deleted the 39-wildcard-searching branch November 20, 2021 16:07
mikegoatly added a commit that referenced this pull request Feb 6, 2022
* Added methods on index navigator to support backtracking and peeking next characters
* Added multi character wildcard matching
* Wildcard query parsing
* Bumped major version due to breaking changes
* Added Search overload to index
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant