Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wildcard searching #39

Closed
mikegoatly opened this issue Oct 31, 2021 · 3 comments
Closed

Wildcard searching #39

mikegoatly opened this issue Oct 31, 2021 · 3 comments
Milestone

Comments

@mikegoatly
Copy link
Owner

mikegoatly commented Oct 31, 2021

Currently you can only use a wildcard operator (*) to query for words starting with a fragment of text, e.g.

Search Example matches
foo* food foolish foot

This proposal is to extend wildcard searching in two ways:

  1. Add to support the * operator anywhere in a word search to match any number of characters
  2. Add the '%' operator to match a single character

Examples:

Search Example matches
f*d food feed fiend fad
%ish fish dish wish
%%cket bucket locket
*cket cricket locket thicket ticket
wi* wink win window

The current StartsWithWordQueryPart implementation will become deprecated in favour of this new implementation as it provides a strict subset of the functionality proposed here.

Proposal for general rules:

  • Multi-character wildcards (*):
    • When between two text patterns (f*d): Match zero or more characters between the end of the first text and start of the following text. Only tokens that end with the second text will be returned.
    • When appearing at the end of some search text (f*): Any tokens appearing after the first text will be returned. This is the same behaviour as the old "starts with" operator.
    • More than one multi-character wildcard can be used in a query, e.g. f*o*d
    • Multiple sequential multi-character wildcards will be reduced to a single wildcard, e.g. w**n will be reduced to w*n. The two are semantically identical, so this doesn't matter.
  • Single character wildcards (%)
    • Can appear anywhere in the search text, at the start, middle or end
    • Single character wildcards can appear sequentially to indicate a fixed number of substitute letters, e.g. f%%d will match f followed by any two characters and then a d.
    • Single character wildcards immediately preceding a multi-character wildcard can be used to require that n or more characters are matched, e.g. c%%%* would match cake and cakes but not cat, because at least 3 characters are required after the c.
    • Any single character wildcards immediately following a multi-character wildcard will cause an error to be thrown (e.g. d*%%, Semantically f* and f*%% are very different, so we can't collapse them - the first can have any number of characters following the f, whereas the second can have any number of characters, but at least two at the end. I currently think that implementing this increase search complexity significantly, but it's possible it could be implemented at a later stage.
@mikegoatly mikegoatly mentioned this issue Oct 31, 2021
@mikegoatly
Copy link
Owner Author

I'm currently in two minds about the collapsing of wildcard characters vs throwing an exception. Because the semantics of the collapsed version differ from the un-collapsed version, I'm leaning towards throwing an exception. That will allow for future versions to correctly handle the query in a semantically correct way without breaking backwards compatibility with something that was relying on incorrect behaviour. I've updated the description to account for this, but I'm open to discussion on it.

@mikegoatly mikegoatly added this to the v3.0.0 milestone Jan 24, 2022
@mikegoatly
Copy link
Owner Author

Note to self: Docs need updating for this new syntax

@mikegoatly mikegoatly mentioned this issue Feb 6, 2022
@mikegoatly
Copy link
Owner Author

Implemented in v3 - docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant