Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: custom stemmers #82

Closed
Dissimilis opened this issue Oct 12, 2023 · 2 comments
Closed

Suggestion: custom stemmers #82

Dissimilis opened this issue Oct 12, 2023 · 2 comments
Labels
enhancement New feature or request
Milestone

Comments

@Dissimilis
Copy link

Judging by the code this.stemmer = new PorterStemmer(); it looks like implementing and passing my own stemmer is impossible.

It should be trivial to make API changes allowing to assign custom stemmer in TokenizationOptions. But maybe IStemmer would need more thoughts on the design.

P.S. this.stemmer = new PorterStemmer(); is a nice illustration of new is glue :)

@mikegoatly
Copy link
Owner

Thanks for the suggestion! Yeah, at the moment only Porter stemming is supported - the IStemmer interface is internal because it hasn't currently been designed with extensibility in mind.

You raise an interesting point though; there are other stemming algorithms, not least so that words from languages other than English can be stemmed effectively.

It's definitely something to think about...

@mikegoatly mikegoatly added the enhancement New feature or request label Dec 29, 2023
@mikegoatly mikegoatly added this to the v6 milestone Dec 29, 2023
@mikegoatly mikegoatly mentioned this issue Dec 29, 2023
7 tasks
@mikegoatly
Copy link
Owner

Custom stemming will be available in v6

mikegoatly added a commit that referenced this issue Jan 16, 2024
Among other things...

* Use latest C# version
* Added support for bracketed field names #76
* Added field score boosting #72 (#83)
* Added field score boosting #72
* Added score boosting query syntax #72
* Add .NET 8 as a target
* Item score boosting (#95)
* Allow characters to be escaped in query syntax #85
* Removing ImmutableCollections (#97)
* Speed up field collection prior to scoring (#102)
* Added support for adding custom stemmers #82 (#103)
* Apply field filters while collecting results
* Filter documents at navigator level #105
* Added query part weight calculations #105
Refactor query match collection primitives
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants