New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tokenizer separator regex #424
Comments
Hi @StarfallProjects - just to make sure I understand you correctly, you essentially want to split up Tweaking If my understanding is correct and you want to use lunr for this, I'd recommend providing a custom, camel-case-aware tokenizer function to the lunr builder - I think you'd have better luck with that. However, I've noticed a few of your issues seem to revolve around searching source code, and I'm not sure lunr is particularly well-suited to that. Is there a specific reason you chose lunr? |
Thanks for the advice. |
Ah, I see - so it might be unreasonable to just yank lunr.js out and replace it with something entirely different! With that in mind, I'd recommend trying to write the custom, camel-case-aware tokenizer I suggested above, and see how that works for you! |
Thanks! |
Hi! I am adding some custom regex to our tokenizer.separator:
lunr.tokenizer.separator = /[\s\-\.\(\)\[\]+\A-Z]/;
According to regex101.com, [\s-.()[]+\A-Z] will match all the capital letters in DeleteStreamAsync. I would like it to split on those, so that, for example, "DeleteStream" will return results.
Everything else is matching fine. For example, the stuff I added so that DeleteStreamAsync(someParams) would be returned when searching DeleteStreamAsync. So it's splitting on ( at least. It just doesn't seem to like the A-Z.
Any suggestions/info much appreciated.
The text was updated successfully, but these errors were encountered: