-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to restrict fuzzy search #18
Comments
Hi again @kaykhancheckpoint. The docs actually make this unclear, but the current version of spaczz on pypi (v0.1.1) actually still uses fuzzywuzzy instead of rapidfuzz, but that will change in the next release (v0.2.0 which will also include the ent_id enhancement you asked for). Regarding your current ask however, fuzzywuzzy vs rapidfuzz shouldn't be an issue. Spaczz's fuzzy matching optional kwargs already expose two fuzzy ratio cutoffs (see If you want a minimum fuzzy ratio of 95 like you're asking for, in your patterns = [
{'label': 'PERSON', 'pattern': 'DrDisrespect', 'type': 'fuzzy'},
{'label': 'PERSON', 'pattern': 'JZRyoutube', 'type': 'fuzzy', 'kwargs': [{'min_r2': 95}]}
] If you wanted to change the default minimum fuzzy ratio for all fuzzy matches to 95 you could instantiate the SpaczzRuler like follows: ruler = SpaczzRuler(nlp, spaczz_fuzzy_defaults={'min_r2': 95}) Then you wouldn't have to add the kwargs to each pattern. The current methods to optimize fuzzy matches in spaczz are available through the optional kwargs you can pass to patterns (the keyword arguments in There is more granular match filtering I would like to implement, but that is already part of issue #14 (will provide more details in that issue soon). Due to the fact that that issue already exists and methods for solving this issue are already implemented in spaczz, I'm going to close this issue. If you feel that you cannot solve your current issue with the methods I've outlined, and issue #14 will not address them, please let me know. Thanks. |
Is it possible to restrict the fuzzy search because in my example it is returning unwanted entities.
The unwanted entity here is ('YouTube', 'PERSON'), is there some way to restrict the fuzzy search so that it does not identify YouTube in the text to be a person?
Full Code:
EDIT:
i noticed rapidfuzz library provides a
score_cutoff
as a parameter im looking to set this to 95 so it's strict. I was hoping something like this could be exposed.The text was updated successfully, but these errors were encountered: