Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handling of unexpected types in SetPredicate #11312

Merged
merged 12 commits into from
Sep 2, 2022

Conversation

shadeMe
Copy link
Contributor

@shadeMe shadeMe commented Aug 15, 2022

Description

The IN and NOT_IN set predicates in Matcher would raise an unhandled exception when matching against attribute values that are non-hashable (like lists). This is now resolved by performing some defensive type checking of the attribute values prior to calling the Python in operator.

Going forward, if any set predicate encounters a value of an unexpected type, it will fail early, i.e., return False, instead of crashing.

Types of change

Bug fix.

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

`SetPredicate`: Emit warning and return `False` on unexpected value types
@shadeMe shadeMe added bug Bugs and behaviour differing from documentation feat / matcher Feature: Token, phrase and dependency matcher labels Aug 15, 2022
@adrianeboyd
Copy link
Contributor

Thinking about this some more, I wonder if a warning is overkill and it would be better to just not match. If I remember correctly, it's somewhat difficult to filter this kind of warning that includes the values because filterwarnings treats each one as a separate warning message.

I do think it's definitely overkill to add a separate method just for the warning here.

@shadeMe
Copy link
Contributor Author

shadeMe commented Aug 16, 2022

Ah, good catch - That's an oversight from an older iteration of the implementation where the warning needed to be emitted at multiple locations. I can inline the call.

Thinking about this some more, I wonder if a warning is overkill and it would be better to just not match. If I remember correctly, it's somewhat difficult to filter this kind of warning that includes the values because filterwarnings treats each one as a separate warning message.

According to the docs, it seems like the message parameter passed to the filter is a regular expression. This ought to allow for filtering based on the warning code, which would be the same for permutations of the warning string.

But I too wonder if a warning is what we want, as this is something we should to enforce/implement consistently, i.e., emit the similar warnings about incompatible types in other predicates.

@svlandeg
Copy link
Member

According to the docs, it seems like the message parameter passed to the filter is a regular expression. This ought to allow for filtering based on the warning code, which would be the same for permutations of the warning string.

The regular expression is used by warnings to define the match, but if you filter with action once it will still show each and every single variation of that error code on the console once. (we battled with this before)

@shadeMe
Copy link
Contributor Author

shadeMe commented Aug 17, 2022

The regular expression is used by warnings to define the match, but if you filter with action once it will still show each and every single variation of that error code on the console once. (we battled with this #9438)

Ah, I see. I'll just remove the warning then - seems like it's more trouble than it's worth.

@shadeMe shadeMe mentioned this pull request Aug 19, 2022
3 tasks
spacy/tests/matcher/test_matcher_api.py Outdated Show resolved Hide resolved
spacy/tests/matcher/test_matcher_api.py Outdated Show resolved Hide resolved
spacy/tests/matcher/test_matcher_api.py Outdated Show resolved Hide resolved
@adrianeboyd adrianeboyd merged commit d1760eb into explosion:master Sep 2, 2022
@shadeMe shadeMe deleted the fix/matcher-ext-in-predicate branch September 2, 2022 07:57
polm pushed a commit to polm/spaCy that referenced this pull request Sep 6, 2022
* `Matcher`: Better type checking of values in `SetPredicate`
`SetPredicate`: Emit warning and return `False` on unexpected value types

* Rename `value_type_mismatch` variable

* Inline warning

* Remove unexpected type warning from `_SetPredicate`

* Ensure that `str` values are not interpreted as sequences
Check elements of sequence values for convertibility to `str` or `int`

* Add more `INTERSECT` and `IN` test cases

* Test for inputs with multiple characters

* Return `False` early instead of using a boolean flag

* Remove superfluous `int` check, parentheses

* Apply suggestions from code review

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Appy suggestions from code review

* Clarify test comment

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
@shadeMe shadeMe mentioned this pull request Sep 6, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs and behaviour differing from documentation feat / matcher Feature: Token, phrase and dependency matcher
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants