-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added rules for validating that tables have certain columns and for c… #1298
Conversation
0a398a5
to
a609965
Compare
ibis/expr/rules.py
Outdated
| Parameters | ||
| ---------- | ||
| satisfying : iterable | ||
| An iterable of column rules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah crap, have to finish this docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. 👍
ibis/expr/rules.py
Outdated
| self.doc = doc | ||
| self.validator = validator | ||
|
|
||
| def _validate(self, args, i): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cpcloud I just realized that with the current implementation a table can pass validation simply by having enough columns to pass the rules. We can of course control the amount of columns with lambda rules, but this might be counterintuitive. Maybe we should have an argument like allow_extra, which changes the behavior for allowing columns that do not match a rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented this.
ibis/expr/rules.py
Outdated
| An iterable of lambda expressions, which will be used to validate | ||
| arguments. Each lambda expression must take a table as its single | ||
| argument, validate, and then return ``True`` for tables that | ||
| pass validation and ``False`` otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to have satisfying or schema (or both as you have them now)?
iow what if i don’t care about a specific schema
i could validate in satisfying (or do some general rule inside that)
|
I cleaned up the example a bit. Once tests pass I think this will be ready. |
ibis/expr/rules.py
Outdated
| validator : ??? | ||
| ??? | ||
| doc : ??? | ||
| ??? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cpcloud I couldn't figure what to do with these (the validator and doc args). I don't use them but stuff breaks if I don't set them to None. Maybe I could set them to None manually (without turning them into args), but first I want to make sure that it makes sense. Do you know how these are used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc lets you give a docstring to the argument:
In [1]: class Foo(ibis.expr.operations.ValueOp):
...: input_type = [ibis.expr.rules.double(name='foo', doc='bar bar double')]
...:
In [2]: x = Foo(1)
In [3]: x.a?
Object `x.a` not found.
In [4]: x.foo?
Type: property
String form: <property object at 0x7fedee9b2278>
Docstring: bar bar doubleThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like validators let you pass in your own validation function, see line 239 in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Should we leave it? Can't think of any use case for it, if users want a custom validator they should just write one, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bump @cpcloud
ibis/expr/rules.py
Outdated
| arguments. Each lambda expression must take a table as its single | ||
| argument, validate, and then return ``True`` for tables that | ||
| pass validation and ``False`` otherwise. | ||
| schema : iterable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mark as optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
ibis/expr/rules.py
Outdated
| The name of the table argument. | ||
| optional : bool | ||
| Whether this table argument is optional or not. | ||
| satisfying : iterable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mark as optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
ibis/expr/tests/test_rules.py
Outdated
| MyOp(123) | ||
|
|
||
|
|
||
| def test_table_invalid_satisfying(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test that doesn't pass either satisfying or schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were already some tests, but I just added more.
e2e9ff7
to
42f292c
Compare
ibis/expr/rules.py
Outdated
| # Check column schema | ||
| rules_matched = 0 | ||
| for column_rule in self.schema: | ||
| if not isinstance(column_rule, Argument): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some comments on the various sections on what is going on / what you are checking
ibis/expr/rules.py
Outdated
| ... lambda t: t.schema().types.count(dt.Int64()) >= 2], | ||
| ... )] | ||
| ... output_type = rules.type_of_arg(0) | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a Notes section about satisfying, schema, allow_extra can be mixed / matched to produce a validator and not all are needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
| Op(table) | ||
|
|
||
|
|
||
| @pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would separate the tests that raise from the ones that don't and don't use check_op_input at all. Much more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that and it was a mess, because the same schemas would be repeated twice, once for tests passing validation and those not passing.
I think it is waaay clearer to say "given this schema, here are things that pass and things that don't pass".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would they be repeated? There's a set of schemas for passing, and one for failing. This should definitely be separated into two tests: test_table_with_schema and test_table_with_invalid_schema. The True/False is hard to follow. If the name of the function indicates what's supposed to happen, then it's very clear what the test is testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, didn't explain this right. Indeed, the schemas for the tables being validated would not be repeated. Bu the schema we are comparing against will be repeated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would end up looking as:
@pytest.mark.parametrize(
"... many passing test cases here ...")
def test_table_with_schema():
"<validation code with a schema>"
@pytest.mark.parametrize(
"... failing test cases here ...")
def test_table_with_schema_failing():
with pytest.raises(IbisTypeError):
"<same validation code with the same schema>"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, did it like that. Its not so bad.
|
@jreback bump |
|
lgtm. needs a rebase. if @cpcloud is ok. |
|
I don't understand why the complexity of this is needed. As a possible strawman, what's wrong with doing the validation of the table in the operation itself, like we currently do with joins? Column rules are convenient because most operations are column operations and there are a ton of them, but there are very few operations that take tables as input and if they do, they are (and should be) unrestricted. Sure, we may want to write more operations that takes tables as input, but there aren't so many that having a generic table rule this complex is warranted. How about this? We define the following classmethod on the and that's it. I think we can get pretty far without things like I don't think there are any operations (yet) where something like Sound good? |
|
i agree this is more complex than needed atm a table that for example
|
|
I think the use case you have in mind for the third case--one or more double columns--can be solved by giving it a different API than one that would require such a rule. The |
|
The API with this pared down version would be: rules.table.with_column_subset([
rules.column(name='foo', value_type=dt.timestamp),
rules.column(value_type=int32)
])If what you meant by an int32 column was "exactly one int32 column", that would be another method |
|
the key is i need a conjunction of all of these in a particular table ; separately they r straightforward |
|
Having |
…olumn type checking.
…olumnValidator instance.
9a21f1a
to
5aa137f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. some minor comments
ci/requirements-dev-3.6.yml
Outdated
| @@ -26,7 +26,7 @@ dependencies: | |||
| - requests | |||
| - six | |||
| - sqlalchemy>=1.0.0,<1.1.15 | |||
| - thrift | |||
| - thrift>=0.10.0,<0.11.0 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this orthogonal change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually @DiegoAlbertoTorres something is strange, since @jreback's code is passing with this commit: 60c5806. You shouldn't need to update this since he didn't have to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, tests were refusing to pass (and still are).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got rid of the pin.
ibis/expr/tests/test_rules.py
Outdated
| ([('group', dt.int64), | ||
| ('value', dt.double), | ||
| ('value2', dt.double)], False)]) | ||
| def test_table_with_schema(schema, raises): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have an example of using with_coumn_subset with a rules that is NOT a column? (should raise)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just added one.
|
CI passes!! |
ibis/expr/rules.py
Outdated
|
|
||
| def _validate(self, args, i): | ||
| arg = args[i] | ||
| if isinstance(arg, ir.TableExpr): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let raise if it's not a table as the first thing, then we can remove a level of indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do this now, I think you commented on an outdated diff?
ibis/expr/rules.py
Outdated
| if not self.allow_extra: | ||
| # Count rules | ||
| rules = [x for x in self.schema if isinstance(x, Argument)] | ||
| if ((len(rules) != 0) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just say if rules here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outdated diff, this is gone!
ibis/expr/rules.py
Outdated
| column = arg[column_rule.name] | ||
| except IbisTypeError: | ||
| if column_rule.optional: | ||
| break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow the logic of using break here. This will exit the loop. Don't we want to check additional column rules that may not be optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is gone too
ibis/expr/rules.py
Outdated
| rules = [x for x in self.schema if isinstance(x, Argument)] | ||
| if ((len(rules) != 0) and | ||
| (len(arg.columns) > rules_matched)): | ||
| raise IbisTypeError('Extra columns not allowed.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "Additional columns beyond {} not allowed".format(list of names)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this
ci/requirements-dev-3.6.yml
Outdated
| @@ -26,7 +26,7 @@ dependencies: | |||
| - requests | |||
| - six | |||
| - sqlalchemy>=1.0.0,<1.1.15 | |||
| - thrift | |||
| - thrift>=0.10.0,<0.11.0 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually @DiegoAlbertoTorres something is strange, since @jreback's code is passing with this commit: 60c5806. You shouldn't need to update this since he didn't have to.
ibis/expr/rules.py
Outdated
| def __str__(self): | ||
| fields = {'name', 'doc', 'optional'} | ||
| return str({k: v for k, v in self.__dict__.items() | ||
| if (k in fields) and (v is not None)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need parens here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| @@ -801,3 +807,120 @@ def _validate(self, args, i): | |||
|
|
|||
| def comparable(left, right): | |||
| return ir.castable(left, right) or ir.castable(right, left) | |||
|
|
|||
|
|
|||
| @six.add_metaclass(abc.ABCMeta) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do this as:
class TableColumnValidator(six.with_metaclass(object, abc.ABCMeta)):
...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this like we said in person.
| Op(table) | ||
|
|
||
|
|
||
| @pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would they be repeated? There's a set of schemas for passing, and one for failing. This should definitely be separated into two tests: test_table_with_schema and test_table_with_invalid_schema. The True/False is hard to follow. If the name of the function indicates what's supposed to happen, then it's very clear what the test is testing.
ibis/expr/rules.py
Outdated
| if column_rule.optional: | ||
| break | ||
| else: | ||
| raise IbisTypeError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole thing can be written much more clearly:
if not column_rule.optional and column_rule.name not in arg:
raise IbisTypeError(...)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still have to skip if optional. Will do:
if column_rule.name not in arg:
if column_rule.optional:
continue
else:
raise IbisTypeError(
'No column with name {}.'.format(column_rule.name))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did that ^
ibis/expr/rules.py
Outdated
| # Check that columns match the schema first | ||
| for column_rule in self.rules: | ||
| # Members of a schema are arguments with a name | ||
| if not isinstance(column_rule, Argument): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a thing that makes stuff into Argument instances somewhere so you shouldn't have to do this. It's very hard to get to the meat of the code here because of all the checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean _to_argument? It seems like most argument subclasses call _to_argument directly, so we wouldn't even get it for free if I made SubsetValidator a child of Argument. Also, we always want to take an Argument here (can't think of anything we may want to upcast), so I think it is a good idea to check it explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I agree it is hard to get to the meat of the code with all the validation. I will split the rule checking from the table validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I made a new function to validate rules. Also it is now called in the constructor, not during validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I think this is the only outstanding-ish comment).
ibis/expr/rules.py
Outdated
| ... output_type = rules.type_of_arg(0) | ||
| """ | ||
| def __init__(self, name=None, optional=False, schema=None, doc=None, | ||
| validator=None, allow_extra=False, **arg_kwds): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need allow_extra? I thought we had decided to get rid of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:( forgot to remove it here srry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it. 👍
|
@DiegoAlbertoTorres Merged! Thanks! |
…olumn type checking.