-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rules refactor #1366
Rules refactor #1366
Conversation
2eb37b7
to
8a19a88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking pretty good so far. We need to avoid any unnecessary breakages, so that downstream users that have extended the library outside of upstream don't experience unnecessary breakage.
ibis/bigquery/client.py
Outdated
| def bq_param_integer(param, value): | ||
| return bq.ScalarQueryParameter(param._name, 'INT64', value) | ||
|
|
||
|
|
||
| @bigquery_param.register(ir.DoubleScalar, float) | ||
| @bigquery_param.register(ir.FloatingScalar, float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure there are tests for these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eg passing in the different int and float subclasses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed those subclasses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because they are only required to be able to bind methods on them (otherwise evenScalarExpr(DataType) and ColumnExpr(DataType) would be enough).
The actual work is done by the datatype passed to ValueExpr as an argument. Also see DataType.scalar and DataType.column class properties.
ibis/expr/analysis.py
Outdated
|
|
||
|
|
||
| def is_reduction(expr): | ||
| # Aggregations yield typed scalar expressions, since the result of an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put this in the docstring
ibis/expr/datatypes.py
Outdated
| @@ -348,12 +360,16 @@ def __str__(self): | |||
| def _equal_part(self, other, cache=None): | |||
| return self.precision == other.precision and self.scale == other.scale | |||
|
|
|||
| def largest(self): | |||
| return Decimal(self.precision, 38) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A scale of 38 implies a precision of 38, because precision = scale + number of digits to the left of the decimal point so unless self.precision == 38 this type is impossible.
There's also the issue of what you mean by largest here. The largest decimal value that can be constructed can only be constructed with a decimal type of decimal(38, 0) and that's the integer with 9 repeated 38 times. Also, we shouldn't assume 38 digits of precision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ported from rules, used at the following places:
Decimal largest is clearly wrong, but what should I use instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we define largest for a given scale? In that case the definition of largest becomes Decimal(38, self.scale)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's assume 38 digits of precision for now. If we have users that desire more precision than that we can revisit in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got confused https://github.com/ibis-project/ibis/blob/master/ibis/expr/tests/test_decimal.py#L39 :)
ibis/expr/datatypes.py
Outdated
| @@ -559,6 +584,9 @@ def _equal_part(self, other, cache=None): | |||
| any = Any() | |||
| null = Null() | |||
| boolean = Boolean() | |||
| integer = Integer() | |||
| floating = Floating() | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean for a user to specify this as the type of a column (exposing them this way implies that they are part of the user-facing API). If it does have a meaning, are there tests for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They were required previously, I'll drop them.
ibis/expr/datatypes.py
Outdated
| @@ -559,6 +584,9 @@ def _equal_part(self, other, cache=None): | |||
| any = Any() | |||
| null = Null() | |||
| boolean = Boolean() | |||
| integer = Integer() | |||
| floating = Floating() | |||
| decimal = Decimal(12, 2) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 12, 2 as the precision and scale, respectively?
ibis/expr/operations.py
Outdated
| class OperationMeta(type): | ||
|
|
||
| @classmethod | ||
| def __prepare__(metacls, name, bases, **kwds): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't exist on Python 2.7, so we'll need to find another way of writing this class. We may have a module that we import it from depending on the version of python so that we can continue to use __prepare__ in Python >=3
| return False | ||
|
|
||
| def flat_args(self): | ||
| for arg in self.args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we have a flatten function laying around somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't one, should it go to ibis.util?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave this here for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One wrinkle here is that this only flattens one nesting level. So a general flatten function would probably break some things. Let's revisit flatten after this PR is merged.
ibis/expr/operations.py
Outdated
| if (self, other) in cache: | ||
| return cache[(self, other)] | ||
|
|
||
| if id(self) == id(other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be changed to self is other.
ibis/expr/operations.py
Outdated
| HasSchema.__init__(self, schema, name=name) | ||
| name = rlz.instanceof(six.string_types) | ||
| schema = rlz.schema | ||
| source = rlz.noop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These noops should be instances of ibis.client.Client, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there are a lot of noops currently which we should eventually configure.
ibis/expr/operations.py
Outdated
| TableNode.__init__(self, [schema, name]) | ||
| HasSchema.__init__(self, schema, name=name) | ||
| schema = rlz.schema | ||
| name = rlz.optional(rlz.instanceof(six.string_types), default=genname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document that callables are allowed for defaults if you haven't already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
ibis/expr/analytics.py
Outdated
| self.buckets = buckets | ||
| self.closed = _validate_closed(closed) | ||
| __slots__ = ('arg', 'buckets', 'closed', 'close_extreme', 'include_under', | ||
| 'include_over') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cpcloud I've added slots for each operations to support argument order on py2 too. While this is a bit more boilerplate, I wouldn't consider bad.
I can use plain __init__ methods as well as the counter trick we've discussed, in case of the latter we need to enforce calling the rules.
Which approach do You prefer?
a. __slot__ or argnames or ...?
b. counter trick and rlz.any() or rlz.value(dt.any)
c. plain but idiomatic __init__ functions*?
- which is not not that bad, but
op.argswould require another order definition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is another one:
class ArraySlice(ValueOp):
arg = Argument(rlz.value(dt.Array(dt.any)))
start = Argument(rlz.integer)
stop = Argument(rlz.integer, default=None)
output_type = rlz.typeof('arg')There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on gitter conversation we ended up with the following API:
class StringFind(ValueOp):
arg = Arg(rlz.string)
start = Arg(rlz.integer, default=None)
end = Arg(rlz.integer, default=None)
return_ = Return(like=arg, dtype=dt.int64)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will alse make possible to support operation definition with python typing.
| MapValue, MapScalar, MapColumn, | ||
| StructValue, StructScalar, StructColumn, | ||
| CategoryValue, unnamed, as_value_expr, literal, | ||
| param, null, sequence) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were not exposed in __all__, so I've removed them.
| from ibis.compat import PY2, to_time, to_date | ||
| from ibis.expr.types import Expr, null, param, literal, sequence, as_value_expr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might factor out null, param, literal, sequence, as_value_expr to here, but we can do that incrementally :)
| def castable(self, target, **kwargs): | ||
| return castable(self, target, **kwargs) | ||
|
|
||
| def cast(self, target, **kwargs): | ||
| return cast(self, target, **kwargs) | ||
|
|
||
| def scalar_type(self): | ||
| import ibis.expr.types as ir | ||
| return getattr(ir, '{}Scalar'.format(self.name)) | ||
| return partial(self.scalar, dtype=self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cpcloud These are actually "factories" not types. Are You OK with these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's fine with me.
| case_expr = as_value_expr(case_expr) | ||
| result_expr = as_value_expr(result_expr) | ||
| case_expr = ir.as_value_expr(case_expr) | ||
| result_expr = ir.as_value_expr(result_expr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should get rid of as_value_expr too, all the value coercions should be handled by rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely agree with that.
ibis/expr/types.py
Outdated
| @@ -915,502 +567,239 @@ def group_by(self, by=None, **additional_grouping_expressions): | |||
| # with: an instance of each is well-typed and includes all valid methods | |||
| # defined for each type. | |||
|
|
|||
| # TODO: __slots__? | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define or not to define? Use a metaclass?
| self.include_under = bool(include_under) | ||
|
|
||
| if not len(buckets): | ||
| arg = Arg(rlz.noop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, Arg(rlz.noop) is equivalent to (the old) rules.value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, noop is toolz.identity, just a placeholder where no rule was defined previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. LGTM.
| import ibis.expr.analysis as _L | ||
| import ibis.expr.datatypes as dt | ||
| import ibis.expr.analytics as _analytics | ||
| import ibis.expr.operations as ops |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to change the _ops and _com imports? The changes in this file are rather noisy because it looks like most of them just renaming _ops to ops and _com to com. The reason these are prefixed with an underscore is to hide them from a star import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Api.py has __all__ defined - which protects against star import, and I thought ops is cleaner.
ibis/expr/datatypes.py
Outdated
| @@ -239,11 +230,16 @@ def __repr__(self): | |||
|
|
|||
|
|
|||
| class SignedInteger(Integer): | |||
| pass | |||
|
|
|||
| def largest(self): | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to make these @propertys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular. When should I prefer a property over a method?
ibis/expr/datatypes.py
Outdated
| @@ -1143,6 +1188,12 @@ def can_cast_floats(source, target, upcast=False, **kwargs): | |||
| return True | |||
|
|
|||
|
|
|||
| @castable.register(Decimal, Decimal) | |||
| def cas_cast_decimals(source, target, **kwargs): | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be can_cast_decimals?
ibis/expr/signature.py
Outdated
| @six.add_metaclass(AnnotableMeta) | ||
| class Annotable(object): | ||
|
|
||
| __slots__ = tuple() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can just be __slots__ = ().
ibis/expr/signature.py
Outdated
| self.default == other.default | ||
| ) | ||
|
|
||
| def optional(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a @property?
| if self.default is None: | ||
| return None | ||
| elif util.is_function(self.default): | ||
| value = self.default() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't aware that we support default argument values that are functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is for table genname:
class UnboundTable(PhysicalTable):
schema = Arg(sch.Schema)
name = Arg(six.string_types, default=genname)I can remove it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, let's leave it. It's fine for now.
| elif name in kwargs: | ||
| value = argument.validate(kwargs[name], name=name) | ||
| else: | ||
| value = argument.validate(name=name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it might be possible to bind arguments using inspect.signature here so we don't have to recreate Python's calling convention algorithm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. Isn't inspect.signature py3 only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, but there's the funcsigs package for that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to add funcsigs as a dependency for #1277, so don't worry about adding it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converted to issue: #1396
| @@ -182,22 +176,6 @@ def test_primitive(spec, expected): | |||
| assert dt.dtype(spec) == expected | |||
|
|
|||
|
|
|||
| def test_precedence_with_no_arguments(): | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you move these tests to another file?
| schema = query.schema() | ||
|
|
||
| # clickhouse columns has been defined as non-nullable | ||
| # whereas other backends don't support non-nullable columns yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe Impala supports this, and possiblty other backends.
| ir.Node.__init__(self, [foreign_table, predicates]) | ||
| class NotExistsSubquery(ops.Node): | ||
| foreign_table = Arg(rlz.noop) | ||
| predicates = Arg(rlz.noop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to see this finally getting rolled into the rules system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to revisit all Args with rlz.noop :)
| r = find_base_table(arg) | ||
| if isinstance(r, TableExpr): | ||
| return r | ||
| unnamed = UnnamedMarker() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember deleting this. Am I mistaken? If not, why bring it back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, it's present in the master
| @@ -1288,7 +1289,7 @@ def g(x): | |||
| def test_pickle_table_expr(): | |||
| schema = [('time', 'timestamp'), ('key', 'string'), ('value', 'double')] | |||
| t0 = ibis.table(schema, name='t0') | |||
| raw = pickle.dumps(t0) | |||
| raw = pickle.dumps(t0, protocol=2) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you passing a specific protocol version here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On py27 serializeing object with __slots__ requires protocol 2.
| arg = Argument(lambda x: x) | ||
|
|
||
| class StringOp(Op): | ||
| arg = Argument(str) # inherit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be Arg instead of Argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In signature.py it's defined as Argument and aliased as Arg during import. Do You mean renaming in signature.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. No, leaving it here as Argument is fine.
|
|
||
| def _repr(self, memo=None): | ||
| if memo is None: | ||
| from ibis.expr.format import FormatMemo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably ok to just import in the function, this is a little strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are circular imports.
| cache[(self, other)] = True | ||
| return True | ||
|
|
||
| def is_ancestor(self, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this being used anywhere? It seems like it's really not that different from just a straight-up call to equals.
| else: | ||
| return rules.shape_like(arg, 'int64') | ||
| """Absolute value""" | ||
| output_type = rlz.typeof('arg') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you not pass arg directly here? It is in scope after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arg is actually an Argument instance without any knowledge of the underlying data, also doesn't know its name until the metaclass finishes signature construction.
A solution would be to set the argument's name in the metaclass or use descriptors and _sunder slots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually You could give me a couple of ideas how should we abstract return_ definition.
|
|
||
|
|
||
| class Capitalize(StringUnaryOp): | ||
| pass | ||
| """Return a capitalized version of input string""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the documentation here.
ibis/expr/operations.py
Outdated
|
|
||
|
|
||
| class Union(TableNode, HasSchema): | ||
| left = Arg(rlz.noop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably be TableExpr here, but leave it as is if making that change doesn't trivially work.
|
Bombs away! |
No description provided.