New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rules refactor #1366

Closed
wants to merge 79 commits into
base: master
from

Conversation

Projects
2 participants
@kszucs
Member

kszucs commented Feb 24, 2018

No description provided.

@kszucs kszucs changed the title from Prainbow to [WIP] Rules refactor Feb 24, 2018

@kszucs kszucs force-pushed the kszucs:prainbow branch 2 times, most recently from 2eb37b7 to 8a19a88 Feb 26, 2018

@cpcloud

This is looking pretty good so far. We need to avoid any unnecessary breakages, so that downstream users that have extended the library outside of upstream don't experience unnecessary breakage.

def bq_param_integer(param, value):
return bq.ScalarQueryParameter(param._name, 'INT64', value)
@bigquery_param.register(ir.DoubleScalar, float)
@bigquery_param.register(ir.FloatingScalar, float)

This comment has been minimized.

@cpcloud

cpcloud Feb 26, 2018

Member

Make sure there are tests for these

This comment has been minimized.

@cpcloud

cpcloud Feb 26, 2018

Member

Eg passing in the different int and float subclasses

This comment has been minimized.

@kszucs

kszucs Mar 1, 2018

Member

I've removed those subclasses.

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

Why did you remove them?

This comment has been minimized.

@kszucs

kszucs Mar 11, 2018

Member

Because they are only required to be able to bind methods on them (otherwise evenScalarExpr(DataType) and ColumnExpr(DataType) would be enough).

The actual work is done by the datatype passed to ValueExpr as an argument. Also see DataType.scalar and DataType.column class properties.

def is_reduction(expr):
# Aggregations yield typed scalar expressions, since the result of an

This comment has been minimized.

@cpcloud

cpcloud Feb 26, 2018

Member

Let's put this in the docstring

@@ -348,12 +360,16 @@ def __str__(self):
def _equal_part(self, other, cache=None):
return self.precision == other.precision and self.scale == other.scale
def largest(self):
return Decimal(self.precision, 38)

This comment has been minimized.

@cpcloud

cpcloud Feb 26, 2018

Member

A scale of 38 implies a precision of 38, because precision = scale + number of digits to the left of the decimal point so unless self.precision == 38 this type is impossible.

There's also the issue of what you mean by largest here. The largest decimal value that can be constructed can only be constructed with a decimal type of decimal(38, 0) and that's the integer with 9 repeated 38 times. Also, we shouldn't assume 38 digits of precision.

This comment has been minimized.

@kszucs

kszucs Feb 28, 2018

Member

Ported from rules, used at the following places:

Decimal largest is clearly wrong, but what should I use instead?

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

How about we define largest for a given scale? In that case the definition of largest becomes Decimal(38, self.scale)

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

Let's assume 38 digits of precision for now. If we have users that desire more precision than that we can revisit in the future.

This comment has been minimized.

@@ -559,6 +584,9 @@ def _equal_part(self, other, cache=None):
any = Any()
null = Null()
boolean = Boolean()
integer = Integer()
floating = Floating()

This comment has been minimized.

@cpcloud

cpcloud Feb 26, 2018

Member

What does it mean for a user to specify this as the type of a column (exposing them this way implies that they are part of the user-facing API). If it does have a meaning, are there tests for it?

This comment has been minimized.

@kszucs

kszucs Feb 26, 2018

Member

They were required previously, I'll drop them.

@@ -559,6 +584,9 @@ def _equal_part(self, other, cache=None):
any = Any()
null = Null()
boolean = Boolean()
integer = Integer()
floating = Floating()
decimal = Decimal(12, 2)

This comment has been minimized.

@cpcloud

cpcloud Feb 26, 2018

Member

Why 12, 2 as the precision and scale, respectively?

class OperationMeta(type):
@classmethod
def __prepare__(metacls, name, bases, **kwds):

This comment has been minimized.

@cpcloud

cpcloud Feb 26, 2018

Member

This doesn't exist on Python 2.7, so we'll need to find another way of writing this class. We may have a module that we import it from depending on the version of python so that we can continue to use __prepare__ in Python >=3

return False
def flat_args(self):
for arg in self.args:

This comment has been minimized.

@cpcloud

cpcloud Feb 26, 2018

Member

I feel like we have a flatten function laying around somewhere.

This comment has been minimized.

@kszucs

kszucs Feb 28, 2018

Member

There isn't one, should it go to ibis.util?

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

Let's leave this here for now.

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

One wrinkle here is that this only flattens one nesting level. So a general flatten function would probably break some things. Let's revisit flatten after this PR is merged.

if (self, other) in cache:
return cache[(self, other)]
if id(self) == id(other):

This comment has been minimized.

@cpcloud

cpcloud Feb 26, 2018

Member

This can be changed to self is other.

HasSchema.__init__(self, schema, name=name)
name = rlz.instanceof(six.string_types)
schema = rlz.schema
source = rlz.noop

This comment has been minimized.

@cpcloud

cpcloud Feb 26, 2018

Member

These noops should be instances of ibis.client.Client, right?

This comment has been minimized.

@kszucs

kszucs Feb 26, 2018

Member

Yes, there are a lot of noops currently which we should eventually configure.

TableNode.__init__(self, [schema, name])
HasSchema.__init__(self, schema, name=name)
schema = rlz.schema
name = rlz.optional(rlz.instanceof(six.string_types), default=genname)

This comment has been minimized.

@cpcloud

cpcloud Feb 26, 2018

Member

Can you document that callables are allowed for defaults if you haven't already?

This comment has been minimized.

@kszucs

kszucs Mar 1, 2018

Member

Done.

self.buckets = buckets
self.closed = _validate_closed(closed)
__slots__ = ('arg', 'buckets', 'closed', 'close_extreme', 'include_under',
'include_over')

This comment has been minimized.

@kszucs

kszucs Feb 28, 2018

Member

@cpcloud I've added slots for each operations to support argument order on py2 too. While this is a bit more boilerplate, I wouldn't consider bad.
I can use plain __init__ methods as well as the counter trick we've discussed, in case of the latter we need to enforce calling the rules.

Which approach do You prefer?
a. __slot__ or argnames or ...?
b. counter trick and rlz.any() or rlz.value(dt.any)
c. plain but idiomatic __init__ functions*?

  • which is not not that bad, but op.args would require another order definition

This comment has been minimized.

@kszucs

kszucs Mar 7, 2018

Member

Here is another one:

class ArraySlice(ValueOp):
    arg = Argument(rlz.value(dt.Array(dt.any)))
    start = Argument(rlz.integer)
    stop = Argument(rlz.integer, default=None)
    output_type = rlz.typeof('arg')

This comment has been minimized.

@kszucs

kszucs Mar 7, 2018

Member

Based on gitter conversation we ended up with the following API:

class StringFind(ValueOp):
    arg = Arg(rlz.string)
    start = Arg(rlz.integer, default=None)
    end = Arg(rlz.integer, default=None)
    return_ = Return(like=arg, dtype=dt.int64)

This comment has been minimized.

@kszucs

kszucs Mar 7, 2018

Member

This will alse make possible to support operation definition with python typing.

MapValue, MapScalar, MapColumn,
StructValue, StructScalar, StructColumn,
CategoryValue, unnamed, as_value_expr, literal,
param, null, sequence)

This comment has been minimized.

@kszucs

kszucs Feb 28, 2018

Member

These were not exposed in __all__, so I've removed them.

from ibis.compat import PY2, to_time, to_date
from ibis.expr.types import Expr, null, param, literal, sequence, as_value_expr

This comment has been minimized.

@kszucs

kszucs Feb 28, 2018

Member

We might factor out null, param, literal, sequence, as_value_expr to here, but we can do that incrementally :)

def castable(self, target, **kwargs):
return castable(self, target, **kwargs)
def cast(self, target, **kwargs):
return cast(self, target, **kwargs)
def scalar_type(self):
import ibis.expr.types as ir
return getattr(ir, '{}Scalar'.format(self.name))
return partial(self.scalar, dtype=self)

This comment has been minimized.

@kszucs

kszucs Feb 28, 2018

Member

@cpcloud These are actually "factories" not types. Are You OK with these?

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

Yes, that's fine with me.

case_expr = as_value_expr(case_expr)
result_expr = as_value_expr(result_expr)
case_expr = ir.as_value_expr(case_expr)
result_expr = ir.as_value_expr(result_expr)

This comment has been minimized.

@kszucs

kszucs Feb 28, 2018

Member

I think we should get rid of as_value_expr too, all the value coercions should be handled by rules.

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

I definitely agree with that.

@@ -915,502 +567,239 @@ def group_by(self, by=None, **additional_grouping_expressions):
# with: an instance of each is well-typed and includes all valid methods
# defined for each type.
# TODO: __slots__?

This comment has been minimized.

@kszucs

kszucs Feb 28, 2018

Member

Define or not to define? Use a metaclass?

self.include_under = bool(include_under)
if not len(buckets):
arg = Arg(rlz.noop)

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

If I understand correctly, Arg(rlz.noop) is equivalent to (the old) rules.value?

This comment has been minimized.

@kszucs

kszucs Mar 11, 2018

Member

No, noop is toolz.identity, just a placeholder where no rule was defined previously.

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

I see. LGTM.

import ibis.expr.analysis as _L
import ibis.expr.datatypes as dt
import ibis.expr.analytics as _analytics
import ibis.expr.operations as ops

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

Is there any reason to change the _ops and _com imports? The changes in this file are rather noisy because it looks like most of them just renaming _ops to ops and _com to com. The reason these are prefixed with an underscore is to hide them from a star import.

This comment has been minimized.

@kszucs

kszucs Mar 11, 2018

Member

Api.py has __all__ defined - which protects against star import, and I thought ops is cleaner.

@@ -239,11 +230,16 @@ def __repr__(self):
class SignedInteger(Integer):
pass
def largest(self):

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

Any reason not to make these @propertys?

This comment has been minimized.

@kszucs

kszucs Mar 11, 2018

Member

No particular. When should I prefer a property over a method?

@kszucs kszucs force-pushed the kszucs:prainbow branch from 15791a7 to 38c3cd3 Mar 11, 2018

@@ -1143,6 +1188,12 @@ def can_cast_floats(source, target, upcast=False, **kwargs):
return True
@castable.register(Decimal, Decimal)
def cas_cast_decimals(source, target, **kwargs):

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

Should this be can_cast_decimals?

@six.add_metaclass(AnnotableMeta)
class Annotable(object):
__slots__ = tuple()

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

This can just be __slots__ = ().

self.default == other.default
)
def optional(self):

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

Should this be a @property?

if self.default is None:
return None
elif util.is_function(self.default):
value = self.default()

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

I wasn't aware that we support default argument values that are functions.

This comment has been minimized.

@kszucs

kszucs Mar 11, 2018

Member

That is for table genname:

class UnboundTable(PhysicalTable):
    schema = Arg(sch.Schema)
    name = Arg(six.string_types, default=genname)

I can remove it though.

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

No, let's leave it. It's fine for now.

elif name in kwargs:
value = argument.validate(kwargs[name], name=name)
else:
value = argument.validate(name=name)

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

Do you think it might be possible to bind arguments using inspect.signature here so we don't have to recreate Python's calling convention algorithm?

This comment has been minimized.

@kszucs

kszucs Mar 11, 2018

Member

Good question. Isn't inspect.signature py3 only?

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

Yep, but there's the funcsigs package for that :)

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

I'm going to add funcsigs as a dependency for #1277, so don't worry about adding it here

This comment has been minimized.

@kszucs

kszucs Apr 1, 2018

Member

Converted to issue: #1396

@@ -182,22 +176,6 @@ def test_primitive(spec, expected):
assert dt.dtype(spec) == expected
def test_precedence_with_no_arguments():

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

Did you move these tests to another file?

schema = query.schema()
# clickhouse columns has been defined as non-nullable
# whereas other backends don't support non-nullable columns yet

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

I believe Impala supports this, and possiblty other backends.

ir.Node.__init__(self, [foreign_table, predicates])
class NotExistsSubquery(ops.Node):
foreign_table = Arg(rlz.noop)
predicates = Arg(rlz.noop)

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

Nice to see this finally getting rolled into the rules system.

This comment has been minimized.

@kszucs

kszucs Mar 11, 2018

Member

We need to revisit all Args with rlz.noop :)

r = find_base_table(arg)
if isinstance(r, TableExpr):
return r
unnamed = UnnamedMarker()

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

I remember deleting this. Am I mistaken? If not, why bring it back?

This comment has been minimized.

@kszucs

kszucs Mar 16, 2018

Member

Probably, it's present in the master

@@ -1288,7 +1289,7 @@ def g(x):
def test_pickle_table_expr():
schema = [('time', 'timestamp'), ('key', 'string'), ('value', 'double')]
t0 = ibis.table(schema, name='t0')
raw = pickle.dumps(t0)
raw = pickle.dumps(t0, protocol=2)

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

Why are you passing a specific protocol version here?

This comment has been minimized.

@kszucs

kszucs Mar 18, 2018

Member

On py27 serializeing object with __slots__ requires protocol 2.

arg = Argument(lambda x: x)
class StringOp(Op):
arg = Argument(str) # inherit

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

Should this be Arg instead of Argument?

This comment has been minimized.

@kszucs

kszucs Mar 11, 2018

Member

In signature.py it's defined as Argument and aliased as Arg during import. Do You mean renaming in signature.py?

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

I see. No, leaving it here as Argument is fine.

def _repr(self, memo=None):
if memo is None:
from ibis.expr.format import FormatMemo

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

Probably ok to just import in the function, this is a little strange.

This comment has been minimized.

@kszucs

kszucs Mar 16, 2018

Member

There are circular imports.

cache[(self, other)] = True
return True
def is_ancestor(self, other):

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

Is this being used anywhere? It seems like it's really not that different from just a straight-up call to equals.

else:
return rules.shape_like(arg, 'int64')
"""Absolute value"""
output_type = rlz.typeof('arg')

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

Can you not pass arg directly here? It is in scope after all.

This comment has been minimized.

@kszucs

kszucs Mar 11, 2018

Member

arg is actually an Argument instance without any knowledge of the underlying data, also doesn't know its name until the metaclass finishes signature construction.
A solution would be to set the argument's name in the metaclass or use descriptors and _sunder slots.

This comment has been minimized.

@kszucs

kszucs Mar 11, 2018

Member

Actually You could give me a couple of ideas how should we abstract return_ definition.

class Capitalize(StringUnaryOp):
pass
"""Return a capitalized version of input string"""

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

Thanks for adding the documentation here.

class Union(TableNode, HasSchema):
left = Arg(rlz.noop)

This comment has been minimized.

@cpcloud

cpcloud Mar 11, 2018

Member

Should probably be TableExpr here, but leave it as is if making that change doesn't trivially work.

@kszucs kszucs force-pushed the kszucs:prainbow branch from 66ed0b8 to 63f0786 Apr 5, 2018

@cpcloud

This comment has been minimized.

Member

cpcloud commented Apr 5, 2018

Bombs away!

@cpcloud cpcloud closed this in 9394436 Apr 5, 2018

Refactoring automation moved this from To do to Done Apr 5, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment