Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interval support #1243

Closed
wants to merge 47 commits into from
Closed

Interval support #1243

wants to merge 47 commits into from

Conversation

kszucs
Copy link
Member

@kszucs kszucs commented Nov 26, 2017

  • interval datatype
  • interval rules
  • IntervalValue/IntervalScalar/IntervalColumn
  • timestamp arithmetics
  • date arithmetics
  • time arithmetics
  • date shorthand
  • IntervalPromoter
  • unit conversions
  • expose convertible units as properties
  • remove timedelta
  • to_interval
  • timestamp.date()
  • port currently used compilers
    • impala
    • clickhouse
  • thorough testing (sort of done)
  • interval comparison operators

resolves #1233, #1228

@cpcloud cpcloud self-requested a review November 26, 2017 22:16
@cpcloud cpcloud added the feature Features or general enhancements label Nov 26, 2017
@cpcloud cpcloud added this to the 0.13 milestone Nov 26, 2017
@@ -300,6 +300,10 @@ def test_timestamp_with_timezone_parser_invalid_timezone():
assert str(ts) == "timestamp('US/Ea')"


def test_interval():
dt.validate_type("interval('Y')")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should assert that this returns the same thing as Interval('Y').

self.unit = unit

def valid_literal(self, value):
return isinstance(value, six.string_types + (datetime.timedelta,))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this list the specific characters that are valid?


__slots__ = 'unit',

_valid_units = set([
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this frozenset.

@@ -201,21 +201,10 @@ def test_interval(literal):


def test_interval_repr():
repr(api.interval(weeks=3)).splitlines()[0] == 'Literal[interval]'
assert repr(api.interval(weeks=3)) == 'Literal[interval]\n 3'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should put the unit in the repr as well.

])
def test_interval(unit):
definition = "interval('{}')".format(unit)
dt.Interval(unit) == dt.validate_type(definition)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs an assert here

@@ -709,6 +710,10 @@ def time(**arg_kwds):
return ValueTyped(ir.TimeValue, 'not time', **arg_kwds)


def interval(**arg_kwds):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need a way to specify exactly unit rules as well? Something like:

interval_with_unit('s')

def interval(**arg_kwds):
return ValueTyped(ir.IntervalValue, 'not an interval', **arg_kwds)


def timedelta(**arg_kwds):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this can be removed (or just aliased to interval for backwards compat)

@@ -452,22 +452,23 @@ def output_type(self):
return output_type


def numeric_highest_promote(i):
# TODO: UNUSED
# def numeric_highest_promote(i):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If things are unused, you can delete them. We can always resurrect them later with git if needed.

class IntervalAdd(ValueOp): # __radd__

input_type = [rules.temporal, rules.interval]
output_type = rules.shape_like_arg(0, 'timestamp')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the output type here is a bit more complicated since date + interval doesn't always return a timestamp. You can either perform the checking in the output_type method or you can make three different add classes that directly express the output type. The latter is a bit easier to reason about IMO.

print('can_implicit_cast')
op = arg.op()
if isinstance(op, Literal):
if isinstance(op.value, six.integer_types):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arbitrary integers shouldn't be implicitly castable to IntervalValues since integers do not have units. Only other interval values should be implicitly castable to interval values.

ibis/expr/api.py Outdated
if len(defined_units) > 1:
raise ValueError('Only one arg can be specified')
elif len(defined_units) < 1:
raise ValueError('At least one of the arguments must be specified')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably just len(defined_units) != 1, with a message saying Exactly one argument is required.

Is there any reason why we need to enforce this? Postgres, for example, allows multiple values to the interval constructor.

Copy link
Member Author

@kszucs kszucs Nov 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres is more flexible than others

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, just wondering.

ibis/expr/api.py Outdated
else:
unit, value = defined_units[0]
type = dt.Interval(unit)
return ir.IntervalScalar(ir.literal(value, type=type).op())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be ir.literal(value, type=type).op().to_expr()?

self.unit = unit

def valid_literal(self, value):
return isinstance(value, six.integer_types + (datetime.timedelta,))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think only timedelta instances are valid literals, otherwise that would mean we're giving a default unit to integers.

@@ -778,6 +808,7 @@ def valid_literal(self, value):
date = Date()
time = Time()
timestamp = Timestamp()
interval = Interval()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want aliases for each of the interval values like year, month, day, etc.?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine if not, just a thought.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just following the patterns in datatypes.py
None of the parametric types has specified aliases, so I guess not.

@kszucs
Copy link
Member Author

kszucs commented Dec 4, 2017

@cpcloud Would You please review before I finalize this PR?

WHERE `timestamp_col` < months_add('2010-01-01 00:00:00', 3) AND
`timestamp_col` < days_add(now(), 10)"""
assert result == expected
# def test_where_analyze_scalar_op(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to keep this test passing if we alias the day, month, etc functions to be calls to the top level interval function.

@@ -367,19 +386,14 @@ def _timestamp_from_unix(translator, expr):
return _call(translator, 'toDateTime', arg)


def _timestamp_delta(translator, expr):
def _timestamp_add(translator, expr):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this being used? It looks like you're mapping ops.TimestampAdd to binary_infix_op('+') and not this. Is that intentional?

_interval_floordiv = _binop_expr('__floordiv__', _ops.IntervalFloorDivide)

_interval_value_methods = dict(
to_unit=_to_unit,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be done with cast?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do You mean interval(hours=3).cast("interval('s')")?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. If this is for backward compat, then we can leave it.

"""Infers the output type of the binary interval operation

This is a slightly modified version of BinaryPromoter, it converts
back and forth between the interval and its innner value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo here, innner has one too many n's

@@ -261,6 +261,23 @@ def _value_list(translator, expr):
return '({0})'.format(', '.join(values_))


def _interval_format(translator, expr):
units = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use Interval._valid_units here so we don't have to repeat this? Maybe we can use Interval._valid_units everywhere we would hard code unit values/labels.

return False

def valid_literal(self, value):
return isinstance(value, six.integer_types + (datetime.timedelta,))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the semantics here such that:

interval(days=2) + 1 == interval(days=3)

?

Copy link
Member Author

@kszucs kszucs Dec 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's required to pass https://github.com/ibis-project/ibis/blob/master/ibis/expr/types.py#L1587

For a type e.g. interval<int32>('s') a python integer should be a valid literal.

The operation fails:

In [1]: import ibis

In [2]: ibis.interval(days=2) + 1 == ibis.interval(days=3)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-3abf187ff76a> in <module>()
----> 1 ibis.interval(days=2) + 1 == ibis.interval(days=3)

TypeError: unsupported operand type(s) for +: 'IntervalScalar' and 'int'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right. Carry on then, this looks good.

'ns' # Nanosecond
])

def __init__(self, value_type=None, unit='s', nullable=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want the unit first here, so that it's easy to construct this type without using a keyword:

# current
Interval(unit='s')
# a little easier to read IMO
Interval('s')

@cpcloud
Copy link
Member

cpcloud commented Dec 5, 2017

@kszucs Looking great. Everything ready to go on your side?

@kszucs kszucs changed the title [WIP] Interval support Interval support Dec 5, 2017
@kszucs
Copy link
Member Author

kszucs commented Dec 5, 2017

@cpcloud Yes! Thank You!

Alt Text

@cpcloud cpcloud closed this in 48306f8 Dec 6, 2017
@kszucs kszucs deleted the interval branch December 29, 2017 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Full support for interval (timedelta) types
2 participants