New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interval support #1243

Closed
wants to merge 47 commits into
base: master
from

Conversation

2 participants
@kszucs
Member

kszucs commented Nov 26, 2017

  • interval datatype
  • interval rules
  • IntervalValue/IntervalScalar/IntervalColumn
  • timestamp arithmetics
  • date arithmetics
  • time arithmetics
  • date shorthand
  • IntervalPromoter
  • unit conversions
  • expose convertible units as properties
  • remove timedelta
  • to_interval
  • timestamp.date()
  • port currently used compilers
    • impala
    • clickhouse
  • thorough testing (sort of done)
  • interval comparison operators

resolves #1233, #1228

@cpcloud cpcloud self-requested a review Nov 26, 2017

@cpcloud cpcloud added the enhancement label Nov 26, 2017

@cpcloud cpcloud added this to To do in New Operations and Types via automation Nov 26, 2017

@cpcloud cpcloud added this to the 0.13 milestone Nov 26, 2017

@@ -300,6 +300,10 @@ def test_timestamp_with_timezone_parser_invalid_timezone():
assert str(ts) == "timestamp('US/Ea')"
def test_interval():
dt.validate_type("interval('Y')")

This comment has been minimized.

@cpcloud

cpcloud Nov 26, 2017

Member

This should assert that this returns the same thing as Interval('Y').

self.unit = unit
def valid_literal(self, value):
return isinstance(value, six.string_types + (datetime.timedelta,))

This comment has been minimized.

@cpcloud

cpcloud Nov 26, 2017

Member

Should this list the specific characters that are valid?

__slots__ = 'unit',
_valid_units = set([

This comment has been minimized.

@cpcloud

cpcloud Nov 27, 2017

Member

Let's make this frozenset.

kszucs added some commits Nov 27, 2017

@@ -201,21 +201,10 @@ def test_interval(literal):
def test_interval_repr():
repr(api.interval(weeks=3)).splitlines()[0] == 'Literal[interval]'
assert repr(api.interval(weeks=3)) == 'Literal[interval]\n 3'

This comment has been minimized.

@cpcloud

cpcloud Nov 28, 2017

Member

We should put the unit in the repr as well.

])
def test_interval(unit):
definition = "interval('{}')".format(unit)
dt.Interval(unit) == dt.validate_type(definition)

This comment has been minimized.

@cpcloud

cpcloud Nov 28, 2017

Member

needs an assert here

@@ -709,6 +710,10 @@ def time(**arg_kwds):
return ValueTyped(ir.TimeValue, 'not time', **arg_kwds)
def interval(**arg_kwds):

This comment has been minimized.

@cpcloud

cpcloud Nov 28, 2017

Member

Do we also need a way to specify exactly unit rules as well? Something like:

interval_with_unit('s')
def interval(**arg_kwds):
return ValueTyped(ir.IntervalValue, 'not an interval', **arg_kwds)
def timedelta(**arg_kwds):

This comment has been minimized.

@cpcloud

cpcloud Nov 28, 2017

Member

I wonder if this can be removed (or just aliased to interval for backwards compat)

@@ -452,22 +452,23 @@ def output_type(self):
return output_type
def numeric_highest_promote(i):
# TODO: UNUSED
# def numeric_highest_promote(i):

This comment has been minimized.

@cpcloud

cpcloud Nov 28, 2017

Member

If things are unused, you can delete them. We can always resurrect them later with git if needed.

class IntervalAdd(ValueOp): # __radd__
input_type = [rules.temporal, rules.interval]
output_type = rules.shape_like_arg(0, 'timestamp')

This comment has been minimized.

@cpcloud

cpcloud Nov 28, 2017

Member

I think the output type here is a bit more complicated since date + interval doesn't always return a timestamp. You can either perform the checking in the output_type method or you can make three different add classes that directly express the output type. The latter is a bit easier to reason about IMO.

print('can_implicit_cast')
op = arg.op()
if isinstance(op, Literal):
if isinstance(op.value, six.integer_types):

This comment has been minimized.

@cpcloud

cpcloud Nov 28, 2017

Member

Arbitrary integers shouldn't be implicitly castable to IntervalValues since integers do not have units. Only other interval values should be implicitly castable to interval values.

if len(defined_units) > 1:
raise ValueError('Only one arg can be specified')
elif len(defined_units) < 1:
raise ValueError('At least one of the arguments must be specified')

This comment has been minimized.

@cpcloud

cpcloud Nov 29, 2017

Member

Probably just len(defined_units) != 1, with a message saying Exactly one argument is required.

Is there any reason why we need to enforce this? Postgres, for example, allows multiple values to the interval constructor.

This comment has been minimized.

@kszucs

kszucs Nov 29, 2017

Member

Postgres is more flexible than others

This comment has been minimized.

@cpcloud

cpcloud Nov 29, 2017

Member

Fair enough, just wondering.

else:
unit, value = defined_units[0]
type = dt.Interval(unit)
return ir.IntervalScalar(ir.literal(value, type=type).op())

This comment has been minimized.

@cpcloud

cpcloud Nov 29, 2017

Member

Should this be ir.literal(value, type=type).op().to_expr()?

self.unit = unit
def valid_literal(self, value):
return isinstance(value, six.integer_types + (datetime.timedelta,))

This comment has been minimized.

@cpcloud

cpcloud Nov 29, 2017

Member

I think only timedelta instances are valid literals, otherwise that would mean we're giving a default unit to integers.

@@ -778,6 +808,7 @@ def valid_literal(self, value):
date = Date()
time = Time()
timestamp = Timestamp()
interval = Interval()

This comment has been minimized.

@cpcloud

cpcloud Nov 29, 2017

Member

Do we want aliases for each of the interval values like year, month, day, etc.?

This comment has been minimized.

@cpcloud

cpcloud Nov 29, 2017

Member

Fine if not, just a thought.

This comment has been minimized.

@kszucs

kszucs Nov 29, 2017

Member

I was just following the patterns in datatypes.py
None of the parametric types has specified aliases, so I guess not.

kszucs added some commits Nov 30, 2017

kszucs added some commits Dec 3, 2017

@kszucs

This comment has been minimized.

Member

kszucs commented Dec 4, 2017

@cpcloud Would You please review before I finalize this PR?

@cpcloud cpcloud added this to To do in Interval Type Dec 4, 2017

@cpcloud cpcloud moved this from To do to In progress in Interval Type Dec 4, 2017

@cpcloud cpcloud removed this from To do in New Operations and Types Dec 4, 2017

WHERE `timestamp_col` < months_add('2010-01-01 00:00:00', 3) AND
`timestamp_col` < days_add(now(), 10)"""
assert result == expected
# def test_where_analyze_scalar_op(self):

This comment has been minimized.

@cpcloud

cpcloud Dec 5, 2017

Member

We should be able to keep this test passing if we alias the day, month, etc functions to be calls to the top level interval function.

@@ -367,19 +386,14 @@ def _timestamp_from_unix(translator, expr):
return _call(translator, 'toDateTime', arg)
def _timestamp_delta(translator, expr):
def _timestamp_add(translator, expr):

This comment has been minimized.

@cpcloud

cpcloud Dec 5, 2017

Member

Where is this being used? It looks like you're mapping ops.TimestampAdd to binary_infix_op('+') and not this. Is that intentional?

_interval_floordiv = _binop_expr('__floordiv__', _ops.IntervalFloorDivide)
_interval_value_methods = dict(
to_unit=_to_unit,

This comment has been minimized.

@cpcloud

cpcloud Dec 5, 2017

Member

Can this be done with cast?

This comment has been minimized.

@kszucs

kszucs Dec 5, 2017

Member

Do You mean interval(hours=3).cast("interval('s')")?

This comment has been minimized.

@cpcloud

cpcloud Dec 5, 2017

Member

Yep. If this is for backward compat, then we can leave it.

"""Infers the output type of the binary interval operation
This is a slightly modified version of BinaryPromoter, it converts
back and forth between the interval and its innner value.

This comment has been minimized.

@cpcloud

cpcloud Dec 5, 2017

Member

Small typo here, innner has one too many n's

@@ -261,6 +261,23 @@ def _value_list(translator, expr):
return '({0})'.format(', '.join(values_))
def _interval_format(translator, expr):
units = {

This comment has been minimized.

@cpcloud

cpcloud Dec 5, 2017

Member

Can you use Interval._valid_units here so we don't have to repeat this? Maybe we can use Interval._valid_units everywhere we would hard code unit values/labels.

return False
def valid_literal(self, value):
return isinstance(value, six.integer_types + (datetime.timedelta,))

This comment has been minimized.

@cpcloud

cpcloud Dec 5, 2017

Member

Are the semantics here such that:

interval(days=2) + 1 == interval(days=3)

?

This comment has been minimized.

@kszucs

kszucs Dec 5, 2017

Member

That's required to pass https://github.com/ibis-project/ibis/blob/master/ibis/expr/types.py#L1587

For a type e.g. interval<int32>('s') a python integer should be a valid literal.

The operation fails:

In [1]: import ibis

In [2]: ibis.interval(days=2) + 1 == ibis.interval(days=3)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-3abf187ff76a> in <module>()
----> 1 ibis.interval(days=2) + 1 == ibis.interval(days=3)

TypeError: unsupported operand type(s) for +: 'IntervalScalar' and 'int'

This comment has been minimized.

@cpcloud

cpcloud Dec 5, 2017

Member

Ah right. Carry on then, this looks good.

'ns' # Nanosecond
])
def __init__(self, value_type=None, unit='s', nullable=True):

This comment has been minimized.

@cpcloud

cpcloud Dec 5, 2017

Member

We probably want the unit first here, so that it's easy to construct this type without using a keyword:

# current
Interval(unit='s')
# a little easier to read IMO
Interval('s')
@cpcloud

This comment has been minimized.

Member

cpcloud commented Dec 5, 2017

@kszucs Looking great. Everything ready to go on your side?

@cpcloud

cpcloud approved these changes Dec 5, 2017

@kszucs kszucs changed the title from [WIP] Interval support to Interval support Dec 5, 2017

@kszucs

This comment has been minimized.

Member

kszucs commented Dec 5, 2017

@cpcloud Yes! Thank You!

Alt Text

@cpcloud cpcloud closed this in 48306f8 Dec 6, 2017

Interval Type automation moved this from In progress to Done Dec 6, 2017

@kszucs kszucs deleted the kszucs:interval branch Dec 29, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment