New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: implement Time datatype #1105

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
3 participants
@jreback
Contributor

jreback commented Aug 10, 2017

implement BetweenTime node
closes #1098
closes #1096

pd.Series,
)
def execute_time(op, data, scope=None):
return op.to_expr()

This comment has been minimized.

@jreback

jreback Aug 10, 2017

Contributor

I had to do this to get it to return my correct node.

)
def execute_between_time(op, data, lower, upper, scope=None):
# TODO(jreback), this is most likely wrong

This comment has been minimized.

@jreback

jreback Aug 10, 2017

Contributor

this node is hit correctly, BUT I cannot get the data at this point (IOW the timestampseries)
column gives me the name of the column, but I don't have access to the table to select.

currently the impl is wrong (its going to be something like this (if data is a Series)

indexer = pd.DatetimeIndex(data).indexer_between_time(lower, upper)

then some other code (i have to convert to a boolean array)

@jreback jreback added the enhancement label Aug 10, 2017

@jreback jreback self-assigned this Aug 10, 2017

@jreback jreback changed the title from implement Time datatype to ENH: implement Time datatype Aug 10, 2017

@jreback jreback force-pushed the jreback:between branch 2 times, most recently to 20d3259 Aug 10, 2017

@@ -643,7 +643,10 @@ def between(arg, lower, upper):
"""
lower = _ops.as_value_expr(lower)
upper = _ops.as_value_expr(upper)
op = _ops.Between(arg, lower, upper)
if isinstance(arg.op(), _ops.Time):
op = _ops.BetweenTime(arg, lower, upper)

This comment has been minimized.

@cpcloud

cpcloud Aug 10, 2017

Member

We shouldn't need this if we have a time datatype, since you'll be able to do:

t.timestamp_col.time().between(lower, upper)

This comment has been minimized.

@jreback

jreback Aug 10, 2017

Contributor

right, but I have something wrong and its dispatching elsewhere

This comment has been minimized.

@cpcloud

cpcloud Aug 10, 2017

Member

Oh, actually I see what you're doing here now. Nevermind what I said.

def test_time():
ts = dt.time
assert str(ts) == "time"

This comment has been minimized.

@cpcloud

cpcloud Aug 10, 2017

Member

Can you also add a test that checks the parsing of the time type , something like assert dt.validate_type('time').equals(dt.time)

This comment has been minimized.

@jreback

jreback Aug 10, 2017

Contributor

done

return False
def _can_compare(self, other):
return isinstance(other, (TemporalValue, StringValue))

This comment has been minimized.

@cpcloud

cpcloud Aug 10, 2017

Member

I don't know if it's intuitive to allow any TemporalValue here. I think allowing TimestampValue to TimeValue might be reasonable, but we definitely can't compare DateValues to TimeValues.

This comment has been minimized.

@jreback

jreback Aug 10, 2017

Contributor

done

This comment has been minimized.

@cpcloud

cpcloud Aug 10, 2017

Member

Here are the postgres rules:

  1. timestamp compare date: allowed
  2. timestamp compare time: not allowed
  3. date compare time: not allowed

So, you can pretty much only compare times with other time typed values. I think that's a reasonable default for now.

@@ -1158,6 +1183,22 @@ class DateColumn(ColumnExpr, DateValue):
pass
class TimeScalar(ScalarExpr, TimeValue):
@property
def _factory(self):

This comment has been minimized.

@cpcloud

cpcloud Aug 10, 2017

Member

I don't think you need this since there's no metadata attached to the time type.

This comment has been minimized.

@jreback

jreback Aug 10, 2017

Contributor

done

@@ -1783,7 +1800,8 @@ def _timestamp_strftime(arg, format_str):
minute=_extract_field('minute', _ops.ExtractMinute),
second=_extract_field('second', _ops.ExtractSecond),
millisecond=_extract_field('millisecond', _ops.ExtractMillisecond),
truncate=_timestamp_truncate
truncate=_timestamp_truncate,
time=_timestamp_time,

This comment has been minimized.

@cpcloud

cpcloud Aug 10, 2017

Member

This should be modeled as an extract operation, so you should be able to do _extract_field('time', _ops.ExtractTime).

This comment has been minimized.

@cpcloud

cpcloud Aug 10, 2017

Member

Hm, we don't actually have a date() method. I'll put up a PR.

This comment has been minimized.

@cpcloud

cpcloud Aug 10, 2017

Member

Actually, nevermind on the extract.

This comment has been minimized.

@jreback

jreback Aug 10, 2017

Contributor

this is very different though. .time() simply gives back a time node w/o actually doing any ting under the hood. Then you have to calll .between(...) to actually do anything

This comment has been minimized.

@cpcloud

cpcloud Aug 10, 2017

Member

.time() should actually be doing something without needing any between.

This comment has been minimized.

@cpcloud

cpcloud Aug 10, 2017

Member

Since it's a type in many databases. Here's postgres:

ibis_testing=# select cast(timestamp_col as time) as "timestamp_col::time", timestamp_col from functional_alltypes limit 5;
 timestamp_col::time |     timestamp_col     
---------------------+-----------------------
 00:00:00            | 2010-06-01 00:00:00
 00:01:00            | 2010-06-01 00:01:00
 00:02:00.1          | 2010-06-01 00:02:00.1
 00:03:00.3          | 2010-06-01 00:03:00.3
 00:04:00.6          | 2010-06-01 00:04:00.6
(5 rows)
@@ -2363,6 +2367,11 @@ class ExtractMillisecond(ExtractTimestampField):
pass
class Time(TemporalUnaryOp):

This comment has been minimized.

@cpcloud

cpcloud Aug 10, 2017

Member

I think this should inherit from UnaryOp since if you inherit from TemporalUnaryOp then date-typed columns will inherit the time() method and that doesn't make sense.

@jreback jreback force-pushed the jreback:between branch from e6da0db to 742d027 Aug 10, 2017

@@ -643,7 +643,11 @@ def between(arg, lower, upper):
"""
lower = _ops.as_value_expr(lower)
upper = _ops.as_value_expr(upper)
op = _ops.Between(arg, lower, upper)
if isinstance(arg.op(), _ops.Time):

This comment has been minimized.

@cpcloud

cpcloud Aug 10, 2017

Member

This should be if arg.type().equals(dt.time)

@cpcloud

This comment has been minimized.

Member

cpcloud commented Aug 10, 2017

Looks like a GCS failure. I restarted the build.

@cpcloud

This comment has been minimized.

Member

cpcloud commented Aug 10, 2017

Appveyor looks good. We've been getting more pathological failures with our now() test, not sure what's up with that.

@cpcloud cpcloud added this to the 0.11.3 milestone Aug 10, 2017

@jreback jreback force-pushed the jreback:between branch from 742d027 to 247d31c Aug 10, 2017

Returns
-------
Time node

This comment has been minimized.

@cpcloud

cpcloud Aug 10, 2017

Member

This should be TimeValue

@cpcloud

This comment has been minimized.

Member

cpcloud commented Aug 10, 2017

This needs additional tests:

  1. Comparing timestamp expressions with time expressions (should fail)
  2. Comparing date expressions with time expressions (should fail)
  3. Comparing time expressions with time expressions (should pass)

These can go in test_value_exprs.py

@jreback jreback force-pushed the jreback:between branch from 247d31c to 059f008 Aug 11, 2017

@jreback

This comment has been minimized.

Contributor

jreback commented Aug 11, 2017

ok pushed.

how do I evaluate to a boolean this:

In [6]: literal('10:00') == time(10,0)
Out[6]: 
Equals[boolean]
  left:
    Literal[time]
      10:00
  right:
    Literal[time]
      10:00:00
@jreback

This comment has been minimized.

Contributor

jreback commented Aug 12, 2017

@cpcloud I am missing some definition

In [4]: datetime.datetime(2016, 1, 1, 0, 0) == literal('2016-1-1')
Out[4]: 
Equals[boolean]
  left:
    Literal[timestamp]
      2016-1-1
  right:
    Literal[timestamp]
      2016-01-01 00:00:00

In [6]: datetime.time(10, 0) == literal('10:00')
Out[6]: False

IOW it seems that somewhere you are overiding __eq__ or doing an implicit cast for datetimes. I need to do the same for time.

@cpcloud

This comment has been minimized.

Member

cpcloud commented Aug 12, 2017

I just read through the Python 2.7.13 C source code for time.__eq__ and there's no way to make this work in Python 2 without subclassing time which we cannot do. We need the ability to compare columns to literals, and we can't start making all possible things that can be compared to time literals subclasses of time.

This is fundamentally a bug in Python 2, which is fixed in Python 3.

Using this feature is going to be limited to Python 3 only or we drop support for Python 2 altogether.

@wesm

This comment has been minimized.

Member

wesm commented Aug 13, 2017

I think I've run into this issue before with datetime.datetime. At some point we should drop Python 2.7; maybe it's now. I'm not sure what we'd be losing if we did that

@cpcloud

This comment has been minimized.

Member

cpcloud commented Aug 13, 2017

What's annoying is that there's a hack that we use for datetime.datetime--give it a no-op attribute timetuple. That works fine. But for datetime.time the only way you can get around this issue is by subclassing it.

:(

@jreback jreback force-pushed the jreback:between branch 2 times, most recently from 7d8b5e7 to 0ed097e Aug 13, 2017

# we cannot actually compare datetime.time objects and literals
# in a deferred way in python 2, they short circuit in the CPython
result = operator.eq(time(10, 0), literal('2017-04-01'))

This comment has been minimized.

@cpcloud

cpcloud Aug 13, 2017

Member

This should compare a time to another time

This comment has been minimized.

@jreback

jreback Aug 13, 2017

Contributor

done

@pytest.mark.skipif(not PY2, reason="testing for PY2")
def test_times_ops_py2(t, df):
with pytest.raises(ValueError):
t.plain_datetimes_naive.time().between('10:00', '10:00').execute()

This comment has been minimized.

@cpcloud

cpcloud Aug 13, 2017

Member

Can you put only the part that fails underneath raises so that it's clear when reading the test where the code should fail.

This comment has been minimized.

@jreback

jreback Aug 13, 2017

Contributor

done

implement Time datatype
imlpmenet BetweenTime node
closes #1098

@jreback jreback force-pushed the jreback:between branch from 0ed097e to e8d2bba Aug 13, 2017

@jreback

This comment has been minimized.

Contributor

jreback commented Aug 14, 2017

@cpcloud all green here.

@cpcloud

This comment has been minimized.

Member

cpcloud commented Aug 14, 2017

Looks good, merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment