Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: implement Time datatype #1105

Closed
wants to merge 1 commit into from
Closed

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Aug 10, 2017

implement BetweenTime node
closes #1098
closes #1096

pd.Series,
)
def execute_time(op, data, scope=None):
return op.to_expr()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to do this to get it to return my correct node.

)
def execute_between_time(op, data, lower, upper, scope=None):

# TODO(jreback), this is most likely wrong
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this node is hit correctly, BUT I cannot get the data at this point (IOW the timestampseries)
column gives me the name of the column, but I don't have access to the table to select.

currently the impl is wrong (its going to be something like this (if data is a Series)

indexer = pd.DatetimeIndex(data).indexer_between_time(lower, upper)

then some other code (i have to convert to a boolean array)

@jreback jreback added the feature Features or general enhancements label Aug 10, 2017
@jreback jreback self-assigned this Aug 10, 2017
@jreback jreback changed the title implement Time datatype ENH: implement Time datatype Aug 10, 2017
@jreback jreback force-pushed the between branch 2 times, most recently from 5f79f90 to 20d3259 Compare August 10, 2017 01:09
ibis/expr/api.py Outdated
@@ -643,7 +643,10 @@ def between(arg, lower, upper):
"""
lower = _ops.as_value_expr(lower)
upper = _ops.as_value_expr(upper)
op = _ops.Between(arg, lower, upper)
if isinstance(arg.op(), _ops.Time):
op = _ops.BetweenTime(arg, lower, upper)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need this if we have a time datatype, since you'll be able to do:

t.timestamp_col.time().between(lower, upper)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, but I have something wrong and its dispatching elsewhere

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, actually I see what you're doing here now. Nevermind what I said.


def test_time():
ts = dt.time
assert str(ts) == "time"
Copy link
Member

@cpcloud cpcloud Aug 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a test that checks the parsing of the time type , something like assert dt.validate_type('time').equals(dt.time)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return False

def _can_compare(self, other):
return isinstance(other, (TemporalValue, StringValue))
Copy link
Member

@cpcloud cpcloud Aug 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it's intuitive to allow any TemporalValue here. I think allowing TimestampValue to TimeValue might be reasonable, but we definitely can't compare DateValues to TimeValues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the postgres rules:

  1. timestamp compare date: allowed
  2. timestamp compare time: not allowed
  3. date compare time: not allowed

So, you can pretty much only compare times with other time typed values. I think that's a reasonable default for now.

@@ -1158,6 +1183,22 @@ class DateColumn(ColumnExpr, DateValue):
pass


class TimeScalar(ScalarExpr, TimeValue):
@property
def _factory(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this since there's no metadata attached to the time type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -1783,7 +1800,8 @@ def _timestamp_strftime(arg, format_str):
minute=_extract_field('minute', _ops.ExtractMinute),
second=_extract_field('second', _ops.ExtractSecond),
millisecond=_extract_field('millisecond', _ops.ExtractMillisecond),
truncate=_timestamp_truncate
truncate=_timestamp_truncate,
time=_timestamp_time,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be modeled as an extract operation, so you should be able to do _extract_field('time', _ops.ExtractTime).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, we don't actually have a date() method. I'll put up a PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, nevermind on the extract.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very different though. .time() simply gives back a time node w/o actually doing any ting under the hood. Then you have to calll .between(...) to actually do anything

Copy link
Member

@cpcloud cpcloud Aug 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.time() should actually be doing something without needing any between.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's a type in many databases. Here's postgres:

ibis_testing=# select cast(timestamp_col as time) as "timestamp_col::time", timestamp_col from functional_alltypes limit 5;
 timestamp_col::time |     timestamp_col     
---------------------+-----------------------
 00:00:00            | 2010-06-01 00:00:00
 00:01:00            | 2010-06-01 00:01:00
 00:02:00.1          | 2010-06-01 00:02:00.1
 00:03:00.3          | 2010-06-01 00:03:00.3
 00:04:00.6          | 2010-06-01 00:04:00.6
(5 rows)

@@ -2363,6 +2367,11 @@ class ExtractMillisecond(ExtractTimestampField):
pass


class Time(TemporalUnaryOp):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should inherit from UnaryOp since if you inherit from TemporalUnaryOp then date-typed columns will inherit the time() method and that doesn't make sense.

@@ -643,7 +643,11 @@ def between(arg, lower, upper):
"""
lower = _ops.as_value_expr(lower)
upper = _ops.as_value_expr(upper)
op = _ops.Between(arg, lower, upper)

if isinstance(arg.op(), _ops.Time):
Copy link
Member

@cpcloud cpcloud Aug 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be if arg.type().equals(dt.time)

@cpcloud
Copy link
Member

cpcloud commented Aug 10, 2017

Looks like a GCS failure. I restarted the build.

@cpcloud
Copy link
Member

cpcloud commented Aug 10, 2017

Appveyor looks good. We've been getting more pathological failures with our now() test, not sure what's up with that.

@cpcloud cpcloud added this to the 0.11.3 milestone Aug 10, 2017

Returns
-------
Time node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be TimeValue

@cpcloud
Copy link
Member

cpcloud commented Aug 10, 2017

This needs additional tests:

  1. Comparing timestamp expressions with time expressions (should fail)
  2. Comparing date expressions with time expressions (should fail)
  3. Comparing time expressions with time expressions (should pass)

These can go in test_value_exprs.py

@jreback
Copy link
Contributor Author

jreback commented Aug 11, 2017

ok pushed.

how do I evaluate to a boolean this:

In [6]: literal('10:00') == time(10,0)
Out[6]: 
Equals[boolean]
  left:
    Literal[time]
      10:00
  right:
    Literal[time]
      10:00:00

@jreback
Copy link
Contributor Author

jreback commented Aug 12, 2017

@cpcloud I am missing some definition

In [4]: datetime.datetime(2016, 1, 1, 0, 0) == literal('2016-1-1')
Out[4]: 
Equals[boolean]
  left:
    Literal[timestamp]
      2016-1-1
  right:
    Literal[timestamp]
      2016-01-01 00:00:00

In [6]: datetime.time(10, 0) == literal('10:00')
Out[6]: False

IOW it seems that somewhere you are overiding __eq__ or doing an implicit cast for datetimes. I need to do the same for time.

@cpcloud
Copy link
Member

cpcloud commented Aug 12, 2017

I just read through the Python 2.7.13 C source code for time.__eq__ and there's no way to make this work in Python 2 without subclassing time which we cannot do. We need the ability to compare columns to literals, and we can't start making all possible things that can be compared to time literals subclasses of time.

This is fundamentally a bug in Python 2, which is fixed in Python 3.

Using this feature is going to be limited to Python 3 only or we drop support for Python 2 altogether.

@wesm
Copy link
Member

wesm commented Aug 13, 2017

I think I've run into this issue before with datetime.datetime. At some point we should drop Python 2.7; maybe it's now. I'm not sure what we'd be losing if we did that

@cpcloud
Copy link
Member

cpcloud commented Aug 13, 2017

What's annoying is that there's a hack that we use for datetime.datetime--give it a no-op attribute timetuple. That works fine. But for datetime.time the only way you can get around this issue is by subclassing it.

:(

@jreback jreback force-pushed the between branch 2 times, most recently from 7d8b5e7 to 0ed097e Compare August 13, 2017 22:27

# we cannot actually compare datetime.time objects and literals
# in a deferred way in python 2, they short circuit in the CPython
result = operator.eq(time(10, 0), literal('2017-04-01'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should compare a time to another time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@pytest.mark.skipif(not PY2, reason="testing for PY2")
def test_times_ops_py2(t, df):
with pytest.raises(ValueError):
t.plain_datetimes_naive.time().between('10:00', '10:00').execute()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put only the part that fails underneath raises so that it's clear when reading the test where the code should fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

imlpmenet BetweenTime node
closes ibis-project#1098
@jreback
Copy link
Contributor Author

jreback commented Aug 14, 2017

@cpcloud all green here.

@cpcloud
Copy link
Member

cpcloud commented Aug 14, 2017

Looks good, merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.time().between(...) for time selection Implement a time type
3 participants