Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Strptime in BigQuery #1457

Conversation

missing-semicolon
Copy link
Contributor

Closes #1455

As implemented, the string method to_datetime requires a format string argument and accepts an optional argument for timezones. All comments appreciated!

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far, a few minor comments. We definitely need at least one compiler test and one execution test (the latter can be placed in test_client.py.

ibis/expr/api.py Outdated
@@ -1800,6 +1800,30 @@ def _string_replace(arg, pattern, replacement):
return ops.StringReplace(arg, pattern, replacement).to_expr()


def to_datetime(arg, format_str, timezone_str=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call timezone_str just timezone.

@@ -440,6 +440,7 @@ All string operations are valid either on scalar or array values
StringValue.capitalize
StringValue.contains
StringValue.like
StringValue.to_datetime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we have to_timestamp for integer values. Let's change the name of this operation to that to keep it consistent.

ibis/expr/api.py Outdated
@@ -1800,6 +1800,30 @@ def _string_replace(arg, pattern, replacement):
return ops.StringReplace(arg, pattern, replacement).to_expr()


def to_datetime(arg, format_str, timezone_str=None):
"""
Parses a string and returns a timestamp.a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you stopped writing mid sentence here :)

@@ -2305,6 +2305,13 @@ class Strftime(ValueOp):
output_type = rlz.shape_like('arg', dt.string)


class Strptime(ValueOp):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this StringToTimestamp or something else that makes the correspondence between the user-facing API and the underlying operation more obvious?

@cpcloud cpcloud added this to the 0.14 milestone May 24, 2018
@cpcloud cpcloud added feature Features or general enhancements expressions Issues or PRs related to the expression API bigquery labels May 24, 2018
@cpcloud cpcloud self-assigned this May 24, 2018
Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good, I'll merge after the next round of comments are addressed. Thanks for doing this @missing-semicolon!

ibis/expr/api.py Outdated

Returns
-------
parsed : datetime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be parsed : TimestampValue

arg = Arg(rlz.string)
format_str = Arg(rlz.string)
timezone = Arg(rlz.string, default=None)
output_type = rlz.shape_like('arg', dt.time)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of dt.time, this should be dt.timestamp.

----------
format_str : A format string potentially of the type '%Y-%m-%d'
timezone : An optional string indicating the timezone,
i.e. 'America/New_York'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the behavior with a non-None value of timezone with a format string that only includes a date? Does it assume midnight for the time portion of the timestamp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function assumes midnight for the time component and returns a timestamp that has been converted to UTC. Here's an example:

>>> s = ibis.literal('2016-02-22')
>>> expr = s.to_timestamp('%F', 'America/New_York')
>>> client.execute(expr)
Timestamp('2016-02-22 05:00:00')
>>> expr2 = s.to_timestamp('%F', 'UTC')
>>> client.execute(expr2)
Timestamp('2016-02-22 00:00:00')

arg, format_string, timezone_arg = expr.op().args
fmt_string = translator.translate(format_string)
arg_formatted = translator.translate(arg)
if isinstance(timezone_arg, ir.StringValue):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking the type of timezone_arg, just check timezone_arg is not None. Ibis's argument validation will have converted the argument to a StringValue if it's not None.

ibis/expr/api.py Outdated
>>> date_as_str = ibis.literal('20170206')
>>> result = date_as_str.to_timestamp('%Y%m%d')


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's kill this extra newline.

@cpcloud
Copy link
Member

cpcloud commented May 30, 2018

@missing-semicolon I'm working on fixing the ci/datamgr.py issue in #1458.

@cpcloud
Copy link
Member

cpcloud commented Jun 4, 2018

@missing-semicolon rebase on master and you'll be able to get past any failures that aren't related this PR.

@missing-semicolon
Copy link
Contributor Author

I cannot wait. 🍾

arg = Arg(rlz.string)
format_str = Arg(rlz.string)
timezone = Arg(rlz.string, default=None)
output_type = rlz.shape_like('arg', dt.timestamp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be dt.Timestamp(timezone='UTC') since BigQuery returns timestamps in UTC. Other backends that implement this can follow suit.

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2018

Merging. thanks @missing-semicolon !

@cpcloud cpcloud closed this in 0f8b502 Jun 6, 2018
@missing-semicolon missing-semicolon deleted the implement-strpdate branch June 7, 2018 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expressions Issues or PRs related to the expression API feature Features or general enhancements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants