New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Strptime in BigQuery #1457
Implement Strptime in BigQuery #1457
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good so far, a few minor comments. We definitely need at least one compiler test and one execution test (the latter can be placed in test_client.py
.
ibis/expr/api.py
Outdated
@@ -1800,6 +1800,30 @@ def _string_replace(arg, pattern, replacement): | |||
return ops.StringReplace(arg, pattern, replacement).to_expr() | |||
|
|||
|
|||
def to_datetime(arg, format_str, timezone_str=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call timezone_str
just timezone
.
docs/source/api.rst
Outdated
@@ -440,6 +440,7 @@ All string operations are valid either on scalar or array values | |||
StringValue.capitalize | |||
StringValue.contains | |||
StringValue.like | |||
StringValue.to_datetime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we have to_timestamp
for integer values. Let's change the name of this operation to that to keep it consistent.
ibis/expr/api.py
Outdated
@@ -1800,6 +1800,30 @@ def _string_replace(arg, pattern, replacement): | |||
return ops.StringReplace(arg, pattern, replacement).to_expr() | |||
|
|||
|
|||
def to_datetime(arg, format_str, timezone_str=None): | |||
""" | |||
Parses a string and returns a timestamp.a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you stopped writing mid sentence here :)
ibis/expr/operations.py
Outdated
@@ -2305,6 +2305,13 @@ class Strftime(ValueOp): | |||
output_type = rlz.shape_like('arg', dt.string) | |||
|
|||
|
|||
class Strptime(ValueOp): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we call this StringToTimestamp
or something else that makes the correspondence between the user-facing API and the underlying operation more obvious?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking pretty good, I'll merge after the next round of comments are addressed. Thanks for doing this @missing-semicolon!
ibis/expr/api.py
Outdated
|
||
Returns | ||
------- | ||
parsed : datetime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be parsed : TimestampValue
ibis/expr/operations.py
Outdated
arg = Arg(rlz.string) | ||
format_str = Arg(rlz.string) | ||
timezone = Arg(rlz.string, default=None) | ||
output_type = rlz.shape_like('arg', dt.time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of dt.time
, this should be dt.timestamp
.
---------- | ||
format_str : A format string potentially of the type '%Y-%m-%d' | ||
timezone : An optional string indicating the timezone, | ||
i.e. 'America/New_York' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the behavior with a non-None
value of timezone
with a format string that only includes a date? Does it assume midnight for the time portion of the timestamp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function assumes midnight for the time component and returns a timestamp that has been converted to UTC. Here's an example:
>>> s = ibis.literal('2016-02-22')
>>> expr = s.to_timestamp('%F', 'America/New_York')
>>> client.execute(expr)
Timestamp('2016-02-22 05:00:00')
>>> expr2 = s.to_timestamp('%F', 'UTC')
>>> client.execute(expr2)
Timestamp('2016-02-22 00:00:00')
ibis/bigquery/compiler.py
Outdated
arg, format_string, timezone_arg = expr.op().args | ||
fmt_string = translator.translate(format_string) | ||
arg_formatted = translator.translate(arg) | ||
if isinstance(timezone_arg, ir.StringValue): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of checking the type of timezone_arg
, just check timezone_arg is not None
. Ibis's argument validation will have converted the argument to a StringValue
if it's not None
.
ibis/expr/api.py
Outdated
>>> date_as_str = ibis.literal('20170206') | ||
>>> result = date_as_str.to_timestamp('%Y%m%d') | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's kill this extra newline.
@missing-semicolon I'm working on fixing the |
@missing-semicolon rebase on master and you'll be able to get past any failures that aren't related this PR. |
I cannot wait. 🍾 |
…plement-strpdate
ibis/expr/operations.py
Outdated
arg = Arg(rlz.string) | ||
format_str = Arg(rlz.string) | ||
timezone = Arg(rlz.string, default=None) | ||
output_type = rlz.shape_like('arg', dt.timestamp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be dt.Timestamp(timezone='UTC')
since BigQuery returns timestamps in UTC. Other backends that implement this can follow suit.
Merging. thanks @missing-semicolon ! |
Closes #1455
As implemented, the string method
to_datetime
requires a format string argument and accepts an optional argument for timezones. All comments appreciated!