-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add to_timestamp support in BigQuery and Pandas #1410
ENH: Add to_timestamp support in BigQuery and Pandas #1410
Conversation
cpcloud
commented
Apr 10, 2018
•
edited
Loading
edited
- Update our testing data for parquet
795f2f1
to
1a35039
Compare
|
@kszucs can you review this when you get a chance? |
1a35039
to
53dfb14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly general comments, thoughts. If You agree with them we could convert them to issues.
ibis/bigquery/types.py
Outdated
| @@ -0,0 +1,70 @@ | |||
| import six | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of general thoughts:
We currently have a pretty solid datatype abstraction in expr/datatypes.py, but our conversions between backend and ibis datatypes could be more consistent, see clickhouse, pandas, alchemy
Shouldn't we standardize that?
I've found the mixed usage of types and datatypes identifiers really confusing, so I suggest to rename this file to datatypes.py to be a clear mapping between expr/datatypes.py and backend/datatypes.py (dt -> dt instead of dt -> ir), we should also factor out the datatype conversion codes in all backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently have a pretty solid datatype abstraction in
expr/datatypes.py, but our conversions between backend and ibis datatypes could be more consistent,
Definitely agree with that. infer works well for going from backend type to ibis type, but we need another API (maybe the to_ibis function you mention below) to make it convenient to go from ibis type to backend type.
I've found the mixed usage of
typesanddatatypesidentifiers really confusing, so I suggest to rename this file todatatypes.py
Fully agree, I'll rename the file.
ibis/bigquery/types.py
Outdated
| __slots__ = () | ||
|
|
||
|
|
||
| ibis_type_to_bigquery_type = Dispatcher('ibis_type_to_bigquery_type') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that explicit is always better than implicit, but this is a bit verbose to me. I've used from_ibis and to_ibis in other backends inside datatypes scope, which might be too implicit. Is there a golden section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's refactor this in another PR, I'll make an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def trans_integer(t): | ||
| # BigQuery doesn't accept integer types because javascript doesn't | ||
| @ibis_type_to_bigquery_type.register(dt.Integer, UDFContext) | ||
| def trans_integer(t, context): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could You write a short reasoning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry, I thought I had a comment there. I'll add.
ibis/bigquery/types.py
Outdated
| import ibis.expr.datatypes as dt | ||
|
|
||
|
|
||
| class TypeTranslationContext(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first look this class seems unused, until the UDFContext definition. A little docstring would be handy :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add.
ibis/bigquery/types.py
Outdated
| return ibis_type_to_bigquery_type(t, TypeTranslationContext()) | ||
|
|
||
|
|
||
| @ibis_type_to_bigquery_type.register(six.string_types) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not support string types, that's rather the responsibility of the caller, see my comment in api.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, will change.
ibis/bigquery/udf/api.py
Outdated
| @@ -156,12 +158,14 @@ def compiles_udf_node(t, expr): | |||
| return {name}({args}); | |||
| """;'''.format( | |||
| name=f.__name__, | |||
| return_type=ibis_type_to_bigquery_type(output_type), | |||
| return_type=ibis_type_to_bigquery_type( | |||
| output_type, type_translation_context), | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about
ibis_type_to_bigquery_type(dt.dtype(output_type), type_translation_context)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool
ibis/bigquery/udf/api.py
Outdated
| source=source, | ||
| signature=', '.join( | ||
| '{name} {type}'.format( | ||
| name=name, | ||
| type=ibis_type_to_bigquery_type(type) | ||
| type=ibis_type_to_bigquery_type( | ||
| type, type_translation_context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above:
ibis_type_to_bigquery_type(dt.dtype(type), type_translation_context)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
ibis/impala/compiler.py
Outdated
| @@ -734,15 +734,19 @@ def _truncate(translator, expr): | |||
| return "trunc({}, '{}')".format(arg_formatted, unit) | |||
|
|
|||
|
|
|||
| TIMESTAMP_UNIT_CONVERSIONS = { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After factoring out _convert_unit could we reuse it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into that.
ibis/bigquery/compiler.py
Outdated
| @compiles(ops.Floor) | ||
| def compiles_floor(t, e): | ||
| bigquery_type = ibis_type_to_bigquery_type(e.type()) | ||
| arg, = e.op().args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simply arg = e.op().arg?
I guess we should favor using named arguments instead of unpacking args.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, will change.
9d33c09
to
a7a4392
Compare