New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add to_timestamp support in BigQuery and Pandas #1410

Closed
wants to merge 20 commits into
base: master
from

Conversation

Projects
2 participants
@cpcloud
Member

cpcloud commented Apr 10, 2018

  • Update our testing data for parquet

@cpcloud cpcloud added this to the 0.14 milestone Apr 10, 2018

@cpcloud cpcloud added this to To do in BigQuery via automation Apr 10, 2018

@cpcloud cpcloud self-assigned this Apr 10, 2018

@cpcloud cpcloud force-pushed the cpcloud:timestamp-from-unix-bigquery branch 5 times, most recently from 795f2f1 to 1a35039 Apr 11, 2018

@cpcloud

This comment has been minimized.

Member

cpcloud commented Apr 13, 2018

@kszucs can you review this when you get a chance?

@cpcloud cpcloud force-pushed the cpcloud:timestamp-from-unix-bigquery branch from 1a35039 to 53dfb14 Apr 15, 2018

@kszucs

kszucs approved these changes Apr 15, 2018

Mainly general comments, thoughts. If You agree with them we could convert them to issues.

@@ -0,0 +1,70 @@
import six

This comment has been minimized.

@kszucs

kszucs Apr 15, 2018

Member

Just a couple of general thoughts:

We currently have a pretty solid datatype abstraction in expr/datatypes.py, but our conversions between backend and ibis datatypes could be more consistent, see clickhouse, pandas, alchemy
Shouldn't we standardize that?

I've found the mixed usage of types and datatypes identifiers really confusing, so I suggest to rename this file to datatypes.py to be a clear mapping between expr/datatypes.py and backend/datatypes.py (dt -> dt instead of dt -> ir), we should also factor out the datatype conversion codes in all backends.

This comment has been minimized.

@cpcloud

cpcloud Apr 15, 2018

Member

We currently have a pretty solid datatype abstraction in expr/datatypes.py, but our conversions between backend and ibis datatypes could be more consistent,

Definitely agree with that. infer works well for going from backend type to ibis type, but we need another API (maybe the to_ibis function you mention below) to make it convenient to go from ibis type to backend type.

I've found the mixed usage of types and datatypes identifiers really confusing, so I suggest to rename this file to datatypes.py

Fully agree, I'll rename the file.

__slots__ = ()
ibis_type_to_bigquery_type = Dispatcher('ibis_type_to_bigquery_type')

This comment has been minimized.

@kszucs

kszucs Apr 15, 2018

Member

I know that explicit is always better than implicit, but this is a bit verbose to me. I've used from_ibis and to_ibis in other backends inside datatypes scope, which might be too implicit. Is there a golden section?

This comment has been minimized.

@cpcloud

cpcloud Apr 15, 2018

Member

Let's refactor this in another PR, I'll make an issue.

This comment has been minimized.

@cpcloud
def trans_integer(t):
# BigQuery doesn't accept integer types because javascript doesn't
@ibis_type_to_bigquery_type.register(dt.Integer, UDFContext)
def trans_integer(t, context):

This comment has been minimized.

@kszucs

kszucs Apr 15, 2018

Member

Could You write a short reasoning?

This comment has been minimized.

@cpcloud

cpcloud Apr 15, 2018

Member

Yes, sorry, I thought I had a comment there. I'll add.

import ibis.expr.datatypes as dt
class TypeTranslationContext(object):

This comment has been minimized.

@kszucs

kszucs Apr 15, 2018

Member

At first look this class seems unused, until the UDFContext definition. A little docstring would be handy :)

This comment has been minimized.

@cpcloud

cpcloud Apr 15, 2018

Member

Will add.

return ibis_type_to_bigquery_type(t, TypeTranslationContext())
@ibis_type_to_bigquery_type.register(six.string_types)

This comment has been minimized.

@kszucs

kszucs Apr 15, 2018

Member

I would not support string types, that's rather the responsibility of the caller, see my comment in api.py.

This comment has been minimized.

@cpcloud

cpcloud Apr 15, 2018

Member

Cool, will change.

@@ -156,12 +158,14 @@ def compiles_udf_node(t, expr):
return {name}({args});
""";'''.format(
name=f.__name__,
return_type=ibis_type_to_bigquery_type(output_type),
return_type=ibis_type_to_bigquery_type(
output_type, type_translation_context),

This comment has been minimized.

@kszucs

kszucs Apr 15, 2018

Member

How about

ibis_type_to_bigquery_type(dt.dtype(output_type), type_translation_context)

?

This comment has been minimized.

@cpcloud
source=source,
signature=', '.join(
'{name} {type}'.format(
name=name,
type=ibis_type_to_bigquery_type(type)
type=ibis_type_to_bigquery_type(
type, type_translation_context)

This comment has been minimized.

@kszucs

kszucs Apr 15, 2018

Member

Same as above:

ibis_type_to_bigquery_type(dt.dtype(type), type_translation_context)

This comment has been minimized.

@cpcloud

cpcloud Apr 15, 2018

Member

Will do

@@ -734,15 +734,19 @@ def _truncate(translator, expr):
return "trunc({}, '{}')".format(arg_formatted, unit)
TIMESTAMP_UNIT_CONVERSIONS = {

This comment has been minimized.

@kszucs

kszucs Apr 15, 2018

Member

After factoring out _convert_unit could we reuse it here?

This comment has been minimized.

@cpcloud

cpcloud Apr 15, 2018

Member

I'll look into that.

@compiles(ops.Floor)
def compiles_floor(t, e):
bigquery_type = ibis_type_to_bigquery_type(e.type())
arg, = e.op().args

This comment has been minimized.

@kszucs

kszucs Apr 15, 2018

Member

Simply arg = e.op().arg?
I guess we should favor using named arguments instead of unpacking args.

This comment has been minimized.

@cpcloud

cpcloud Apr 15, 2018

Member

Agreed, will change.

@cpcloud cpcloud force-pushed the cpcloud:timestamp-from-unix-bigquery branch from 9d33c09 to a7a4392 Apr 15, 2018

@cpcloud cpcloud closed this in f917f7c Apr 16, 2018

BigQuery automation moved this from To do to Done Apr 16, 2018

@cpcloud cpcloud deleted the cpcloud:timestamp-from-unix-bigquery branch Apr 16, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment