Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add to_timestamp support in BigQuery and Pandas #1410

Closed

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Apr 10, 2018

  • Update our testing data for parquet

@cpcloud cpcloud added this to the 0.14 milestone Apr 10, 2018
@cpcloud cpcloud self-assigned this Apr 10, 2018
@cpcloud cpcloud force-pushed the timestamp-from-unix-bigquery branch 5 times, most recently from 795f2f1 to 1a35039 Compare April 13, 2018 14:43
@cpcloud
Copy link
Member Author

cpcloud commented Apr 13, 2018

@kszucs can you review this when you get a chance?

@cpcloud cpcloud force-pushed the timestamp-from-unix-bigquery branch from 1a35039 to 53dfb14 Compare April 15, 2018 19:32
Copy link
Member

@kszucs kszucs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly general comments, thoughts. If You agree with them we could convert them to issues.

@@ -0,0 +1,70 @@
import six
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of general thoughts:

We currently have a pretty solid datatype abstraction in expr/datatypes.py, but our conversions between backend and ibis datatypes could be more consistent, see clickhouse, pandas, alchemy
Shouldn't we standardize that?

I've found the mixed usage of types and datatypes identifiers really confusing, so I suggest to rename this file to datatypes.py to be a clear mapping between expr/datatypes.py and backend/datatypes.py (dt -> dt instead of dt -> ir), we should also factor out the datatype conversion codes in all backends.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently have a pretty solid datatype abstraction in expr/datatypes.py, but our conversions between backend and ibis datatypes could be more consistent,

Definitely agree with that. infer works well for going from backend type to ibis type, but we need another API (maybe the to_ibis function you mention below) to make it convenient to go from ibis type to backend type.

I've found the mixed usage of types and datatypes identifiers really confusing, so I suggest to rename this file to datatypes.py

Fully agree, I'll rename the file.

__slots__ = ()


ibis_type_to_bigquery_type = Dispatcher('ibis_type_to_bigquery_type')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that explicit is always better than implicit, but this is a bit verbose to me. I've used from_ibis and to_ibis in other backends inside datatypes scope, which might be too implicit. Is there a golden section?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's refactor this in another PR, I'll make an issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def trans_integer(t):
# BigQuery doesn't accept integer types because javascript doesn't
@ibis_type_to_bigquery_type.register(dt.Integer, UDFContext)
def trans_integer(t, context):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could You write a short reasoning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry, I thought I had a comment there. I'll add.

import ibis.expr.datatypes as dt


class TypeTranslationContext(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first look this class seems unused, until the UDFContext definition. A little docstring would be handy :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add.

return ibis_type_to_bigquery_type(t, TypeTranslationContext())


@ibis_type_to_bigquery_type.register(six.string_types)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not support string types, that's rather the responsibility of the caller, see my comment in api.py.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, will change.

@@ -156,12 +158,14 @@ def compiles_udf_node(t, expr):
return {name}({args});
""";'''.format(
name=f.__name__,
return_type=ibis_type_to_bigquery_type(output_type),
return_type=ibis_type_to_bigquery_type(
output_type, type_translation_context),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

ibis_type_to_bigquery_type(dt.dtype(output_type), type_translation_context)

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool

source=source,
signature=', '.join(
'{name} {type}'.format(
name=name,
type=ibis_type_to_bigquery_type(type)
type=ibis_type_to_bigquery_type(
type, type_translation_context)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above:

ibis_type_to_bigquery_type(dt.dtype(type), type_translation_context)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

@@ -734,15 +734,19 @@ def _truncate(translator, expr):
return "trunc({}, '{}')".format(arg_formatted, unit)


TIMESTAMP_UNIT_CONVERSIONS = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After factoring out _convert_unit could we reuse it here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into that.

@compiles(ops.Floor)
def compiles_floor(t, e):
bigquery_type = ibis_type_to_bigquery_type(e.type())
arg, = e.op().args
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply arg = e.op().arg?
I guess we should favor using named arguments instead of unpacking args.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, will change.

@cpcloud cpcloud force-pushed the timestamp-from-unix-bigquery branch from 9d33c09 to a7a4392 Compare April 15, 2018 22:58
@cpcloud cpcloud closed this in f917f7c Apr 16, 2018
@cpcloud cpcloud deleted the timestamp-from-unix-bigquery branch April 16, 2018 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants