-
Notifications
You must be signed in to change notification settings - Fork 586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: add a bunch of bigquery ops #1222
Conversation
still need to add some tests |
3eac40d
to
eabddd9
Compare
return BigQueryCursor(query) | ||
if results: | ||
return BigQueryCursor(query) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this, since it's a no-op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the else and default to python's implicit return None
or remove the if?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the else:
and write return None
or nothing, up to you.
ibis/bigquery/compiler.py
Outdated
|
||
|
||
def approx_count_distinct(arg): | ||
return bq_ops.ApproxCountDistinct(arg).to_expr() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ibis already has HLLCardinality
for this. Not a great name since it contains the algorithm used to implement the functionality. We should rename it to ApproxCountDistinct
. You don't have to do that in this PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API is column.approx_nunique()
, just FYI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: BigQuery offers an explicitly separate HyperLogLog API
ibis/bigquery/compiler.py
Outdated
|
||
|
||
def format_date(fmt, arg): | ||
return bq_ops.FormatDate(arg, fmt).to_expr() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ibis calls this ops.Strftime
. The API is value.strftime('some format string')
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strftime
should cover all of the cases that bigquery separates like TIME_FORMAT
, DATE_FORMAT
, and TIMESTAMP_FORMAT
.
args = expr.op().args | ||
(arg, percentile_rank) = map(translator.translate, args[:2]) | ||
value_type = args[2] | ||
command = dict(cont='percentile_cont', disc='percentile_disc')[value_type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about 'percentile_{}'.format(value_type)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave this out of this PR, and I'm going to refactor Quantile
to be an AnalyticOp
so that we don't have to have both Quantile
and Percentile
.
|
||
def _approx_quantile(translator, expr): | ||
args = expr.op().args | ||
args = [translator.translate(arg) for arg in args[:2]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the :2
? Is there a third argument you're leaving out on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, ignore_nulls
which is the next function down and raises NotImplementedError
if it evaluates to True
ibis/bigquery/compiler.py
Outdated
def _timestamp_diff(translator, expr): | ||
(arg0, arg1, timestamp_part) = expr.op().args | ||
(af0, af1) = [translator.translate(arg) for arg in (arg0, arg1)] | ||
return 'TIMESTAMP_DIFF({}, {}, {})'.format(af0, af1, timestamp_part) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want a generic method like delta
that can choose the correct function to call (DATE_DIFF
/TIME_DIFF
/TIMESTAMP_DIFF
) based on the input types. This would also need a promotion mechanism to determine which type to promote different arguments to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: BigQuery also has a Datetime type and I don't think we have a way of differentiating in between Datetime
and Timestamp
in ibis.
@@ -1044,6 +1062,15 @@ class NthValue(AnalyticOp): | |||
input_type = [rules.column, rules.integer] | |||
output_type = rules.type_of_arg(0) | |||
|
|||
|
|||
class Arbitrary(Reduction): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a test.
ibis/bigquery/compiler.py
Outdated
def format_date(fmt, arg): | ||
return bq_ops.FormatDate(arg, fmt).to_expr() | ||
klass_name = expr.op().args[0].type().__class__.__name__.upper() | ||
if klass_name in ['TIMESTAMP', 'DATE', 'TIME']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an assert
ion since you shouldn't be able to reach this code path if the type system is working correctly.
ibis/bigquery/compiler.py
Outdated
(typ0, typ1) = (arg0.type(), arg1.type()) | ||
klass_name = typ0.__class__.__name__.upper() | ||
assert typ0 == typ1, ( | ||
"Don't know how to diff different temporal types: {}, {}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If users can hit these they should be TypeError
s. Same for the assert
below.
ibis/bigquery/compiler.py
Outdated
bq_ops.TimestampDiff: _timestamp_diff, | ||
bq_ops.DateDiff: _temporal_diff, | ||
bq_ops.TimestampDiff: _temporal_diff, | ||
bq_ops.TimeDiff: _temporal_diff, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These classes should go in ibis.expr.operations
.
ibis/bigquery/compiler.py
Outdated
return '{}_DIFF({}, {}, {})'.format(klass_name, af0, af1, part) | ||
|
||
|
||
def temporal_diff(arg0, arg1, *args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you call these left
and right
f55eb84
to
9f299d3
Compare
hey @tsdlovell what is the status of this PR? there are some files with conflict, could you rebase your code? |
Closing as stale. @tsdlovell, if you want to continue working on this, please let me know. |
... and fixup some things in BigQueryClient.{_execute,raw_sql}