Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add JavaScript UDFs for BigQuery #1377

Closed
wants to merge 7 commits into from

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Mar 9, 2018

No description provided.

@cpcloud cpcloud added this to the 0.13 milestone Mar 10, 2018
@cpcloud cpcloud self-assigned this Mar 10, 2018
@cpcloud cpcloud modified the milestones: 0.13, 0.14 Mar 11, 2018
@cpcloud cpcloud force-pushed the bigquery-udfs branch 7 times, most recently from 7d5d9dd to 139e572 Compare March 12, 2018 17:20
def generate_setup_queries(self):
result = list(
map(partial(BigQueryUDFDefinition, context=self.context),
lin.traverse(find_bigquery_udf, self.expr)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"A local variables wouldn't kill you here" :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might :)

@@ -291,10 +291,11 @@ def _build_ast(self, expr, context):
result = comp.build_ast(expr, context)
return result

def _execute_query(self, ddl, async=False):
def _execute_query(self, dml, async=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did You mean sql like above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, dml is coming in here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think you're right. Let me take another look.


def find_bigquery_udf(expr):
if isinstance(expr.op(), BigQueryUDFNode):
return lin.halt, expr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will find the topmost BigQueryUDFNodes in the hierarchy. Is it the expected behaviour?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. Fixing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Multiple UDFs now tested and working

(query, rest) = (ast.queries[0], ast.queries[1:])
assert not rest
query, rest = (ast.queries[0], ast.queries[1:])
assert not rest, '*rest should be empty in bigquery.compiler._get_query'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that multi-statement queries are not supported?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm removing this function from this module.

@@ -0,0 +1 @@
from ibis.bigquery.udf.udf import udf # noqa: F401
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid namespaces like that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll rename the udf module to core.

from ibis.bigquery.compiler import compiles, BigQueryUDFNode


ibis_type_to_bigquery_type = Dispatcher('ibis_type_to_bigquery_type')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could reuse this conversion in ibis.bigquery.client. The other backends store the datatype conversion logic in ibis.{backend}.client, which we should factor out (probably) to ibis.{backend}.datatypes in a followup PR.

return textwrap.indent(text, ' ' * spaces)


def udf(input_type, output_type, strict=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving this to ibis.bigquery.udf.api or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

)


class PythonToJavaScriptTranslator:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving this to transpiler.py along with the find utility in find.py? Find is only used during translation (maybe Rewriter too).

ibis/client.py Outdated
@@ -221,15 +218,12 @@ def execute(self, expr, params=None, limit='default', async=False,
Scalar expressions: Python scalar value
"""
ast = self._build_ast_ensure_limit(expr, limit, params=params)
result = self._execute_query(ast, async=async, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ast got confusing with js2py transpiler introduced, we might prefer query_ast here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed.

@@ -683,6 +670,18 @@ def __init__(self, *args, **kwargs):
super(AlchemyContext, self).__init__(*args, **kwargs)
self._table_objects = {}

def collapse(self, queries):
if not isinstance(queries, six.string_types):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the following is more idiomatic:

if isinstance(queries, six.string_types):
    return query

if len(queries) > 1:
    raise 'Multi-statement queries not supported'

return queries[0]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, agree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@kszucs
Copy link
Member

kszucs commented Mar 12, 2018

Time's up for modern frontend dialects!

image

def find_bigquery_udf(expr):
if isinstance(expr.op(), BigQueryUDFNode):
result = expr
return lin.proceed, expr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is not required.

@cpcloud cpcloud force-pushed the bigquery-udfs branch 2 times, most recently from 13b71ee to 96c9ec0 Compare March 16, 2018 15:29
@cpcloud cpcloud force-pushed the bigquery-udfs branch 11 times, most recently from eadf4f8 to ae70d3f Compare April 3, 2018 00:22
@cpcloud cpcloud force-pushed the bigquery-udfs branch 2 times, most recently from ca89eb9 to c7538e6 Compare April 5, 2018 16:08
@cpcloud cpcloud closed this in 8556dea Apr 6, 2018
@cpcloud cpcloud deleted the bigquery-udfs branch April 6, 2018 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants