Skip to content
This repository has been archived by the owner on Sep 3, 2022. It is now read-only.

UDF parameters of type STRUCT cannot be parsed #371

Closed
hhagblom opened this issue May 9, 2017 · 3 comments
Closed

UDF parameters of type STRUCT cannot be parsed #371

hhagblom opened this issue May 9, 2017 · 3 comments
Assignees

Comments

@hhagblom
Copy link
Contributor

hhagblom commented May 9, 2017

I have created a UDF query

%%bq udf --name unstackCustomDimensions -l js
// A function that pluses one
// @param x ARRAY<STRUCT<index INT64, value STRING>>
// @returns ARRAY<STRING>
var customDimensions = new Array(30)
for (i = 0; i < customDimensions.length; i++) {
  customDimensions[i] = 'n/a'
}
for (i = 0; i < x.length; i++) {
  customDimensions[x[i].index-1] = x[i].value 
}
return customDimensions;

And get the output

invalidQuery: No matching signature for function JS:UNSTACKCUSTOMDIMENSIONS for argument types: ARRAY<STRUCT<index INT64, value STRING>>. Supported signature: UNSTACKCUSTOMDIMENSIONS() at [11:7]

I was looking at the code, in the file _bigquery.py on line 517 the following pattern is mentioned

param_pattern = r'^\s*\/\/\s*@param\s+([<>\w]+)\s+([<>\w]+)\s*$'

Which is used to parse the incoming parameters when a UDF is defined. The regular expression doesn't seem to recognize any spaces or commas which makes it impossible to define structs.

Some output from my playing around in ipython is shown below. When I use the same UDF in the "old-school" big-query console everything works fine.

In [1]: param_pattern = r'^\s*\/\/\s*@param\s+([<>\w]+)\s+([<>\w]+)\s*$'

re.findall(param_pattern, '//@param test STRUCT<>')
Out[7]: [('test', 'STRUCT<>')]

In [8]: re.findall(param_pattern, '//@param test STRUCT<t INT64>')
Out[8]: []

In [9]: re.findall(param_pattern, '//@param test STRUCT<INT64>')
Out[9]: [('test', 'STRUCT<INT64>')]

In [10]: re.findall(param_pattern, '//@param test STRUCT<INT64,STRING>')
Out[10]: []

In [11]: re.findall(param_pattern, '//@param test ARRAY<STRUCT<INT64,STRING>>')
Out[11]: []

In [12]: re.findall(param_pattern, '//@param test ARRAY<STRUCT<INT64>>')
Out[12]: [('test', 'ARRAY<STRUCT<INT64>>')]

In [13]: param_pattern = r'^\s*\/\/\s*@param\s+([<>\w]+)\s+([<>\w,\s]+)\s*$'

In [14]: re.findall(param_pattern, '//@param test ARRAY<STRUCT<INT64>>')
Out[14]: [('test', 'ARRAY<STRUCT<INT64>>')]

In [15]: re.findall(param_pattern, '//@param test ARRAY<STRUCT<INT64,STRING>>')
Out[15]: [('test', 'ARRAY<STRUCT<INT64,STRING>>')]

In [16]: re.findall(param_pattern, '//@param test ARRAY<STRUCT<index INT64,value STRING>>')
Out[16]: [('test', 'ARRAY<STRUCT<index INT64,value STRING>>')]

The return type regular expression seem to suffer from the same fault.

Thanks for creating an otherwise great product, it really make it easy to explore data :)

@yebrahim
Copy link
Contributor

yebrahim commented May 9, 2017

Thanks for catching that @hhagblom. The regex is very loose anyway, and doesn't rule out all incorrect syntax, so I think it's fine if we allow an arbitrary number of words with the addition of spaces.

Would you like to send a PR with this small fix?

@hhagblom
Copy link
Contributor Author

Hi Yebrahim!

I created a small pull request for this fix #373

I have manually tested this on a running notebook on GCE by hotpatching (more information in pull-request)

Best Regards,
Hans Peter

@yebrahim
Copy link
Contributor

Closing this as it was fixed by #373.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants