Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harness for deploying and testing (dependent) UDFs #141

Closed
jklukas opened this issue May 16, 2019 · 2 comments
Closed

Harness for deploying and testing (dependent) UDFs #141

jklukas opened this issue May 16, 2019 · 2 comments

Comments

@jklukas
Copy link
Contributor

jklukas commented May 16, 2019

I have previously worked with https://github.com/PeriscopeData/redshift-udfs which provides a structure for defining python UDFs for Redshift along with tests, and a harness for deploying the functions and running tests.

It would be nice to have something similar for this repo. For the BigQuery case, this is made more complicated by the fact that persistent UDFs are not yet generally available.

Here's a potential way forward:

  • Change udf definition files under udf/ to use syntax for creating persistent udfs
  • Write a small python package for parsing through the udf files to determine dependencies (find usages of udf_*) and build a DAG
  • Write a small python package for executing the udf definition files in the order they appear in the DAG
  • Write a test harness that can run creation of the persistent udfs as part of running tests in a generated temporary dataset, then runs tests defined similarly to how we have tests defined for tables
  • Write a small python lib for parsing the files under sql/ and identifying usage of udfs there; we would then inject temporary udf definitions and output a generated directory; this allows us to more reliably use udfs in our production etl queries without having to duplicate them directly into the source files

The above seems like a significant chunk of work, so likely not an immediate priority.

@relud
Copy link
Collaborator

relud commented May 16, 2019

  • Change udf definition files under udf/ to use syntax for creating persistent udfs
    ...
  • Write a small python package for executing the udf definition files in the order they appear in the DAG

We could instead have the dependency-checker do the same for test queries, and prepend them to the query for running tests, in which case it doesn't matter what order they are specified in.

jklukas added a commit that referenced this issue May 22, 2019
This is the first stage for addressing #141

With this change, we parse dependencies between UDFs so that each
UDF sql file can contain just a single UDF definition.
Furthermore, we allow additional SQL statements in the file which
will be treated as tests; they should call ERROR to cause
a test to fail.

Opening as a draft before converting all the content, so that it's easier
to make course corrections based on feedback.
jklukas added a commit that referenced this issue May 22, 2019
This is the first stage for addressing #141

With this change, we parse dependencies between UDFs so that each
UDF sql file can contain just a single UDF definition.
Furthermore, we allow additional SQL statements in the file which
will be treated as tests; they should call ERROR to cause
a test to fail.

Opening as a draft before converting all the content, so that it's easier
to make course corrections based on feedback.
jklukas added a commit that referenced this issue May 22, 2019
This is the first stage for addressing #141

With this change, we parse dependencies between UDFs so that each
UDF sql file can contain just a single UDF definition.
Furthermore, we allow additional SQL statements in the file which
will be treated as tests; they should call ERROR to cause
a test to fail.
jklukas added a commit that referenced this issue May 22, 2019
This is the first stage for addressing #141

With this change, we parse dependencies between UDFs so that each
UDF sql file can contain just a single UDF definition.
Furthermore, we allow additional SQL statements in the file which
will be treated as tests; they should call ERROR to cause
a test to fail.
jklukas added a commit that referenced this issue May 22, 2019
This is the first stage for addressing #141

With this change, we parse dependencies between UDFs so that each
UDF sql file can contain just a single UDF definition.
Furthermore, we allow additional SQL statements in the file which
will be treated as tests; they should call ERROR to cause
a test to fail.
@jklukas
Copy link
Contributor Author

jklukas commented May 28, 2019

We now have a harness for resolving dependencies between UDFs and running tests. I'm going to say that deploying persistent UDFs is probably not a priority right now, so we've met the spirit of this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants