Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we apply events_stream performace improvements to GLAM ETL? #2840

Closed
edugfilho opened this issue May 30, 2024 · 1 comment · Fixed by mozilla/bigquery-etl#5697
Closed
Assignees

Comments

@edugfilho
Copy link
Collaborator

edugfilho commented May 30, 2024

Following @BenWu 's improvement to events_stream's queries in mozilla/bigquery-etl#5659 we should see if we can:

  • Replace JS UDFs with SQL. JS scales very poorly with the size of the input, so we might have some low hanging fruits there performance-wise.
  • Move computations before cross joins so they process less data (where possible)
@edugfilho edugfilho self-assigned this May 30, 2024
@edugfilho edugfilho changed the title Rewrite UDFs written in JS to SQL Can we apply events_stream performace improvements to GLAM ETL? May 30, 2024
@BenWu
Copy link

BenWu commented May 30, 2024

Some notes re:js udfs, it would be good to see the maximum improvement we can get from replacing the js udfs before committing too much time into it. One quick way is to just replace the udfs with stub udfs that do very little and see how much slot time reduces.

From a quick look at mozfun.glam, the js udfs are:

  • histogram_cast_json
  • histogram_generate_exponential_buckets
  • histogram_generate_functional_buckets
  • histogram_generate_linear_buckets
  • histogram_generate_scalar_buckets
  • percentile

Looking at where these are used, I wouldn't expect a huge improvement from replacing these but I think it's low-effort enough to try.

I created a sample pr with the three of the fairly trivial conversions at mozilla/bigquery-etl#5689

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: To assign
Development

Successfully merging a pull request may close this issue.

2 participants