Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #5266 - Add a separate view for live user_reports #5267

Merged
merged 2 commits into from
Mar 22, 2024

Conversation

ksy36
Copy link
Contributor

@ksy36 ksy36 commented Mar 22, 2024

This is a follow up for #5250.

I would like to create a separate live view mainly for ML classification job using bugbug that is defined in docker-etl repository.

This ETL job runs every 15 minutes and classifies reports gradually throughout the day. That worked well with the live data, but with the switch to historical table that gets updates only once a day, the script will try to classify all reports at once when they appear in the table. Given the size of the job (800-1k reports with non-empty description per day) this will take a long time, and ETL script likely will not get results during a single run. With this approach, by the time a report appears in the historical table, it will be already classified.

Checklist for reviewer:

  • Commits should reference a bug or github issue, if relevant (if a bug is referenced, the pull request should include the bug number in the title).
  • If the PR comes from a fork, trigger integration CI tests by running the Push to upstream workflow and provide the <username>:<branch> of the fork as parameter. The parameter will also show up
    in the logs of the manual-trigger-required-for-fork CI task together with more detailed instructions.
  • If adding a new field to a query, ensure that the schema and dependent downstream schemas have been updated.
  • When adding a new derived dataset, ensure that data is not available already (fully or partially) and recommend extending an existing dataset in favor of creating new ones. Data can be available in the bigquery-etl repository, looker-hub or in looker-spoke-default.

For modifications to schemas in restricted namespaces (see CODEOWNERS):

@ksy36
Copy link
Contributor Author

ksy36 commented Mar 22, 2024

r? @denschub @scholtzan

@scholtzan scholtzan enabled auto-merge (squash) March 22, 2024 19:31
@scholtzan scholtzan disabled auto-merge March 22, 2024 20:21
@scholtzan scholtzan enabled auto-merge (squash) March 22, 2024 20:21
@scholtzan scholtzan merged commit c0a4a85 into mozilla:main Mar 22, 2024
22 checks passed
@ksy36
Copy link
Contributor Author

ksy36 commented Mar 22, 2024

Thanks!

@ksy36 ksy36 deleted the issues/5266/1 branch March 22, 2024 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants