Skip to content

Add additional JobRun stats for blackbox fuzzers#5238

Merged
ViniciustCosta merged 10 commits intomasterfrom
dylanj/jobrun-stats
Apr 15, 2026
Merged

Add additional JobRun stats for blackbox fuzzers#5238
ViniciustCosta merged 10 commits intomasterfrom
dylanj/jobrun-stats

Conversation

@dylanjew
Copy link
Copy Markdown
Collaborator

@dylanjew dylanjew commented Apr 7, 2026

This PR aims to improve our ability to benchmark blackbox fuzzers effectiveness and monitor their health.

Context

When measuring fuzzer effectiveness, we’re interested in the rate at which the fuzzer can execute testcases and find bugs. This requires a baseline definition of total fuzzing hours across all types of fuzzers.

Some of these stats are already computed for Monarch monitoring in monitoring_metrics.py, but we'd like to get them into BigQuery to have a consistent data source for analysis

Changes

Adds testcases_generated, testcase_execution_duration, testcase_generation_duration, and fuzzing_duration to the uworker FuzzTaskOutput proto and write those stats to the JobRun BigQuery table.

Notes

This only adds metrics for blackbox fuzzers. We will need to aggregate the fuzzing session hours and execution metrics we already store in the TestcaseRun tables for engine guided fuzzers if we want comparisons across fuzzer types. This either means we need to aggregate those tables and write to the JobRun tables, or do the aggregation in our plx workflows/scripts.

Testing

  • Ran the server locally, uploaded a fuzzer, and ran a bot locally with additional logs and local bigquery writes enabled. Verified that the stats were added to the FuzzTaskOutput proto and written to BigQuery

logs from running a local bot: https://paste.googleplex.com/4551083230887936
local BigQuery stats: https://paste.googleplex.com/6697329928306688

Added unit tests for the stats

@dylanjew dylanjew force-pushed the dylanj/jobrun-stats branch 3 times, most recently from 4c7b0a9 to 3fe165d Compare April 8, 2026 18:38
@dylanjew dylanjew changed the base branch from master to dylanj/utask-test April 8, 2026 18:40
@dylanjew dylanjew force-pushed the dylanj/jobrun-stats branch from 3fe165d to 4068b54 Compare April 8, 2026 18:45
@dylanjew dylanjew force-pushed the dylanj/utask-test branch from cb69ba1 to ba7b83f Compare April 8, 2026 22:22
Base automatically changed from dylanj/utask-test to master April 9, 2026 15:26
@dylanjew dylanjew force-pushed the dylanj/jobrun-stats branch 8 times, most recently from 33ba946 to 2b259ca Compare April 9, 2026 20:03
@dylanjew dylanjew requested a review from aakallam April 9, 2026 20:04
Copy link
Copy Markdown
Collaborator

@aakallam aakallam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread src/clusterfuzz/_internal/bot/tasks/utasks/fuzz_task.py Outdated
@dylanjew dylanjew force-pushed the dylanj/jobrun-stats branch from a918536 to 4affc41 Compare April 13, 2026 19:06

def _timedelta_to_duration_string(time_delta):
"""Converts a datetime.timedelta to ISO8601 duration string.
BigQuery Load API requires the ISO8601 duration string rather than an INTERVAL
Copy link
Copy Markdown
Collaborator Author

@dylanjew dylanjew Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found an issue while testing in dev and needed to change this to ISO8601 format instead of the SQL INTERVAL format.

I verified the new format by running bq load --source_format=NEWLINE_DELIMITED_JSON 'clusterfuzz-development:ochang_js_fuzzer_stats.temp_JobRun' test_bq.json

And now I can successfully query the table in dev.

Previously I was seeing this error in dev: https://paste.googleplex.com/5295014885326848

@dylanjew
Copy link
Copy Markdown
Collaborator Author

dylanjew commented Apr 14, 2026

I verified that the big query writes succeed locally after the last commit by running: python butler.py run -c ../clusterfuzz-config/configs/chrome-development run_cron --script_args load_bigquery_stats

I do see a couple failures but the don't seem related to the new schema:

Failed load for job_wRvhqDgV4Gm9qQdG0hZuEofDJQte with errors: [{'reason': 'notFound', 'message': 'Not found: Uris gs://bigquery.clusterfuzz-development.appspot.com/metzman_graphicsfuzz_crash_fuzzer/JobRun/date/20260413/c*.json'}])

and

Failed load for job_waQQDdB_5YY1mFRL5wxQXY93thvQ with errors: [{'reason': 'rateLimitExceeded', 'location': 'table.write', 'message': 'Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas'}])

But the cronjob that's running in dev seemed to show different errors that sound like it wasn't using my latest changes which I merged to dev already.

I'm not sure if I need to do anything else to make sure that the cron job is running with these changes too.
https://cloudlogging.app.goo.gl/UN5LA4GC6Pe2hhj28

@dylanjew dylanjew requested a review from aakallam April 14, 2026 16:05
@dylanjew
Copy link
Copy Markdown
Collaborator Author

dylanjew commented Apr 14, 2026

I verified that the big query writes succeed locally after the last commit by running: python butler.py run -c ../clusterfuzz-config/configs/chrome-development run_cron --script_args load_bigquery_stats

I do see a couple failures but the don't seem related to the new schema:

Failed load for job_wRvhqDgV4Gm9qQdG0hZuEofDJQte with errors: [{'reason': 'notFound', 'message': 'Not found: Uris gs://bigquery.clusterfuzz-development.appspot.com/metzman_graphicsfuzz_crash_fuzzer/JobRun/date/20260413/c*.json'}])

and

Failed load for job_waQQDdB_5YY1mFRL5wxQXY93thvQ with errors: [{'reason': 'rateLimitExceeded', 'location': 'table.write', 'message': 'Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas'}])

But the cronjob that's running in dev seemed to show different errors that sound like it wasn't using my latest changes which I merged to dev already.

I'm not sure if I need to do anything else to make sure that the cron job is running with these changes too. https://cloudlogging.app.goo.gl/UN5LA4GC6Pe2hhj28

The cronjob was running an old version of the code, which I found by looking up the clusterfuzz_revision. There aren't any additional steps required to deploy the cronjob with the latest code.

@dylanjew dylanjew marked this pull request as ready for review April 14, 2026 16:23
@dylanjew dylanjew requested a review from a team as a code owner April 14, 2026 16:23
@dylanjew dylanjew requested a review from ViniciustCosta April 14, 2026 16:24
Comment thread src/clusterfuzz/_internal/metrics/fuzzer_stats.py
"""Fuzzer stats exception."""


def _timedelta_to_duration_string(time_delta):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I wonder if there is a standard/common lib to convert this and avoid us having to implement and maintain it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing in datetime, but it looks like we could import isodate and use isodate.duration_isoformat
https://github.com/gweis/isodate

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be good, as this package is tracked in our third party internally: https://source.corp.google.com/piper///depot/google3/third_party/py/isodate/METADATA

Feel free to decide if it's worth it, not a blocker :)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to pull that library in and use that so we don't need to maintain the duration formatting.

I'm struggling to get the dependency added correctly. Is there any documentation for how to do this?
When I try to run pipenv lock I'm seeing a larger diff than I would expect. I suspect there is something wrong with my local clusterfuzz installation.

Could I follow up with that change?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have doc for that (best place to look should be under that "how to do clusterfuzz development" doc).

IIRC it should work by installing with pip and running pipenv lock as you said. The large diff might be due to a lot of packages not having their version pinned, so this also updates them :/

I'll merge this PR and you can try to change this in a follow-up.

@ViniciustCosta ViniciustCosta merged commit 4de5a89 into master Apr 15, 2026
9 of 10 checks passed
@ViniciustCosta ViniciustCosta deleted the dylanj/jobrun-stats branch April 15, 2026 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants