Add a script to generate signatures from crash pings #15

staktrace · 2020-08-24T16:41:21Z

This is the cleaned-up work we were discussing in #11

Also add some more detailed error logging on errors returned by tecken, so that we can diagnose errors and make fixes accordingly.

staktrace · 2020-08-24T16:42:52Z

bigquery-etl.py

+  normalized_channel="nightly"
+  AND DATE(submission_timestamp)="{date}"
+  AND application.build_id > FORMAT_DATE("%Y%m%d", DATE_SUB(DATE "{date}", INTERVAL 1 WEEK))


This query might need a bit of refining, I guess - ideally we want to generate signatures for all the crash pings, not restricted to the nightly channel, and not restricting the buildids. Assuming the airflow thing runs once a day on the previous day's submissions and no new submissions come in after the airflow task runs, we should just need the submission_timestamp filter here.

Hmm, not sure-- from a BigQuery perspective this is reasonable, but the number of crashes on release is very large (1.2-1.7 million a day) so might be overwhelming for tecken to process. Would a 1% sample of release be acceptable? That would bring the quantity down to 17,000 a day, which seems more manageable. @willkg wdyt?

I need to spend some time looking at how Tecken handled that symbolication batch on 8/20. Did it cache a lot? Did it cause problems for other parts of the service?

Short-to-medium term, I want to redo how Tecken does symbolication so it's doing it on a dedicated cluster and thus symbolication outages don't affect symbol uploads.

So my current gut feeling is that we should:

pick a small sample for now that Tecken can handle

increase the sample in the future after I make Tecken changes and/or we figure out a better way to do symbolication

I'll look into how Tecken performed today and get back to you.

Ok, in that case maybe we should start with just nightly crashes as it's valuable to see changes in the top crashers on a daily basis. And the volume is low enough that hopefully it shouldn't be too much of a problem.

I had a long talk about Tecken today with Brian.

Here's what we're thinking:

I'm in the process of redoing symbolication in Tecken. When that's done, it should work better for bigquery-etl. I'm planning to use this job to load test. I don't have an ETA--I'll probably know more next week as I flesh things out. The work is being done in this bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1636210

This bigquery-etl job does a burst of symbolication requests which doesn't give Tecken enough time to scale up. It'd be better if somehow it could ramp up the rate of requests over 5 minutes.

We should keep the sample rate small for now. We can probably play with that number to see how high we can get it, if you want.

willkg · 2020-08-24T19:18:17Z

I can review this tomorrow--idiomatic Python, signature generation, and symbolication.

bigquery-etl.py

wlach · 2020-08-24T18:55:50Z

bigquery-etl.py

+  normalized_channel="nightly"
+  AND DATE(submission_timestamp)="{date}"
+  AND application.build_id > FORMAT_DATE("%Y%m%d", DATE_SUB(DATE "{date}", INTERVAL 1 WEEK))


Hmm, not sure-- from a BigQuery perspective this is reasonable, but the number of crashes on release is very large (1.2-1.7 million a day) so might be overwhelming for tecken to process. Would a 1% sample of release be acceptable? That would bring the quantity down to 17,000 a day, which seems more manageable. @willkg wdyt?

wlach · 2020-08-24T18:57:55Z

bigquery-etl.py

+        if sig is None or len(sig.signature) == 0:
+            print(f"Error computing signature for {doc_id}", file=sys.stderr)
+            continue
+        print(f'{doc_id},"{sig.signature}"')


This is going to be excessively long if we process anything like a large number of crashes. We should probably use the logging module here, instead of print. I can take care of this later though

fx_crash_sig/symbolicate.py

wlach · 2020-08-24T20:04:22Z

I can review this tomorrow--idiomatic Python, signature generation, and symbolication.

👍 I was halfway through a review so finished it up, I may have missed some things though.

This is likely from python version migration.

This script generates a CSV file of the format "document_id, crash_signature" by pulling from the `telemetry.crash` table in BigQuery. The script parallelizes operations internally to reduce total wall-clock time.

staktrace · 2020-08-26T13:36:40Z

Updated patches to address review comments thus far.

wlach

Thanks @staktrace! Let's land this as-is, we can make followups as needed

wlach and others added 4 commits August 24, 2020 10:36

Remove some unnecessary python2 compat shim

1410909

Handle missing data in crash pings

f57f974

Add CrashProcessor::get_signatures_multi to batch process payloads

9974d11

Send error logging to stderr

7e71c83

Also add some more detailed error logging on errors returned by tecken, so that we can diagnose errors and make fixes accordingly.

staktrace commented Aug 24, 2020

View reviewed changes

staktrace mentioned this pull request Aug 24, 2020

WIP: First draft of integrating this with BigQuery #11

Closed

3 tasks

wlach suggested changes Aug 24, 2020

View reviewed changes

staktrace and others added 2 commits August 25, 2020 13:56

Avoid errors appending ints to strings

9d2986c

This is likely from python version migration.

Add a script for generating signatures from crash pings.

d87d07a

This script generates a CSV file of the format "document_id, crash_signature" by pulling from the `telemetry.crash` table in BigQuery. The script parallelizes operations internally to reduce total wall-clock time.

staktrace force-pushed the bigquery branch from 20b0846 to d87d07a Compare August 26, 2020 13:35

wlach approved these changes Aug 26, 2020

View reviewed changes

wlach merged commit ef9db02 into mozilla-services:master Aug 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a script to generate signatures from crash pings #15

Add a script to generate signatures from crash pings #15

staktrace commented Aug 24, 2020

staktrace Aug 24, 2020

wlach Aug 24, 2020

willkg Aug 26, 2020

staktrace Aug 26, 2020

willkg Aug 26, 2020

willkg commented Aug 24, 2020

wlach Aug 24, 2020

wlach Aug 24, 2020

wlach commented Aug 24, 2020

staktrace commented Aug 26, 2020

wlach left a comment

Add a script to generate signatures from crash pings #15

Add a script to generate signatures from crash pings #15

Conversation

staktrace commented Aug 24, 2020

staktrace Aug 24, 2020

Choose a reason for hiding this comment

wlach Aug 24, 2020

Choose a reason for hiding this comment

willkg Aug 26, 2020

Choose a reason for hiding this comment

staktrace Aug 26, 2020

Choose a reason for hiding this comment

willkg Aug 26, 2020

Choose a reason for hiding this comment

willkg commented Aug 24, 2020

wlach Aug 24, 2020

Choose a reason for hiding this comment

wlach Aug 24, 2020

Choose a reason for hiding this comment

wlach commented Aug 24, 2020

staktrace commented Aug 26, 2020

wlach left a comment

Choose a reason for hiding this comment