Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1610616 - Create bugs for expiring probes #181

Merged
merged 15 commits into from
Apr 29, 2020

Conversation

BenWu
Copy link
Contributor

@BenWu BenWu commented Apr 2, 2020

https://bugzilla.mozilla.org/show_bug.cgi?id=1610616

Here are some example bugs:

This will create a bug in toolkit -> telemetry with the whiteboard probe-expiry-alert and won't close the bug if the probe is no longer expiring

@chutten any additional bug metadata to add that would make triage easier?

@BenWu BenWu requested a review from fbertsch April 2, 2020 19:54
Copy link
Contributor

@fbertsch fbertsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! We just need to ensure that alerts are triaged and heading to the right places.

}

found_bugs = search_bugs(version, request_header)
print(f"Found {len(found_bugs)} previously created bugs for version {version}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the switch from logging to print? Will these show up in the stackdriver logs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print will show up in logs since every run right now starts with Unable to parse whitelist ... which is created with print. logging wasn't showing up before

probes_by_email_by_state[email]["expiring"].append(name)
def search_bugs(version, request_header):
search_query_params = {
"summary": BUG_SUMMARY_TEMPLATE.format(version=version, probe=""),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just search for the whiteboard tag? Might make this more resilient.



def create_bug(probe_name, version, emails, request_header):
description_notes = ("No emails associated with the probe have a "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indicates to me that triaging this list will be more difficult than we intended. I believe many of the emails are not valid bugzilla users.

How about, any email that is not a valid bugzilla email gets an email with a link to the bug. That will hopefully alleviate some of the triaging pain.



if __name__ == "__main__":
args = parse_args()
main(datetime.date.today(), args.dry_run)
main(datetime.date.today(), args.dry_run, args.bugzilla_api_key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has now occurred to me that we should be using the date sent from Airflow for this check. That way if the Wednesday run fails, we will still file them on Thursday, when that task re-runs.

@@ -41,7 +43,9 @@
[1] https://wiki.mozilla.org/Release_Management/Calendar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh, could we take the opportunity to instead link to the #telemetry:mozilla.org Matrix channel? Its url is https://chat.mozilla.org/#/room/#telemetry:mozilla.org


{probes}
BUG_DESCRIPTION_TEMPLATE = """
The following Firefox probe will expire in the next major Firefox nightly release: version {version} [1].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there going to be one bug per probe? Could we group them for the case where a bunch of devtools probes all expire on the same version? (( the way to group could be by component of the most recent bug in the bug_numbers, by shared prefix (DEVTOOLS_*, FX_*, etc. Scalars and Events have categories to make this easier), or by overlapping notification emails sets ))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a great suggestion. We could automate it if the prefix is up to an underscore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This algorithm is a technical interview question to a tee.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider that a gift to you : )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grouping by component and email set sounds good. Probe names seems a little all over the place so I don't think prefixes will work consistently. For example the last set of expiring probes had:

security.javascriptLoad#javascriptLoad
STARTUP_CACHE_REQUESTS
WEBEXT_USER_SCRIPT_INJECTION_MS_BY_ADDONID
TOUCHBAR_BUTTON_PRESSES
QM_REPOSITORIES_INITIALIZATION_TIME_V2
webrtc.sdp.parser_diff
startup.profile_selection_reason
script.preloader.mainthread_recompile
networking.ftp_opened_channels_listings

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, we should just be able to group by email, right? What if we create a bug for each email instead of each probe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how email alerts are working now but if there are multiple emails in a set of related probes it would be better to just make one bug and needinfo all the emails.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. In that case let's make a one-off that finds all probes without a valid bugzilla email and we go and require that those teams add a valid bugzilla email. We can perhaps add a check as well that verifies this moving forward. What do you think @ben-wu @chutten?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked all the probes active in nightly >= 76 and got this:

Total valid emails: 34 / 66
Probes with valid accounts by type:
scalar: 49 / 97
event: 0 / 22
histogram: 78 / 124

About half the probes have valid bugzilla accounts. I'm worried that it would be too prohibitive to require a valid account. Most of the invalid accounts are either email lists or email aliases. I think that sending an email with the bug link to emails that don't have a bugzilla account is sufficient and definitely better than what we currently have. So we're sending an email and creating a bug that will hopefully be triaged. I'm sure @chutten will have more thoughts on this though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefixes don't have to work consistently, it'll just add a little pressure for people to organize their histogram names : )

I think creating a bug

  • per category/prefix (trying to keep the number of bugs filed down)
  • in the component of the bug_number with the highest id (default to Firefox::General)
  • with needinfos for the accounts there are
  • and emails with links to the ones there aren't

is the optimal play for now. At the very least an unprioritized bug in a vaguely-relevant component will get the triage owners to triage it.


create_params = {
"product": "Toolkit",
"component": "Telemetry",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should file a bug in the component that the most recent bug from the bug_numbers list was filed in, or Firefox::General. Toolkit::Telemetry should not contain bugs about probes unless the probes are measuring the Telemetry subsystem itself.

probe_scraper/probe_expiry_alert.py Show resolved Hide resolved
Copy link
Contributor

@wlach wlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we somehow include these bugs in the probe expiry emails that I presume will still be sent? I worry that people will still assume they need to file bugs if they see their probes there.

A further improvement which might be worth considering at some point would be to file a metabug per release on which all these bugs would depend, that might be handy for release managers or others to look at.

BASE_URI = "https://hg.mozilla.org/mozilla-central/raw-file/tip/toolkit/components/telemetry/"
HISTOGRAMS_FILE = "Histograms.json"
SCALARS_FILE = "Scalars.yaml"
EVENTS_FILE = "Events.yaml"

EMAIL_BODY_FORMAT_STRING = """
The following Firefox probes have either expired or will expire in the next major Firefox nightly release: version {next_version} [1].
BUG_WHITEBOARD_TAG = "probe-expiry-alert"
Copy link
Contributor

@wlach wlach Apr 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BUG_WHITEBOARD_TAG = "probe-expiry-alert"
BUG_WHITEBOARD_TAG = "[probe-expiry-alert]"

Drive by: I think we generally use square brackets around whiteboard tags which makes them more unique (e.g. probe-expiry-alert would also catch probe-expiry-alerts-on-sundays)

@BenWu
Copy link
Contributor Author

BenWu commented Apr 8, 2020

Each email will include a link to the bugs for the probes in the email. But emails will only be sent to emails that aren't associated with a bugzilla account, otherwise it will needinfo the account instead of sending an email.

@wlach
Copy link
Contributor

wlach commented Apr 8, 2020

Each email will include a link to the bugs for the probes in the email. But emails will only be sent to emails that aren't associated with a bugzilla account, otherwise it will needinfo the account instead of sending an email.

I think that makes sense, hopefully the whiteboard tag will help us track unaddressed probe expiry over time.

@BenWu
Copy link
Contributor Author

BenWu commented Apr 9, 2020

This took longer than expected; I rewrote almost the entire file.

Here are some examples: https://bugzilla-dev.allizom.org/buglist.cgi?status_whiteboard=[probe-expiry-alert]

So now single bug is created for multiple probes that are in the same component and have the same set of emails. The name of bug does is a best effort longest prefix among the probe names which if you look at the examples in the link may result in bugs with the same summary which is not great. A fix for this would be to just use the first probe name or to not group by emails (one bug per component).

Also emails are sent to emails that do not have a bugzilla account but only when bugs are filed, so once per release cycle.

@BenWu BenWu requested review from chutten and fbertsch April 9, 2020 00:37
Copy link
Contributor

@chutten chutten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the examples and the behaviour you described. I think this'll help keep the bug volumes down to a reasonable and actionable level.

probe_scraper/probe_expiry_alert.py Show resolved Hide resolved
@BenWu BenWu force-pushed the benwu/bugs-4-expiring-probes branch from 0aa3d64 to 9db1619 Compare April 29, 2020 15:36
Copy link
Contributor

@fbertsch fbertsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Feels like magic that it ends up in the right component.


What to do about this:
1. If one, some, or all of the metrics are no longer needed, please remove them from their definitions files (Histograms.json, Scalars.yaml, Events.yaml).
2. If one, some, or all of the metrics are still required, please submit a Data Collection Review [2] and patch to extend their expiry.

If you have any problems, please ask for help on the #fx-metrics Slack channel. We'll give you a hand.
If you have any problems, please ask for help on the #fx-metrics Slack channel or the #telemetry Matrix room at https://chat.mozilla.org/#/room/#telemetry:mozilla.org. We'll give you a hand.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking it over, let's also mention that it's easier to extend collection now, with the shorter form.

values = [v.ljust(longest_value_length) for v in values]

prefix_length = 0
for c in zip(*values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@BenWu BenWu merged commit c84efa6 into master Apr 29, 2020
@BenWu BenWu deleted the benwu/bugs-4-expiring-probes branch April 29, 2020 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants