Skip to content

Comments

WIP - Alert filing#7076

Closed
airimovici wants to merge 6 commits intomozilla:masterfrom
gmierz:alert_filing
Closed

WIP - Alert filing#7076
airimovici wants to merge 6 commits intomozilla:masterfrom
gmierz:alert_filing

Conversation

@airimovici
Copy link
Contributor

@gmierz I created this PR to easily compare the changes and leave some notes

def _handle_backfill_alerts(self):
# Get backfilled alerts
bugzilla = BugzillaHelper()
backfilled_records = BackfillRecord.objects.select_related(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can extract this as a method of BackfillRecord.

return

@staticmethod
def recompute_backfill_alert(record: BackfillRecord) -> Push:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type hint for the output isn't correct.

else:
logger.info(f"Change not found here {prev} vs. {cur}")

if not first_changed:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what to say about this. So far we've only marked invalid alerts manually.
Trying to automate this can be risky & end up hiding real regressions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I wouldn't go that far.

# TODO: get job_type from record when soft launch lands ---> job_type = record.job_type
job_type = get_job_type(record)
from_time, to_time = self._get_push_timestamp_range(record.get_context())
from_time, to_time = Helper.get_push_timestamp_range(record.get_context())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to refactor OutcomeChecker, as we already covered these exact methods in PR 7070.


# Check if this push has an alert summary already. If it does, reassign
# the alerts to it, delete this summary.
existing_summary = PerformanceAlertSummary.objects.filter(push=cur_push)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PerformanceAlertSummary.objects.filter(push=cur_push).first() gets the first one or None if there is no result after filtering

@alexandru-io
Copy link
Contributor

alexandru-io commented Apr 2, 2021

The idea of this bot is awesome, but is pretty tricky and complex to implement. I am saying this from the experience of developing the automated bug filling from the alerts view.
I would suggest as small sprints of development&testing as possible. This bot should take care of the code limitations that are currently present (and they are not few) and the current sheriffing workflow. Even if this is automatic, its purpose is to automate what the sheriffs do, thus it should respect this workflow.
Another thing I would suggest is to use some sort of defensive approach: if the bot picks up an alerts and looses itself in scenarios, it should be able to abandon it and leave the "manual sheriffs" to take care of it further. Perhaps if the backfill was finished but there's too much noise, it shouldn't risk opening a regression bug with low confidence, but to leave a note like "backfilled with depth x and y retriggers but there's too much noise to confidently identify the culprit". Even if it doesn't file the bug, there would already be a big win for sheriffing.
A last thought is to release this bot gradually. Build metrics and awsy alerts are generally the most clean. Some build metrics test are even running at every revision, so would be the best production scenario to test the automated regression filling. Awsy are very suitable to trigger the backfill and then easily find the culprit. But if the culprit revision contains patches from several bugs in different products&components, this should be abandoned to the care of the sheriff in the first release(s) of the bot. Some talos tests are clean and reasonably easy to sheriff, while other are pretty noisy. then, raptor and browsertime tests are the most noisy, I would release the bot to sheriff them only when we have steady results in sheriffing the cleaner tests like AWSY and build metrics.

@jmaher
Copy link
Collaborator

jmaher commented Jul 18, 2022

is there planned work on this PR? I am trying to close out old PRs or ensure they have a valid project and will be worked on in the near future.

@gmierz
Copy link
Collaborator

gmierz commented Jul 20, 2022

@jmaher you can close it. We can find it in the closed PRs if we need it again.

@jmaher jmaher closed this Jul 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants