Bug 1654496 - Update perfherder to ingest and present test information generated by multi commit builds #6710

ionutgoldan · 2020-08-13T14:23:30Z

Bug link: https://bugzilla.mozilla.org/show_bug.cgi?id=1654496

codecov-commenter · 2020-08-13T14:32:49Z

Codecov Report

Merging #6710 into master will increase coverage by 0.07%.
The diff coverage is 97.08%.

@@            Coverage Diff             @@
##           master    #6710      +/-   ##
==========================================
+ Coverage   88.34%   88.42%   +0.07%     
==========================================
  Files         280      282       +2     
  Lines       12802    12919     +117     
==========================================
+ Hits        11310    11423     +113     
- Misses       1492     1496       +4

Impacted Files	Coverage Δ
...rf/management/commands/remove_multi_commit_data.py	`88.88% <88.88%> (ø)`
tests/etl/test_perf_data_load.py	`98.68% <97.67%> (-1.32%)`	⬇️
treeherder/etl/perf.py	`97.14% <100.00%> (+0.51%)`	⬆️
.../perf/migrations/0033_permit_multi_data_per_job.py	`100.00% <100.00%> (ø)`
treeherder/perf/models.py	`94.63% <100.00%> (+0.08%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ce123c8...b9d0abc. Read the comment docs.

ionutgoldan · 2020-08-24T11:51:44Z

The main implementation is pretty much done.
What puzzles me now is how to prepare a revert plan, in case the patch encounters problems. Because this patch implies a database migration.

treeherder/config/settings.py

airimovici

👍

ionutgoldan · 2020-08-31T07:53:52Z

@sarah-clements as this PR implies database schema changes, I've prepared a backup plan, in case it uncovers any unknown issues.

For doing the deploy, these 3 PRs should land in order:

PR 6731
This PR (as dead code)
PR 6741, which actually enables this PR

The backup plan (in case steps above have problems) is as follows:

Toggle off the PERFHERDER_ENABLE_MULTIDATA_INGESTION feature flag, so we don't ingest any more dirty data.
Use the remove_multi_commit_data script to clean the database of any newly ingested (dirty) data.
Revert this PR.

sarah-clements · 2020-09-01T23:56:50Z

@sarah-clements as this PR implies database schema changes, I've prepared a backup plan, in case it uncovers any unknown issues.

For doing the deploy, these 3 PRs should land in order:
1. [PR 6731](https://github.com/mozilla/treeherder/pull/6731)

2. [This PR](https://github.com/mozilla/treeherder/pull/6710) (as dead code)

3. [PR 6741](https://github.com/mozilla/treeherder/pull/6741), which actually enables this PR
The backup plan (in case steps above have problems) is as follows:

Toggle off the PERFHERDER_ENABLE_MULTIDATA_INGESTION feature flag, so we don't ingest any more dirty data.

This idea of toggling this on/off would only work if it's set up as environment variable, with a default value. So then Cam or I can set a value in the Heroku env settings for each deployment (ex: COMMENTER_API_KEY = env("BUG_COMMENTER_API_KEY", default=None). This would give us more control.

2. Use the `remove_multi_commit_data` script to clean the database of any newly ingested (dirty) data.

Revert this PR.

Will you be testing this on prototype2 first?

tests/etl/test_perf_data_load.py

sarah-clements · 2020-09-02T00:05:07Z

treeherder/perf/models.py

@@ -269,6 +273,15 @@ def __str__(self):
        return "{} {}".format(self.value, self.push_timestamp)


+class MultiCommitDatum(models.Model):


How long are you planning to keep data for this table? We'll need to have expiry of old data in a separate pr.

Edit: Just realized it'll have a foreign key on perf_datum, which is already a very large table. I think we need to have another discussion about retention of perf datum since active_data is the data warehouse for data greater than 4 months.

But I'm not quite following the design of this table. It's one job with multiple commits, right?

How long are you planning to keep data for this table? We'll need to have expiry of old data in a separate pr.

I'll keep this table for no more than 1 month, after which I'll truncate & remove it entirely. Its only purpose is to keep track of the dirty data, in case a revert is needed. With its tracking, I can easily remove the dirty data & revert the database migration.

Edit: Just realized it'll have a foreign key on perf_datum, which is already a very large table. I think we need to have another discussion about retention of perf datum since active_data is the data warehouse for data greater than 4 months.

I already filed an epic on Jira, with 4 subtasks that will make the expiry of perf data more aggressive. But we should sync about the way active_data is used for data greater than 4 months. To see if Perfherder can be adapted to use this approach also.

But I'm not quite following the design of this table. It's one job with multiple commits, right?

Yes, you got that right. It's a (perf) job which has multiple Perfherder-ingestable JSONs in it, each pertaining to a different Fenix revision. These JSONs have embedded the Fenix revision inside them + the timestamp when that revision was committed.

I already filed an epic on Jira, with 4 subtasks that will make the expiry of perf data more aggressive. But we should sync about the way active_data is used for data greater than 4 months. To see if Perfherder can be adapted to use this approach also.

That sounds good however, I don't have any familiarity with active data so you might want to loop in jmaher or ahal, who I believe is taking over the management of it.

ionutgoldan · 2020-09-02T08:03:20Z

[...]
The backup plan (in case steps above have problems) is as follows:

Toggle off the PERFHERDER_ENABLE_MULTIDATA_INGESTION feature flag, so we don't ingest any more dirty data.

This idea of toggling this on/off would only work if it's set up as environment variable, with a default value. So then Cam or I can set a value in the Heroku env settings for each deployment (ex: COMMENTER_API_KEY = env("BUG_COMMENTER_API_KEY", default=None). This would give us more control.

Ok, I'll adapt the PRs to take this into account.

Will you be testing this on prototype2 first?

Yes, this would be an even safer approach.

treeherder/config/settings.py

sarah-clements

This looks OK but I think it'd be good to test it on prototype before merging into master so we know how long the schema changes will take. We might want to plan for the merge to stage and production deploy to happen during a slow time since the perf datum table is rather large.

ionutgoldan · 2020-09-10T11:46:43Z

I just finished doing the deploy on treeherder-prototype2. In total, it took around 7 minutes.
So there shouldn't be concerns that following deploys could take too long.

ionutgoldan · 2020-09-10T13:11:53Z

I've also switched on the feature flag, to enable the new ingestion mechanism. No problems noticed.
Note: the new mechanism is only enabled, but doesn't ingest any new data types, as we haven't turned on our new producers.

sarah-clements · 2020-09-10T17:16:34Z

I just finished doing the deploy on treeherder-prototype2. In total, it took around 7 minutes.
So there shouldn't be concerns that following deploys could take too long.

Well, prototype2 doesn't have that much data. That's why I think prototype would be a better deployment to test it on since it has the typical amount of perf_datum we store :)

ionutgoldan · 2020-09-11T05:39:57Z

@sarah-clements oh, ok. I attempted a similar deploy on treeherder-prototype, but after ~50 minutes, the release failed.
This is the error from the logs:

django.db.utils.OperationalError: (1034, "Incorrect key file for table 'performance_datum'; try to repair it")

Which according to SO, was caused by the disk becoming full, as the database attempted to rebuild the new & huge table constraint.

Now Django has the impression that no migration happened, when in reality we lost the original unique constraint.
As I have extended rights on treeherder-prototypes database, I'm trying to clean it by bringing its schema to a pre-migration state (using raw SQL).

Update: I managed to put the database in its orignal state & deployed back the master on treeherder-prototype.

codecov-io · 2020-10-08T06:11:03Z

Codecov Report

Merging #6710 into master will increase coverage by 0.07%.
The diff coverage is 97.08%.

@@            Coverage Diff             @@
##           master    #6710      +/-   ##
==========================================
+ Coverage   88.10%   88.18%   +0.07%     
==========================================
  Files         282      284       +2     
  Lines       12878    12995     +117     
==========================================
+ Hits        11346    11459     +113     
- Misses       1532     1536       +4

Impacted Files	Coverage Δ
...rf/management/commands/remove_multi_commit_data.py	`88.88% <88.88%> (ø)`
tests/etl/test_perf_data_load.py	`98.68% <97.67%> (-1.32%)`	⬇️
treeherder/etl/perf.py	`97.14% <100.00%> (+0.51%)`	⬆️
.../perf/migrations/0033_permit_multi_data_per_job.py	`100.00% <100.00%> (ø)`
treeherder/perf/models.py	`94.63% <100.00%> (+0.08%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 56f5f51...60c50f5. Read the comment docs.

ionutgoldan added the python label Aug 13, 2020

ionutgoldan force-pushed the ingest-multi-data-job branch 11 times, most recently from e43cedc to 5404488 Compare August 17, 2020 14:19

ionutgoldan added the db-schema-changes This PR will require a Database Schema change label Aug 24, 2020

ionutgoldan marked this pull request as ready for review August 24, 2020 11:52

ionutgoldan requested review from airimovici and esanuandra August 24, 2020 11:52

ionutgoldan marked this pull request as draft August 24, 2020 13:50

ionutgoldan force-pushed the ingest-multi-data-job branch 5 times, most recently from 590c31a to 834ba0b Compare August 26, 2020 10:05

ionutgoldan marked this pull request as ready for review August 26, 2020 14:29

airimovici reviewed Aug 28, 2020

View reviewed changes

treeherder/config/settings.py Outdated Show resolved Hide resolved

ionutgoldan force-pushed the ingest-multi-data-job branch from 22039e1 to 243cbde Compare August 28, 2020 08:57

airimovici mentioned this pull request Aug 28, 2020

Bug 1660780 - Enable the feature flag #6741

Closed

airimovici approved these changes Aug 28, 2020

View reviewed changes

ionutgoldan force-pushed the ingest-multi-data-job branch from 243cbde to d4a2aad Compare August 31, 2020 06:05

sarah-clements mentioned this pull request Sep 1, 2020

Bug 1660779 - Prepare feature flag in settings.py #6731

Merged

sarah-clements reviewed Sep 1, 2020

View reviewed changes

tests/etl/test_perf_data_load.py Outdated Show resolved Hide resolved

sarah-clements reviewed Sep 2, 2020

View reviewed changes

ionutgoldan force-pushed the ingest-multi-data-job branch 5 times, most recently from cfb7b7d to 7e5edbc Compare September 7, 2020 12:25

sarah-clements reviewed Sep 8, 2020

View reviewed changes

treeherder/config/settings.py Outdated Show resolved Hide resolved

sarah-clements approved these changes Sep 8, 2020

View reviewed changes

ionutgoldan force-pushed the ingest-multi-data-job branch 2 times, most recently from 5ba1d5a to 5dcb9c4 Compare September 10, 2020 10:31

ionutgoldan temporarily deployed to treeherder-prototype2 September 10, 2020 10:51 Inactive

ionutgoldan temporarily deployed to treeherder-prototype September 11, 2020 06:30 Inactive

ionutgoldan force-pushed the ingest-multi-data-job branch from 5dcb9c4 to b9d0abc Compare October 6, 2020 13:15

ionutgoldan temporarily deployed to treeherder-prototype October 7, 2020 08:46 Inactive

ionutgoldan added 2 commits October 8, 2020 08:56

Bug 1660779 - Enable multi data ingestion for same perf job

bb27cf8

Bug 1654496 - Address review requests

60c50f5

ionutgoldan force-pushed the ingest-multi-data-job branch from b9d0abc to 60c50f5 Compare October 8, 2020 05:56

ionutgoldan temporarily deployed to treeherder-prototype October 8, 2020 06:15 Inactive

ionutgoldan merged commit 7b863dc into mozilla:master Oct 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug 1654496 - Update perfherder to ingest and present test information generated by multi commit builds #6710

Bug 1654496 - Update perfherder to ingest and present test information generated by multi commit builds #6710

ionutgoldan commented Aug 13, 2020 •

edited

Loading

codecov-commenter commented Aug 13, 2020 •

edited

Loading

ionutgoldan commented Aug 24, 2020

airimovici left a comment

ionutgoldan commented Aug 31, 2020 •

edited

Loading

sarah-clements commented Sep 1, 2020

sarah-clements Sep 2, 2020 •

edited

Loading

ionutgoldan Sep 2, 2020 •

edited

Loading

sarah-clements Sep 8, 2020

ionutgoldan commented Sep 2, 2020

sarah-clements left a comment

ionutgoldan commented Sep 10, 2020 •

edited

Loading

ionutgoldan commented Sep 10, 2020

sarah-clements commented Sep 10, 2020

ionutgoldan commented Sep 11, 2020 •

edited

Loading

codecov-io commented Oct 8, 2020

		@@ -269,6 +273,15 @@ def __str__(self):
		return "{} {}".format(self.value, self.push_timestamp)


		class MultiCommitDatum(models.Model):

Bug 1654496 - Update perfherder to ingest and present test information generated by multi commit builds #6710

Bug 1654496 - Update perfherder to ingest and present test information generated by multi commit builds #6710

Conversation

ionutgoldan commented Aug 13, 2020 • edited Loading

codecov-commenter commented Aug 13, 2020 • edited Loading

Codecov Report

ionutgoldan commented Aug 24, 2020

airimovici left a comment

Choose a reason for hiding this comment

ionutgoldan commented Aug 31, 2020 • edited Loading

sarah-clements commented Sep 1, 2020

sarah-clements Sep 2, 2020 • edited Loading

Choose a reason for hiding this comment

ionutgoldan Sep 2, 2020 • edited Loading

Choose a reason for hiding this comment

sarah-clements Sep 8, 2020

Choose a reason for hiding this comment

ionutgoldan commented Sep 2, 2020

sarah-clements left a comment

Choose a reason for hiding this comment

ionutgoldan commented Sep 10, 2020 • edited Loading

ionutgoldan commented Sep 10, 2020

sarah-clements commented Sep 10, 2020

ionutgoldan commented Sep 11, 2020 • edited Loading

codecov-io commented Oct 8, 2020

Codecov Report

ionutgoldan commented Aug 13, 2020 •

edited

Loading

codecov-commenter commented Aug 13, 2020 •

edited

Loading

ionutgoldan commented Aug 31, 2020 •

edited

Loading

sarah-clements Sep 2, 2020 •

edited

Loading

ionutgoldan Sep 2, 2020 •

edited

Loading

ionutgoldan commented Sep 10, 2020 •

edited

Loading

ionutgoldan commented Sep 11, 2020 •

edited

Loading