Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1654496 - Update perfherder to ingest and present test information generated by multi commit builds #6710

Merged
merged 2 commits into from
Oct 8, 2020

Conversation

ionutgoldan
Copy link
Contributor

@ionutgoldan ionutgoldan commented Aug 13, 2020

@codecov-commenter
Copy link

codecov-commenter commented Aug 13, 2020

Codecov Report

Merging #6710 into master will increase coverage by 0.07%.
The diff coverage is 97.08%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6710      +/-   ##
==========================================
+ Coverage   88.34%   88.42%   +0.07%     
==========================================
  Files         280      282       +2     
  Lines       12802    12919     +117     
==========================================
+ Hits        11310    11423     +113     
- Misses       1492     1496       +4     
Impacted Files Coverage Δ
...rf/management/commands/remove_multi_commit_data.py 88.88% <88.88%> (ø)
tests/etl/test_perf_data_load.py 98.68% <97.67%> (-1.32%) ⬇️
treeherder/etl/perf.py 97.14% <100.00%> (+0.51%) ⬆️
.../perf/migrations/0033_permit_multi_data_per_job.py 100.00% <100.00%> (ø)
treeherder/perf/models.py 94.63% <100.00%> (+0.08%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ce123c8...b9d0abc. Read the comment docs.

@ionutgoldan ionutgoldan force-pushed the ingest-multi-data-job branch 11 times, most recently from e43cedc to 5404488 Compare August 17, 2020 14:19
@ionutgoldan
Copy link
Contributor Author

The main implementation is pretty much done.
What puzzles me now is how to prepare a revert plan, in case the patch encounters problems. Because this patch implies a database migration.

@ionutgoldan ionutgoldan added the db-schema-changes This PR will require a Database Schema change label Aug 24, 2020
@ionutgoldan ionutgoldan marked this pull request as ready for review August 24, 2020 11:52
@ionutgoldan ionutgoldan marked this pull request as draft August 24, 2020 13:50
@ionutgoldan ionutgoldan force-pushed the ingest-multi-data-job branch 5 times, most recently from 590c31a to 834ba0b Compare August 26, 2020 10:05
@ionutgoldan ionutgoldan marked this pull request as ready for review August 26, 2020 14:29
Copy link
Contributor

@airimovici airimovici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ionutgoldan
Copy link
Contributor Author

ionutgoldan commented Aug 31, 2020

@sarah-clements as this PR implies database schema changes, I've prepared a backup plan, in case it uncovers any unknown issues.

For doing the deploy, these 3 PRs should land in order:

  1. PR 6731
  2. This PR (as dead code)
  3. PR 6741, which actually enables this PR

The backup plan (in case steps above have problems) is as follows:

  1. Toggle off the PERFHERDER_ENABLE_MULTIDATA_INGESTION feature flag, so we don't ingest any more dirty data.
  2. Use the remove_multi_commit_data script to clean the database of any newly ingested (dirty) data.
  3. Revert this PR.

@sarah-clements
Copy link
Contributor

@sarah-clements as this PR implies database schema changes, I've prepared a backup plan, in case it uncovers any unknown issues.

For doing the deploy, these 3 PRs should land in order:

1. [PR 6731](https://github.com/mozilla/treeherder/pull/6731)

2. [This PR](https://github.com/mozilla/treeherder/pull/6710) (as dead code)

3. [PR 6741](https://github.com/mozilla/treeherder/pull/6741), which actually enables this PR

The backup plan (in case steps above have problems) is as follows:

  1. Toggle off the PERFHERDER_ENABLE_MULTIDATA_INGESTION feature flag, so we don't ingest any more dirty data.

This idea of toggling this on/off would only work if it's set up as environment variable, with a default value. So then Cam or I can set a value in the Heroku env settings for each deployment (ex: COMMENTER_API_KEY = env("BUG_COMMENTER_API_KEY", default=None). This would give us more control.

2. Use the `remove_multi_commit_data` script to clean the database of any newly ingested (dirty) data.
  1. Revert this PR.

Will you be testing this on prototype2 first?

@@ -269,6 +273,15 @@ def __str__(self):
return "{} {}".format(self.value, self.push_timestamp)


class MultiCommitDatum(models.Model):
Copy link
Contributor

@sarah-clements sarah-clements Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long are you planning to keep data for this table? We'll need to have expiry of old data in a separate pr.

Edit: Just realized it'll have a foreign key on perf_datum, which is already a very large table. I think we need to have another discussion about retention of perf datum since active_data is the data warehouse for data greater than 4 months.

But I'm not quite following the design of this table. It's one job with multiple commits, right?

Copy link
Contributor Author

@ionutgoldan ionutgoldan Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long are you planning to keep data for this table? We'll need to have expiry of old data in a separate pr.

I'll keep this table for no more than 1 month, after which I'll truncate & remove it entirely. Its only purpose is to keep track of the dirty data, in case a revert is needed. With its tracking, I can easily remove the dirty data & revert the database migration.

Edit: Just realized it'll have a foreign key on perf_datum, which is already a very large table. I think we need to have another discussion about retention of perf datum since active_data is the data warehouse for data greater than 4 months.

I already filed an epic on Jira, with 4 subtasks that will make the expiry of perf data more aggressive. But we should sync about the way active_data is used for data greater than 4 months. To see if Perfherder can be adapted to use this approach also.

But I'm not quite following the design of this table. It's one job with multiple commits, right?

Yes, you got that right. It's a (perf) job which has multiple Perfherder-ingestable JSONs in it, each pertaining to a different Fenix revision. These JSONs have embedded the Fenix revision inside them + the timestamp when that revision was committed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already filed an epic on Jira, with 4 subtasks that will make the expiry of perf data more aggressive. But we should sync about the way active_data is used for data greater than 4 months. To see if Perfherder can be adapted to use this approach also.

That sounds good however, I don't have any familiarity with active data so you might want to loop in jmaher or ahal, who I believe is taking over the management of it.

@ionutgoldan
Copy link
Contributor Author

[...]
The backup plan (in case steps above have problems) is as follows:

  1. Toggle off the PERFHERDER_ENABLE_MULTIDATA_INGESTION feature flag, so we don't ingest any more dirty data.

This idea of toggling this on/off would only work if it's set up as environment variable, with a default value. So then Cam or I can set a value in the Heroku env settings for each deployment (ex: COMMENTER_API_KEY = env("BUG_COMMENTER_API_KEY", default=None). This would give us more control.

Ok, I'll adapt the PRs to take this into account.

Will you be testing this on prototype2 first?

Yes, this would be an even safer approach.

@ionutgoldan ionutgoldan force-pushed the ingest-multi-data-job branch 5 times, most recently from cfb7b7d to 7e5edbc Compare September 7, 2020 12:25
Copy link
Contributor

@sarah-clements sarah-clements left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks OK but I think it'd be good to test it on prototype before merging into master so we know how long the schema changes will take. We might want to plan for the merge to stage and production deploy to happen during a slow time since the perf datum table is rather large.

@ionutgoldan ionutgoldan force-pushed the ingest-multi-data-job branch 2 times, most recently from 5ba1d5a to 5dcb9c4 Compare September 10, 2020 10:31
@ionutgoldan ionutgoldan temporarily deployed to treeherder-prototype2 September 10, 2020 10:51 Inactive
@ionutgoldan
Copy link
Contributor Author

ionutgoldan commented Sep 10, 2020

I just finished doing the deploy on treeherder-prototype2. In total, it took around 7 minutes.
So there shouldn't be concerns that following deploys could take too long.

@ionutgoldan
Copy link
Contributor Author

I've also switched on the feature flag, to enable the new ingestion mechanism. No problems noticed.
Note: the new mechanism is only enabled, but doesn't ingest any new data types, as we haven't turned on our new producers.

@sarah-clements
Copy link
Contributor

I just finished doing the deploy on treeherder-prototype2. In total, it took around 7 minutes.
So there shouldn't be concerns that following deploys could take too long.

Well, prototype2 doesn't have that much data. That's why I think prototype would be a better deployment to test it on since it has the typical amount of perf_datum we store :)

@ionutgoldan
Copy link
Contributor Author

ionutgoldan commented Sep 11, 2020

@sarah-clements oh, ok. I attempted a similar deploy on treeherder-prototype, but after ~50 minutes, the release failed.
This is the error from the logs:

django.db.utils.OperationalError: (1034, "Incorrect key file for table 'performance_datum'; try to repair it")

Which according to SO, was caused by the disk becoming full, as the database attempted to rebuild the new & huge table constraint.

Now Django has the impression that no migration happened, when in reality we lost the original unique constraint.
As I have extended rights on treeherder-prototypes database, I'm trying to clean it by bringing its schema to a pre-migration state (using raw SQL).

Update: I managed to put the database in its orignal state & deployed back the master on treeherder-prototype.

@ionutgoldan ionutgoldan temporarily deployed to treeherder-prototype September 11, 2020 06:30 Inactive
@ionutgoldan ionutgoldan temporarily deployed to treeherder-prototype October 7, 2020 08:46 Inactive
@codecov-io
Copy link

Codecov Report

Merging #6710 into master will increase coverage by 0.07%.
The diff coverage is 97.08%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6710      +/-   ##
==========================================
+ Coverage   88.10%   88.18%   +0.07%     
==========================================
  Files         282      284       +2     
  Lines       12878    12995     +117     
==========================================
+ Hits        11346    11459     +113     
- Misses       1532     1536       +4     
Impacted Files Coverage Δ
...rf/management/commands/remove_multi_commit_data.py 88.88% <88.88%> (ø)
tests/etl/test_perf_data_load.py 98.68% <97.67%> (-1.32%) ⬇️
treeherder/etl/perf.py 97.14% <100.00%> (+0.51%) ⬆️
.../perf/migrations/0033_permit_multi_data_per_job.py 100.00% <100.00%> (ø)
treeherder/perf/models.py 94.63% <100.00%> (+0.08%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 56f5f51...60c50f5. Read the comment docs.

@ionutgoldan ionutgoldan temporarily deployed to treeherder-prototype October 8, 2020 06:15 Inactive
@ionutgoldan ionutgoldan merged commit 7b863dc into mozilla:master Oct 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
db-schema-changes This PR will require a Database Schema change python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants