Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add run tracker option for versioning stats #7481

Merged
merged 6 commits into from Apr 4, 2019

Conversation

Projects
None yet
3 participants
@codealchemy
Copy link
Contributor

commented Apr 2, 2019

Prior / related work: #7475, #7474, #7446, #7392

Adds an option to the RunTracker (defaulting to current behavior) to permit selecting which json schema for recording stats.

V1

stats = {
'run_info': self.run_information(),
'cumulative_timings': self.cumulative_timings.get_all(),
'self_timings': self.self_timings.get_all(),
'critical_path_timings': self.get_critical_path_timings().get_all(),
'artifact_cache_stats': self.artifact_cache_stats.get_all(),
'pantsd_stats': self.pantsd_stats.get_all(),
'outcomes': self.outcomes,
'recorded_options': self._get_options_to_record(),
}

V2

json.dumps({
'workunits': self._results,
'artifact_cache_stats': self.run_tracker.artifact_cache_stats.get_all(),
'pantsd_stats': self.run_tracker.pantsd_stats.get_all(),
'run_info': self.run_tracker.run_information(),
}),

@jsirois

jsirois approved these changes Apr 3, 2019

@stuhood
Copy link
Member

left a comment

FTR: we rely heavily on these stats, and store them in the following SQL schema:

        Column         |                                                                                                               Type                                                                                                                |     Extra     |           Comment
-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------------------------
 run_info              | row(id varchar, build_run_at bigint, machine varchar, user_ varchar, cmd_line varchar, outcome varchar, git_path varchar, git_tag varchar, git_branch varchar, git_revision varchar, default_report varchar, ci_job_name varchar) |               |
 self_timings          | array(row(duration bigint, is_tool boolean, label varchar))                                                                                                                                                                       |               |
 cumulative_timings    | array(row(duration bigint, is_tool boolean, label varchar))                                                                                                                                                                       |               |
 artifact_cache_stats  | array(row(cache_name varchar, hits array(varchar), misses array(varchar), hit_details map(varchar, varchar), miss_details map(varchar, varchar)))                                                                                 |               |
 pantsd_stats          | row(graph_preceding_size bigint, graph_resulting_size bigint, affected_file_count bigint)                                                                                                                                         |               |
 critical_path_timings | array(row(duration bigint, is_tool boolean, label varchar))                                                                                                                                                                       |               |
 run_options           | row(explicit_options map(varchar, map(varchar, array(varchar))), all_options map(varchar, map(varchar, array(varchar))))                                                                                                          |               |
 datehour              | varchar                                                                                                                                                                                                                           | partition key | partition based on date hour

Once you have settled on a final design for the v2 stats, would you mind sketching out a "v1 to v2" stats comparison guide somewhere to give us an idea of what a migration is going to look like?

'outcomes': self.outcomes,
'recorded_options': self._get_options_to_record(),
}
stats = self._stats()

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 3, 2019

Member

The effect of this for v2 is that we load the stats from one place and store them to another?... that's a bit odd.

This comment has been minimized.

Copy link
@codealchemy

codealchemy Apr 3, 2019

Author Contributor

✔️ I believe the latest changes help resolve this, the JSON reporter is now simply collects workunit data and serves that to the RunTracker.

Move JsonReporter setup to RunTracker (when v2 stats is configured)
This allows us to only run json reports when v2 stats are specified, and since the JsonReporter is _only_ used by the RunTracker for stats it feels like a better fit here than in Reporting / Report. Additionally, we can leverage the in-memory results of the json reporter (instead of reading the file) and more clearly illustrate the differences between the v1 / v2 stats when selecting them in the RunTracker based on the configured flag.
@jsirois

jsirois approved these changes Apr 3, 2019

Copy link
Member

left a comment

Much better!

Show resolved Hide resolved src/python/pants/reporting/json_reporter.py Outdated
@jsirois

jsirois approved these changes Apr 3, 2019

Copy link
Member

left a comment

And even better still: +65 −116

@stuhood

stuhood approved these changes Apr 4, 2019

Copy link
Member

left a comment

Thanks.

Please definitely take note of the above comment... this is potentially disruptive, and we'll need some notice.

@codealchemy

This comment has been minimized.

Copy link
Contributor Author

commented Apr 4, 2019

@stuhood re:

Once you have settled on a final design for the v2 stats, would you mind sketching out a "v1 to v2" stats comparison guide somewhere to give us an idea of what a migration is going to look like?

👍 Absolutely - I don't see V1 going away (or not being the default) anytime soon, and will put together a guide if / when deprecating that is under consideration.

@jsirois jsirois merged commit eca0e42 into pantsbuild:master Apr 4, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@codealchemy codealchemy deleted the codealchemy:version-schema-stats branch Apr 4, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.