New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per-run metrics for target roots, transitive target counts. #5651

Merged
merged 5 commits into from Apr 4, 2018

Conversation

Projects
None yet
2 participants
@kwlzn
Copy link
Member

kwlzn commented Apr 3, 2018

This should help ensure a more stable sizing metric for runs.

[omerta pants (kwlzn/more_metrics)]$ PANTS_ENABLE_PANTSD=True ./pants -q list 3rdparty/python:psutil 3rdparty/python:requests --run-tracker-stats-local-json-file=test.json | wc -l ; cat test.json | json.tool | grep -A5 pantsd_stats

       2
    "pantsd_stats": {
        "affected_targets_size": 2,
        "preceding_graph_size": 0,
        "resulting_graph_size": 27,
        "target_root_size": 2
    },
[omerta pants (kwlzn/more_metrics)]$ PANTS_ENABLE_PANTSD=True ./pants -q list : --run-tracker-stats-local-json-file=test.json | wc -l ; cat test.json | json.tool | grep -A5 pantsd_stats

       8
    "pantsd_stats": {
        "affected_targets_size": 15,
        "preceding_graph_size": 5,
        "resulting_graph_size": 99,
        "target_root_size": 8
    },
[omerta pants (kwlzn/more_metrics)]$ PANTS_ENABLE_PANTSD=True ./pants -q list :: --run-tracker-stats-local-json-file=test.json | wc -l ; cat test.json | json.tool | grep -A5 pantsd_stats

    1558
    "pantsd_stats": {
        "affected_targets_size": 1562,
        "preceding_graph_size": 19,
        "resulting_graph_size": 15256,
        "target_root_size": 1558
    },
[omerta pants (kwlzn/more_metrics)]$ PANTS_ENABLE_PANTSD=True ./pants -q list : --run-tracker-stats-local-json-file=test.json | wc -l ; cat test.json | json.tool | grep -A5 pantsd_stats

       8
    "pantsd_stats": {
        "affected_targets_size": 15,
        "preceding_graph_size": 5086,
        "resulting_graph_size": 5166,
        "target_root_size": 8
    },
[omerta pants (kwlzn/more_metrics)]$ ./pants -q list : --run-tracker-stats-local-json-file=test.json | wc -l ; cat test.json | json.tool | grep -A5 pantsd_stats

       8
    "pantsd_stats": {
        "affected_targets_size": 15,
        "preceding_graph_size": -1,
        "resulting_graph_size": 95,
        "target_root_size": 8
    },
[omerta pants (kwlzn/more_metrics)]$ ./pants -q list :: --run-tracker-stats-local-json-file=test.json | wc -l ; cat test.json | json.tool | grep -A5 pantsd_stats

    1558
    "pantsd_stats": {
        "affected_targets_size": 1562,
        "preceding_graph_size": -1,
        "resulting_graph_size": 15254,
        "target_root_size": 1558
    },

@kwlzn kwlzn requested review from stuhood , ity and illicitonion Apr 3, 2018

@stuhood
Copy link
Member

stuhood left a comment

Thanks Kris. I think that we should avoid directly tracking targets in favor of tracking files.

If you have appetite for diving into the rust code, there is a relatively small solution available there... otherwise, can get similar information from the BuildGraph for now.

self.run_tracker.pantsd_stats.set_target_root_size(target_count)
return target_count

def set_affected_target_count_in_runtracker(self):

This comment has been minimized.

@stuhood

stuhood Apr 3, 2018

Member

Rather than exposing these as public methods on Context, is there somewhere else we could put them that would not potentially give the impression that they are intended for Task developers to call? We should be aiming to shrink the Context API if possible.

This comment has been minimized.

@kwlzn

kwlzn Apr 3, 2018

Member

I'd assumed making them non-:API: public would suffice for that - but I've made them private and wrapped that with a contextmanager to tighten this up a bit. let me know what you think.


def set_affected_target_count_in_runtracker(self):
"""Sets the realized target count in the run tracker's daemon stats object."""
target_count = len(self.build_graph)

This comment has been minimized.

@stuhood

stuhood Apr 3, 2018

Member

This is going to break when we stop computing a BuildGraph in all cases via #5639 and #4769. And it is explicitly the goal of those tickets to avoid computing this.

@benjyw and I talked about this yesterday, but moving forward, we're going to need to make a decision about what the graph looks like to tasks like depmap and dependencies... but it's entirely possible that that will not be literally the graph used by a goal like compile or test, since those will request exactly the products they need in order to execute. This is also touched on in the blog post I sent out about execution models.

I think that a metric that would be more durable as the concept of "targets" evolves might be something like: "number of involved files". There are a few ways to compute that (and ideally it would be computed by the engine itself: see below), but one easy approach for now would be to continue to use the BuildGraph (temporarily), and sum the counts of all files owned by targets.

If you have time for the more forwards compatible approach, it would be to add a method similar to graph_trace that counted how many files were accessible below some roots by calling fs_subject: in pseudocode, it would be something like:

impl Graph {
  fn count_fs_nodes(&self, roots: Vec<NodeKey>) -> usize {
    self
      .walk(roots.into_iter().flat_map(|n| self.entry_id(EntryKey.Value(n))).collect(), false)
      // Count entries which have an fs_subject.
      .filter(|entry_id| self.entry_for_id.and_then(|e| e.node.content().fs_subject()).is_some())
      .count()
  }
}

This comment has been minimized.

@kwlzn

kwlzn Apr 3, 2018

Member

yeah, I fully expect this to evolve over time as we focus on e.g. the runtracker/reporting aspect of the v2 pipeline port. for now, was just going for something basic to establish a better approximate baseline than the accumulating metric we have now for product graph size. I've gone ahead and implemented the file counting idea using BuildGraph for the moment w/ a TODO to circle back to do that in the engine.

@kwlzn kwlzn force-pushed the twitter:kwlzn/more_metrics branch from 9f0dcf0 to cd52c0c Apr 3, 2018

@stuhood

stuhood approved these changes Apr 3, 2018

Copy link
Member

stuhood left a comment

Thanks. Would still recommend dropping the target count metric entirely.


def _set_affected_target_count_in_runtracker(self):
"""Sets the realized target count in the run tracker's daemon stats object."""
target_count = len(self.build_graph)

This comment has been minimized.

@stuhood

stuhood Apr 3, 2018

Member

We're going to be breaking this metric pretty soon, so I'm not sure we even want to begin collecting it.

This comment has been minimized.

@kwlzn

kwlzn Apr 3, 2018

Member

hmm, but only for the python related tasks afaict. my thinking is that it's still a fairly relevant and interesting grouping metric and one that might make sense to be able to compare/contrast file count to. ftr: I'm still not completely convinced that "files" is that much better than "targets" since they can both be deceptive as to substantive size on the surface. so was thinking both can't hurt, initially - and it's cheap to collect/report on.

wdyt?

This comment has been minimized.

@stuhood

stuhood Apr 3, 2018

Member

I think that targets will "mostly" be going away, and that it will not be free to compute this any longer. If we essentially pre-deprecate this metric and bake in the assumption that only v1 tasks report it, then ok.

This comment has been minimized.

@kwlzn

kwlzn Apr 3, 2018

Member

we essentially pre-deprecate this metric and bake in the assumption that only v1 tasks report it

yeah, that's the idea. this metric is exactly as "deprecated" as Target/BuildGraph is. once we erode Target et al in favor of pushing more into the v2 engine, I fully expect this metric to also erode/evolve alongside it - but think it might be useful to experiment with in the interim.

This comment has been minimized.

@stuhood

stuhood Apr 3, 2018

Member

Ok. Fine with landing it.

hmm, but only for the python related tasks afaict.

The first task it will be going away for is list in #5639

@kwlzn kwlzn merged commit 7ffc115 into pantsbuild:master Apr 4, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment