Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler metrics don't get collected in pure v2 mode #7386

Closed
illicitonion opened this issue Mar 15, 2019 · 1 comment · Fixed by #7915
Closed

Scheduler metrics don't get collected in pure v2 mode #7386

illicitonion opened this issue Mar 15, 2019 · 1 comment · Fixed by #7915
Assignees

Comments

@illicitonion
Copy link
Contributor

Currently all the code which interacts with a scheduler session exists in the Context.executing context manager.

This doesn't cover console rule execution.

We should move this code so it lives in the LocalPantsRunner, or more closely wraps individual SchedulerSession lifetimes, rather than the v1-specific Context.

@illicitonion illicitonion added this to To Do in Python Pipeline Porting via automation Mar 15, 2019
@illicitonion
Copy link
Contributor Author

Some context:

We added zipkin support to pants! If you run the zipkin server, and have pants post a trace to it, you should see a detailed timing breakdown of what pants does:

$ docker run -d -p 9411:9411 openzipkin/zipkin
$ ./pants --v1 --v2 --reporting-zipkin-endpoint=http://localhost:9411/api/v1/spans --reporting-zipkin-trace-v2 compile 3rdparty:

If you navigate to http://localhost:9411/zipkin/ and press "Find traces", and open up the latest trace, you should see a whole lot of spans; notably, you should see some containing the text scandir and some containing the text digestfile.

If instead you run:

$ ./pants --v1 --v2 --reporting-zipkin-endpoint=http://localhost:9411/api/v1/spans --reporting-zipkin-trace-v2 list 3rdparty:

you will not see any of those scandirs or digestfiles. But significantly, pants is doing those things, it's just not reporting them. scandir and digestfile are v2 tasks.

The reason for this is that list is a v2 goal, and compile is a v1 goal. We only collect and publish data related to v2 tasks if you happen to be running a v1 goal as part of your command.

When you run pants, LocalPantsRunner calls _maybe_run_v2 then _maybe_run_v1.

v2 metrics publishing is done through a callstack of roughly:
LocalPantsRunner._maybe_run_v1 -> GoalRunnerFactory.run -> GoalRunnerFactory._run_goals -> Context.executing

In Context.executing, we end up calling:

    metrics = self._scheduler.metrics()
    self.run_tracker.pantsd_stats.set_scheduler_metrics(metrics)
    engine_workunits = self._scheduler.engine_workunits(metrics)
    if engine_workunits:
      self.run_tracker.report.bulk_record_workunits(engine_workunits)
    self._set_affected_target_count_in_runtracker()

and the bulk_record_workunits call is what does the pushing of v2 information into these traces. But this isn't v1 specific, it just happens to be where the code sits.

So, let's move this block of code into somewhere more sensible. As good an answer as any seems to be: Let's make a new function on LocalPantsRunner which does what we currently do in Context.executing, called something like update_stats. Let's call it in LocalPantsRunner._run after we call _maybe_run_v1 (maybe in the finally block, as we always want stats).

In LocalPantsRunner you can access the RunTracker as self._run_tracker, and you can access the Scheduler... Well, you can't quite, but you can make it so you can! self._graph_session is a LegacyGraphSession which, when it was created, there was a reference to a Scheduler. So Let's make it so that when we construct a LocalPantsRunner we pass in a Scheduler, so that we can self._scheduler

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants