Scheduler metrics don't get collected in pure v2 mode #7386

illicitonion · 2019-03-15T11:29:48Z

Currently all the code which interacts with a scheduler session exists in the Context.executing context manager.

This doesn't cover console rule execution.

We should move this code so it lives in the LocalPantsRunner, or more closely wraps individual SchedulerSession lifetimes, rather than the v1-specific Context.

The text was updated successfully, but these errors were encountered:

illicitonion · 2019-06-18T16:03:55Z

Some context:

We added zipkin support to pants! If you run the zipkin server, and have pants post a trace to it, you should see a detailed timing breakdown of what pants does:

$ docker run -d -p 9411:9411 openzipkin/zipkin
$ ./pants --v1 --v2 --reporting-zipkin-endpoint=http://localhost:9411/api/v1/spans --reporting-zipkin-trace-v2 compile 3rdparty:

If you navigate to http://localhost:9411/zipkin/ and press "Find traces", and open up the latest trace, you should see a whole lot of spans; notably, you should see some containing the text scandir and some containing the text digestfile.

If instead you run:

$ ./pants --v1 --v2 --reporting-zipkin-endpoint=http://localhost:9411/api/v1/spans --reporting-zipkin-trace-v2 list 3rdparty:

you will not see any of those scandirs or digestfiles. But significantly, pants is doing those things, it's just not reporting them. scandir and digestfile are v2 tasks.

The reason for this is that list is a v2 goal, and compile is a v1 goal. We only collect and publish data related to v2 tasks if you happen to be running a v1 goal as part of your command.

When you run pants, LocalPantsRunner calls _maybe_run_v2 then _maybe_run_v1.

v2 metrics publishing is done through a callstack of roughly:
LocalPantsRunner._maybe_run_v1 -> GoalRunnerFactory.run -> GoalRunnerFactory._run_goals -> Context.executing

In Context.executing, we end up calling:

    metrics = self._scheduler.metrics()
    self.run_tracker.pantsd_stats.set_scheduler_metrics(metrics)
    engine_workunits = self._scheduler.engine_workunits(metrics)
    if engine_workunits:
      self.run_tracker.report.bulk_record_workunits(engine_workunits)
    self._set_affected_target_count_in_runtracker()

and the bulk_record_workunits call is what does the pushing of v2 information into these traces. But this isn't v1 specific, it just happens to be where the code sits.

So, let's move this block of code into somewhere more sensible. As good an answer as any seems to be: Let's make a new function on LocalPantsRunner which does what we currently do in Context.executing, called something like update_stats. Let's call it in LocalPantsRunner._run after we call _maybe_run_v1 (maybe in the finally block, as we always want stats).

In LocalPantsRunner you can access the RunTracker as self._run_tracker, and you can access the Scheduler... Well, you can't quite, but you can make it so you can! self._graph_session is a LegacyGraphSession which, when it was created, there was a reference to a Scheduler. So Let's make it so that when we construct a LocalPantsRunner we pass in a Scheduler, so that we can self._scheduler

illicitonion added this to To Do in Python Pipeline Porting via automation Mar 15, 2019

illicitonion assigned cattibrie May 22, 2019

illicitonion unassigned cattibrie Jun 18, 2019

Eric-Arellano assigned hrfuller Jun 19, 2019

hrfuller mentioned this issue Jun 20, 2019

Report zipkin spans regardless of goal version. #7915

Merged

illicitonion closed this as completed in #7915 Jun 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler metrics don't get collected in pure v2 mode #7386

Scheduler metrics don't get collected in pure v2 mode #7386

illicitonion commented Mar 15, 2019

illicitonion commented Jun 18, 2019

Scheduler metrics don't get collected in pure v2 mode #7386

Scheduler metrics don't get collected in pure v2 mode #7386

Comments

illicitonion commented Mar 15, 2019

illicitonion commented Jun 18, 2019