Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect Zipkin spans for v2 engine #7342

Conversation

Projects
None yet
4 participants
@cattibrie
Copy link
Contributor

commented Mar 8, 2019

Problem

Currently, the v2 engine is a black box when it comes to collecting Zipkin spans for tracing.

Solution

This PR allows tracking spans for v2 engine for every Node. But, the number of these Nodes is very big (hundreds or thousands depending on the size of the target), so, later on, will be decided for what Nodes Zipkin spans will be collected. So a flag was added that enables Zipkin tracing for v2 engine.

Result

There are the next things in this PR:

  • workunits for v2 engine that collect Zipkin tracing info;
  • flag that enables collecting Zipkin spans for v2 engine.
    I still have to write tests.
@illicitonion
Copy link
Contributor

left a comment

This generally looks great, thanks!

Would you mind rebasing so that the dict / f64 changes aren't in this PR (and so that we'll use rust 1.33)?

A few very minor thoughts as comments - we should probably also add some explicit TODOs about setting parent IDs better :)

Show resolved Hide resolved src/rust/engine/src/nodes.rs Outdated
Show resolved Hide resolved src/rust/engine/src/nodes.rs Outdated
Show resolved Hide resolved src/rust/engine/src/nodes.rs Outdated
@stuhood
Copy link
Member

left a comment

Thanks, looks good.

Show resolved Hide resolved src/python/pants/goal/pantsd_stats.py Outdated
Show resolved Hide resolved src/python/pants/reporting/zipkin_reporter.py Outdated
Show resolved Hide resolved src/rust/engine/src/externs.rs Outdated
Show resolved Hide resolved src/rust/engine/src/nodes.rs Outdated
Show resolved Hide resolved src/rust/engine/src/scheduler.rs Outdated

@cattibrie cattibrie force-pushed the cattibrie:etyurina/add_workunits_form_rust_to_trace branch from 9c3caec to 1b40a81 Mar 12, 2019

@stuhood
Copy link
Member

left a comment

Looks great! Thanks.

Show resolved Hide resolved src/python/pants/reporting/zipkin_reporter.py Outdated
Show resolved Hide resolved src/python/pants/goal/context.py Outdated
}

#[derive(Clone)]
pub struct Session(Arc<InnerSession>);

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Mar 12, 2019

Contributor

If I'm reading it right, is this change in order to support adding session: Session, to RootContext? I am not familiar -- is there a reason we would use an Arc<InnerSession> with a new InnerSession struct here instead of keeping Session the same, but using session: Arc<Session>, in RootContext? It seems like it would allow us to remove the extra helper methods such as preceding_graph_size() as well, which seems like it would reduce the diff size a lot.

This comment has been minimized.

Copy link
@illicitonion

illicitonion Mar 13, 2019

Contributor

It's a fairly cosmetic thing - there's a convention to make an outer/inner thing to make something cloneable (particularly if your type needs to implement traits, because you can't implement traits on Arc<_> unless the trait happens to be defined in your crate), but also passing around Arcs to things is generally fine :)

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Mar 15, 2019

Contributor

I also just find the use of .0 to be difficult to follow personally, and thought that having an Arc<Session> instead might make this diff smaller and reduce the number of passthrough helper methods. The code is quite readable either way, and the current diff will make it easier to introduce some more methods on Session later.

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Mar 19, 2019

Contributor

I still don't know that I see the utility of an InnerSession struct when it requires so many extra accessors such as preceding_graph_size, compared to just using an Arc<Session>. It seems to inflate the diff count and make it more difficult to symbol search. This PR can be merged regardless, I just don't quite understand why it was architected this way.

This comment has been minimized.

Copy link
@cattibrie

cattibrie Mar 19, 2019

Author Contributor

The main idea was to point out that Session now is Arc and it is so because of workunits. In this case, I explicitly say in the definition of the Session that it is Arc (not when it is used as a part of another struct). The Session is used in RootContext and Context (at least two places) and in both places, I need to put Arc. It is not difficult, but not obvious why the Session should be Arc in these places.

Show resolved Hide resolved src/rust/engine/src/nodes.rs Outdated
externs::store_utf8("end_timestamp"),
externs::store_f64(workunit.end_timestamp),
externs::store_utf8("span_id"),
externs::store_utf8(&workunit.span_id),

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Mar 12, 2019

Contributor

I've made #7371 for a related thought about how to make creating python objects a little more declarative in general, but this is great as is.

@cattibrie cattibrie force-pushed the cattibrie:etyurina/add_workunits_form_rust_to_trace branch from 858fca1 to fb7472c Mar 15, 2019

@cosmicexplorer cosmicexplorer referenced this pull request Mar 18, 2019

Draft

[WIP] switch usages of `log` to `tokio_trace` #7395

0 of 3 tasks complete

@cattibrie cattibrie force-pushed the cattibrie:etyurina/add_workunits_form_rust_to_trace branch from fb7472c to c506fff Mar 18, 2019

@cosmicexplorer
Copy link
Contributor

left a comment

Great!! I made #7395 on top of this because I thought a tracing library might play really well with this code -- I don't know if that PR will be merged, but this PR is a great step.

@@ -444,6 +446,12 @@ def metrics(self):
"""Returns metrics for this SchedulerSession as a dict of metric name to metric value."""
return self._scheduler._metrics(self._session)

def metrics_have_engine_workunits(self):
return "engine_workunits" in self.metrics()

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Mar 19, 2019

Contributor

I think this can be removed?

This comment has been minimized.

Copy link
@illicitonion
@@ -217,7 +217,6 @@ def test_trace_includes_rule_exception_traceback(self):
# Execute a request that will trigger the nested raise, and then directly inspect its trace.
request = self.scheduler.execution_request([A], [B()])
self.scheduler.execute(request)

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Mar 19, 2019

Contributor

I would probably remove this change since it is not important to this PR. In general, it is good to minimize changes in whitespace such as adding or removing extra lines unless it is done to make the code look better.

self.assert_success(pants_run)

num_of_traces = len(ZipkinHandler.traces)
self.assertEqual(num_of_traces, 1)

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Mar 19, 2019

Contributor

There's a method assert_single_element() I added a while ago for cases like this:

def assert_single_element(iterable):
"""Get the single element of `iterable`, or raise an error.

Here, you could write:

from pants.util.collections import assert_single_element
# ...
   trace = assert_single_element(ZipkinHandler.traces)

which would raise an exception if there was not exactly 1 trace.

You do not need to use it at all, I have just found it useful to make it more clear what is being tested.

}

#[derive(Clone)]
pub struct Session(Arc<InnerSession>);

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Mar 19, 2019

Contributor

I still don't know that I see the utility of an InnerSession struct when it requires so many extra accessors such as preceding_graph_size, compared to just using an Arc<Session>. It seems to inflate the diff count and make it more difficult to symbol search. This PR can be merged regardless, I just don't quite understand why it was architected this way.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2019

Also, if a PR is still "WIP", people tend to think it's not ready for review, because there might be more changes they would have to review later. If you're looking for review in the future, it would be good to either first fix things so that it can not be WIP anymore, or if not, make it clear what reviewers should be looking at.

cattibrie added some commits Feb 18, 2019

@cattibrie cattibrie changed the title WIP: Collect Zipkin spans for v2 engine Collect Zipkin spans for v2 engine Mar 19, 2019

@illicitonion
Copy link
Contributor

left a comment

Also I think you need to reformat the rust code :)

Tiny things then let's merge!

.metrics(session)
.into_iter()
.flat_map(|(metric, value)| vec![externs::store_utf8(metric), externs::store_i64(value)])
.collect::<Vec<_>>();
let values_ref = &mut values;

This comment has been minimized.

Copy link
@illicitonion

illicitonion Mar 19, 2019

Contributor

This line shouldn't be necessary; you should be able to remove this line and just values.push where you're currently calling values_ref.push.

.metrics(session)
.into_iter()
.flat_map(|(metric, value)| vec![externs::store_utf8(metric), externs::store_i64(value)])
.collect::<Vec<_>>();
let values_ref = &mut values;
if session.should_record_zipkin_spans() {
let workunits = &session

This comment has been minimized.

Copy link
@illicitonion

illicitonion Mar 19, 2019

Contributor

This & shouldn't be necessary:

Suggested change
let workunits = &session
let workunits = session
@@ -444,6 +446,12 @@ def metrics(self):
"""Returns metrics for this SchedulerSession as a dict of metric name to metric value."""
return self._scheduler._metrics(self._session)

def metrics_have_engine_workunits(self):
return "engine_workunits" in self.metrics()

This comment has been minimized.

Copy link
@illicitonion

@cattibrie cattibrie force-pushed the cattibrie:etyurina/add_workunits_form_rust_to_trace branch from c506fff to cb10be9 Mar 19, 2019

@illicitonion illicitonion merged commit 99e6e3c into pantsbuild:master Mar 19, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

stuhood added a commit that referenced this pull request Mar 20, 2019

Collect Zipkin spans for v2 engine (#7342)
### Problem
Currently, the v2 engine is a black box when it comes to collecting [Zipkin spans ](https://zipkin.io/pages/instrumenting.html)for tracing.

### Solution

This PR allows tracking spans for v2 engine for every Node. But, the number of these Nodes is very big (hundreds or thousands depending on the size of the target), so, later on, will be decided for what Nodes Zipkin spans will be collected. So a flag was added that enables Zipkin tracing for v2 engine.

### Result
There are the next things in this PR:
- workunits for v2 engine that collect Zipkin tracing info;
- flag that enables collecting Zipkin spans for v2 engine.

stuhood added a commit that referenced this pull request Mar 21, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.