Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix zipkin multithreading issue #7415 #7826

Merged

Conversation

Projects
None yet
4 participants
@cattibrie
Copy link
Contributor

commented May 31, 2019

Problem

Pants runs with Zipkin tracing enabled failed for cases with pants runs with multiple threads (e.g. compile or for background workunits). The issue is described in #7415 .
Parent-child relations for zipkin trace were stored in local thread storage. So workunits in the new thread didn't have Zipkin trace related information about its parent.

Solution

py_zipkin library was recently updated to work better with multithreading. A Tracer object is used to store Zipkin trace info and to pass it between threads.

Result

Zipkin tracing with WRONG parent-child relations with multiple threads
image

Zipkin tracing with RIGHT parent-child relations with multiple threads
image

@cattibrie cattibrie changed the title Fix zipkin multithreading Fix zipkin multithreading issue #7415 May 31, 2019

@illicitonion
Copy link
Contributor

left a comment

Very nice! Just a few minor comments :) Thanks!

# Copyright 2019 Pants project contributors (see CONTRIBUTORS.md).
# Licensed under the Apache License, Version 2.0 (see LICENSE).

# This BUILD file contains several targets with scala libraries that will be compiled in parallel.

This comment has been minimized.

Copy link
@illicitonion

illicitonion May 31, 2019

Contributor
Suggested change
# This BUILD file contains several targets with scala libraries that will be compiled in parallel.
# This BUILD file contains several targets with scala libraries that can be compiled in parallel.

Nothing about them inherently means they will be compiled in parallel, but if you ask them to be compiled, they can be compiled in parallel :)

@@ -74,6 +76,11 @@ def start_workunit(self, workunit):
else:
service_name = "pants workunit"

# Set local_tracer. Tracer stores spans and span's zipkin_attrs.
# If start_workunit is called from the root thread than local_tracer is the same as self.tracer.
# If start_workunit is called from a new thread than local_tracer will be empty.

This comment has been minimized.

Copy link
@illicitonion

illicitonion May 31, 2019

Contributor
Suggested change
# If start_workunit is called from a new thread than local_tracer will be empty.
# If start_workunit is called from a new thread than local_tracer will have an empty span storage and stack.
span.zipkin_attrs = ZipkinAttrs(
trace_id=span.zipkin_attrs.trace_id,
span_id=span.zipkin_attrs.span_id,
parent_span_id=self.run_tracker._main_root_workunit.zipkin_attrs.span_id,

This comment has been minimized.

Copy link
@illicitonion

illicitonion May 31, 2019

Contributor

Can you just set span.zipkin_attrs.parent_span_id=self.run_tracker._main_root_workunit.zipkin_attrs.span_id here instead of making a whole new ZipkinAttrs? Or does the constructor do more things?

# Goals and tasks save their start time at the beginning of their run.
# This start time is passed to workunit, because the workunit may be created much later.
span.start_timestamp = workunit.start_time
if first_span and span.zipkin_attrs.is_sampled:
span.logging_context.start_timestamp = workunit.start_time
self._workunits_to_spans[workunit] = span

This comment has been minimized.

Copy link
@illicitonion

illicitonion May 31, 2019

Contributor

It looks like you could completely get rid of self._workunits_to_spans if you set workunit.span instead of workunit.zipkin_attrs above?

@@ -177,7 +179,7 @@ def test_zipkin_reporter(self):
num_of_traces = len(ZipkinHandler.traces)
self.assertEqual(num_of_traces, 1)

trace = ZipkinHandler.traces[-1]
trace = next(iter(ZipkinHandler.traces.values()))

This comment has been minimized.

Copy link
@illicitonion

illicitonion May 31, 2019

Contributor

Rather than next(iter( maybe call assert_single_element from pants.util.collections? (And throughout)


pants_run = self.run_pants(command)
self.assert_success(pants_run)
num_of_traces = len(ZipkinHandler.traces)

This comment has been minimized.

Copy link
@illicitonion

illicitonion May 31, 2019

Contributor
      num_of_traces = len(ZipkinHandler.traces)
      self.assertEqual(num_of_traces, 1)

      trace = next(iter(ZipkinHandler.traces.values()))

could just become

trace = assert_single_element(ZikinHandler.traces.values())

object GreetingByName {
def greeting_by_name(name: String): String = {
"Hello %s! How are you?".format(name)

This comment has been minimized.

Copy link
@blorente

blorente May 31, 2019

Contributor

Super nit:

Scala has really cool string interpolation: s"Hello {name}! How are you?" (notice the s" at the beginning of the string).

@@ -74,6 +76,11 @@ def start_workunit(self, workunit):
else:
service_name = "pants workunit"

# Set local_tracer. Tracer stores spans and span's zipkin_attrs.
# If start_workunit is called from the root thread than local_tracer is the same as self.tracer.

This comment has been minimized.

Copy link
@blorente

blorente May 31, 2019

Contributor
Suggested change
# If start_workunit is called from the root thread than local_tracer is the same as self.tracer.
# If start_workunit is called from the root thread then local_tracer is the same as self.tracer.
@@ -74,6 +76,11 @@ def start_workunit(self, workunit):
else:
service_name = "pants workunit"

# Set local_tracer. Tracer stores spans and span's zipkin_attrs.
# If start_workunit is called from the root thread than local_tracer is the same as self.tracer.
# If start_workunit is called from a new thread than local_tracer will be empty.

This comment has been minimized.

Copy link
@blorente

blorente May 31, 2019

Contributor
Suggested change
# If start_workunit is called from a new thread than local_tracer will be empty.
# If start_workunit is called from a new thread then local_tracer will be empty.
num_of_traces = len(ZipkinHandler.traces)
self.assertEqual(num_of_traces, 1)

trace = next(iter(ZipkinHandler.traces.values()))

This comment has been minimized.

Copy link
@blorente

blorente May 31, 2019

Contributor

I'm very curious, why can't this be ZipkinHandler.traces.values()[0]?

This comment has been minimized.

Copy link
@cattibrie

cattibrie May 31, 2019

Author Contributor

values() Return a new view of the dictionary’s values. See the documentation of view objects.
https://docs.python.org/3.3/library/stdtypes.html#dict-views

@stuhood
Copy link
Member

left a comment

Thanks!

@@ -109,6 +109,8 @@ def __init__(self, run_info_dir, parent, name, labels=None, cmd='', log_config=N
self.parent = parent
self.children = []

self.zipkin_attrs = None

This comment has been minimized.

Copy link
@stuhood

stuhood May 31, 2019

Member

Would it be possible to require these as a parameter to the constructor? Or is it actually necessary for the zipkin_attrs to be set after the workunit has been constructed?

The advantage to passing them to the constructor would be that you could ensure that they were never None, which reduces error cases.

This comment has been minimized.

Copy link
@cattibrie

cattibrie Jun 3, 2019

Author Contributor

Updated the code and added comments

@stuhood

stuhood approved these changes Jun 3, 2019

Copy link
Member

left a comment

Thanks!

cattibrie added some commits Jun 4, 2019

@illicitonion illicitonion merged commit bf7263d into pantsbuild:master Jun 4, 2019

1 check was pending

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details

blorente added a commit to blorente/pants that referenced this pull request Jun 5, 2019

Fix zipkin multithreading issue pantsbuild#7415 (pantsbuild#7826)
Pants runs with Zipkin tracing enabled failed for cases with pants runs with multiple threads (e.g. compile or for background workunits). The issue is described in pantsbuild#7415 .
Parent-child relations for zipkin trace were stored in local thread storage. So workunits in the new thread didn't have Zipkin trace related information about its parent.

py_zipkin library was recently updated to work better with multithreading. A Tracer object is used to store Zipkin trace info and to pass it between threads.

Zipkin tracing with WRONG parent-child relations with multiple threads
![image](https://user-images.githubusercontent.com/29643054/58710111-80838b80-83b3-11e9-9c2f-6274e3b4215a.png)

Zipkin tracing with RIGHT parent-child relations with multiple threads
![image](https://user-images.githubusercontent.com/29643054/58710095-76fa2380-83b3-11e9-916b-7ff14361ed00.png)

stuhood added a commit that referenced this pull request Jun 7, 2019

Fix zipkin multithreading issue #7415 (#7826)
### Problem
Pants runs with Zipkin tracing enabled failed for cases with pants runs with multiple threads (e.g. compile or for background workunits). The issue is described in #7415 . 
Parent-child relations for zipkin trace were stored in local thread storage. So workunits in the new thread didn't have Zipkin trace related information about its parent.

### Solution
py_zipkin library was recently updated to work better with multithreading. A Tracer object is used to store Zipkin trace info and to pass it between threads.

### Result
Zipkin tracing with WRONG parent-child relations with multiple threads
![image](https://user-images.githubusercontent.com/29643054/58710111-80838b80-83b3-11e9-9c2f-6274e3b4215a.png)


Zipkin tracing with RIGHT parent-child relations with multiple threads
![image](https://user-images.githubusercontent.com/29643054/58710095-76fa2380-83b3-11e9-916b-7ff14361ed00.png)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.