Add messaging use case #189

objectiser · 2017-03-03T10:38:51Z

@yurishkuro @bhs @wu-sheng Initial stab at the messaging use case.

Someone my need to check the python is valid :)

yurishkuro · 2017-03-03T11:50:54Z

_docs/pages/instrumentation/common-use-cases.md

+
+There are two messaging styles that should be handled, Message Queues and Publish/Subscribe (Topics).
+
+The main distinction between these styles is that a message written to a queue can only be consumed by **at most** one consumer, whereas a message published on a topic can be consumed by zero or more subscribers. The other point to note is that although a message can be sent/published by a producer, that does not guarantee that it will be consumed - messaging systems are asynchronous and therefore the producer has no visibility of when or if the message was delivered.


I think in a CAP world, if you choose AP system then "at most once" cannot be guaranteed, only "at least once" can be, which makes the distinction between queue and pub/sub even less clear. So I would drop that part, especially since your next paragraph says it doesn't really matter for tracing anyway.

Good point - I'll drop this part.

yurishkuro · 2017-03-03T11:53:51Z

_docs/pages/instrumentation/common-use-cases.md

+        span.log(event='general exception', payload=e)
+        span.set_tag('error', true)
+        span.finish()
+        raise


instead of this, a more idiomatic way is

with span: messaging_client.send(message)

yurishkuro · 2017-03-03T11:54:45Z

_docs/pages/instrumentation/common-use-cases.md

+As with the RPC client example, a messaging producer is expected to start a new tracing Span before sending a message, and propagate the new Span along with that message. The following example shows how it can be done.
+
+```python
+def traced_request(message, operation, producer):


maybe send_traced, to not confuse with rpc req/res

the producer is not used, did you mean to record it somewhere in tags?

producer was left over from when looking to record the type, so will remove.

yurishkuro · 2017-03-03T12:02:15Z

_docs/pages/instrumentation/common-use-cases.md

+
+The main distinction between these styles is that a message written to a queue can only be consumed by **at most** one consumer, whereas a message published on a topic can be consumed by zero or more subscribers. The other point to note is that although a message can be sent/published by a producer, that does not guarantee that it will be consumed - messaging systems are asynchronous and therefore the producer has no visibility of when or if the message was delivered.
+
+From a tracing perspective, the messaging style is not important, only that the span context associated with the producer is propagated to the zero or more consumers of the message. It is then the responsibility of the consumer(s) to create a span to encapsulate processing of the consumed message and establish a _FollowsFrom_ reference to the propagated span context.


should there be a difference between FollowsFrom used to fire&forget a cache write in the same process vs. the messaging case? I think depending on the situation some users may want the messaging consumer to start a new trace and link it to the parent trace (e.g. if the parent was a part of an http request), while in other cases (pure messaging based system) they may want the same trace. Are we suggesting that the tracing system must make that decision on its own, or should we have another reference type or some other indicator from the instrumentation whether a single trace or linked traces are desired? cc @bhs

Might be worth having the distinction, but do you have some concrete scenarios where a new trace would be preferred rather than same trace?

yurishkuro · 2017-03-03T12:03:41Z

_docs/pages/instrumentation/common-use-cases.md

+```
+
+  * The `get_current_span()` function is not a part of the OpenTracing API. It is meant to represent some util method of retrieving the current Span from the current request context propagated implicitly (as is often the case in Python).
+  * Because the messaging client may throw an exception, we use a try/catch block to finish the Span in all circumstances, to ensure it is reported and avoid leaking resources.


not needed if with span: syntax is used

yurishkuro · 2017-03-03T13:29:45Z

Might be worth having the distinction, but do you have some concrete scenarios where a new trace would be preferred rather than same trace?

Yes. It's fairly typical to have a trace representing some graph of RPCs and one of them puts a message to a queue to be handler by an async job. The time scale of RPC graph vs. the async job is quite different, so its useful to represent the async job as a separate trace linked to the first trace via something similar to FollowsFrom reference.

wu-sheng

@objectiser ,I think one important thing must be claimed, which is:

some moms(like apache kafka/RocketMQ) support batch consume, when the tracer faces this scenario, these are more than one messages, each of them carries a Context, from same or different producers.

In my tracer implementation, I will start a new trace segment(diff id), and set multi refSegmentIds, which are from the messages. This operation is called multiExtract(looking for any better name, suggestion?), like this:https://github.com/wu-sheng/sky-walking/blob/feature/3.0/skywalking-sniffer/skywalking-api/src/main/java/com/a/eye/skywalking/api/context/TracerContext.java#L149-L153 .

And what is OpenTracing spec official recommendation? I think we must set this up, before we claim to support mom trace and context-propagation.

objectiser · 2017-03-03T15:53:36Z

@wu-sheng Agree this is an interesting case, but doesn't it depend how that batch is processed. If the consumer proceeds to process those messages independently, then isn't it reasonable to treat them still as separate trace instances?

Maybe the consumer spans need to be marked in some way to indicate that they are part of a particular batch - but unless the processing continues to treat them as a batch, then they are still separate trace instances.

wu-sheng · 2017-03-03T16:04:44Z

@objectiser Agree with you that should a seperated trace on consumer side. My point is How to extract context is also part of OT Spec. One new consumer side span faces several messages.

unless the processing continues to treat them as a batch

Definitely happened , when use batch consume, they receive the messages, put them in a list, and continue to process,e.g. batch insert into database.

objectiser · 2017-03-03T16:17:51Z

@wu-sheng Just to clarify, there are two possible approaches that may be valid in different situations:

a) messages received and processed as a batch, so a single span is created on the consumer with references to the extracted contexts for each message in the batch

b) messages received as a batch, but then processed internally as individual messages - in this case it may be preferrable to have a consumer span per message, referencing the extracted context of that message, as it is a continuation of the same trace instance. In this case, it would be useful if each consumer span (i.e. per message) is annotated in some way to identify the batch, so it is clear they all were part of the same batch for analysis purposes

The reason why (b) may be important is that the batch processing may be for efficiency purposes, but at the same time we don't want to loose the causal relationship of subsequent processing of that message with the previous processing - which would happen if the trace data converged on a single span for the whole batch.

wu-sheng · 2017-03-04T00:17:02Z

@objectiser

there are two possible approaches that may be valid in different situations

Agree.

at the same time we don't want to loose the causal relationship of subsequent processing of that message with the previous processing

But this may be not right, even process data in a batch mode (situation a), because you should not break the related trace-segments. Btw, a) have much better performance than b), when you face hundred of thousands messages per seconds(even much more than this), and you want to insert them into a database.

As you can see in my tracer, a segment have multi-ref. It means these are multi-parent, still have the causal relationship between the parent-trace-segments and this batch one. As a different trace segment started, segmentRef equals the spans causal relationship. If the tracer visualize the trace in an appropriate way, you can still know the trace. After all, a trace should be as same as application goes.

objectiser · 2017-03-04T11:00:20Z

@yurishkuro @bhs Any thoughts on scenario (b)?

bhs · 2017-03-04T15:26:05Z

_docs/pages/instrumentation/common-use-cases.md

@@ -253,3 +253,65 @@ if request.get('debug'):
        tags={tags.SAMPLING_PRIORITY: 1}
    )
 ```
+
+### Tracing Messaging Scenarios


(same naming questions from opentracing/specification#49 apply here)

bhs · 2017-03-04T15:30:13Z

_docs/pages/instrumentation/common-use-cases.md

+    format=opentracing.TEXT_MAP_FORMAT,
+    carrier=message.headers
+)
+if extracted_context is None:


unnecessary... the spec says that a None context implies that the ref should be ignored

yurishkuro · 2017-03-04T15:44:32Z

@objectiser this is a typical "batch write" challenge when doing tracing. There is no "right" answer, as there are different ways to model and capture causality. See Section 3 in this paper.

objectiser · 2017-03-04T17:13:32Z

@yurishkuro Thanks for the reference.

Have created opentracing/specification#51 as a placeholder for the point you mentioned earlier about in some situations wanting to create consumer spans in a new trace.

…acted_context is None

bhs

Thanks!

bhs · 2017-04-29T20:39:03Z

_docs/pages/instrumentation/common-use-cases.md

+
+From a tracing perspective, the message bus style is not important, only that the span context associated with the producer is propagated to the zero or more consumers of the message. It is then the responsibility of the consumer(s) to create a span to encapsulate processing of the consumed message and establish a _FollowsFrom_ reference to the propagated span context.
+
+As with the RPC client example, a messaging producer is expected to start a new tracing Span before sending a message, and propagate the new Span along with that message. The following example shows how it can be done.


subtle point, but I'd prefer propagate the new Span's SpanContext along with that message.

We should also clarify that the producer Span only lives as long as it takes to enqueue/publish the message; i.e., it does not wait for a dequeue-er/consumer before finishing the Span. (I think this is the only sane thing if someone really stops to think about it, but it's different enough from nested RPCs that it deserves to be stated explicitly)

bhs · 2017-04-29T20:40:01Z

_docs/pages/instrumentation/common-use-cases.md

+    with span:
+        messaging_client.send(message)
+    except Exception e:
+        span.log(event='general exception', payload=e)


there is a scheme for error logging... https://github.com/opentracing/specification/blob/master/semantic_conventions.md#log-fields-table

I'm also fine leaving this out of the example entirely since it's a bit of a distraction.

bhs · 2017-04-29T20:41:05Z

_docs/pages/instrumentation/common-use-cases.md

+        raise
+```
+
+  * The `get_current_span()` function is not a part of the OpenTracing API. It is meant to represent some util method of retrieving the current Span from the current request context propagated implicitly (as is often the case in Python).


(I can't wait to kill this caveat!)

bhs · 2017-04-29T20:42:57Z

_docs/pages/instrumentation/common-use-cases.md

+    format=opentracing.TEXT_MAP_FORMAT,
+    carrier=message.headers
+)
+span = tracer.start_span(operation_name=operation, follows_from=extracted_context)


follows_from is not its own named parameter... child_of is "special" in this way. the references= named parameter works, though.

see: https://github.com/opentracing/opentracing-python/blob/master/opentracing/tracer.py#L44 and https://github.com/opentracing/opentracing-python/blob/master/opentracing/tracer.py#L163

bhs · 2017-04-29T20:43:11Z

_docs/pages/instrumentation/common-use-cases.md

+span.set_tag('message.destination', message.destination)
+```
+
+#### Synchronous request response over queues


nit: "request-response"

bhs · 2017-04-29T20:44:19Z

_docs/pages/instrumentation/common-use-cases.md

+
+However this pattern could also be used for delegation to indicate a third party that should be informed of the result. In which case it would be treated as two separate message exchanges with _Follows From_ relationship types linking each stage.
+
+As it would be difficult to distinguish between these two scenarios, and the use of message oriented middleware for synchronous request/response pattern should be discouraged, it is recommended that the request/response scenario be ignored from a tracing perspective. 


I'm not sure I can get behind this advice... IMO, this seems like a time to use "normal" child_of and shrug about the fact that a message bus is involved.

The problem is distinguishing (from a framework integration instrumentation perspective) between a request/response pattern, and where the 'replyTo' message is actually going to be consumed by another application.

One approach would be to assume that if a temporary queue is used, then only the current application could be used to receive the response.

Would prefer to create a separate issue to discuss this, and leave this caveat for now.

bhs · 2017-04-29T20:45:07Z

(oh, and sorry for the insane delay – was just looking at pending PRs in a few places and saw this)

objectiser · 2017-04-30T12:19:34Z

No problem - I'll update in a couple of days.

bhs · 2017-05-27T21:09:11Z

(cleaning up old PRs, and this one is ready for merge – sorry to miss it!)

* Add messaging use case * Removed para on 'at most' once, updated python code * Change to use 'Message Bus' term and update code to not check if extracted_context is None * Updated message bus scenario based on review comments

Add messaging use case

25441f3

objectiser mentioned this pull request Mar 3, 2017

Message Oriented Middleware (MOM) tags opentracing/specification#49

Merged

yurishkuro reviewed Mar 3, 2017

View reviewed changes

Removed para on 'at most' once, updated python code

815b945

wu-sheng reviewed Mar 3, 2017

View reviewed changes

bhs reviewed Mar 4, 2017

View reviewed changes

objectiser mentioned this pull request Mar 4, 2017

Reference type to represent relationship between traces opentracing/specification#51

Open

Change to use 'Message Bus' term and update code to not check if extr…

be2ac6d

…acted_context is None

objectiser mentioned this pull request Mar 5, 2017

New reference type to support batch scenarios opentracing/specification#52

Open

bhs reviewed Apr 29, 2017

View reviewed changes

Updated message bus scenario based on review comments

fed3767

bhs merged commit 9108e72 into opentracing:master May 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add messaging use case #189

Add messaging use case #189

objectiser commented Mar 3, 2017

yurishkuro Mar 3, 2017

objectiser Mar 3, 2017

yurishkuro Mar 3, 2017 •

edited

Loading

yurishkuro Mar 3, 2017 •

edited

Loading

objectiser Mar 3, 2017

yurishkuro Mar 3, 2017

objectiser Mar 3, 2017

yurishkuro Mar 3, 2017

yurishkuro commented Mar 3, 2017

wu-sheng left a comment

objectiser commented Mar 3, 2017 •

edited

Loading

wu-sheng commented Mar 3, 2017

objectiser commented Mar 3, 2017

wu-sheng commented Mar 4, 2017

objectiser commented Mar 4, 2017

bhs Mar 4, 2017

bhs Mar 4, 2017

yurishkuro commented Mar 4, 2017

objectiser commented Mar 4, 2017 •

edited

Loading

bhs left a comment

bhs Apr 29, 2017

bhs Apr 29, 2017

bhs Apr 29, 2017

bhs Apr 29, 2017

bhs Apr 29, 2017

bhs Apr 29, 2017

objectiser May 2, 2017

bhs commented Apr 29, 2017

objectiser commented Apr 30, 2017

bhs commented May 27, 2017


		There are two messaging styles that should be handled, Message Queues and Publish/Subscribe (Topics).

		The main distinction between these styles is that a message written to a queue can only be consumed by at most one consumer, whereas a message published on a topic can be consumed by zero or more subscribers. The other point to note is that although a message can be sent/published by a producer, that does not guarantee that it will be consumed - messaging systems are asynchronous and therefore the producer has no visibility of when or if the message was delivered.


		The main distinction between these styles is that a message written to a queue can only be consumed by at most one consumer, whereas a message published on a topic can be consumed by zero or more subscribers. The other point to note is that although a message can be sent/published by a producer, that does not guarantee that it will be consumed - messaging systems are asynchronous and therefore the producer has no visibility of when or if the message was delivered.

		From a tracing perspective, the messaging style is not important, only that the span context associated with the producer is propagated to the zero or more consumers of the message. It is then the responsibility of the consumer(s) to create a span to encapsulate processing of the consumed message and establish a _FollowsFrom_ reference to the propagated span context.


		From a tracing perspective, the message bus style is not important, only that the span context associated with the producer is propagated to the zero or more consumers of the message. It is then the responsibility of the consumer(s) to create a span to encapsulate processing of the consumed message and establish a _FollowsFrom_ reference to the propagated span context.

		As with the RPC client example, a messaging producer is expected to start a new tracing Span before sending a message, and propagate the new Span along with that message. The following example shows how it can be done.


		However this pattern could also be used for delegation to indicate a third party that should be informed of the result. In which case it would be treated as two separate message exchanges with _Follows From_ relationship types linking each stage.

		As it would be difficult to distinguish between these two scenarios, and the use of message oriented middleware for synchronous request/response pattern should be discouraged, it is recommended that the request/response scenario be ignored from a tracing perspective.

Add messaging use case #189

Add messaging use case #189

Conversation

objectiser commented Mar 3, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yurishkuro Mar 3, 2017 • edited Loading

Choose a reason for hiding this comment

yurishkuro Mar 3, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yurishkuro commented Mar 3, 2017

wu-sheng left a comment

Choose a reason for hiding this comment

objectiser commented Mar 3, 2017 • edited Loading

wu-sheng commented Mar 3, 2017

objectiser commented Mar 3, 2017

wu-sheng commented Mar 4, 2017

objectiser commented Mar 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yurishkuro commented Mar 4, 2017

objectiser commented Mar 4, 2017 • edited Loading

bhs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhs commented Apr 29, 2017

objectiser commented Apr 30, 2017

bhs commented May 27, 2017

yurishkuro Mar 3, 2017 •

edited

Loading

yurishkuro Mar 3, 2017 •

edited

Loading

objectiser commented Mar 3, 2017 •

edited

Loading

objectiser commented Mar 4, 2017 •

edited

Loading