Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add messaging use case #189

Merged
merged 4 commits into from
May 27, 2017
Merged

Add messaging use case #189

merged 4 commits into from
May 27, 2017

Conversation

objectiser
Copy link
Contributor

@yurishkuro @bhs @wu-sheng Initial stab at the messaging use case.

Someone my need to check the python is valid :)


There are two messaging styles that should be handled, Message Queues and Publish/Subscribe (Topics).

The main distinction between these styles is that a message written to a queue can only be consumed by **at most** one consumer, whereas a message published on a topic can be consumed by zero or more subscribers. The other point to note is that although a message can be sent/published by a producer, that does not guarantee that it will be consumed - messaging systems are asynchronous and therefore the producer has no visibility of when or if the message was delivered.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in a CAP world, if you choose AP system then "at most once" cannot be guaranteed, only "at least once" can be, which makes the distinction between queue and pub/sub even less clear. So I would drop that part, especially since your next paragraph says it doesn't really matter for tracing anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I'll drop this part.

span.log(event='general exception', payload=e)
span.set_tag('error', true)
span.finish()
raise
Copy link
Member

@yurishkuro yurishkuro Mar 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this, a more idiomatic way is

with span:
    messaging_client.send(message)

As with the RPC client example, a messaging producer is expected to start a new tracing Span before sending a message, and propagate the new Span along with that message. The following example shows how it can be done.

```python
def traced_request(message, operation, producer):
Copy link
Member

@yurishkuro yurishkuro Mar 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • maybe send_traced, to not confuse with rpc req/res
  • the producer is not used, did you mean to record it somewhere in tags?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

producer was left over from when looking to record the type, so will remove.


The main distinction between these styles is that a message written to a queue can only be consumed by **at most** one consumer, whereas a message published on a topic can be consumed by zero or more subscribers. The other point to note is that although a message can be sent/published by a producer, that does not guarantee that it will be consumed - messaging systems are asynchronous and therefore the producer has no visibility of when or if the message was delivered.

From a tracing perspective, the messaging style is not important, only that the span context associated with the producer is propagated to the zero or more consumers of the message. It is then the responsibility of the consumer(s) to create a span to encapsulate processing of the consumed message and establish a _FollowsFrom_ reference to the propagated span context.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there be a difference between FollowsFrom used to fire&forget a cache write in the same process vs. the messaging case? I think depending on the situation some users may want the messaging consumer to start a new trace and link it to the parent trace (e.g. if the parent was a part of an http request), while in other cases (pure messaging based system) they may want the same trace. Are we suggesting that the tracing system must make that decision on its own, or should we have another reference type or some other indicator from the instrumentation whether a single trace or linked traces are desired? cc @bhs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth having the distinction, but do you have some concrete scenarios where a new trace would be preferred rather than same trace?

```

* The `get_current_span()` function is not a part of the OpenTracing API. It is meant to represent some util method of retrieving the current Span from the current request context propagated implicitly (as is often the case in Python).
* Because the messaging client may throw an exception, we use a try/catch block to finish the Span in all circumstances, to ensure it is reported and avoid leaking resources.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed if with span: syntax is used

@yurishkuro
Copy link
Member

Might be worth having the distinction, but do you have some concrete scenarios where a new trace would be preferred rather than same trace?

Yes. It's fairly typical to have a trace representing some graph of RPCs and one of them puts a message to a queue to be handler by an async job. The time scale of RPC graph vs. the async job is quite different, so its useful to represent the async job as a separate trace linked to the first trace via something similar to FollowsFrom reference.

Copy link
Member

@wu-sheng wu-sheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@objectiser ,I think one important thing must be claimed, which is:

some moms(like apache kafka/RocketMQ) support batch consume, when the tracer faces this scenario, these are more than one messages, each of them carries a Context, from same or different producers.

In my tracer implementation, I will start a new trace segment(diff id), and set multi refSegmentIds, which are from the messages. This operation is called multiExtract(looking for any better name, suggestion?), like this:https://github.com/wu-sheng/sky-walking/blob/feature/3.0/skywalking-sniffer/skywalking-api/src/main/java/com/a/eye/skywalking/api/context/TracerContext.java#L149-L153 .

And what is OpenTracing spec official recommendation? I think we must set this up, before we claim to support mom trace and context-propagation.

@objectiser
Copy link
Contributor Author

objectiser commented Mar 3, 2017

@wu-sheng Agree this is an interesting case, but doesn't it depend how that batch is processed. If the consumer proceeds to process those messages independently, then isn't it reasonable to treat them still as separate trace instances?

Maybe the consumer spans need to be marked in some way to indicate that they are part of a particular batch - but unless the processing continues to treat them as a batch, then they are still separate trace instances.

@wu-sheng
Copy link
Member

wu-sheng commented Mar 3, 2017

@objectiser Agree with you that should a seperated trace on consumer side. My point is How to extract context is also part of OT Spec. One new consumer side span faces several messages.

unless the processing continues to treat them as a batch

Definitely happened , when use batch consume, they receive the messages, put them in a list, and continue to process,e.g. batch insert into database.

@objectiser
Copy link
Contributor Author

@wu-sheng Just to clarify, there are two possible approaches that may be valid in different situations:

a) messages received and processed as a batch, so a single span is created on the consumer with references to the extracted contexts for each message in the batch

b) messages received as a batch, but then processed internally as individual messages - in this case it may be preferrable to have a consumer span per message, referencing the extracted context of that message, as it is a continuation of the same trace instance. In this case, it would be useful if each consumer span (i.e. per message) is annotated in some way to identify the batch, so it is clear they all were part of the same batch for analysis purposes

The reason why (b) may be important is that the batch processing may be for efficiency purposes, but at the same time we don't want to loose the causal relationship of subsequent processing of that message with the previous processing - which would happen if the trace data converged on a single span for the whole batch.

@wu-sheng
Copy link
Member

wu-sheng commented Mar 4, 2017

@objectiser

there are two possible approaches that may be valid in different situations

Agree.

at the same time we don't want to loose the causal relationship of subsequent processing of that message with the previous processing

But this may be not right, even process data in a batch mode (situation a), because you should not break the related trace-segments. Btw, a) have much better performance than b), when you face hundred of thousands messages per seconds(even much more than this), and you want to insert them into a database.

As you can see in my tracer, a segment have multi-ref. It means these are multi-parent, still have the causal relationship between the parent-trace-segments and this batch one. As a different trace segment started, segmentRef equals the spans causal relationship. If the tracer visualize the trace in an appropriate way, you can still know the trace. After all, a trace should be as same as application goes.

@objectiser
Copy link
Contributor Author

@yurishkuro @bhs Any thoughts on scenario (b)?

@@ -253,3 +253,65 @@ if request.get('debug'):
tags={tags.SAMPLING_PRIORITY: 1}
)
```

### Tracing Messaging Scenarios
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same naming questions from opentracing/specification#49 apply here)

format=opentracing.TEXT_MAP_FORMAT,
carrier=message.headers
)
if extracted_context is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary... the spec says that a None context implies that the ref should be ignored

@yurishkuro
Copy link
Member

@objectiser this is a typical "batch write" challenge when doing tracing. There is no "right" answer, as there are different ways to model and capture causality. See Section 3 in this paper.

@objectiser
Copy link
Contributor Author

objectiser commented Mar 4, 2017

@yurishkuro Thanks for the reference.

Have created opentracing/specification#51 as a placeholder for the point you mentioned earlier about in some situations wanting to create consumer spans in a new trace.

Copy link
Contributor

@bhs bhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!


From a tracing perspective, the message bus style is not important, only that the span context associated with the producer is propagated to the zero or more consumers of the message. It is then the responsibility of the consumer(s) to create a span to encapsulate processing of the consumed message and establish a _FollowsFrom_ reference to the propagated span context.

As with the RPC client example, a messaging producer is expected to start a new tracing Span before sending a message, and propagate the new Span along with that message. The following example shows how it can be done.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subtle point, but I'd prefer propagate the new Span's SpanContext along with that message.

We should also clarify that the producer Span only lives as long as it takes to enqueue/publish the message; i.e., it does not wait for a dequeue-er/consumer before finishing the Span. (I think this is the only sane thing if someone really stops to think about it, but it's different enough from nested RPCs that it deserves to be stated explicitly)

with span:
messaging_client.send(message)
except Exception e:
span.log(event='general exception', payload=e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a scheme for error logging... https://github.com/opentracing/specification/blob/master/semantic_conventions.md#log-fields-table

I'm also fine leaving this out of the example entirely since it's a bit of a distraction.

raise
```

* The `get_current_span()` function is not a part of the OpenTracing API. It is meant to represent some util method of retrieving the current Span from the current request context propagated implicitly (as is often the case in Python).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I can't wait to kill this caveat!)

format=opentracing.TEXT_MAP_FORMAT,
carrier=message.headers
)
span = tracer.start_span(operation_name=operation, follows_from=extracted_context)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follows_from is not its own named parameter... child_of is "special" in this way. the references= named parameter works, though.

see: https://github.com/opentracing/opentracing-python/blob/master/opentracing/tracer.py#L44 and https://github.com/opentracing/opentracing-python/blob/master/opentracing/tracer.py#L163

span.set_tag('message.destination', message.destination)
```

#### Synchronous request response over queues
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "request-response"


However this pattern could also be used for delegation to indicate a third party that should be informed of the result. In which case it would be treated as two separate message exchanges with _Follows From_ relationship types linking each stage.

As it would be difficult to distinguish between these two scenarios, and the use of message oriented middleware for synchronous request/response pattern should be discouraged, it is recommended that the request/response scenario be ignored from a tracing perspective.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I can get behind this advice... IMO, this seems like a time to use "normal" child_of and shrug about the fact that a message bus is involved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is distinguishing (from a framework integration instrumentation perspective) between a request/response pattern, and where the 'replyTo' message is actually going to be consumed by another application.

One approach would be to assume that if a temporary queue is used, then only the current application could be used to receive the response.

Would prefer to create a separate issue to discuss this, and leave this caveat for now.

@bhs
Copy link
Contributor

bhs commented Apr 29, 2017

(oh, and sorry for the insane delay – was just looking at pending PRs in a few places and saw this)

@objectiser
Copy link
Contributor Author

No problem - I'll update in a couple of days.

@bhs
Copy link
Contributor

bhs commented May 27, 2017

(cleaning up old PRs, and this one is ready for merge – sorry to miss it!)

@bhs bhs merged commit 9108e72 into opentracing:master May 27, 2017
reimertz pushed a commit to reimertz/opentracing.io that referenced this pull request Jul 4, 2017
* Add messaging use case

* Removed para on 'at most' once, updated python code

* Change to use 'Message Bus' term and update code to not check if extracted_context is None

* Updated message bus scenario based on review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants