Conversation
73dc9d4
to
ad87d61
Compare
Codecov Report
@@ Coverage Diff @@
## master #208 +/- ##
==========================================
+ Coverage 94.58% 95.09% +0.51%
==========================================
Files 25 26 +1
Lines 1883 1957 +74
Branches 250 255 +5
==========================================
+ Hits 1781 1861 +80
+ Misses 67 63 -4
+ Partials 35 33 -2
Continue to review full report at Codecov.
|
One meta-comment I have: I think we will keep running into issues with backwards-compatibility of refactoring as long as the Reporter is using IOLoop for its internal queue mgmt. If we re-implement it with a separate thread (the way Lightstep did) and remove the dependency on IOLoop, the rest of the refactoring will become much easier. |
@yurishkuro, in the wake of #205 it seems like it could be a good time to address the tornado-ectomy in a subsequent PR, but I'm not 100% on the proposed architecture. Is the Reporter intended to maintain ownership of the loop and its parent, lazily created thread or should all of that be delegated to its Sender while it just maintains the basic NullReporter interface? I'm getting the (potentially incorrect) impression that you and @black-adder are approaching the solution differently. Happy to move this discussion to #31. |
I think it's best to keep the same architecture as in other Jaeger clients:
|
cf20619
to
c969ad2
Compare
Basic UDP batching has been added. I used jaeger-client-node as a rough template but am unclear if my thrift usage is optimal to get the content length as it requires serialization before properly encoding the batch. |
c969ad2
to
71a9ae6
Compare
@black-adder, wanted to make sure you knew this is where I respond to your #186 feedback. Thanks for your thorough review. |
@black-adder & @yurishkuro, is this still on the block as a precursor to an HTTPSender? I don't believe it introduces any breaking changes so both senders shouldn't necessitate a major version (re: #205). |
@rmfitzpatrick this PR is very large. I saw you added UDP batching, is it possible to pull this out into an independent PR and merge separately from HTTP refactoring? |
2962ee6
to
ad87d61
Compare
@yurishkuro, went ahead and removed that commit. |
@tiffon you said you can review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro Yes, will do 👍
Edit Oops, didn't mean to submit those comments, yet. Will finish looking this PR over tonight or tomorrow.
ad87d61
to
f2fc585
Compare
@tiffon, any chance you've had a moment to look over my updates? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmfitzpatrick I finally got time to come back to this, would like to get it moving. Top concerns:
- I left a comment about the overall Sender API
- Don't know if it's a common practice in Python, but what if we moved all implementations to something line
jaeger_client.internal
dir so that we can be more flexible about the ongoing API changes? - We may want to cut a
version-4.x
branch and go with breaking changes by making the whole internal API cleaner. At this point we need to upgrade to OpenTracing 2.0, which will require a major release anyway.
self._process = None | ||
self.spans = [] | ||
|
||
def append(self, span): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a different Sender API than what we have in the other languages (jaegertracing/jaeger#1269). Specifically, the Reporter's responsibility is only to provide multi-threading support by maintaining an internal queue, not to decide when the batch should be sent. The sender makes the flush decision, which can be based on the cumulative byte size of the packet (which is what all other languages do), or on some configuration param like max batch size
, which is what Python client did because we didn't implement the exact byte counting in Thrift encoding (not sure if it's even possible to do).
In Go the Sender interface looks like this (it is incorrectly called Transport):
// Transport abstracts the method of sending spans out of process.
// Implementations are NOT required to be thread-safe; the RemoteReporter
// is expected to only call methods on the Transport from the same go-routine.
type Transport interface {
// Append converts the span to the wire representation and adds it
// to sender's internal buffer. If the buffer exceeds its designated
// size, the transport should call Flush() and return the number of spans
// flushed, otherwise return 0. If error is returned, the returned number
// of spans is treated as failed span, and reported to metrics accordingly.
Append(span *Span) (int, error)
// Flush submits the internal buffer to the remote server. It returns the
// number of spans flushed. If error is returned, the returned number of
// spans is treated as failed span, and reported to metrics accordingly.
Flush() (int, error)
io.Closer
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the belatedness of my reply, but I have some cycles to spend on this now. I'm definitely up for adopting the proposed api, but I don't think I understand what you're referring to regarding the other jaeger clients.
jaeger-client-node: the RemoteReporter maintains a flush interval: https://github.com/jaegertracing/jaeger-client-node/blob/master/src/reporters/remote_reporter.js#L37
jaeger-client-java: the RemoteReporter maintains a flush timer: https://github.com/jaegertracing/jaeger-client-java/blob/master/jaeger-core/src/main/java/io/jaegertracing/internal/reporters/RemoteReporter.java#L65
Jaeger-client-go: the remoteReporter.processQueue() follows similar behavior: https://github.com/jaegertracing/jaeger-client-go/blob/master/reporter.go#L252
Full disclosure that I haven't poured over these and could be missing something in each case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it is possible, but expensive* to get the exact byte count of a thrift message and I'll be sure to propose proper udp batching in a subsequent PR (was removed from this one due to the added complexity).
self._thread_loop.start() | ||
return self._thread_loop._io_loop | ||
|
||
def getProtocol(self, transport): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the expectation of this method? Is it required by Thrift or something (given non-idiomatic name for Python)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this seems to be a required thrift method. This is currently a method in the reporter, and was moved to the udp sender as the protocol may depend on the sender type. 435b532#r208288056 observed it was a breaking change, so is now wrapped by the reporter to retrieve from the sender as the protocol factory could vary for each sender type. An http sender I'm currently using* uses the binary protocol for example.
`Reporter` now delegate span sending to new `Sender` classes . Signed-off-by: Vittorio Camisa <vittorio.camisa@gmail.com>
Signed-off-by: Ryan Fitzpatrick <rmfitzpatrick@signalfx.com>
Also adds private Sender attributes and methods for ease of future improvements. Signed-off-by: Ryan Fitzpatrick <rmfitzpatrick@signalfx.com>
7394c49
to
1627ea7
Compare
Also updates Reporter._consume_queue loop to not reset timeout for each span received. Signed-off-by: Ryan Fitzpatrick <rmfitzpatrick@signalfx.com>
1627ea7
to
83c9e09
Compare
@yurishkuro, I moved the batch size concerns to Sender as you requested, but realized the difficulties in reporting failed span metrics and acknowledging tornado tasks without receiving more information from the Sender. Your suggestions or answer to jaegertracing/jaeger#1269 (comment) would be appreciated, because as is I'm not sure these are ready to land.
I think having a senders package would be good enough, while continuing the |
Just curious if this effort is still continued? We have use cases in Serverless functions to trace with Jaeger and Python. |
@shuaichang I've unfortunately let this slip on my todo list, since I think it and http functionality are unlikely to land until jaegertracing/jaeger#1269 is sorted out (which it may have been but I've lost track). fwiw I'm using a fork of this library that removes internal tornado usage and supports http submissions: https://github.com/signalfx/jaeger-client-python/blob/sfx-release/jaeger_client/config.py#L137. I would like this to continue but may need someone to take the reins if unified client design is still tbd. |
Continuing on with the adoption of Sender from #186 and addresses requested changes.
This addition still has Reporter manage the consume_queue background callback as it possesses the ErrorReporter and ReporterMetrics attributes (#186 (review)). I plan on moving that functionality to Sender while incorporating appropriate UDP batching in a subsequent PR before adding an HTTPSender.edit: Given #208 (comment) I will add UDP batching to this PR with the expectation that the Reporter span queue management is to roughly stay as is functionally.edit 2: UDP batching has been added.#208 (comment) -- Just addressing breaking changes and will introduce UDP batching in another PR.