Skip to content
This repository has been archived by the owner on Jul 11, 2022. It is now read-only.

Initial Sender implementation #208

Closed

Conversation

rmfitzpatrick
Copy link
Contributor

@rmfitzpatrick rmfitzpatrick commented Aug 27, 2018

Continuing on with the adoption of Sender from #186 and addresses requested changes.

This addition still has Reporter manage the consume_queue background callback as it possesses the ErrorReporter and ReporterMetrics attributes (#186 (review)). I plan on moving that functionality to Sender while incorporating appropriate UDP batching in a subsequent PR before adding an HTTPSender.

edit: Given #208 (comment) I will add UDP batching to this PR with the expectation that the Reporter span queue management is to roughly stay as is functionally.

edit 2: UDP batching has been added. #208 (comment) -- Just addressing breaking changes and will introduce UDP batching in another PR.

@codecov
Copy link

codecov bot commented Aug 27, 2018

Codecov Report

Merging #208 into master will increase coverage by 0.51%.
The diff coverage is 97.67%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #208      +/-   ##
==========================================
+ Coverage   94.58%   95.09%   +0.51%     
==========================================
  Files          25       26       +1     
  Lines        1883     1957      +74     
  Branches      250      255       +5     
==========================================
+ Hits         1781     1861      +80     
+ Misses         67       63       -4     
+ Partials       35       33       -2
Impacted Files Coverage Δ
jaeger_client/config.py 91.62% <ø> (ø) ⬆️
jaeger_client/local_agent_net.py 95.55% <100%> (+0.2%) ⬆️
jaeger_client/senders.py 97.26% <97.26%> (ø)
jaeger_client/reporter.py 96.85% <98.14%> (+4.98%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a1c7e9d...83c9e09. Read the comment docs.

@yurishkuro
Copy link
Member

One meta-comment I have: I think we will keep running into issues with backwards-compatibility of refactoring as long as the Reporter is using IOLoop for its internal queue mgmt. If we re-implement it with a separate thread (the way Lightstep did) and remove the dependency on IOLoop, the rest of the refactoring will become much easier.

@rmfitzpatrick
Copy link
Contributor Author

rmfitzpatrick commented Aug 27, 2018

@yurishkuro, in the wake of #205 it seems like it could be a good time to address the tornado-ectomy in a subsequent PR, but I'm not 100% on the proposed architecture. Is the Reporter intended to maintain ownership of the loop and its parent, lazily created thread or should all of that be delegated to its Sender while it just maintains the basic NullReporter interface? I'm getting the (potentially incorrect) impression that you and @black-adder are approaching the solution differently.

Happy to move this discussion to #31.

@yurishkuro
Copy link
Member

I think it's best to keep the same architecture as in other Jaeger clients:

  • the Reporter runs a background thread and a queue
  • queue is used to shed load if the sender is not fast enough
  • senders do not need to be multi-threaded, only Reporter is

@rmfitzpatrick rmfitzpatrick changed the title Initial Sender implementation WIP: Initial Sender implementation Aug 27, 2018
@rmfitzpatrick rmfitzpatrick changed the title WIP: Initial Sender implementation Initial Sender implementation Aug 27, 2018
@rmfitzpatrick rmfitzpatrick force-pushed the initial_sender branch 2 times, most recently from cf20619 to c969ad2 Compare August 27, 2018 21:21
@rmfitzpatrick
Copy link
Contributor Author

Basic UDP batching has been added. I used jaeger-client-node as a rough template but am unclear if my thrift usage is optimal to get the content length as it requires serialization before properly encoding the batch.

@rmfitzpatrick
Copy link
Contributor Author

@black-adder, wanted to make sure you knew this is where I respond to your #186 feedback. Thanks for your thorough review.

@rmfitzpatrick
Copy link
Contributor Author

@black-adder & @yurishkuro, is this still on the block as a precursor to an HTTPSender? I don't believe it introduces any breaking changes so both senders shouldn't necessitate a major version (re: #205).

@yurishkuro
Copy link
Member

@rmfitzpatrick this PR is very large. I saw you added UDP batching, is it possible to pull this out into an independent PR and merge separately from HTTP refactoring?

@rmfitzpatrick rmfitzpatrick force-pushed the initial_sender branch 4 times, most recently from 2962ee6 to ad87d61 Compare October 16, 2018 18:34
@rmfitzpatrick
Copy link
Contributor Author

@yurishkuro, went ahead and removed that commit.

@yurishkuro
Copy link
Member

@tiffon you said you can review this?

Copy link
Member

@tiffon tiffon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro Yes, will do 👍

Edit Oops, didn't mean to submit those comments, yet. Will finish looking this PR over tonight or tomorrow.

jaeger_client/reporter.py Outdated Show resolved Hide resolved
jaeger_client/reporter.py Outdated Show resolved Hide resolved
@rmfitzpatrick
Copy link
Contributor Author

@tiffon, any chance you've had a moment to look over my updates?

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmfitzpatrick I finally got time to come back to this, would like to get it moving. Top concerns:

  • I left a comment about the overall Sender API
  • Don't know if it's a common practice in Python, but what if we moved all implementations to something line jaeger_client.internal dir so that we can be more flexible about the ongoing API changes?
  • We may want to cut a version-4.x branch and go with breaking changes by making the whole internal API cleaner. At this point we need to upgrade to OpenTracing 2.0, which will require a major release anyway.

self._process = None
self.spans = []

def append(self, span):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a different Sender API than what we have in the other languages (jaegertracing/jaeger#1269). Specifically, the Reporter's responsibility is only to provide multi-threading support by maintaining an internal queue, not to decide when the batch should be sent. The sender makes the flush decision, which can be based on the cumulative byte size of the packet (which is what all other languages do), or on some configuration param like max batch size, which is what Python client did because we didn't implement the exact byte counting in Thrift encoding (not sure if it's even possible to do).

In Go the Sender interface looks like this (it is incorrectly called Transport):

// Transport abstracts the method of sending spans out of process.
// Implementations are NOT required to be thread-safe; the RemoteReporter
// is expected to only call methods on the Transport from the same go-routine.
type Transport interface {
	// Append converts the span to the wire representation and adds it
	// to sender's internal buffer.  If the buffer exceeds its designated
	// size, the transport should call Flush() and return the number of spans
	// flushed, otherwise return 0. If error is returned, the returned number
	// of spans is treated as failed span, and reported to metrics accordingly.
	Append(span *Span) (int, error)

	// Flush submits the internal buffer to the remote server. It returns the
	// number of spans flushed. If error is returned, the returned number of
	// spans is treated as failed span, and reported to metrics accordingly.
	Flush() (int, error)

	io.Closer
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the belatedness of my reply, but I have some cycles to spend on this now. I'm definitely up for adopting the proposed api, but I don't think I understand what you're referring to regarding the other jaeger clients.

jaeger-client-node: the RemoteReporter maintains a flush interval: https://github.com/jaegertracing/jaeger-client-node/blob/master/src/reporters/remote_reporter.js#L37

jaeger-client-java: the RemoteReporter maintains a flush timer: https://github.com/jaegertracing/jaeger-client-java/blob/master/jaeger-core/src/main/java/io/jaegertracing/internal/reporters/RemoteReporter.java#L65

Jaeger-client-go: the remoteReporter.processQueue() follows similar behavior: https://github.com/jaegertracing/jaeger-client-go/blob/master/reporter.go#L252

Full disclosure that I haven't poured over these and could be missing something in each case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it is possible, but expensive* to get the exact byte count of a thrift message and I'll be sure to propose proper udp batching in a subsequent PR (was removed from this one due to the added complexity).

self._thread_loop.start()
return self._thread_loop._io_loop

def getProtocol(self, transport):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the expectation of this method? Is it required by Thrift or something (given non-idiomatic name for Python)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this seems to be a required thrift method. This is currently a method in the reporter, and was moved to the udp sender as the protocol may depend on the sender type. 435b532#r208288056 observed it was a breaking change, so is now wrapped by the reporter to retrieve from the sender as the protocol factory could vary for each sender type. An http sender I'm currently using* uses the binary protocol for example.

ProvoK and others added 3 commits January 18, 2019 14:31
`Reporter` now delegate span sending to new `Sender` classes .

Signed-off-by: Vittorio Camisa <vittorio.camisa@gmail.com>
Signed-off-by: Ryan Fitzpatrick <rmfitzpatrick@signalfx.com>
Also adds private Sender attributes and methods for ease of future
improvements.

Signed-off-by: Ryan Fitzpatrick <rmfitzpatrick@signalfx.com>
@rmfitzpatrick rmfitzpatrick force-pushed the initial_sender branch 2 times, most recently from 7394c49 to 1627ea7 Compare January 22, 2019 18:47
Also updates Reporter._consume_queue loop to not reset timeout
for each span received.

Signed-off-by: Ryan Fitzpatrick <rmfitzpatrick@signalfx.com>
@rmfitzpatrick
Copy link
Contributor Author

@yurishkuro, I moved the batch size concerns to Sender as you requested, but realized the difficulties in reporting failed span metrics and acknowledging tornado tasks without receiving more information from the Sender. Your suggestions or answer to jaegertracing/jaeger#1269 (comment) would be appreciated, because as is I'm not sure these are ready to land.

  • Don't know if it's a common practice in Python, but what if we moved all implementations to something line jaeger_client.internal dir so that we can be more flexible about the ongoing API changes?

I think having a senders package would be good enough, while continuing the _-prefix method of not guaranteeing stability.

@shuaichang
Copy link

Just curious if this effort is still continued? We have use cases in Serverless functions to trace with Jaeger and Python.

@rmfitzpatrick
Copy link
Contributor Author

@shuaichang I've unfortunately let this slip on my todo list, since I think it and http functionality are unlikely to land until jaegertracing/jaeger#1269 is sorted out (which it may have been but I've lost track). fwiw I'm using a fork of this library that removes internal tornado usage and supports http submissions: https://github.com/signalfx/jaeger-client-python/blob/sfx-release/jaeger_client/config.py#L137.

I would like this to continue but may need someone to take the reins if unified client design is still tbd.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants