-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(#362): adds flush method to http reporter. #417
Conversation
@@ -135,6 +147,15 @@ class BatchRecorder { | |||
} | |||
} | |||
|
|||
flush() { | |||
this.partialSpans.forEach((span, id) => { | |||
if (span.delegate.duration === undefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the span is finished, we don't want to add akward annotations outside the span lifespan. Do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the help on this @jcchavezs
It is unnecessary to expose the flush interval to give the functionality of flush. If you do both at the same time this means more tests and docs. You can solve the problem just by exposing flush alone from what I understand.
Anyway, I would recommend re-using values and text from async reporter. ex zero is an invalid time so just use that if you choose to disable the flush thread.
Adding of a flush annotation should not happen when someone calls flush(). This means the timeout code should either do the annotation adding logic, or both logic dispatch to an internal function with a boolean. I wouldn't expose to users an api for flush that has any parameters.
Also note that this logic will be extracted later as it is arbitrary that the queuing is pinned to http transport.
/**
* Default 1 second. 0 implies spans are {@link #flush() flushed} externally.
*
* <p>Instead of sending one message at a time, spans are bundled into messages, up to {@link
* Sender#messageMaxBytes()}. This timeout ensures that spans are not stuck in an incomplete
* message.
*
* <p>Note: this timeout starts when the first unsent span is reported.
*/
/**
* Calling this will flush any pending spans to the transport on the current thread.
*
* <p>Note: If you set {@link Builder#messageTimeout(long, TimeUnit) message timeout} to zero, you
* must call this externally as otherwise spans will never be sent.
*
* @throws IllegalStateException if closed
*/
@Override public abstract void flush();
actually I am not sure the logic here is timeout or interval in nature. my recommendation is to remove extraction of the time parameter and just expose flush. that way we dont have to setup a new parameter just to remove it later as we need to redo the batching part anyway. |
Makes sense both comments. It actually makes things easier and actually the
go reporter works the same way as you describe.
|
fa751dc
to
04e233f
Compare
PTAL @adriancole |
04e233f
to
9475e51
Compare
9475e51
to
9e48988
Compare
|
||
recorder.flush(); | ||
|
||
expect(logSpan.calledOnce).to.equal(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you can rebase and redo this test. A similar old one was lying.
Co-Authored-By: Adrian Cole <adriancole@users.noreply.github.com>
5af752b
to
989dffe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, made some explanatory comments mainly to engage you on some of the recent test related refactorings.
|
||
expect(spans).to.be.empty; // eslint-disable-line no-unused-expressions | ||
|
||
clock.tick(100); // 1000 is de batching interval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would personally remove the lolex stuff as it is irrelevant. For example, we have other tests in the same file which show that records are not flushed until complete. The only responsibility this one should have is that when you hit flush the data is expected. The clock ticking stuff distracts from an otherwise simple goal IOTW. For this test you can even use fake timestamps like other things here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I think for me it was better to make it clear that the flush occurs when it was not yet the time for a scheduled flush but definitively I can remove it.
expect(popSpan()).to.deep.equal({ | ||
traceId: rootId.traceId, | ||
id: rootId.spanId, | ||
kind: 'SERVER' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
surprised the timestamp didn't make it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it was because of lolex
. I think this is ready to be merged.
win! |
This zipkin.flush spans just bring down my whole system. Any time an endpoint returns a 400 error code, for some reason I get a zipkin.flush span ... and they all pile up under the same traceID. So i end up with this SINGLE trace that is thousands of spans all with same trace ID. This makes it so the zipkin portal UI doesn't render cuz it stack overflows. I don't understand why every single span has same trace ID... here is a SMALL part of the json for the trace.. it's ENDLESS question is how can i NOT include this feature? this feature totally broke me... [{"traceId":"2fede1516e4a9531","parentId":"20376d16c0bdeef9","id":"80001e186964c474","kind":"CLIENT","name":"post","timestamp":1570639888002950,"localEndpoint":{"serviceName":"bis-service"},"annotations":[{"timestamp":1570639949002000,"value":"zipkin-js.flush"}],"tags":{"component":"AbtTracer","http.path":"/superset/explore_json/","project":"aip-core-ui"}},{"traceId":"2fede1516e4a9531","parentId":"614fd007f68a98e2","id":"8006b05ef6c90fb4","kind":"CLIENT","name":"post","timestamp":1570642255539077,"localEndpoint":{"serviceName":"bis-service"},"annotations":[{"timestamp":1570642316145000,"value":"zipkin-js.flush"}],"tags":{"component":"AbtTracer","http.path":"/superset/explore_json/","project":"aip-core-ui"}},{"traceId":"2fede1516e4a9531","parentId":"1653057111a48a03","id":"8006fdf52b7751a4","kind":"CLIENT","name":"post","timestamp":1570637439396212,"localEndpoint":{"serviceName":"bis-service"},"annotations":[{"timestamp":1570637499780000,"value":"zipkin-js.flush"}],"tags":{"component":"AbtTracer","http.path":"/superset/explore_json/","project":"aip-core-ui"}},{"traceId":"2fede1516e4a9531","parentId":"53eff77f02d6e14f","id":"800f2d0832f65b5f","kind":"CLIENT","name":"post","timestamp":1570636293176881,"localEndpoint":{"serviceName":"bis-service"},"annotations":[{"timestamp":1570636353781000,"value":"zipkin-js.flush"}],"tags":{"component":"AbtTracer","http.path":"/superset/explore_json/","project":"aip-core-ui"}},{"traceId":"2fede1516e4a9531","parentId":"a108ced553744bcc","id":"80212179d2b26ba7","kind":"CLIENT","name":"post","timestamp":1570633862471293,"localEndpoint":{"serviceName":"bis-service"},"annotations":[{"timestamp":1570633922531000,"value":"zipkin-js.flush"}],"tags":{"component":"AbtTracer","http.path":"/superset/explore_json/","project":"aip-core-ui"}},{"traceId":"2fede1516e4a9531","parentId":"f23363dbbb25a6b1","id":"802d851d77e25dd1","kind":"CLIENT","name":"post","timestamp":1570638686552524,"localEndpoint":{"serviceName":"bis-service"},"annotations":[{"timestamp":1570638746915000,"value":"zipkin-js.flush"}],"tags":{"component":"AbtTracer","http.path":"/superset/explore_json/","project":"aip-core-ui"}},{"traceId":"2fede1516e4a9531","parentId":"38b9db64fd9413dc","id":"802e871225693318","kind":"CLIENT","name":"post","timestamp":1570645863532340,"localEndpoint":{"serviceName":"bis-service"},"annotations":[{"timestamp":1570645924478000,"value":"zipkin-js.flush"}],"tags":{"component":"AbtTracer","http.path":"/superset/explore_json/","project":"aip-core-ui"}},{"traceId":"2fede1516e4a9531","parentId":"2a59056443015c05","id":"8041f9691cc256cb","kind":"CLIENT","name":"post","timestamp":1570632850830676,"localEndpoint":{"serviceName":"bis-service"},"annotations":[{"timestamp":1570632911628000,"value":"zipkin-js.flush"}],"tags":{"component":"AbtTracer","http.path":"/superset/explore_json/","project":"aip-core-ui"}},{"traceId":"2fede1516e4a9531","parentId":"2c3378fbbb126517","id":"804482adb6500c7e","kind":"CLIENT","name":"post","timestamp":1570639890764047,"localEndpoint":{"serviceName":"bis-service"},"annotations":[{"timestamp":1570639951005000,"value":"zipkin-js.flush"}],"tags":{"component":"AbtTracer","http.path":"/superset/explore_json/","project":"aip-core-ui"}},{"traceId":"2fede1516e4a9531","parentId":"9988ae5b455be743","id":"8049e96752bd04cb","kind":"CLIENT","name":"post","timestamp":1570647080045901,"duration":1157594,"localEndpoint":{"serviceName":"bis-service"},"annotations":[{"timestamp":1570647081229572,"value":"finish"},{"timestamp":1570647081471349,"value":"finish"},{"timestamp":1570647081473861,"value":"finish"},{"timestamp":1570647081476349,"value":"finish"},{"timestamp":1570647081486183,"value":"finish"},{"timestamp":1570647081521970,"value":"finish"},{"timestamp":1570647081523545,"value":"finish"},{"timestamp":1570647081533483,"value":"finish"},{"timestamp":1570647081546296,"value":"finish"},{"timestamp":1570647081934752,"value":"finish"},{"timestamp":1570647081963690,"value":"finish"},{"timestamp":1570647081995823,"value":"finish"},{"timestamp":1570647082056161,"value":"finish"},{"timestamp":1570647082061941,"value":"finish"},{"timestamp":1570647082066852,"value":"finish"},{"timestamp":1570647082457872,"value":"finish"},{"timestamp":1570647082459342,"value":"finish"},{"timestamp":1570647082464122,"value":"finish"},{"timestamp":1570647082464445,"value":"finish"},{"timestamp":1570647082464722,"value":"finish"}, |
@jeffthompson1971 what you are describing is a really bad bug it seems. I've tried to help you on gitter before and maybe notifications aren't working. there's a human communication problem by which I'm not able to get a suggestion to you to try connect instead (ideally avoiding the bug) https://gitter.im/openzipkin/zipkin?at=5d93e7a9fb131014e721b07f At any rate, you can mask the bug by setting the batch recorder timeout to a very high number, which essentially makes it impossible to ever become timed out. |
note that if you do set this to a really high number there's a chance, depending on the nature of the bug, that it could leak memory. I really implore trying some suggestions to make the bug not happen |
What you are describing is really bad but I am not sure this is a problem caused for this PR. This PR includes a method for flush but that method is being called by the user, not by any zipkin instrumentation. The piece of code that adds the |
Hi @jeffthompson1971 we just released https://github.com/openzipkin/zipkin-js/releases/tag/v0.19.1. This should fix the problem you have in express as hooks usage have been fixed, would you mind trying it? |
sorry I missed this guys. I will try this today! |
Actually I already tried it I had forgotten. Still see the issue. And I was also given the recommendation to move to zipkin-instrumentation-connect and I did that too.. still i get these zipkin.flush spans. wish those would just go away until i figure out why this is happening. Just not easy to debug this it seems. |
This PR adds a
flush
method for http reporter.Closes #362
Ping @csp197 @adriancole