feat(#362): adds flush method to http reporter. #417

jcchavezs · 2019-07-20T16:06:40Z

This PR adds a flush method for http reporter.

Closes #362

jcchavezs · 2019-07-20T16:07:32Z

packages/zipkin/src/batch-recorder.js

@@ -135,6 +147,15 @@ class BatchRecorder {
    }
  }

+  flush() {
+    this.partialSpans.forEach((span, id) => {
+      if (span.delegate.duration === undefined) {


if the span is finished, we don't want to add akward annotations outside the span lifespan. Do you agree?

codefromthecrypt

Thanks for the help on this @jcchavezs

It is unnecessary to expose the flush interval to give the functionality of flush. If you do both at the same time this means more tests and docs. You can solve the problem just by exposing flush alone from what I understand.

Anyway, I would recommend re-using values and text from async reporter. ex zero is an invalid time so just use that if you choose to disable the flush thread.

Adding of a flush annotation should not happen when someone calls flush(). This means the timeout code should either do the annotation adding logic, or both logic dispatch to an internal function with a boolean. I wouldn't expose to users an api for flush that has any parameters.

Also note that this logic will be extracted later as it is arbitrary that the queuing is pinned to http transport.

https://github.com/openzipkin/zipkin-reporter-java/blob/master/core/src/main/java/zipkin2/reporter/AsyncReporter.java

    /**
     * Default 1 second. 0 implies spans are {@link #flush() flushed} externally.
     *
     * <p>Instead of sending one message at a time, spans are bundled into messages, up to {@link
     * Sender#messageMaxBytes()}. This timeout ensures that spans are not stuck in an incomplete
     * message.
     *
     * <p>Note: this timeout starts when the first unsent span is reported.
     */

  /**
   * Calling this will flush any pending spans to the transport on the current thread.
   *
   * <p>Note: If you set {@link Builder#messageTimeout(long, TimeUnit) message timeout} to zero, you
   * must call this externally as otherwise spans will never be sent.
   *
   * @throws IllegalStateException if closed
   */
  @Override public abstract void flush();

codefromthecrypt · 2019-07-21T04:41:29Z

actually I am not sure the logic here is timeout or interval in nature. my recommendation is to remove extraction of the time parameter and just expose flush. that way we dont have to setup a new parameter just to remove it later as we need to redo the batching part anyway.

jcchavezs · 2019-07-21T09:32:27Z

Makes sense both comments. It actually makes things easier and actually the go reporter works the same way as you describe.

jcchavezs · 2019-07-21T17:39:31Z

PTAL @adriancole

codefromthecrypt · 2019-07-23T05:11:15Z

packages/zipkin/test/batch-recorder.test.js

+
+    recorder.flush();
+
+    expect(logSpan.calledOnce).to.equal(true);


maybe you can rebase and redo this test. A similar old one was lying.

packages/zipkin/src/batch-recorder.js

Co-Authored-By: Adrian Cole <adriancole@users.noreply.github.com>

codefromthecrypt

thanks, made some explanatory comments mainly to engage you on some of the recent test related refactorings.

codefromthecrypt · 2019-07-23T07:18:22Z

packages/zipkin/test/batch-recorder.test.js


    expect(spans).to.be.empty; // eslint-disable-line no-unused-expressions
+
+    clock.tick(100); // 1000 is de batching interval


I would personally remove the lolex stuff as it is irrelevant. For example, we have other tests in the same file which show that records are not flushed until complete. The only responsibility this one should have is that when you hit flush the data is expected. The clock ticking stuff distracts from an otherwise simple goal IOTW. For this test you can even use fake timestamps like other things here

Right. I think for me it was better to make it clear that the flush occurs when it was not yet the time for a scheduled flush but definitively I can remove it.

codefromthecrypt · 2019-07-23T07:18:33Z

packages/zipkin/test/batch-recorder.test.js

+    expect(popSpan()).to.deep.equal({
+      traceId: rootId.traceId,
+      id: rootId.spanId,
+      kind: 'SERVER'


surprised the timestamp didn't make it!

I guess it was because of lolex. I think this is ready to be merged.

codefromthecrypt · 2019-07-23T09:52:13Z

win!

jeffthompson1971 · 2019-10-09T19:12:52Z

This zipkin.flush spans just bring down my whole system. Any time an endpoint returns a 400 error code, for some reason I get a zipkin.flush span ... and they all pile up under the same traceID. So i end up with this SINGLE trace that is thousands of spans all with same trace ID. This makes it so the zipkin portal UI doesn't render cuz it stack overflows. I don't understand why every single span has same trace ID... here is a SMALL part of the json for the trace.. it's ENDLESS

question is how can i NOT include this feature? this feature totally broke me...

codefromthecrypt · 2019-10-09T19:29:45Z

@jeffthompson1971 what you are describing is a really bad bug it seems. I've tried to help you on gitter before and maybe notifications aren't working. there's a human communication problem by which I'm not able to get a suggestion to you to try connect instead (ideally avoiding the bug) https://gitter.im/openzipkin/zipkin?at=5d93e7a9fb131014e721b07f

At any rate, you can mask the bug by setting the batch recorder timeout to a very high number, which essentially makes it impossible to ever become timed out.

codefromthecrypt · 2019-10-09T19:33:04Z

note that if you do set this to a really high number there's a chance, depending on the nature of the bug, that it could leak memory. I really implore trying some suggestions to make the bug not happen

jcchavezs · 2019-10-11T15:45:28Z

Hi @jeffthompson1971,

What you are describing is really bad but I am not sure this is a problem caused for this PR. This PR includes a method for flush but that method is being called by the user, not by any zipkin instrumentation. The piece of code that adds the zipkin-js.flush annotations is #416 and that is adding the annotation. What I think you are experiencing is that the middleware is holding the span and not finishing. Could you please provide your setup over gitter?

jcchavezs · 2019-10-16T10:23:20Z

Hi @jeffthompson1971 we just released https://github.com/openzipkin/zipkin-js/releases/tag/v0.19.1. This should fix the problem you have in express as hooks usage have been fixed, would you mind trying it?

jeffthompson1971 · 2019-11-12T19:31:52Z

sorry I missed this guys. I will try this today!

jeffthompson1971 · 2019-11-12T19:36:21Z

Actually I already tried it I had forgotten. Still see the issue. And I was also given the recommendation to move to zipkin-instrumentation-connect and I did that too.. still i get these zipkin.flush spans. wish those would just go away until i figure out why this is happening. Just not easy to debug this it seems.

jcchavezs commented Jul 20, 2019

View reviewed changes

codefromthecrypt reviewed Jul 21, 2019

View reviewed changes

jcchavezs force-pushed the 362_adds_http_logger_flush branch from fa751dc to 04e233f Compare July 21, 2019 17:39

jcchavezs force-pushed the 362_adds_http_logger_flush branch from 04e233f to 9475e51 Compare July 21, 2019 17:40

jcchavezs mentioned this pull request Jul 21, 2019

HTTP transporter does not send open records when node process is about to terminate #33

Closed

jcchavezs force-pushed the 362_adds_http_logger_flush branch from 9475e51 to 9e48988 Compare July 21, 2019 17:52

codefromthecrypt reviewed Jul 23, 2019

View reviewed changes

packages/zipkin/src/batch-recorder.js Show resolved Hide resolved

jcchavezs and others added 3 commits July 23, 2019 09:11

feat(#362): adds flush method to http reporter.

1deb081

chore(#362): refactors flush setup based on @adriancole's feedback.

467b9f1

docs: improves documentation for flush method.

989dffe

Co-Authored-By: Adrian Cole <adriancole@users.noreply.github.com>

jcchavezs force-pushed the 362_adds_http_logger_flush branch from 5af752b to 989dffe Compare July 23, 2019 07:14

codefromthecrypt reviewed Jul 23, 2019

View reviewed changes

tests(#362): removes lolex dependency for flush test.

458b8a4

codefromthecrypt merged commit 6b29aa1 into master Jul 23, 2019

codefromthecrypt deleted the 362_adds_http_logger_flush branch July 23, 2019 09:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(#362): adds flush method to http reporter. #417

feat(#362): adds flush method to http reporter. #417

jcchavezs commented Jul 20, 2019 •

edited

Loading

jcchavezs Jul 20, 2019

codefromthecrypt left a comment

codefromthecrypt commented Jul 21, 2019

jcchavezs commented Jul 21, 2019 via email •

edited

Loading

jcchavezs commented Jul 21, 2019 •

edited

Loading

codefromthecrypt Jul 23, 2019

codefromthecrypt left a comment

codefromthecrypt Jul 23, 2019

jcchavezs Jul 23, 2019 •

edited

Loading

codefromthecrypt Jul 23, 2019

jcchavezs Jul 23, 2019

codefromthecrypt commented Jul 23, 2019

jeffthompson1971 commented Oct 9, 2019

codefromthecrypt commented Oct 9, 2019

codefromthecrypt commented Oct 9, 2019

jcchavezs commented Oct 11, 2019

jcchavezs commented Oct 16, 2019

jeffthompson1971 commented Nov 12, 2019

jeffthompson1971 commented Nov 12, 2019


		expect(spans).to.be.empty; // eslint-disable-line no-unused-expressions

		clock.tick(100); // 1000 is de batching interval

feat(#362): adds flush method to http reporter. #417

feat(#362): adds flush method to http reporter. #417

Conversation

jcchavezs commented Jul 20, 2019 • edited Loading

jcchavezs Jul 20, 2019

Choose a reason for hiding this comment

codefromthecrypt left a comment

Choose a reason for hiding this comment

codefromthecrypt commented Jul 21, 2019

jcchavezs commented Jul 21, 2019 via email • edited Loading

jcchavezs commented Jul 21, 2019 • edited Loading

codefromthecrypt Jul 23, 2019

Choose a reason for hiding this comment

codefromthecrypt left a comment

Choose a reason for hiding this comment

codefromthecrypt Jul 23, 2019

Choose a reason for hiding this comment

jcchavezs Jul 23, 2019 • edited Loading

Choose a reason for hiding this comment

codefromthecrypt Jul 23, 2019

Choose a reason for hiding this comment

jcchavezs Jul 23, 2019

Choose a reason for hiding this comment

codefromthecrypt commented Jul 23, 2019

jeffthompson1971 commented Oct 9, 2019

codefromthecrypt commented Oct 9, 2019

codefromthecrypt commented Oct 9, 2019

jcchavezs commented Oct 11, 2019

jcchavezs commented Oct 16, 2019

jeffthompson1971 commented Nov 12, 2019

jeffthompson1971 commented Nov 12, 2019

jcchavezs commented Jul 20, 2019 •

edited

Loading

jcchavezs commented Jul 21, 2019 via email •

edited

Loading

jcchavezs commented Jul 21, 2019 •

edited

Loading

jcchavezs Jul 23, 2019 •

edited

Loading