Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent double call of async callback #268

Closed
wants to merge 134 commits into from

Conversation

joeyfromspace
Copy link

CHANGE LOG

  • Wraps the async series callback in harvester in another function to prevent double calls

NOTES

This is to address uncaught exceptions that we've been seeing after upgrading to NR agent 4.X from 1.X.

While this Pull Request does not address the root cause of the double callbacks (I think someone with more experience with this code base would be better suited to do that), it will prevent uncaught exceptions by ensuring that a given callback is only called once.

Here is an example stack trace that we've been seeing:

uncaughtException (api) Callback was already called. Error: Callback was already called. 
    at /opt/node_modules/newrelic/node_modules/async/dist/async.js:966:32 
    at afterSend (/opt/node_modules/newrelic/lib/harvest.js:156:7) 
    at ClientRequest.standardHandler (/opt/node_modules/newrelic/lib/collector/api.js:517:12) 
    at ClientRequest.emit (events.js:182:13) 
    at ClientRequest.EventEmitter.emit (domain.js:442:20) 
    at TLSSocket.socketErrorListener (_http_client.js:391:9) 
    at TLSSocket.emit (events.js:182:13) 
    at TLSSocket.EventEmitter.emit (domain.js:442:20) 
    at TLSSocket._emitTLSError (_tls_wrap.js:606:10) 
    at TLSWrap.onerror (_tls_wrap.js:268:11)

Peter Svetlichny and others added 30 commits November 19, 2018 18:24
@NatalieWolfe
Copy link
Contributor

Thanks for this contribution! We've just recently cleaned up a lot of the harvest/collector code and are about to release those changes (holiday season may cause delays, of course). This may resolve the underlying issue, and will definitely cause a conflict with this PR.

Once these changes are released, and if they don't resolve your issue, then we will look into taking this PR.

@3z3qu13l
Copy link

@joeyfromspace could you rebase with latest 4.13.0 to check ?

@joeyfromspace
Copy link
Author

@3z3qu13l Done!

@3z3qu13l
Copy link

We just got it 3 times today on different apps

With version 4.10.0:
Error: Callback was already called., at /var/www/node_modules/async/dist/async.js:966:32, at onMetricData (/var/www/node_modules/newrelic/lib/harvest.js:274:16), at ClientRequest.standardHandler (/var/www/node_modules/newrelic/lib/collector/api.js:464:12), at ClientRequest.emit (events.js:182:13), at TLSSocket.socketErrorListener (_http_client.js:391:9), at TLSSocket.emit (events.js:182:13), at TLSSocket._emitTLSError (_tls_wrap.js:606:10), at TLSWrap.onerror (_tls_wrap.js:268:11)

With version 4.11.0:
Error: Callback was already called., at /var/www/node_modules/async/dist/async.js:966:32, at onAnalyticsEvents (/var/www/node_modules/newrelic/lib/harvest.js:370:9), at ClientRequest.standardHandler (/var/www/node_modules/newrelic/lib/collector/api.js:468:12), at ClientRequest.emit (events.js:182:13), at TLSSocket.socketErrorListener (_http_client.js:391:9), at TLSSocket.emit (events.js:182:13), at TLSSocket._emitTLSError (_tls_wrap.js:606:10), at TLSWrap.onerror (_tls_wrap.js:268:11)

I'ld like to be sure it's fixed before updating every app packages.

@NatalieWolfe
Copy link
Contributor

@joeyfromspace I believe your rebase went a little wrong. I don't think you want all 134 of these commits.

@NatalieWolfe
Copy link
Contributor

@3z3qu13l Have you tested with 4.13.0 yet? We just released this version yesterday and it contains a major refactoring of the collector and harvest code, the exact areas causing this problem.

I would also like to know if there were any other errors in the agent log from this time? The only reason for the callback to be executed twice would involve a connection or runtime error. Knowing specifically what caused the first callback to execute would help us determine the root cause of the double call.

@3z3qu13l
Copy link

@NatalieWolfe
4.13.0 is probably gonna be shipped in our next release.
I'll ask for the logs on Monday.
Since I don't know how to reproduce I can't verify with certitude the latest version; maybe the logs will help me.

@3z3qu13l
Copy link

Not much information in my logs:
{"v":0,"level":30,"name":"newrelic","hostname":"xxxxxxxxxxxxx","pid":53,"time":"2018-12-20T05:43:21.842Z","msg":"Error on submission to New Relic (data held for redelivery):","code":"ECONNRESET","path":null,"host":"collector-009.newrelic.com","port":443,"stack":"Error: Client network socket disconnected before secure TLS connection was established\n at TLSSocket.onConnectEnd (_tls_wrap.js:1086:19)\n at Object.onceWrapper (events.js:273:13)\n at TLSSocket.emit (events.js:187:15)\n at endReadableNT (_stream_readable.js:1094:12)\n at process._tickCallback (internal/process/next_tick.js:63:19)","message":"Client network socket disconnected before secure TLS connection was established"} {"v":0,"level":50,"name":"newrelic","hostname":"xxxxxxxxxxxxx","pid":53,"time":"2018-12-20T09:11:45.452Z","msg":"Calling error_data on New Relic failed unexpectedly. Data will be held until it can be submitted:","component":"collector_api","errno":"ETIMEDOUT","code":"ETIMEDOUT","syscall":"read","stack":"Error: read ETIMEDOUT\n at TLSWrap.onStreamRead (internal/stream_base_commons.js:111:27)","message":"read ETIMEDOUT"} {"v":0,"level":40,"name":"newrelic","hostname":"xxxxxxxxxxxxx","pid":53,"time":"2018-12-20T09:11:45.452Z","msg":"Merging 14 (of 14) errors for next delivery.","component":"error_tracer"} {"v":0,"level":40,"name":"newrelic","hostname":"xxxxxxxxxxxxx","pid":53,"time":"2018-12-20T09:11:45.452Z","msg":"Merging 14 of 14 events into ErrorAggregator for next harvest","component":"event_aggregator"} {"v":0,"level":40,"name":"newrelic","hostname":"xxxxxxxxxxxxx","pid":53,"time":"2018-12-20T09:11:45.452Z","msg":"Merging 0 of 0 events into SpanAggregator for next harvest","component":"event_aggregator"} {"v":0,"level":30,"name":"newrelic","hostname":"xxxxxxxxxxxxx","pid":53,"time":"2018-12-20T09:11:45.452Z","msg":"Error on submission to New Relic (data held for redelivery):","errno":"ETIMEDOUT","code":"ETIMEDOUT","syscall":"read","stack":"Error: read ETIMEDOUT\n at TLSWrap.onStreamRead (internal/stream_base_commons.js:111:27)","message":"read ETIMEDOUT"} {"v":0,"level":40,"name":"newrelic","hostname":"xxxxxxxxxxxxx","pid":53,"time":"2018-12-21T14:40:23.909Z","msg":"Length limit exceeded for attribute name, not adding to transaction trace: request.parameters.campaignarea\\xxxxxxxxxxxxxxxx,final_url:","component":"trace"} {"v":0,"level":50,"name":"newrelic","hostname":"xxxxxxxxxxxxx","pid":53,"time":"2018-12-22T04:56:42.166Z","msg":"Calling analytic_event_data on New Relic failed unexpectedly. Data will be held until it can be submitted:","component":"collector_api","statusCode":408,"laterErrors":[{}],"stack":"Error: No body found in response to analytic_event_data.\n at StreamSink.parser [as callback] (/var/www/node_modules/newrelic/lib/collector/parse-response.js:95:19)\n at StreamSink.end (/var/www/node_modules/newrelic/lib/util/stream-sink.js:43:8)\n at IncomingMessage.onend (_stream_readable.js:628:10)\n at Object.onceWrapper (events.js:273:13)\n at IncomingMessage.emit (events.js:187:15)\n at endReadableNT (_stream_readable.js:1094:12)\n at process._tickCallback (internal/process/next_tick.js:63:19)","message":"No body found in response to analytic_event_data."}

No new uncaughtException since.
Despite the fact that the module need to be secured, could it be related to newrelic server ?

@NatalieWolfe
Copy link
Contributor

Hey @3z3qu13l, I hope your holidays went well.

To my knowledge, the connection issue is unique to your case. I checked our incident log (available publicly here: https://status.newrelic.com/) and our maintenances. There were none that aligned with this time window.

I just looked at the support ticket you guys opened too, looks like this was resolved. Is that correct? If it hasn't could you send updated information to that ticket? Our support engineers are much better equipped to help debug these kinds of problems.

@kid-icarus
Copy link

Hi @NatalieWolfe I am also facing this issue running 4.8.1

Here are some agent logs that correlate to the time of the double callback uncaught exception. They are fairly intermittent for us, and it seems to be related to network/tls issues.

image

image 1

@NatalieWolfe
Copy link
Contributor

@kid-icarus Have you tried updating to the latest agent? There was a fairly large refactoring of the collector code after 4.8 which may resolve this.

@kid-icarus
Copy link

Hey @NatalieWolfe, I updated to the latest agent, and will report back if I encounter the issue again. Thanks :)

@NatalieWolfe
Copy link
Contributor

@joeyfromspace @kid-icarus Is this still an issue for either of you with the latest agent? If not, I'll close this PR, if it is please fix this PR so it does not contain a hundred duplicated commits.

@kid-icarus
Copy link

Hey @NatalieWolfe, I just looked through logs, and after the upgrade I can see that those connection errors aren't causing problems anymore.

Thanks!

@joeyfromspace
Copy link
Author

I haven't been able to test ourselves, but will go ahead and close this PR as it's likely going to need to be re-done anyhow if the issue persists due to the volume of changes.

Thanks!

bizob2828 added a commit to bizob2828/node-newrelic that referenced this pull request Apr 19, 2024
bizob2828 added a commit to bizob2828/node-newrelic that referenced this pull request Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants