-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: blocking behaviour under load #2024
Conversation
Update to http client which fixes/improves comms with apm-server.
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
Trends 🧪 |
These arguably *fix* the tests to expect the correct order of operations: first the APM server should receive the event data (transaction, span, or error) and then *after that* the callback (to 'agent.captureError()' or to 'agent.flush()') should be called.
That failing integration test https://apm-ci.elastic.co/blue/organizations/jenkins/apm-integration-tests-selector-mbp/detail/master/15669/pipeline
worked for me when trying on my dev machine: https://gist.github.com/trentm/cc20ef0cc4a5d62aa9d8a95fb661df4c I'm not sure what that failure is. I'll try running it again. |
jenkins run the tests |
I can repro the integration tests failures via:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A reminder that we'll want to undo #2033 once this one lands.
The maxQueueSize and flushInterval vars are settings from v1 of the node.js agent -- EOL'd more than 2 years ago: https://www.elastic.co/guide/en/apm/agent/nodejs/current/upgrade-to-v2.html#v2-removed-config-options Now, elastic/apm-agent-nodejs#2024 is adding a maxQueueSize config variable with a different meaning. This old `maxQueueSize: 1` breaks integration tests for the new code by telling the agent to drop any transactions/span/errors if there is already a single event that hasn't yet been sent to the APM server.
…ent (#1104) The maxQueueSize and flushInterval vars are settings from v1 of the node.js agent -- EOL'd more than 2 years ago: https://www.elastic.co/guide/en/apm/agent/nodejs/current/upgrade-to-v2.html#v2-removed-config-options Now, elastic/apm-agent-nodejs#2024 is adding a maxQueueSize config variable with a different meaning. This old `maxQueueSize: 1` breaks integration tests for the new code by telling the agent to drop any transactions/span/errors if there is already a single event that hasn't yet been sent to the APM server.
jenkins run the tests |
I can repro the current Integration Tests failure locally via:
FWIW. Ah:
Need to update package.json now that elastic/apm-nodejs-http-client#151 has been merged. |
jenkins run the tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewer notes.
@@ -15,7 +15,7 @@ | |||
"coverage:report": "nyc report --reporter=lcov", | |||
"test": "./test/script/run_tests.sh", | |||
"test:cli": "node test/script/cli.js", | |||
"test:deps": "dependency-check *.js 'lib/**/*.js' 'test/**/*.js' --no-dev -i async_hooks -i perf_hooks -i parseurl", | |||
"test:deps": "dependency-check start.js index.js 'lib/**/*.js' 'test/**/*.js' --no-dev -i async_hooks -i perf_hooks -i parseurl", | |||
"test:tav": "tav --quiet", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewer note: I like to have "foo.js" and "play.js" files (etc.) in my working copy. It is unhelpful when those break npm test
because they use some dev/test dependency, e.g.:
% npx dependency-check *.js 'lib/**/*.js' 'test/**/*.js' --no-dev -i async_hooks -i perf_hooks -i parseurl
Fail! Dependencies not listed in package.json: blocked-at
where "blocked-at" is only being used here for dev/testing (it was installed via npm install --no-save blocked-at
.
I could move to a separate PR if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I've run into that myself more than a few times. This is fine here.
self.on('close', () => { | ||
res.end() | ||
}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewer note: we shouldn't only end the response from the APM server when we server.close(). Below we properly respond.
t.strictEqual(data.name, 'foo') | ||
t.end() | ||
// 1. The APM server should receive the transaction, and then ... | ||
t.strictEqual(data.name, 'foo', 'APM server received the "foo" transaction') | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewer note: All of the test fixes are of this form. Now that the http client actually waits for a response from the APM server before concluding, the callback to .flush(cb)
or to .captureError(cb)
is reliably called after the APM server receives the data (i.e. the "data-transaction" event from the mock APM server here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New docs and changelog look good/accurate, changes to the tests look required/reasonable.
I just have one non-question about the 202
status code in _apm_server.js
-- but it's non-blocking. Approving.
@@ -15,7 +15,7 @@ | |||
"coverage:report": "nyc report --reporter=lcov", | |||
"test": "./test/script/run_tests.sh", | |||
"test:cli": "node test/script/cli.js", | |||
"test:deps": "dependency-check *.js 'lib/**/*.js' 'test/**/*.js' --no-dev -i async_hooks -i perf_hooks -i parseurl", | |||
"test:deps": "dependency-check start.js index.js 'lib/**/*.js' 'test/**/*.js' --no-dev -i async_hooks -i perf_hooks -i parseurl", | |||
"test:tav": "tav --quiet", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I've run into that myself more than a few times. This is fine here.
@@ -78,6 +75,10 @@ function APMServer (agentOpts, mockOpts = { expect: [] }) { | |||
|
|||
index++ | |||
}) | |||
parsedStream.on('end', function () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Why end with a 202? Is this just mimicking APM server's HTTP response code, or is there more going on here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, just mimicking APM server's HTTP response code.
Some background: In earlier work of the makeInjestRequest()
re-write in the http client I had initially only accepted a "202" response from APM server. However, I ended up relaxing that to accepting any "2xx" response because we have test suites using mock APM servers (like this one) that just res.end()
defaulting to a "200" response.
…ent (elastic#1104) The maxQueueSize and flushInterval vars are settings from v1 of the node.js agent -- EOL'd more than 2 years ago: https://www.elastic.co/guide/en/apm/agent/nodejs/current/upgrade-to-v2.html#v2-removed-config-options Now, elastic/apm-agent-nodejs#2024 is adding a maxQueueSize config variable with a different meaning. This old `maxQueueSize: 1` breaks integration tests for the new code by telling the agent to drop any transactions/span/errors if there is already a single event that hasn't yet been sent to the APM server.
…ent (elastic#1104) The maxQueueSize and flushInterval vars are settings from v1 of the node.js agent -- EOL'd more than 2 years ago: https://www.elastic.co/guide/en/apm/agent/nodejs/current/upgrade-to-v2.html#v2-removed-config-options Now, elastic/apm-agent-nodejs#2024 is adding a maxQueueSize config variable with a different meaning. This old `maxQueueSize: 1` breaks integration tests for the new code by telling the agent to drop any transactions/span/errors if there is already a single event that hasn't yet been sent to the APM server.
…ent (#1104) (#1125) The maxQueueSize and flushInterval vars are settings from v1 of the node.js agent -- EOL'd more than 2 years ago: https://www.elastic.co/guide/en/apm/agent/nodejs/current/upgrade-to-v2.html#v2-removed-config-options Now, elastic/apm-agent-nodejs#2024 is adding a maxQueueSize config variable with a different meaning. This old `maxQueueSize: 1` breaks integration tests for the new code by telling the agent to drop any transactions/span/errors if there is already a single event that hasn't yet been sent to the APM server.
…ent (#1104) (#1126) The maxQueueSize and flushInterval vars are settings from v1 of the node.js agent -- EOL'd more than 2 years ago: https://www.elastic.co/guide/en/apm/agent/nodejs/current/upgrade-to-v2.html#v2-removed-config-options Now, elastic/apm-agent-nodejs#2024 is adding a maxQueueSize config variable with a different meaning. This old `maxQueueSize: 1` breaks integration tests for the new code by telling the agent to drop any transactions/span/errors if there is already a single event that hasn't yet been sent to the APM server.
Update to http client which fixes/improves comms with apm-server.
Checklist