fix: blocking behaviour under load #2024

trentm · 2021-03-30T15:15:31Z

Update to http client which fixes/improves comms with apm-server.

Checklist

Update to http client which fixes/improves comms with apm-server.

apmmachine · 2021-03-30T15:26:05Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Build Cause: Pull request #2024 updated
Start Time: 2021-04-12T18:42:06.384+0000
Duration: 17 min 28 sec
Commit: 40c29cb

Test stats 🧪

Test	Results
Failed	0
Passed	16686
Skipped	0
Total	16686

Trends 🧪

These arguably *fix* the tests to expect the correct order of operations: first the APM server should receive the event data (transaction, span, or error) and then *after that* the callback (to 'agent.captureError()' or to 'agent.flush()') should be called.

…ul of 'npm test'

…changelog

trentm · 2021-04-05T23:01:14Z

That failing integration test https://apm-ci.elastic.co/blue/organizations/jenkins/apm-integration-tests-selector-mbp/detail/master/15669/pipeline

...
[2021-04-05T22:30:07.129Z] =========================== short test summary info ============================
[2021-04-05T22:30:07.129Z] FAILED tests/agent/test_nodejs.py::test_request_express - AssertionError: Exp...
[2021-04-05T22:30:07.129Z] FAILED tests/agent/test_nodejs.py::test_express_error - AssertionError: Expec...
[2021-04-05T22:30:07.129Z] FAILED tests/agent/test_nodejs.py::test_conc_req_express - AssertionError: qu...
[2021-04-05T22:30:07.129Z] FAILED tests/agent/test_nodejs.py::test_conc_req_node_foobar - AssertionError...
[2021-04-05T22:30:07.129Z] =========== 4 failed, 11281 warnings, 12 rerun in 885.55s (0:14:45) ============
[2021-04-05T22:30:07.129Z] make: *** [Makefile:101: test-agent-nodejs] Error 1
[2021-04-05T22:30:07.707Z] Makefile:134: recipe for target 'dockerized-test' failed
[2021-04-05T22:30:07.707Z] make[1]: *** [dockerized-test] Error 2
[2021-04-05T22:30:07.707Z] make[1]: Leaving directory '/var/lib/jenkins/workspace/ration-tests-selector-mbp_master/src/github.com/elastic/apm-integration-testing'
[2021-04-05T22:30:07.707Z] Makefile:131: recipe for target 'docker-test-agent-nodejs' failed
[2021-04-05T22:30:07.707Z] make: *** [docker-test-agent-nodejs] Error 2
script returned exit code 2

worked for me when trying on my dev machine: https://gist.github.com/trentm/cc20ef0cc4a5d62aa9d8a95fb661df4c

I'm not sure what that failure is. I'll try running it again.

trentm · 2021-04-05T23:04:12Z

jenkins run the tests

trentm · 2021-04-06T01:33:38Z

I can repro the integration tests failures via:

% cd .../apm-integration-testing
% export BUILD_OPTS="--nodejs-agent-package elastic/apm-agent-nodejs#e7c50f21fa0bf3f709b5ca047781222c411d754d --opbeans-node-agent-branch e7c50f21fa0bf3f709b5ca047781222c411d754d --build-parallel"
% export ELASTIC_STACK_VERSION=8.0.0
% .ci/scripts/agent.sh nodejs nodejs-express

astorm

A reminder that we'll want to undo #2033 once this one lands.

The maxQueueSize and flushInterval vars are settings from v1 of the node.js agent -- EOL'd more than 2 years ago: https://www.elastic.co/guide/en/apm/agent/nodejs/current/upgrade-to-v2.html#v2-removed-config-options Now, elastic/apm-agent-nodejs#2024 is adding a maxQueueSize config variable with a different meaning. This old `maxQueueSize: 1` breaks integration tests for the new code by telling the agent to drop any transactions/span/errors if there is already a single event that hasn't yet been sent to the APM server.

…ent (#1104) The maxQueueSize and flushInterval vars are settings from v1 of the node.js agent -- EOL'd more than 2 years ago: https://www.elastic.co/guide/en/apm/agent/nodejs/current/upgrade-to-v2.html#v2-removed-config-options Now, elastic/apm-agent-nodejs#2024 is adding a maxQueueSize config variable with a different meaning. This old `maxQueueSize: 1` breaks integration tests for the new code by telling the agent to drop any transactions/span/errors if there is already a single event that hasn't yet been sent to the APM server.

trentm · 2021-04-07T22:56:09Z

jenkins run the tests

trentm · 2021-04-07T23:13:13Z

I can repro the current Integration Tests failure locally via:

python3 scripts/compose.py start 8.0.0 --nodejs-agent-package elastic/apm-agent-nodejs#73cf4d3dd302cc9ec10945be59b5b1d859642081 --opbeans-node-agent-branch 73cf4d3dd302cc9ec10945be59b5b1d859642081 --build-parallel   --with-agent-nodejs-express   --with-opbeans-node   --no-apm-server-dashboards   --no-apm-server-self-instrument   --apm-server-agent-config-poll=1s   --force-build --no-xpack-secure

FWIW. Ah:

% docker logs bfe80e48a8e1
npm ERR! code 1
npm ERR! Command failed: git checkout trentm/call-me-back-maybe
npm ERR! error: pathspec 'trentm/call-me-back-maybe' did not match any file(s) known to git.
npm ERR!

npm ERR! A complete log of this run can be found in:
npm ERR!     /root/.npm/_logs/2021-04-07T23_09_59_770Z-debug.log

Need to update package.json now that elastic/apm-nodejs-http-client#151 has been merged.

trentm · 2021-04-08T18:41:45Z

jenkins run the tests

trentm

Reviewer notes.

trentm · 2021-04-08T19:18:41Z

package.json

@@ -15,7 +15,7 @@
    "coverage:report": "nyc report --reporter=lcov",
    "test": "./test/script/run_tests.sh",
    "test:cli": "node test/script/cli.js",
-    "test:deps": "dependency-check *.js 'lib/**/*.js' 'test/**/*.js' --no-dev -i async_hooks -i perf_hooks -i parseurl",
+    "test:deps": "dependency-check start.js index.js 'lib/**/*.js' 'test/**/*.js' --no-dev -i async_hooks -i perf_hooks -i parseurl",
    "test:tav": "tav --quiet",


Reviewer note: I like to have "foo.js" and "play.js" files (etc.) in my working copy. It is unhelpful when those break npm test because they use some dev/test dependency, e.g.:

% npx dependency-check *.js 'lib/**/*.js' 'test/**/*.js' --no-dev -i async_hooks -i perf_hooks -i parseurl Fail! Dependencies not listed in package.json: blocked-at

where "blocked-at" is only being used here for dev/testing (it was installed via npm install --no-save blocked-at.

I could move to a separate PR if you prefer.

👍 I've run into that myself more than a few times. This is fine here.

trentm · 2021-04-08T19:20:07Z

test/_apm_server.js

-      self.on('close', () => {
-        res.end()
-      })
-


Reviewer note: we shouldn't only end the response from the APM server when we server.close(). Below we properly respond.

trentm · 2021-04-08T19:22:00Z

test/agent.js

-        t.strictEqual(data.name, 'foo')
-        t.end()
+        // 1. The APM server should receive the transaction, and then ...
+        t.strictEqual(data.name, 'foo', 'APM server received the "foo" transaction')
      })


Reviewer note: All of the test fixes are of this form. Now that the http client actually waits for a response from the APM server before concluding, the callback to .flush(cb) or to .captureError(cb) is reliably called after the APM server receives the data (i.e. the "data-transaction" event from the mock APM server here).

astorm

New docs and changelog look good/accurate, changes to the tests look required/reasonable.

I just have one non-question about the 202 status code in _apm_server.js -- but it's non-blocking. Approving.

astorm · 2021-04-12T15:48:04Z

package.json

@@ -15,7 +15,7 @@
    "coverage:report": "nyc report --reporter=lcov",
    "test": "./test/script/run_tests.sh",
    "test:cli": "node test/script/cli.js",
-    "test:deps": "dependency-check *.js 'lib/**/*.js' 'test/**/*.js' --no-dev -i async_hooks -i perf_hooks -i parseurl",
+    "test:deps": "dependency-check start.js index.js 'lib/**/*.js' 'test/**/*.js' --no-dev -i async_hooks -i perf_hooks -i parseurl",
    "test:tav": "tav --quiet",


👍 I've run into that myself more than a few times. This is fine here.

astorm · 2021-04-12T16:08:02Z

test/_apm_server.js

@@ -78,6 +75,10 @@ function APMServer (agentOpts, mockOpts = { expect: [] }) {

        index++
      })
+      parsedStream.on('end', function () {


Question: Why end with a 202? Is this just mimicking APM server's HTTP response code, or is there more going on here?

Yes, just mimicking APM server's HTTP response code.

Some background: In earlier work of the makeInjestRequest() re-write in the http client I had initially only accepted a "202" response from APM server. However, I ended up relaxing that to accepting any "2xx" response because we have test suites using mock APM servers (like this one) that just res.end() defaulting to a "200" response.

…ent (elastic#1104) The maxQueueSize and flushInterval vars are settings from v1 of the node.js agent -- EOL'd more than 2 years ago: https://www.elastic.co/guide/en/apm/agent/nodejs/current/upgrade-to-v2.html#v2-removed-config-options Now, elastic/apm-agent-nodejs#2024 is adding a maxQueueSize config variable with a different meaning. This old `maxQueueSize: 1` breaks integration tests for the new code by telling the agent to drop any transactions/span/errors if there is already a single event that hasn't yet been sent to the APM server.

…ent (#1104) (#1125) The maxQueueSize and flushInterval vars are settings from v1 of the node.js agent -- EOL'd more than 2 years ago: https://www.elastic.co/guide/en/apm/agent/nodejs/current/upgrade-to-v2.html#v2-removed-config-options Now, elastic/apm-agent-nodejs#2024 is adding a maxQueueSize config variable with a different meaning. This old `maxQueueSize: 1` breaks integration tests for the new code by telling the agent to drop any transactions/span/errors if there is already a single event that hasn't yet been sent to the APM server.

…ent (#1104) (#1126) The maxQueueSize and flushInterval vars are settings from v1 of the node.js agent -- EOL'd more than 2 years ago: https://www.elastic.co/guide/en/apm/agent/nodejs/current/upgrade-to-v2.html#v2-removed-config-options Now, elastic/apm-agent-nodejs#2024 is adding a maxQueueSize config variable with a different meaning. This old `maxQueueSize: 1` breaks integration tests for the new code by telling the agent to drop any transactions/span/errors if there is already a single event that hasn't yet been sent to the APM server.

fix: blocking behaviour under load

60a2961

Update to http client which fixes/improves comms with apm-server.

trentm self-assigned this Mar 30, 2021

trentm added this to Planned in APM-Agents (OLD) via automation Mar 30, 2021

github-actions bot added the agent-nodejs Make available for APM Agents project planning. label Mar 30, 2021

trentm added 4 commits March 30, 2021 09:59

complete sentence nit

0700fd3

would be nice to be able to have foo.js at top-level and not fall afo…

6a8aa38

…ul of 'npm test'

Merge branch 'master' into trentm/blocking-behavior

5b80295

This was referenced Mar 31, 2021

fix: workaround to a git <2.7 issue #2025

Merged

fix: blocking behaviour under load elastic/apm-nodejs-http-client#144

Merged

trentm added 4 commits April 5, 2021 14:38

use the published http client version; maxQueueSize config var; docs/…

e1a9fb1

…changelog

typescript for new config var

46caeb6

docs: maxQueueSize is relevant to perf tuning the agent

35c4a79

add basic config test for maxQueueSize

e7c50f2

trentm moved this from Planned to In Progress in APM-Agents (OLD) Apr 5, 2021

trentm mentioned this pull request Apr 6, 2021

elastic-apm-http-client@9.7.0 is currently breaking tests #2032

Closed

astorm suggested changes Apr 6, 2021

View reviewed changes

trentm added 2 commits April 7, 2021 08:01

Merge branch 'master' into trentm/blocking-behavior

daa276e

temporarily get back on a working branch of the http client

3554f05

trentm mentioned this pull request Apr 7, 2021

fix: drop ancient and conflicting maxQueueSize setting for node.js agent elastic/apm-integration-testing#1104

Merged

paranoid changelog note about conflict with very old maxQueueSize

73cf4d3

trentm mentioned this pull request Apr 7, 2021

9.7.1 elastic/apm-nodejs-http-client#152

Merged

trentm added 4 commits April 7, 2021 16:15

update dep now that elastic/apm-nodejs-http-client#151 is merged

67a97d0

Merge branch 'master' into trentm/blocking-behavior

9488a93

fix up changelog (SQS and this PR's work are 'Unreleased'

4f909c2

try this

9f3c77f

trentm mentioned this pull request Apr 8, 2021

".ci/scripts/opbeans-app.sh nodejs nodejs-express node" fails if the given node.js agent commit needs git to "npm install" elastic/apm-integration-testing#1106

Closed

trentm added 2 commits April 8, 2021 12:01

Merge branch 'master' into trentm/blocking-behavior

8304872

switch to an actual client release

e59048e

trentm commented Apr 8, 2021

View reviewed changes

trentm marked this pull request as ready for review April 8, 2021 19:32

trentm requested a review from astorm April 8, 2021 19:32

astorm approved these changes Apr 12, 2021

View reviewed changes

Merge branch 'master' into trentm/blocking-behavior

40c29cb

trentm merged commit 842be71 into master Apr 12, 2021

APM-Agents (OLD) automation moved this from In Progress to Done Apr 12, 2021

trentm deleted the trentm/blocking-behavior branch April 12, 2021 19:47

trentm mentioned this pull request Apr 20, 2021

High CPU load when APM-SERVER is down #1984

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: blocking behaviour under load #2024

fix: blocking behaviour under load #2024

trentm commented Mar 30, 2021 •

edited

Loading

apmmachine commented Mar 30, 2021 •

edited

Loading

Build stats

Test stats 🧪

Trends 🧪

trentm commented Apr 5, 2021

trentm commented Apr 5, 2021

trentm commented Apr 6, 2021

astorm left a comment

trentm commented Apr 7, 2021

trentm commented Apr 7, 2021 •

edited

Loading

trentm commented Apr 8, 2021

trentm left a comment

trentm Apr 8, 2021

astorm Apr 12, 2021

trentm Apr 8, 2021

trentm Apr 8, 2021

astorm left a comment

astorm Apr 12, 2021

astorm Apr 12, 2021

trentm Apr 12, 2021

fix: blocking behaviour under load #2024

fix: blocking behaviour under load #2024

Conversation

trentm commented Mar 30, 2021 • edited Loading

Checklist

apmmachine commented Mar 30, 2021 • edited Loading

💚 Build Succeeded

Build stats

Test stats 🧪

Trends 🧪

trentm commented Apr 5, 2021

trentm commented Apr 5, 2021

trentm commented Apr 6, 2021

astorm left a comment

Choose a reason for hiding this comment

trentm commented Apr 7, 2021

trentm commented Apr 7, 2021 • edited Loading

trentm commented Apr 8, 2021

trentm left a comment

Choose a reason for hiding this comment

trentm Apr 8, 2021

Choose a reason for hiding this comment

astorm Apr 12, 2021

Choose a reason for hiding this comment

trentm Apr 8, 2021

Choose a reason for hiding this comment

trentm Apr 8, 2021

Choose a reason for hiding this comment

astorm left a comment

Choose a reason for hiding this comment

astorm Apr 12, 2021

Choose a reason for hiding this comment

astorm Apr 12, 2021

Choose a reason for hiding this comment

trentm Apr 12, 2021

Choose a reason for hiding this comment

trentm commented Mar 30, 2021 •

edited

Loading

apmmachine commented Mar 30, 2021 •

edited

Loading

trentm commented Apr 7, 2021 •

edited

Loading