Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: blocking behaviour under load #2024

Merged
merged 19 commits into from
Apr 12, 2021
Merged

fix: blocking behaviour under load #2024

merged 19 commits into from
Apr 12, 2021

Conversation

trentm
Copy link
Member

@trentm trentm commented Mar 30, 2021

Update to http client which fixes/improves comms with apm-server.

Checklist

  • Implement code
  • Add tests
  • Update TypeScript typings
  • Update documentation
  • Add CHANGELOG.asciidoc entry
  • Commit message follows commit guidelines

Update to http client which fixes/improves comms with apm-server.
@trentm trentm self-assigned this Mar 30, 2021
@trentm trentm added this to Planned in APM-Agents (OLD) via automation Mar 30, 2021
@github-actions github-actions bot added the agent-nodejs Make available for APM Agents project planning. label Mar 30, 2021
@apmmachine
Copy link
Collaborator

apmmachine commented Mar 30, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #2024 updated

  • Start Time: 2021-04-12T18:42:06.384+0000

  • Duration: 17 min 28 sec

  • Commit: 40c29cb

Test stats 🧪

Test Results
Failed 0
Passed 16686
Skipped 0
Total 16686

Trends 🧪

Image of Build Times

Image of Tests

These arguably *fix* the tests to expect the correct order of
operations: first the APM server should receive the event data
(transaction, span, or error) and then *after that* the callback
(to 'agent.captureError()' or to 'agent.flush()') should be called.
@trentm trentm moved this from Planned to In Progress in APM-Agents (OLD) Apr 5, 2021
@trentm
Copy link
Member Author

trentm commented Apr 5, 2021

That failing integration test https://apm-ci.elastic.co/blue/organizations/jenkins/apm-integration-tests-selector-mbp/detail/master/15669/pipeline

...
[2021-04-05T22:30:07.129Z] =========================== short test summary info ============================
[2021-04-05T22:30:07.129Z] FAILED tests/agent/test_nodejs.py::test_request_express - AssertionError: Exp...
[2021-04-05T22:30:07.129Z] FAILED tests/agent/test_nodejs.py::test_express_error - AssertionError: Expec...
[2021-04-05T22:30:07.129Z] FAILED tests/agent/test_nodejs.py::test_conc_req_express - AssertionError: qu...
[2021-04-05T22:30:07.129Z] FAILED tests/agent/test_nodejs.py::test_conc_req_node_foobar - AssertionError...
[2021-04-05T22:30:07.129Z] =========== 4 failed, 11281 warnings, 12 rerun in 885.55s (0:14:45) ============
[2021-04-05T22:30:07.129Z] make: *** [Makefile:101: test-agent-nodejs] Error 1
[2021-04-05T22:30:07.707Z] Makefile:134: recipe for target 'dockerized-test' failed
[2021-04-05T22:30:07.707Z] make[1]: *** [dockerized-test] Error 2
[2021-04-05T22:30:07.707Z] make[1]: Leaving directory '/var/lib/jenkins/workspace/ration-tests-selector-mbp_master/src/github.com/elastic/apm-integration-testing'
[2021-04-05T22:30:07.707Z] Makefile:131: recipe for target 'docker-test-agent-nodejs' failed
[2021-04-05T22:30:07.707Z] make: *** [docker-test-agent-nodejs] Error 2
script returned exit code 2

worked for me when trying on my dev machine: https://gist.github.com/trentm/cc20ef0cc4a5d62aa9d8a95fb661df4c

I'm not sure what that failure is. I'll try running it again.

@trentm
Copy link
Member Author

trentm commented Apr 5, 2021

jenkins run the tests

@trentm
Copy link
Member Author

trentm commented Apr 6, 2021

I can repro the integration tests failures via:

% cd .../apm-integration-testing
% export BUILD_OPTS="--nodejs-agent-package elastic/apm-agent-nodejs#e7c50f21fa0bf3f709b5ca047781222c411d754d --opbeans-node-agent-branch e7c50f21fa0bf3f709b5ca047781222c411d754d --build-parallel"
% export ELASTIC_STACK_VERSION=8.0.0
% .ci/scripts/agent.sh nodejs nodejs-express

Copy link
Contributor

@astorm astorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reminder that we'll want to undo #2033 once this one lands.

trentm added a commit to trentm/apm-integration-testing that referenced this pull request Apr 7, 2021
The maxQueueSize and flushInterval vars are settings from v1 of the
node.js agent -- EOL'd more than 2 years ago:
https://www.elastic.co/guide/en/apm/agent/nodejs/current/upgrade-to-v2.html#v2-removed-config-options

Now, elastic/apm-agent-nodejs#2024 is adding a
maxQueueSize config variable with a different meaning. This old
`maxQueueSize: 1` breaks integration tests for the new code by telling
the agent to drop any transactions/span/errors if there is already a
single event that hasn't yet been sent to the APM server.
trentm added a commit to elastic/apm-integration-testing that referenced this pull request Apr 7, 2021
…ent (#1104)

The maxQueueSize and flushInterval vars are settings from v1 of the
node.js agent -- EOL'd more than 2 years ago:
https://www.elastic.co/guide/en/apm/agent/nodejs/current/upgrade-to-v2.html#v2-removed-config-options

Now, elastic/apm-agent-nodejs#2024 is adding a
maxQueueSize config variable with a different meaning. This old
`maxQueueSize: 1` breaks integration tests for the new code by telling
the agent to drop any transactions/span/errors if there is already a
single event that hasn't yet been sent to the APM server.
@trentm
Copy link
Member Author

trentm commented Apr 7, 2021

jenkins run the tests

@trentm
Copy link
Member Author

trentm commented Apr 7, 2021

I can repro the current Integration Tests failure locally via:

python3 scripts/compose.py start 8.0.0 --nodejs-agent-package elastic/apm-agent-nodejs#73cf4d3dd302cc9ec10945be59b5b1d859642081 --opbeans-node-agent-branch 73cf4d3dd302cc9ec10945be59b5b1d859642081 --build-parallel   --with-agent-nodejs-express   --with-opbeans-node   --no-apm-server-dashboards   --no-apm-server-self-instrument   --apm-server-agent-config-poll=1s   --force-build --no-xpack-secure

FWIW. Ah:

% docker logs bfe80e48a8e1
npm ERR! code 1
npm ERR! Command failed: git checkout trentm/call-me-back-maybe
npm ERR! error: pathspec 'trentm/call-me-back-maybe' did not match any file(s) known to git.
npm ERR!

npm ERR! A complete log of this run can be found in:
npm ERR!     /root/.npm/_logs/2021-04-07T23_09_59_770Z-debug.log

Need to update package.json now that elastic/apm-nodejs-http-client#151 has been merged.

@trentm
Copy link
Member Author

trentm commented Apr 8, 2021

jenkins run the tests

Copy link
Member Author

@trentm trentm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewer notes.

@@ -15,7 +15,7 @@
"coverage:report": "nyc report --reporter=lcov",
"test": "./test/script/run_tests.sh",
"test:cli": "node test/script/cli.js",
"test:deps": "dependency-check *.js 'lib/**/*.js' 'test/**/*.js' --no-dev -i async_hooks -i perf_hooks -i parseurl",
"test:deps": "dependency-check start.js index.js 'lib/**/*.js' 'test/**/*.js' --no-dev -i async_hooks -i perf_hooks -i parseurl",
"test:tav": "tav --quiet",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewer note: I like to have "foo.js" and "play.js" files (etc.) in my working copy. It is unhelpful when those break npm test because they use some dev/test dependency, e.g.:

% npx dependency-check *.js 'lib/**/*.js' 'test/**/*.js' --no-dev -i async_hooks -i perf_hooks -i parseurl
Fail! Dependencies not listed in package.json: blocked-at

where "blocked-at" is only being used here for dev/testing (it was installed via npm install --no-save blocked-at.

I could move to a separate PR if you prefer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I've run into that myself more than a few times. This is fine here.

self.on('close', () => {
res.end()
})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewer note: we shouldn't only end the response from the APM server when we server.close(). Below we properly respond.

t.strictEqual(data.name, 'foo')
t.end()
// 1. The APM server should receive the transaction, and then ...
t.strictEqual(data.name, 'foo', 'APM server received the "foo" transaction')
})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewer note: All of the test fixes are of this form. Now that the http client actually waits for a response from the APM server before concluding, the callback to .flush(cb) or to .captureError(cb) is reliably called after the APM server receives the data (i.e. the "data-transaction" event from the mock APM server here).

@trentm trentm marked this pull request as ready for review April 8, 2021 19:32
@trentm trentm requested a review from astorm April 8, 2021 19:32
Copy link
Contributor

@astorm astorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New docs and changelog look good/accurate, changes to the tests look required/reasonable.

I just have one non-question about the 202 status code in _apm_server.js -- but it's non-blocking. Approving.

@@ -15,7 +15,7 @@
"coverage:report": "nyc report --reporter=lcov",
"test": "./test/script/run_tests.sh",
"test:cli": "node test/script/cli.js",
"test:deps": "dependency-check *.js 'lib/**/*.js' 'test/**/*.js' --no-dev -i async_hooks -i perf_hooks -i parseurl",
"test:deps": "dependency-check start.js index.js 'lib/**/*.js' 'test/**/*.js' --no-dev -i async_hooks -i perf_hooks -i parseurl",
"test:tav": "tav --quiet",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I've run into that myself more than a few times. This is fine here.

@@ -78,6 +75,10 @@ function APMServer (agentOpts, mockOpts = { expect: [] }) {

index++
})
parsedStream.on('end', function () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why end with a 202? Is this just mimicking APM server's HTTP response code, or is there more going on here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just mimicking APM server's HTTP response code.

Some background: In earlier work of the makeInjestRequest() re-write in the http client I had initially only accepted a "202" response from APM server. However, I ended up relaxing that to accepting any "2xx" response because we have test suites using mock APM servers (like this one) that just res.end() defaulting to a "200" response.

@trentm trentm merged commit 842be71 into master Apr 12, 2021
APM-Agents (OLD) automation moved this from In Progress to Done Apr 12, 2021
@trentm trentm deleted the trentm/blocking-behavior branch April 12, 2021 19:47
trentm added a commit to trentm/apm-integration-testing that referenced this pull request Apr 28, 2021
…ent (elastic#1104)

The maxQueueSize and flushInterval vars are settings from v1 of the
node.js agent -- EOL'd more than 2 years ago:
https://www.elastic.co/guide/en/apm/agent/nodejs/current/upgrade-to-v2.html#v2-removed-config-options

Now, elastic/apm-agent-nodejs#2024 is adding a
maxQueueSize config variable with a different meaning. This old
`maxQueueSize: 1` breaks integration tests for the new code by telling
the agent to drop any transactions/span/errors if there is already a
single event that hasn't yet been sent to the APM server.
trentm added a commit to trentm/apm-integration-testing that referenced this pull request Apr 28, 2021
…ent (elastic#1104)

The maxQueueSize and flushInterval vars are settings from v1 of the
node.js agent -- EOL'd more than 2 years ago:
https://www.elastic.co/guide/en/apm/agent/nodejs/current/upgrade-to-v2.html#v2-removed-config-options

Now, elastic/apm-agent-nodejs#2024 is adding a
maxQueueSize config variable with a different meaning. This old
`maxQueueSize: 1` breaks integration tests for the new code by telling
the agent to drop any transactions/span/errors if there is already a
single event that hasn't yet been sent to the APM server.
trentm added a commit to elastic/apm-integration-testing that referenced this pull request Apr 28, 2021
…ent (#1104) (#1125)

The maxQueueSize and flushInterval vars are settings from v1 of the
node.js agent -- EOL'd more than 2 years ago:
https://www.elastic.co/guide/en/apm/agent/nodejs/current/upgrade-to-v2.html#v2-removed-config-options

Now, elastic/apm-agent-nodejs#2024 is adding a
maxQueueSize config variable with a different meaning. This old
`maxQueueSize: 1` breaks integration tests for the new code by telling
the agent to drop any transactions/span/errors if there is already a
single event that hasn't yet been sent to the APM server.
trentm added a commit to elastic/apm-integration-testing that referenced this pull request Apr 28, 2021
…ent (#1104) (#1126)

The maxQueueSize and flushInterval vars are settings from v1 of the
node.js agent -- EOL'd more than 2 years ago:
https://www.elastic.co/guide/en/apm/agent/nodejs/current/upgrade-to-v2.html#v2-removed-config-options

Now, elastic/apm-agent-nodejs#2024 is adding a
maxQueueSize config variable with a different meaning. This old
`maxQueueSize: 1` breaks integration tests for the new code by telling
the agent to drop any transactions/span/errors if there is already a
single event that hasn't yet been sent to the APM server.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-nodejs Make available for APM Agents project planning.
Projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants