fix: Resolve logs getting blocked when an excessive payload is encoun… #2013

jsumners-nr · 2024-02-12T20:20:35Z

This PR resolves #1998.

I have tested this with an updated reproduction script:

'use strict'

const newrelic = require('newrelic')
const crypto = require('node:crypto')
const { setTimeout } = require('node:timers/promises')
const xbytes = require('xbytes')
const pino = require('pino')
const log = pino(pino.destination('/dev/null'))
// const log = pino()

newrelic.startBackgroundTransaction('log-size-exceeded', async () => {
  const tx = newrelic.getTransaction()
  const allowableBytes = crypto.randomBytes(xbytes.parseSize('64KiB'))
  log.info({ bytes: allowableBytes.toString('base64') }, 'allowable log line')

  const tooManyBytes = crypto.randomBytes(xbytes.parseSize('2MiB'))
  log.info({ bytes: tooManyBytes.toString('base64') }, 'log line too big')

  // Internal method to force what _should_ happen:
  tx._transaction.logs.flush(1)
  console.log('waiting for harvest cycle...')
  let waitSecs = 10
  let int = setInterval(() => {
    if (waitSecs === 0) {
      clearInterval(int)
      return
    }
    console.log(waitSecs--)
  }, 1_000)
  await setTimeout(1000 * waitSecs)

  log.info('!!! should still get recorded')


  newrelic.shutdown({ collectPendingData: true }, () => {})
})

The !!! should still get recorded log shows up in my dashboard.

lib/collector/api.js

bizob2828 · 2024-02-12T22:25:22Z

test/unit/collector/api.test.js

@@ -74,6 +75,51 @@ tap.test('reportSettings', (t) => {
      t.end()
    })
  })
+
+  t.test('handles excessive payload sizes without blocking subsequent sends', (t) => {


this implicitly tests the issue. not sure if it's good enough but here's what i'd do

t.test('handles excessive payload sizes without blocking subsequent sends', (t) => { const tstamp = 1_707_756_300_000 // 2024-02-12T11:45:00.000-05:00 function log(data) { return JSON.stringify({ level: 30, time: tstamp, pid: 17035, hostname: 'test-host', msg: data }) } const kb512 = log(crypto.randomBytes(524_288).toString('base64')) const mb1 = log(crypto.randomBytes(1_048_576).toString('base64')) const toFind = log('find me') let apiHits = 0 let sends = 0 nock(URL) .persist(true) .post(helper.generateCollectorPath('log_event_data', RUN_ID)) .reply(200, () => { apiHits += 1 }) agent.logs.add(kb512) agent.logs.send() agent.logs.add(mb1) agent.logs.send() agent.logs.add(toFind) agent.logs.send() agent.logs.on('finished log_event_data data send.', (t) => { sends++ if (sends === 3) { t.equal(apiHits, 2, 'should only transmit batches that are within max limit') } }) })

you have to call send after every log enqueue because that's when data is harvested. so this asserts once all 3 batches have been harvested that only 2 make it to the collector and implicitly doesn't back it up like it did before your fix

But that does not replicate real world conditions. Even in the reproduction script I added to the original issue, we can see all three logs when we JSON.parse(data)[0].logs at

node-newrelic/lib/collector/remote-method.js

Line 160 in d2c4eee

const useDeflate = this._config.compressed_content_encoding === 'deflate'

If we just drop the whole package then we are likely dropping logs that would be allowed if they were tried individually. My suspicion at the time of this post is that the issue is deeper than just "is it an exceeded error? drop it".

i understand. this is what our specifications state. Unfortunately one bad apple will ruin the entire batch in this situation.

bizob2828 · 2024-02-12T22:26:55Z

lib/collector/errors.js

+
+'use strict'
+
+const ERR_CODES = {


not sure we need all of this for this ticket. i would think you're only handling the max payload exceeded

I figured it might be a start to centralizing errors and establishing error codes (similar to https://github.com/fastify/fastify/blob/ffbc92c78a588e5ec6f16d20492f23b08654345f/lib/errors.js). But I'm not tied to this. I can make the payload exceeded error unique if you think that would be better.

it may be fine but would prefer we keep this PR scoped to fixing the bug. especially since the other codes are errors on construction of a remote method class which only happens at boostrap of agent or when we make a request but it's all http options that we are controlling

…tered

codecov · 2024-02-13T14:43:36Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (b460c7b) 97.07% compared to head (eb23ead) 97.11%.
Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2013      +/-   ##
==========================================
+ Coverage   97.07%   97.11%   +0.03%     
==========================================
  Files         225      239      +14     
  Lines       40490    41040     +550     
==========================================
+ Hits        39307    39854     +547     
- Misses       1183     1186       +3

Flag	Coverage Δ
integration-tests-16.x	`78.70% <14.28%> (-0.12%)`	⬇️
integration-tests-18.x	`78.68% <14.28%> (-0.43%)`	⬇️
integration-tests-20.x	`78.68% <14.28%> (-0.43%)`	⬇️
unit-tests-16.x	`91.08% <100.00%> (+0.02%)`	⬆️
unit-tests-18.x	`90.97% <100.00%> (-0.06%)`	⬇️
unit-tests-20.x	`90.97% <100.00%> (-0.06%)`	⬇️
versioned-tests-16.x	`74.90% <3.57%> (+0.24%)`	⬆️
versioned-tests-18.x	`74.92% <3.57%> (+0.24%)`	⬆️
versioned-tests-20.x	`74.93% <3.57%> (+0.24%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

bizob2828

overall looks good. just a few comments around the extra error codes that i'd prob defer to another time, if ever

bizob2828 · 2024-02-13T16:34:05Z

lib/collector/errors.js

+
+'use strict'
+
+const ERR_CODES = {


it may be fine but would prefer we keep this PR scoped to fixing the bug. especially since the other codes are errors on construction of a remote method class which only happens at boostrap of agent or when we make a request but it's all http options that we are controlling

lib/collector/errors.js

lib/config/index.js

bizob2828 · 2024-02-13T16:38:56Z

test/unit/collector/api.test.js

@@ -74,6 +75,51 @@ tap.test('reportSettings', (t) => {
      t.end()
    })
  })
+
+  t.test('handles excessive payload sizes without blocking subsequent sends', (t) => {


i understand. this is what our specifications state. Unfortunately one bad apple will ruin the entire batch in this situation.

test/unit/collector/api.test.js

newrelic-node-agent-team added this to Needs PR Review in Node.js Engineering Board Feb 12, 2024

jsumners-nr force-pushed the issue-1998 branch from 853a6a7 to d2c4eee Compare February 12, 2024 20:41

bizob2828 requested changes Feb 12, 2024

View reviewed changes

jsumners-nr force-pushed the issue-1998 branch from d2c4eee to a17042a Compare February 13, 2024 13:03

fix: Resolve logs getting blocked when an excessive payload is encoun…

6eb0dea

…tered

jsumners-nr force-pushed the issue-1998 branch from a17042a to 6eb0dea Compare February 13, 2024 14:34

jsumners-nr marked this pull request as ready for review February 13, 2024 15:19

bizob2828 requested changes Feb 13, 2024

View reviewed changes

address feedback

eb23ead

jsumners-nr requested a review from bizob2828 February 13, 2024 17:03

bizob2828 approved these changes Feb 13, 2024

View reviewed changes

jsumners-nr merged commit 365ded4 into newrelic:main Feb 13, 2024
23 checks passed

Node.js Engineering Board automation moved this from Needs PR Review to Done: Issues recently completed Feb 13, 2024

jsumners-nr deleted the issue-1998 branch February 13, 2024 18:24

github-actions bot mentioned this pull request Feb 13, 2024

chore: release v11.10.4 #2016

Merged

bizob2828 mentioned this pull request Feb 13, 2024

test: Cleaned up pending mocks in the log dropping test to avoid a console warning around pending mocks in after each #2017

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Resolve logs getting blocked when an excessive payload is encoun… #2013

fix: Resolve logs getting blocked when an excessive payload is encoun… #2013

jsumners-nr commented Feb 12, 2024 •

edited

bizob2828 Feb 12, 2024

bizob2828 Feb 12, 2024

jsumners-nr Feb 13, 2024

bizob2828 Feb 13, 2024

bizob2828 Feb 12, 2024

jsumners-nr Feb 13, 2024

bizob2828 Feb 13, 2024

codecov bot commented Feb 13, 2024 •

edited

bizob2828 left a comment

bizob2828 Feb 13, 2024

bizob2828 Feb 13, 2024

fix: Resolve logs getting blocked when an excessive payload is encoun… #2013

fix: Resolve logs getting blocked when an excessive payload is encoun… #2013

Conversation

jsumners-nr commented Feb 12, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Feb 13, 2024 • edited

Codecov Report

bizob2828 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsumners-nr commented Feb 12, 2024 •

edited

codecov bot commented Feb 13, 2024 •

edited