New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Resolve logs getting blocked when an excessive payload is encoun… #2013
Conversation
853a6a7
to
d2c4eee
Compare
@@ -74,6 +75,51 @@ tap.test('reportSettings', (t) => { | |||
t.end() | |||
}) | |||
}) | |||
|
|||
t.test('handles excessive payload sizes without blocking subsequent sends', (t) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this implicitly tests the issue. not sure if it's good enough but here's what i'd do
t.test('handles excessive payload sizes without blocking subsequent sends', (t) => {
const tstamp = 1_707_756_300_000 // 2024-02-12T11:45:00.000-05:00
function log(data) {
return JSON.stringify({
level: 30,
time: tstamp,
pid: 17035,
hostname: 'test-host',
msg: data
})
}
const kb512 = log(crypto.randomBytes(524_288).toString('base64'))
const mb1 = log(crypto.randomBytes(1_048_576).toString('base64'))
const toFind = log('find me')
let apiHits = 0
let sends = 0
nock(URL)
.persist(true)
.post(helper.generateCollectorPath('log_event_data', RUN_ID))
.reply(200, () => {
apiHits += 1
})
agent.logs.add(kb512)
agent.logs.send()
agent.logs.add(mb1)
agent.logs.send()
agent.logs.add(toFind)
agent.logs.send()
agent.logs.on('finished log_event_data data send.', (t) => {
sends++
if (sends === 3) {
t.equal(apiHits, 2, 'should only transmit batches that are within max limit')
}
})
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have to call send after every log enqueue because that's when data is harvested. so this asserts once all 3 batches have been harvested that only 2 make it to the collector and implicitly doesn't back it up like it did before your fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that does not replicate real world conditions. Even in the reproduction script I added to the original issue, we can see all three logs when we JSON.parse(data)[0].logs
at
node-newrelic/lib/collector/remote-method.js
Line 160 in d2c4eee
const useDeflate = this._config.compressed_content_encoding === 'deflate' |
If we just drop the whole package then we are likely dropping logs that would be allowed if they were tried individually. My suspicion at the time of this post is that the issue is deeper than just "is it an exceeded error? drop it".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i understand. this is what our specifications state. Unfortunately one bad apple will ruin the entire batch in this situation.
lib/collector/errors.js
Outdated
|
||
'use strict' | ||
|
||
const ERR_CODES = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure we need all of this for this ticket. i would think you're only handling the max payload exceeded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured it might be a start to centralizing errors and establishing error codes (similar to https://github.com/fastify/fastify/blob/ffbc92c78a588e5ec6f16d20492f23b08654345f/lib/errors.js). But I'm not tied to this. I can make the payload exceeded error unique if you think that would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may be fine but would prefer we keep this PR scoped to fixing the bug. especially since the other codes are errors on construction of a remote method class which only happens at boostrap of agent or when we make a request but it's all http options that we are controlling
d2c4eee
to
a17042a
Compare
a17042a
to
6eb0dea
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2013 +/- ##
==========================================
+ Coverage 97.07% 97.11% +0.03%
==========================================
Files 225 239 +14
Lines 40490 41040 +550
==========================================
+ Hits 39307 39854 +547
- Misses 1183 1186 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good. just a few comments around the extra error codes that i'd prob defer to another time, if ever
lib/collector/errors.js
Outdated
|
||
'use strict' | ||
|
||
const ERR_CODES = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may be fine but would prefer we keep this PR scoped to fixing the bug. especially since the other codes are errors on construction of a remote method class which only happens at boostrap of agent or when we make a request but it's all http options that we are controlling
@@ -74,6 +75,51 @@ tap.test('reportSettings', (t) => { | |||
t.end() | |||
}) | |||
}) | |||
|
|||
t.test('handles excessive payload sizes without blocking subsequent sends', (t) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i understand. this is what our specifications state. Unfortunately one bad apple will ruin the entire batch in this situation.
This PR resolves #1998.
I have tested this with an updated reproduction script:
The
!!! should still get recorded
log shows up in my dashboard.