Skip to content

fix: GetDataStream must extend Writable, not legacy Stream#19

Merged
msimerson merged 1 commit intoharaka:masterfrom
bjarn:fix/get-data-spool-hang
Apr 8, 2026
Merged

fix: GetDataStream must extend Writable, not legacy Stream#19
msimerson merged 1 commit intoharaka:masterfrom
bjarn:fix/get-data-spool-hang

Conversation

@bjarn
Copy link
Copy Markdown
Contributor

@bjarn bjarn commented Apr 8, 2026

fixes #18

Summary

MessageStream.get_data(cb) hangs forever on spooled messages when the constructor is passed a non-empty headers array. No 'error' event fires; the only symptom is that cb is never invoked, so any plugin reading the full body from data_post hits Haraka's plugin_timeout and DENYSOFTs the message.

Test coverage

Added one regression test: get_data() returns full body for spooled messages with ctor headers. It:

  1. Creates a MessageStream with spool_after = 5 MB and a non-empty headers array (mimicking Haraka transactions).
  2. Writes ~6 MB of body — large enough to spool and to fill internal highWaterMarks.
  3. Calls get_data(cb) with a 3s timeout.
  4. Asserts the callback fires with the full body.

Without this fix, the test times out after 3s with get_data callback never fired. With the fix it completes in ~70 ms.

Minimal repro

const MessageStream = require('haraka-message-stream')

const ms = new MessageStream(
  { main: { spool_after: 5 * 1024 * 1024, spool_dir: '/tmp' } },
  'repro',
  ['From: a@example.com\r\n'], // ctor headers is the trigger
)

ms.add_line('From: a@example.com\r\n')
ms.add_line('\r\n')
for (let i = 0; i < 100000; i++) ms.add_line('A'.repeat(76) + '\r\n')

ms.add_line_end(() => {
  console.log('spooling=', ms.spooling) // true
  const timeout = setTimeout(() => {
    console.log('❌ get_data never fired')
    process.exit(2)
  }, 5000)
  ms.get_data((buf) => {
    clearTimeout(timeout)
    console.log('✅ get_data fired, bytes=', buf.length)
    process.exit(0)
  })
})

Without the fix: prints ❌ get_data never fired after 5 s.
With the fix: prints ✅ get_data fired, bytes=7600000 after ~40 ms.

This comment was marked as resolved.

@msimerson msimerson changed the title fix: GetDataStream must extend Writable, not legacy Stream. Fixes #18 fix: GetDataStream must extend Writable, not legacy Stream Apr 8, 2026
@msimerson msimerson merged commit 612ca5b into haraka:master Apr 8, 2026
22 of 23 checks passed
@msimerson msimerson mentioned this pull request Apr 8, 2026
@msimerson
Copy link
Copy Markdown
Member

msimerson commented Apr 8, 2026

Today I remembered why I didn't update to the new Writable yet: node v20 is still under LTS until 2026-04-18. Yep, 10 more days before it goes EOL. This PR surfaces a race condition in node v20. Fortunately, the CI tests caught it (🎉 for good tests!) and I'll have it fixed soon. Fun fun.

msimerson added a commit that referenced this pull request Apr 8, 2026
- fix: GetDataStream must extend Writable, not legacy Stream #19
- fix: limit header size to prevent memory exhaustion
- fix: limit boundaries to prevent memory exhaustion
- fix: sanitize the spool filename, prevent path traversal
- fix: on Writable stream, set `autoClose: false`
  - `end` is a noop here, that's for fs.createReadableStream
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

get_data() callback never fires for spooled messages

3 participants