occasional "sourceEnd out of bounds" exception on receipt of message #65

Open
orlandov opened this Issue Feb 8, 2012 · 29 comments

Projects

None yet
@orlandov
orlandov commented Feb 8, 2012

This might be related to receiving a large message, it doesn't seem to happen all the time. Looks to be related to some code that was recently committed.

buffer.js:494
    throw new Error('sourceEnd out of bounds');
          ^ 
Error: sourceEnd out of bounds
    at Buffer.copy (buffer.js:494:11)
    at frame (/smartdc/node_modules/amqp/amqp.js:169:10)
    at header (/smartdc/node_modules/amqp/amqp.js:159:14)
    at frameEnd (/smartdc/node_modules/amqp/amqp.js:204:16)
    at frame (/smartdc/node_modules/amqp/amqp.js:171:14)
    at header (/smartdc/node_modules/amqp/amqp.js:159:14)
    at frameEnd (/smartdc/node_modules/amqp/amqp.js:204:16)
    at frame (/smartdc/node_modules/amqp/amqp.js:171:14)
    at AMQPParser.header [as parse] (/smartdc/node_modules/amqp/amqp.js:159:14)
    at AMQPParser.execute (/smartdc/node_modules/amqp/amqp.js:230:21)
@squaremo
Contributor
squaremo commented Feb 9, 2012

That'll be my recent changes to parsing, let me take a look.

@FGRibreau

Yep, I got the same error

@squaremo
Contributor

The fix for this, and test cases, have been merged into master -- please do try.

@FGRibreau

Yes, it fixs this issue, thanks !

@jimeh
jimeh commented Feb 22, 2012

Fixes it for me too. When could we expect to see this fix released? :)

@squaremo
Contributor

Yes, it fixs this issue, thanks !
and
Fixes it for me too.

Great! Sorry about the interruption in service.

When could we expect to see this fix released? :)

I didn't realise it had made it into a release. Ops, now I feel extra bad.

@pijewski

I ran into the same issue as orlandov and can also report that this change fixed the issue. Thanks!

@prystupa
prystupa commented Mar 3, 2012

@squaremo Did the fix make it to latest npm release? I'm still getting this with 0.1.2. Thanks!

@FGRibreau

@prystupa The fix isn't available in 0.1.2, we have to wait for 0.1.3 I think.

@FGRibreau

I still get the bug it's so inconvenient, please, publish 0.1.3.

@michaelplaing
Contributor

This a blocker for me. Is there any update on 0.1.3? Or is there a 'version' to back into?

@squaremo
Contributor

@michaelplaing @FGRibreau 0.1.1 does not have the bug, and neither lacks any features so far as I can tell.
You shall have to petition Theo for a 0.1.3 release -- he and Ryan are the maintainers (npm info amqp).

@michaelplaing
Contributor

Thanks. I found I had to go back to 0.1.0 because I was generating too many
malformed frames with 0.1.1 and rabbitmq would drop the connection. :-/ ml

On Tue, Mar 13, 2012 at 17:23, Michael Bridgen <
reply@reply.github.com

wrote:

@michaelplaing @FGRibreau 0.1.1 does not have the bug, and neither lacks
any features so far as I can tell.
You shall have to petition Theo for a 0.1.3 release -- he and Ryan are the
maintainers (npm info amqp).


Reply to this email directly or view it on GitHub:
#65 (comment)

@FGRibreau

I can't go back to 0.1.1 because 0.1.1 requires node 0.4 || 0.5 and I'm using node 0.6.12.

remote: npm WARN amqp@0.1.0 package.json: bugs['web'] should probably be bugs['url']
remote: npm ERR! Unsupported
remote: npm ERR! Not compatible with your version of node/npm: amqp@0.1.0
remote: npm ERR! Required: {"node":"0.4 || 0.5"}
remote: npm ERR! Actual:   {"npm":"1.0.106","node":"0.6.12"}
@rschiavi

I'm seeing this alot when I ssh tunnel to a server, but not much when I run against my local rabbit server. Any updates on when 0.1.3 is available? thanks

@michaelplaing
Contributor

Multiple publishes of moderately sized messages (400K) to a direct exchange fail silently on the 2nd one - queue.shift() fails as well for the input queue which was the source of the message. I think this is related and that internal structures are being overwritten. Workaround is to revert back to v0.1.0

@squaremo
Contributor

@michaelplaing Test case? (Does it pass with the head of master?)

@michaelplaing
Contributor

This stripped-down test in coffeescript passes in master with a small message but fails with a larger one. No errors are reported but in this case the publish to queue 'output2' does not occur although the ack succeeds. My test file is 341KB of JSON.

fs = require 'fs'
amqp = require 'amqp'

do_message = (message) ->
    console.log "got message"
    default_exchange.publish "output1", JSON.stringify(message), contentType:'application/json'
    console.log "published to output1"
    default_exchange.publish "output2", JSON.stringify(message), contentType:'application/json'
    console.log "published to output2"
    input_queue.shift() # acknowledge
    console.log "acked"

input_message = JSON.parse(fs.readFileSync('./test/test'))
console.log "read input_message?: #{input_message.data?}; length: #{input_message.data.length}"
amqp_connection = amqp.createConnection()

default_exchange = {}
input_queue = {}
amqp_connection.on 'ready', () -> 
    default_exchange = amqp_connection.exchange()
    queue_args = durable:true, autoDelete: false

    amqp_connection.queue 'input', queue_args, (queue) ->
        input_queue = queue
        queue.subscribe ack:true, (message) -> do_message message
        console.log "listening..."

        amqp_connection.queue 'output1', queue_args, (queue) ->
            console.log "output1"

            amqp_connection.queue 'output2', queue_args, (queue) ->
                console.log "output2"
                default_exchange.publish 'input', JSON.stringify(input_message), contentType:'application/json'
                console.log "published to input"
@squaremo
Contributor

@michaelplaing What are you expecting to see that didn't happen? I get the same output regardless of the size of the file test/test (well, up to 1MB approx.). I also verified that the both the queues output1 and output2 have a message pending at the end of the run (and the queue input has no message pending, since it's been consumed and acked).

@michaelplaing
Contributor

I am expecting to see a message in output2. None appears for messages larger than about 70KB. I'm running the latest node, rabbitmq, and erlang on OSX Lion.

@squaremo
Contributor

I traced this to the commit fc2625b, which pre-allocates (and reuses) the method sending frame buffer. It's reverted in my issue 65 branch, for which I'll issue a pull request.

Meanwhile I discovered a couple of things:

  • All the methods for sending frames ignore the return value from WritableStream.write. In the test case it so happens that with a particular content size, the publish to output2 reliably exceeds the kernel buffer
  • Connection._sendBody ignores the maximum frame size negotiated with the server. (As it happens RabbitMQ is understanding about this)

Now the most interesting thing: it appears that when the method send buffer is reused, and WritableStream.write returns false, the buffer is never written to the socket. When a new buffer is allocated each send, it is eventually written even if it overflows the kernel buffer.

Anyway the reverted commit goes back to allocating a buffer of maxFrameSize for each method send. This is pretty wasteful -- maxFrameSize is well over the threshold for allocation from the pool. Possibly a better solution is to use a single buffer for assembling frames but copy them to (likely much smaller) fresh buffers before sending.

@michaelplaing
Contributor

Thanks Michael!

On Tue, Mar 27, 2012 at 10:50, Michael Bridgen <
reply@reply.github.com

wrote:

I traced this to the commit fc2625b,
which pre-allocates (and reuses) the method sending frame buffer. It's
reverted in my issue 65 branch, for which I'll issue a pull request.

Meanwhile I discovered a couple of things:

  • All the methods for sending frames ignore the return value from
    WritableStream.write. In the test case it so happens that with a
    particular content size, the publish to output2 reliably exceeds the kernel
    buffer
  • Connection._sendBody ignores the maximum frame size negotiated with
    the server. (As it happens RabbitMQ is understanding about this)

Now the most interesting thing: it appears that when the method send
buffer is reused, and WritableStream.write returns false, the buffer is
never written to the socket. When a new buffer is allocated each send, it
is eventually written even if it overflows the kernel buffer.

Anyway the reverted commit goes back to allocating a buffer of
maxFrameSize for each method send. This is pretty wasteful -- maxFrameSize
is well over the threshold for allocation from the pool. Possibly a better
solution is to use a single buffer for assembling frames but copy them to
(likely much smaller) fresh buffers before sending.


Reply to this email directly or view it on GitHub:
#65 (comment)

@seosh81
seosh81 commented Mar 31, 2012

Did it fix?

@psla
psla commented Apr 1, 2012

I've got same error today on version 0.2.0 of rabbit.js (amqp 0.1.2) (installed yesterday)

buffer.js:494
    throw new Error('sourceEnd out of bounds');
          ^
Error: sourceEnd out of bounds
    at Buffer.copy (buffer.js:494:11)
    at frame (/srv/scripts/node_modules/rabbit.js/node_modules/amqp/amqp.js:169:10)
    at header (/srv/scripts/node_modules/rabbit.js/node_modules/amqp/amqp.js:159:14)
    at frameEnd (/srv/scripts/node_modules/rabbit.js/node_modules/amqp/amqp.js:204:16)
    at frame (/srv/scripts/node_modules/rabbit.js/node_modules/amqp/amqp.js:171:14)
    at header (/srv/scripts/node_modules/rabbit.js/node_modules/amqp/amqp.js:159:14)
    at frameEnd (/srv/scripts/node_modules/rabbit.js/node_modules/amqp/amqp.js:204:16)
    at frame (/srv/scripts/node_modules/rabbit.js/node_modules/amqp/amqp.js:171:14)
    at header (/srv/scripts/node_modules/rabbit.js/node_modules/amqp/amqp.js:159:14)
    at frameEnd (/srv/scripts/node_modules/rabbit.js/node_modules/amqp/amqp.js:204:16)

any workaround for this issue? any plans of releasing new version without this bug?

@danmilon

Same here. @postwait any update on this? Thank you.

@marcab
marcab commented Apr 13, 2012

Same here.

@ericbarch

Seeing this as well (amqp 0.1.2 from npm). Node 0.6.15. Only seems to happen under high load.

@jimeh
jimeh commented Apr 16, 2012

There's a lot people running into this issue. Personally I got around it by pointing my package.json dependency directly at https://github.com/postwait/node-amqp.git#master instead of a specific version number.

@onyxrev
onyxrev commented Mar 17, 2013

Pulled 0.1.6 and am running it in production. I have a daemon that picks up 37,000+ messages in a batch every ten minutes that was falling over on itself with this issue. Haven't seen it since upgrading to 0.1.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment