Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

v0.12.0 smalloc causes node process to bail #9180

Closed
ugate opened this issue Feb 10, 2015 · 26 comments
Closed

v0.12.0 smalloc causes node process to bail #9180

ugate opened this issue Feb 10, 2015 · 26 comments
Assignees
Labels
Milestone

Comments

@ugate
Copy link

ugate commented Feb 10, 2015

Receiving Assertion failed: end <= source_len, file src\smalloc.cc, line 280 while running node for a large number of asynchronous operations (issue not present in v0.10.36).

Windows 7 64bit

@Sjonnie2nd
Copy link

I've got this issue too (also Windows 7 64bit). Looking further I saw this is mentioned on the github site of io.js too. andris9/mailParser would be a possible cause (?) I use mailParser too, and you?.. But perhaps it has nothing to do with it.

Looking at http://fossies.org/dox/node-v0.12.0/smalloc_8cc_source.html:
.. dest._data = sliceOnto(source, dest, start, end); ..
.. source_len = ..
and breaks on end > source_len

@trevnorris trevnorris self-assigned this Feb 11, 2015
@trevnorris
Copy link

Do you have a way to reproduce this issue?

@ugate
Copy link
Author

ugate commented Feb 11, 2015

@trevnorris I've been trying to track down the cause, but without any type of recovery in JS it makes it hard to identify a portion of our code-base where the issue originated. Maybe there's a way to do that that I'm unaware of?

@Sjonnie2nd We're not using mailParser. The only external module we're using in our test case is soap. It only seems to be happening after a large number of asynchronous operations have been performed. I'm not entirely sure that's even related, but to make sure I've throttled the number of concurrent calls to only 5 and I'm still getting the error.

@trevnorris
Copy link

@ugate Thanks. I don't even care if it's a minimal test case. Just give me any number of steps to reproduce and I'll take a look.

@ugate
Copy link
Author

ugate commented Feb 13, 2015

@trevnorris I've isolated the cause of the issue and it looks like it originates from fs.writeFile(filePath, new Buffer(pdfContent, 'base64'), ...). It doesn't seem to happen every time the call is made. So, I'm assuming it has to do with the actual encoded content of the PDF that's being written and the number of total writes being performed. I'm trying to come up with a test case that consistently produces the error 😕

@dchest
Copy link

dchest commented Feb 13, 2015

I don't have a test case for this issue, but it looks like it always appears in this test for TweetNaCl-js:

https://travis-ci.org/dchest/tweetnacl-js/jobs/50673984#L241
https://travis-ci.org/dchest/tweetnacl-js/jobs/50668750#L241

The test in question is this:
https://github.com/dchest/tweetnacl-js/blob/master/test/c/01-scalarmult.js

The issue appears in Linux, I can't reproduce this in OS X.

Looks like the issue appeared somewhere between v0.11.13 and v0.11.16.

@Sjonnie2nd
Copy link

Earlier I mentioned I have this problem with mailParser (Windows 7 64bit). I assume it is not caused by mailParser itself but perhaps by the surrounding memory management of node.

The problem appeard to me in 2 situations:

  • On random moments when the webserver is in a more or less idle state: Sometimes nightly when a new email is received through pop3. Sometimes when the (idle) webserver is accessed from another machine.
  • In a reproducable bulk process: synchrously walking through 3000 emails, uncompress and parse them.
    With uncompressing only the process works without problem. With parsing it crashes on about 10 emails (trial and error). Excluding those 10, the process finishes without problem. But: parsing the 10 affected only gave no problem too.

Tried to force a strict garbage collection with global.gc() after every email: The process got slower, the memory usage got very low and stable, but it still crashed on the same 10.

At last I tried to debug this within Webstorm. I noticed that with the debugger attached it broke in an earlier stage then the 10 mails mentioned. Also it looked like the positioning of breakpoints influenced the moment of the crash. Finally it brought me to the following code in mailParser.js:

1047: level = level || 1;
1048: for (var i = 0, len = node.childNodes.length; i < len; i++) {

It crashed on 1048. In the line itself nothing is wrong. It could be node crashes on the moment node is asked to reserve memory for the var's i and len?

Lots of information and I'm afraid not very useful, but maybe it helps.

@mjhasbach
Copy link

Getting this error as well in Node v0.12.0:

  1. Used bops to convert a Base64-encoded MP3 file to a Buffer.
  2. Used superagent's post and send to submit a POST request for that Buffer.

Hexadecimal-encoded mp3 -> Buffer -> POST is working fine.

@trevnorris
Copy link

Thanks for reporting this. Fixed by a35ba2f.

dchest added a commit to dchest/tweetnacl-js that referenced this issue Feb 24, 2015
0.12 has a bug with Buffers, which breaks tests:

nodejs/node-v0.x-archive#9180

It will be fixed in the next release. For now, disable testing with
Node 0.12.

Io.js fixed the same bug in v1.3.0:

nodejs/node#661

so if Travis CI have updated iojs version, tests should work.
@johntrandall
Copy link

I'm also experiencing this issue using theIntern and SeleniumGrid. It is reproducible.

The error happens about 80% of the time using node v0.12.0. This error does NOT happen on v0.11.0.
See https://gist.github.com/johntrandall/182ddadccf971d4ecd20.

@trevnorris
Copy link

@johntrandall I'll need a smaller test case for me to reproduce, and this will be fixed in v0.12.1. The fix commit hasn't been released yet.

/cc @tjfontaine v0.12.1 release.

@dchest
Copy link

dchest commented Mar 24, 2015

Looks like the bug is still in v0.12.1.

@johntrandall
Copy link

@trevnorris Thanks for looking at this.

I'm pretty green- and I'm not quite sure how to make a smaller test case. I could writeup exactly how to install Seleneium and theIntern and run these tests or maybe write a script to do it, but I'd be at a loss as to where to begin in trying to pair it down.

@misterdjules
Copy link

@dchest Yes, v0.12.1 is v0.12.0 + OpenSSL upgrades, nothing else.

@trevnorris
Copy link

@dchest yeah sorry, because of the openssl patch everything that was scheduled for v0.12.1 got pushed to v0.12.2. Hopefully that's also released soon.

@trevnorris
Copy link

@johntrandall before you spend your time doing that, I'm confident the issue has already been fixed in a later commit. You'll need to try v0.12.x latest. unfortunately there are no nightly builds that I'm aware of, but when v0.12.2 is released you should be able to try it out.

@johntrandall
Copy link

@trevnorris I'll keep an eye out for it. Thank you.

@dchest
Copy link

dchest commented Mar 25, 2015

@trevnorris good to know, thanks

@misterdjules
Copy link

Just for the record, there are instructions to reproduce the issue (and see how #14165 fixes it) here: nodejs/node#661 (comment).

@dchest
Copy link

dchest commented Mar 28, 2015

@misterdjules yes, @trevnorris committed the same fix to both Node.js and io.js: a35ba2f

@misterdjules
Copy link

@dchest Yes, I'm aware of that. We want to do a v0.12.2 release soon that includes a fix for this problem. But because the fix doesn't come with a regression test, I needed to find some code that reproduced the issue to make sure it was actually fixed. I figured that putting this information in a place where it's easy to find would help us write that regression test.

@misterdjules
Copy link

@dchest Re-reading my latest comment, I just realized it may have sounded a bit rude. I want to apologize for that, and I want to thank you again for helping us figure out how to reproduce this 👍

@dchest
Copy link

dchest commented Mar 28, 2015

@misterdjules no worries :)

@misterdjules
Copy link

@airs0urce Thank you!

@trevnorris For info, I just double checked that node v0.12.2 actually fixes the previous repro, and it indeed fixes that one. So at least we know that it shouldn't be an issue of the fix not being in the release for whatever reason.

@airs0urce
Copy link

Sorry guys. I removed my comment yesterday as found that my process manager (pm2) cached path to old node (0.12.0). So, I made it use 0.12.2 and now everything is ok

@trevnorris
Copy link

@misterdjules Awesome. Thanks for double checking that. Though I still haven't been able to create a minimal test case for the issue. Any thoughts?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

8 participants