os.cpus() broken in 0.8.0-rc9 #3528

Closed
joshwilsdon opened this Issue Jun 25, 2012 · 5 comments

Projects

None yet

4 participants

@joshwilsdon

on SmartOS this is broken 100% of the time on 0.8.0-rc9:

# node -e "console.dir(require('os').cpus())"
Segmentation Fault (core dumped)
# 

on the same system with node 0.6.19:

# node -e "console.dir(require('os').cpus())"
[ { speed: 2659,
    model: 'Intel(r) Core(tm) i7 CPU       M 620  @ 2.67GHz',
    times: { system: 41359, user: 412625, idle: 188832, irq: 4894743 } },
  { speed: 2659,
    model: 'Intel(r) Core(tm) i7 CPU       M 620  @ 2.67GHz',
    times: { system: 39799, user: 246904, idle: 356075, irq: 4616599 } } ]
#
@bcantrill

Running with libumem debugging takes us straight to the root-cause:

# export LD_PRELOAD=libumem.so.1
# export UMEM_DEBUG=default
# /usr/node/bin/node -e "console.dir(require('os').cpus())"
Abort (core dumped)

From the dump generated:

mdb core.node.43912
Loading modules: [ libumem.so.1 libc.so.1 ld.so.1 ]
> ::umem_status
Status:         ready and active
Concurrency:    4
Logs:           (inactive)
Message buffer:
free(894ac14): double-free or invalid buffer
stack trace:
libumem.so.1'umem_err_recoverable+0x4a
libumem.so.1'process_free+0xc2
libumem.so.1'free+0x1c
node'uv_free_cpu_info+0x30
node'_ZN4nodeL10GetCPUInfoERKN2v89ArgumentsE+0x253
node'_ZN2v88internalL21Builtin_HandleApiCallENS0_12_GLOBAL__N_116BuiltinArgument
sILNS0_21BuiltinExtraArgumentsE1EEEPNS0_7IsolateE+0x143
?? (0x33f0a336)
?? (0x33f5bfc7)
?? (0x33f0db41)
?? (0x33f5beae)
?? (0x33f24625)
?? (0x33f50d5b)
?? (0x33f27d9b)
?? (0x33f272a1)
?? (0x33f2c4b0)
?? (0x33f21c39)
?? (0x33f12c2a)
node'_ZN2v88internalL6InvokeEbNS0_6HandleINS0_10JSFunctionEEENS1_INS0_6ObjectEEE
iPS5_Pb+0xe2
node'_ZN2v88internal9Execution4CallENS0_6HandleINS0_6ObjectEEES4_iPS4_Pbb+0x5a
node'_ZN2v88Function4CallENS_6HandleINS_6ObjectEEEiPNS1_INS_5ValueEEE+0xd1
node'_ZN4node4LoadEN2v86HandleINS0_6ObjectEEE+0xc4
node'_ZN4node5StartEiPPc+0x167
node'main+0x1b
node'_start+0x83

As libumem has helpfully indicated, the problem is a double (or bogus) free -- and it has indicated the bad buffer:

> 894ac14::whatis
mdb: unable to readvar "umem_internal_arena": unknown symbol name
894ac14 is 894a740+4d4, freed from umem_alloc_1600:
            ADDR          BUFADDR        TIMESTAMP           THREAD
                            CACHE          LASTLOG         CONTENTS
         88b64a0          894a740      7e301a9aa3e                1
                          8846290                0                0
                 libumem.so.1`umem_cache_free_debug+0x193
                 libumem.so.1`umem_cache_free+0x3f
                 libumem.so.1`umem_free+0x10b
                 libumem.so.1`process_free+0x5c
                 libumem.so.1`free+0x1c
                 libkstat.so.1`kstat_chain_free+0x2d
                 libkstat.so.1`kstat_close+0x1f
                 uv_cpu_info+0x320
                 _ZN4nodeL10GetCPUInfoERKN2v89ArgumentsE+0x26

                 _ZN2v88internalL21Builtin_HandleApiCallENS0_12_GLOBAL__N_116Bui
                 ltinArgumentsILNS0_21BuiltinExtraArgumentsE1EEEPNS0_7IsolateE+0
                 x143
                 0x33f0a336
                 0x33f5bfc7
                 0x33f0db41
                 0x33f5beae
                 0x33f24625

Here is the bad free we're choking on in uv_free_cpu_info():

  for (i = 0; i < count; i++) {
    free(cpu_infos[i].model);
  }

And here's the code that assigns to model:

      cpu_info->model = KSTAT_NAMED_STR_PTR(knp);

The problem is that the memory that this refers to is allocated as part of the kstat -- and as soon as the kstat chain is destroyed (as it is in the kstat_close() at the end of uv_cpu_info()), the model will be freed along with it, and we will die on a bad free in uv_free_cpu_info(). The fix is to properly duplicate the string into the model field.

As long as I'm in this code, I would add that the comment about ACPI -- clearly lifted from my code in node-kstat -- is out of place here: this is not trying to read ACPI (this is a virtual kstat), and while the code should deal with failure of kstat_read(), the comment as written (or rather, copied), is entirely incorrect.

@bcantrill

Here is a diff that fixes the issue:

diff --git a/deps/uv/src/unix/sunos.c b/deps/uv/src/unix/sunos.c
index 906e69f..261d69d 100644
--- a/deps/uv/src/unix/sunos.c
+++ b/deps/uv/src/unix/sunos.c
@@ -349,7 +349,7 @@ uv_err_t uv_cpu_info(uv_cpu_info_t** cpu_infos, int* count) 

       knp = (kstat_named_t *) kstat_data_lookup(ksp, (char *)"brand");
       assert(knp->data_type == KSTAT_DATA_STRING);
-      cpu_info->model = KSTAT_NAMED_STR_PTR(knp);
+      cpu_info->model = strdup(KSTAT_NAMED_STR_PTR(knp));
     }

     lookup_instance++;

Note that this code does not actually check the return value of strdup() -- but given that uv_cpu_info() can already return a NULL model (based on the kstat_read() failure), it is presumed that this is acceptable. The preferred semantics would be of course to fail uv_cpu_info() with UV_ENOMEM, but those diffs are clearly more involved. Finally, note that there are two other instances of unchecked return values of strdup() in this file; it seems that those should be cleaned up as well.

@isaacs
Collaborator
isaacs commented Jun 25, 2012

@bcantrill @joshwilsdon I can't seem to reproduce this in a SmartOS machine on Joyent Cloud. I don't mind taking the patch of course, but it should be applied upstream to libuv, and it would be nice to be able to know why the issue does not manifest in all SmartOS environments.

The test at tests/simple/test-os.js passes for me 100% of the time, and it does call os.cpus.

@bnoordhuis
Member

Fixed in joyent/libuv@d0816aa. Isaac landed the patch in node in 9a7158d.

@bnoordhuis bnoordhuis closed this Jun 25, 2012
@bcantrill

It's data corruption; it will be sensitive to heap layout. If you
LD_PRELOAD libumem, and set the debug options, you'll hit it every time.

    - Bryan

On Jun 25, 2012 7:38 AM, "Isaac Z. Schlueter" <
reply@reply.github.com>
wrote:

@bcantrill @joshwilsdon I can't seem to reproduce this in a SmartOS
machine on Joyent Cloud. I don't mind taking the patch of course, but it
should be applied upstream to libuv, and it would be nice to be able to
know why the issue does not manifest in all SmartOS environments.

The test at tests/simple/test-os.js passes for me 100% of the time, and it
does call os.cpus.


Reply to this email directly or view it on GitHub:
#3528 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment