Crash in lxw_worksheet_assemble_xml_file #49

Closed
evanmiller opened this Issue May 29, 2016 · 15 comments

Comments

Projects
None yet
2 participants
@evanmiller

Hi, I am using libxlsxwriter 0.3.2 in a Mac program. I recently received a customer crash report that appears to originate inside libxlsxwriter. Here are the relevant bits:

Process:               Wizard [4171]
Path:                  /Applications/Wizard.app/Contents/MacOS/Wizard
Identifier:            org.evanmiller.wizard
Version:               1.8.11 (175)
App Item ID:           495152161
App External ID:       817409485
Code Type:             X86-64 (Native)
Parent Process:        ??? [1]
Responsible:           Wizard [4171]
User ID:               501

Date/Time:             2016-05-25 14:43:33.330 -0700
OS Version:            Mac OS X 10.11.5 (15F34)
Report Version:        11
Anonymous UUID:        158C81F0-9849-BB26-6A96-7B2D4A3905A0

Sleep/Wake UUID:       5A3E1BB8-41CB-4875-B85C-8A54A252B403

Time Awake Since Boot: 190000 seconds
Time Since Wake:       18000 seconds

System Integrity Protection: enabled

Crashed Thread:        7

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000068
Exception Note:        EXC_CORPSE_NOTIFY

[snip]

Thread 7 Crashed:
0   libsystem_c.dylib               0x00007fff8f2fa0a5 flockfile + 4
1   libsystem_c.dylib               0x00007fff8f2fd2f9 fwrite + 72
2   libxlsxwriter.dylib             0x000000010e126145 lxw_worksheet_assemble_xml_file + 53
3   libxlsxwriter.dylib             0x000000010e11c70b lxw_create_package + 203
4   libxlsxwriter.dylib             0x000000010e122ce5 workbook_close + 1285
5   org.evanmiller.wizard           0x000000010dc3f338 0x10db5a000 + 938808
6   org.evanmiller.wizard           0x000000010dc51911 0x10db5a000 + 1014033
7   com.apple.CoreFoundation        0x00007fff8e08aa6c __invoking___ + 140
8   com.apple.CoreFoundation        0x00007fff8e08a8fe -[NSInvocation invoke] + 286
9   com.apple.Foundation            0x00007fff88c496cb -[NSInvocationOperation main] + 34
10  com.apple.Foundation            0x00007fff88bd5c7a -[__NSOperationInternal _start:] + 654
11  com.apple.Foundation            0x00007fff88bd1c64 __NSOQSchedule_f + 194
12  libdispatch.dylib               0x00007fff9a34540b _dispatch_client_callout + 8
13  libdispatch.dylib               0x00007fff9a34a03b _dispatch_queue_drain + 754
14  libdispatch.dylib               0x00007fff9a350707 _dispatch_queue_invoke + 549
15  libdispatch.dylib               0x00007fff9a348d53 _dispatch_root_queue_drain + 538
16  libdispatch.dylib               0x00007fff9a348b00 _dispatch_worker_thread3 + 91
17  libsystem_pthread.dylib         0x00007fff99e894de _pthread_wqthread + 1129
18  libsystem_pthread.dylib         0x00007fff99e87341 start_wqthread + 13

It looks like there is a NULL pointer being dereferenced as part of a struct. I am not sure if it matters but the code is running on a background thread. I received the crash report automatically via Apple so I'm afraid I can't provide more details about the crash conditions.

@jmcnamara

This comment has been minimized.

Show comment
Hide comment
@jmcnamara

jmcnamara May 29, 2016

Owner

Hi Evan,

I haven't had any other reports of this nature. Also, and I'm sure you know this, there isn't enough information there to debug it.

The library isn't thread safe. Are you writing to a worksheet using multiple threads by any chance?

At a guess it looks like the worksheet is trying to write to a filehandle that has already been closed.

John

Owner

jmcnamara commented May 29, 2016

Hi Evan,

I haven't had any other reports of this nature. Also, and I'm sure you know this, there isn't enough information there to debug it.

The library isn't thread safe. Are you writing to a worksheet using multiple threads by any chance?

At a guess it looks like the worksheet is trying to write to a filehandle that has already been closed.

John

@evanmiller

This comment has been minimized.

Show comment
Hide comment
@evanmiller

evanmiller May 29, 2016

All the activity is on the same thread.

This may or may not be related, but I have also received reports that LXW_ERROR_WORKBOOK_FILE_CREATE is being returned for some users -- I wonder if there is some issue with sandboxing and perhaps writing to temporary files.

As for writing to a closed file handle -- is it possible to return an error rather than crash if that's what's happening here?

I realize this report isn't actionable in its current state, but it would be worth knowing if other users have experienced the same thing.

All the activity is on the same thread.

This may or may not be related, but I have also received reports that LXW_ERROR_WORKBOOK_FILE_CREATE is being returned for some users -- I wonder if there is some issue with sandboxing and perhaps writing to temporary files.

As for writing to a closed file handle -- is it possible to return an error rather than crash if that's what's happening here?

I realize this report isn't actionable in its current state, but it would be worth knowing if other users have experienced the same thing.

@jmcnamara

This comment has been minimized.

Show comment
Hide comment
@jmcnamara

jmcnamara May 29, 2016

Owner

I have also received reports that LXW_ERROR_WORKBOOK_FILE_CREATE is being returned for some users -- I wonder if there is some issue with sandboxing and perhaps writing to temporary files.

Interesting. LXW_ERROR_WORKBOOK_FILE_CREATE only occurs in one place in workbook.c:

    /* If the packager fails it is generally due to a zip permission error. */
    if (packager == NULL) {
        fprintf(stderr, "[ERROR] Error creating '%s': %s\n", self->filename,
                strerror(errno));

        error = LXW_ERROR_WORKBOOK_FILE_CREATE;
        goto mem_error;
    }

As you can see from the comment this error usually occurs when the lib (or in this case zlib) fails to create the overall zip container. This could be due to

  • A lack of permission to create or overwrite the target xlsx file.
  • Trying to create the target in an unwritable directly.
  • If this was Windows I would also say that this was caused by the file being open in Excel while the user was trying to create it again. However, that doesn't (for me) happen on the Mac with Excel 2011. Perhaps it could occur with another application like Numbers of Libreoffice.

Also, there should be an error on stderr in that case. You can test that yourself. This would return an error like:

$ ./examples/hello
$ chmod -w hello_world.xlsx
$ ./examples/hello
[ERROR] Error creating 'hello_world.xlsx': Permission denied

Of course, your app is probably GUI based so this may not go anywhere useful.

However, the log in the original message says that error occurs in lxw_worksheet_assemble_xml_file() which may indicate a tmpfile issue. Is it possible that the user has created a large number of worksheet (>2560) and exceed the system ulimit for open filehandles? It could also just be a permission issue.

In packager.c there is the following code:

/*
 * Write the worksheet files.
 */
STATIC uint8_t
_write_worksheet_files(lxw_packager *self)
{
    lxw_workbook *workbook = self->workbook;
    lxw_worksheet *worksheet;
    char sheetname[LXW_FILENAME_LENGTH] = { 0 };
    uint16_t index = 1;

    STAILQ_FOREACH(worksheet, workbook->worksheets, list_pointers) {
        lxw_snprintf(sheetname, LXW_FILENAME_LENGTH,
                     "xl/worksheets/sheet%d.xml", index++);

        if (worksheet->optimize_row)
            lxw_worksheet_write_single_row(worksheet);

        worksheet->file = lxw_tmpfile();

        lxw_worksheet_assemble_xml_file(worksheet);

        _add_file_to_zip(self, worksheet->file, sheetname);

        fclose(worksheet->file);
    }

    return 0;
}

In particular this:

        worksheet->file = lxw_tmpfile();

        lxw_worksheet_assemble_xml_file(worksheet);

There is no check here that lxw_tmpfile() succeeded which is a bug (at the very least).

Owner

jmcnamara commented May 29, 2016

I have also received reports that LXW_ERROR_WORKBOOK_FILE_CREATE is being returned for some users -- I wonder if there is some issue with sandboxing and perhaps writing to temporary files.

Interesting. LXW_ERROR_WORKBOOK_FILE_CREATE only occurs in one place in workbook.c:

    /* If the packager fails it is generally due to a zip permission error. */
    if (packager == NULL) {
        fprintf(stderr, "[ERROR] Error creating '%s': %s\n", self->filename,
                strerror(errno));

        error = LXW_ERROR_WORKBOOK_FILE_CREATE;
        goto mem_error;
    }

As you can see from the comment this error usually occurs when the lib (or in this case zlib) fails to create the overall zip container. This could be due to

  • A lack of permission to create or overwrite the target xlsx file.
  • Trying to create the target in an unwritable directly.
  • If this was Windows I would also say that this was caused by the file being open in Excel while the user was trying to create it again. However, that doesn't (for me) happen on the Mac with Excel 2011. Perhaps it could occur with another application like Numbers of Libreoffice.

Also, there should be an error on stderr in that case. You can test that yourself. This would return an error like:

$ ./examples/hello
$ chmod -w hello_world.xlsx
$ ./examples/hello
[ERROR] Error creating 'hello_world.xlsx': Permission denied

Of course, your app is probably GUI based so this may not go anywhere useful.

However, the log in the original message says that error occurs in lxw_worksheet_assemble_xml_file() which may indicate a tmpfile issue. Is it possible that the user has created a large number of worksheet (>2560) and exceed the system ulimit for open filehandles? It could also just be a permission issue.

In packager.c there is the following code:

/*
 * Write the worksheet files.
 */
STATIC uint8_t
_write_worksheet_files(lxw_packager *self)
{
    lxw_workbook *workbook = self->workbook;
    lxw_worksheet *worksheet;
    char sheetname[LXW_FILENAME_LENGTH] = { 0 };
    uint16_t index = 1;

    STAILQ_FOREACH(worksheet, workbook->worksheets, list_pointers) {
        lxw_snprintf(sheetname, LXW_FILENAME_LENGTH,
                     "xl/worksheets/sheet%d.xml", index++);

        if (worksheet->optimize_row)
            lxw_worksheet_write_single_row(worksheet);

        worksheet->file = lxw_tmpfile();

        lxw_worksheet_assemble_xml_file(worksheet);

        _add_file_to_zip(self, worksheet->file, sheetname);

        fclose(worksheet->file);
    }

    return 0;
}

In particular this:

        worksheet->file = lxw_tmpfile();

        lxw_worksheet_assemble_xml_file(worksheet);

There is no check here that lxw_tmpfile() succeeded which is a bug (at the very least).

@jmcnamara jmcnamara added the bug label May 29, 2016

@jmcnamara jmcnamara self-assigned this May 29, 2016

@evanmiller

This comment has been minimized.

Show comment
Hide comment
@evanmiller

evanmiller May 29, 2016

I am guessing it is some kind of permissions issue, rather than exceeding the file handle limit. (Given the application design it is implausible a user will have thousands of worksheets exported.) I am able to reproduce the LXW_ERROR_WORKBOOK_FILE_CREATE when I try exporting to an unwritable directory, for instance. (In the stderr log I do see the "Permission denied" message.)

I haven't been able to reproduce the crash though, and so its cause remains a mystery to me -- I suppose I might have a "double close" somewhere in the application code but nothing jumps out. In any event, I endorse adding more checks to mitigate possible crash paths.

I am guessing it is some kind of permissions issue, rather than exceeding the file handle limit. (Given the application design it is implausible a user will have thousands of worksheets exported.) I am able to reproduce the LXW_ERROR_WORKBOOK_FILE_CREATE when I try exporting to an unwritable directory, for instance. (In the stderr log I do see the "Permission denied" message.)

I haven't been able to reproduce the crash though, and so its cause remains a mystery to me -- I suppose I might have a "double close" somewhere in the application code but nothing jumps out. In any event, I endorse adding more checks to mitigate possible crash paths.

@jmcnamara

This comment has been minimized.

Show comment
Hide comment
@jmcnamara

jmcnamara May 29, 2016

Owner

I haven't been able to reproduce the crash though, and so its cause remains a mystery to me

I think the tempfile is the issue.

Try the following to reproduce it with this patch:

diff --git a/src/Makefile b/src/Makefile
index 2f0ede7..ea15da7 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -19,7 +19,7 @@ INC_DIR     = ../include
 MINIZIP_DIR = ../third_party/minizip

 # Flags passed to compiler.
-CFLAGS   += -g -O3 -Wall -Wextra -pedantic -ansi
+CFLAGS   += -g -O3 -Wall -Wextra -pedantic

 # Library names.
 LIBXLSXWRITER_A  = libxlsxwriter.a
diff --git a/src/utility.c b/src/utility.c
index fa34eb4..bfe09e7 100644
--- a/src/utility.c
+++ b/src/utility.c
@@ -12,6 +12,7 @@
 #include <string.h>
 #include <stdint.h>
 #include <stdlib.h>
+#include <unistd.h>
 #include "xlsxwriter/utility.h"

 /*
@@ -452,5 +453,13 @@ lxw_quote_sheetname(char *str)
 FILE *
 lxw_tmpfile(void)
 {
-    return tmpfile();
+    char template[] = "/tmp/locked/libxlsxwriter.XXXXXX";
+    FILE *tempfile;
+    int fd;
+
+    fd = mkstemp(template);
+    tempfile = fdopen(fd, "w+");
+    unlink(template);
+
+    return tempfile;
 }

Then make the target tmp dir unwritable and you will get a segfault:

$ patch -p 1 < gh49.patch

$ mkdir /tmp/locked
$ chmod -w /tmp/locked
$ make clean
$ make examples

$ ./examples/hello
Segmentation fault: 11
Owner

jmcnamara commented May 29, 2016

I haven't been able to reproduce the crash though, and so its cause remains a mystery to me

I think the tempfile is the issue.

Try the following to reproduce it with this patch:

diff --git a/src/Makefile b/src/Makefile
index 2f0ede7..ea15da7 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -19,7 +19,7 @@ INC_DIR     = ../include
 MINIZIP_DIR = ../third_party/minizip

 # Flags passed to compiler.
-CFLAGS   += -g -O3 -Wall -Wextra -pedantic -ansi
+CFLAGS   += -g -O3 -Wall -Wextra -pedantic

 # Library names.
 LIBXLSXWRITER_A  = libxlsxwriter.a
diff --git a/src/utility.c b/src/utility.c
index fa34eb4..bfe09e7 100644
--- a/src/utility.c
+++ b/src/utility.c
@@ -12,6 +12,7 @@
 #include <string.h>
 #include <stdint.h>
 #include <stdlib.h>
+#include <unistd.h>
 #include "xlsxwriter/utility.h"

 /*
@@ -452,5 +453,13 @@ lxw_quote_sheetname(char *str)
 FILE *
 lxw_tmpfile(void)
 {
-    return tmpfile();
+    char template[] = "/tmp/locked/libxlsxwriter.XXXXXX";
+    FILE *tempfile;
+    int fd;
+
+    fd = mkstemp(template);
+    tempfile = fdopen(fd, "w+");
+    unlink(template);
+
+    return tempfile;
 }

Then make the target tmp dir unwritable and you will get a segfault:

$ patch -p 1 < gh49.patch

$ mkdir /tmp/locked
$ chmod -w /tmp/locked
$ make clean
$ make examples

$ ./examples/hello
Segmentation fault: 11
@jmcnamara

This comment has been minimized.

Show comment
Hide comment
@jmcnamara

jmcnamara May 29, 2016

Owner

And the backtrace looks similar:

$ lldb ./examples/hello

(lldb) target create "./examples/hello"
Current executable set to './examples/hello' (x86_64).

(lldb) r
Process 53236 launched: './examples/hello' (x86_64)
Process 53236 stopped
* thread #1: tid = 0x13b2f2d, 0x00007fff92dbf0a5 libsystem_c.dylib`flockfile + 4, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x68)
    frame #0: 0x00007fff92dbf0a5 libsystem_c.dylib`flockfile + 4
libsystem_c.dylib`flockfile:
->  0x7fff92dbf0a5 <+4>:  movq   0x68(%rdi), %rdi
    0x7fff92dbf0a9 <+8>:  addq   $0x8, %rdi
    0x7fff92dbf0ad <+12>: popq   %rbp
    0x7fff92dbf0ae <+13>: jmp    0x7fff92e08452            ; symbol stub for: pthread_mutex_lock

(lldb) bt
* thread #1: tid = 0x13b2f2d, 0x00007fff92dbf0a5 libsystem_c.dylib`flockfile + 4, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x68)
  * frame #0: 0x00007fff92dbf0a5 libsystem_c.dylib`flockfile + 4
    frame #1: 0x00007fff92dc22f9 libsystem_c.dylib`fwrite + 72
    frame #2: 0x0000000100018465 hello`lxw_worksheet_assemble_xml_file [inlined] _worksheet_xml_declaration + 53 at worksheet.c:859 [opt]
    frame #3: 0x0000000100018460 hello`lxw_worksheet_assemble_xml_file(self=0x0000000100102f90) + 48 at worksheet.c:3291 [opt]
    frame #4: 0x000000010000d36b hello`lxw_create_package [inlined] _write_worksheet_files + 166 at packager.c:162 [opt]
    frame #5: 0x000000010000d2c5 hello`lxw_create_package(self=0x0000000100103b20) + 37 at packager.c:711 [opt]
    frame #6: 0x00000001000146fb hello`workbook_close(self=0x0000000100103700) + 1707 at workbook.c:1507 [opt]
    frame #7: 0x0000000100000e80 hello`main + 128 at hello.c:18
    frame #8: 0x00007fff983775ad libdyld.dylib`start + 1

Owner

jmcnamara commented May 29, 2016

And the backtrace looks similar:

$ lldb ./examples/hello

(lldb) target create "./examples/hello"
Current executable set to './examples/hello' (x86_64).

(lldb) r
Process 53236 launched: './examples/hello' (x86_64)
Process 53236 stopped
* thread #1: tid = 0x13b2f2d, 0x00007fff92dbf0a5 libsystem_c.dylib`flockfile + 4, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x68)
    frame #0: 0x00007fff92dbf0a5 libsystem_c.dylib`flockfile + 4
libsystem_c.dylib`flockfile:
->  0x7fff92dbf0a5 <+4>:  movq   0x68(%rdi), %rdi
    0x7fff92dbf0a9 <+8>:  addq   $0x8, %rdi
    0x7fff92dbf0ad <+12>: popq   %rbp
    0x7fff92dbf0ae <+13>: jmp    0x7fff92e08452            ; symbol stub for: pthread_mutex_lock

(lldb) bt
* thread #1: tid = 0x13b2f2d, 0x00007fff92dbf0a5 libsystem_c.dylib`flockfile + 4, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x68)
  * frame #0: 0x00007fff92dbf0a5 libsystem_c.dylib`flockfile + 4
    frame #1: 0x00007fff92dc22f9 libsystem_c.dylib`fwrite + 72
    frame #2: 0x0000000100018465 hello`lxw_worksheet_assemble_xml_file [inlined] _worksheet_xml_declaration + 53 at worksheet.c:859 [opt]
    frame #3: 0x0000000100018460 hello`lxw_worksheet_assemble_xml_file(self=0x0000000100102f90) + 48 at worksheet.c:3291 [opt]
    frame #4: 0x000000010000d36b hello`lxw_create_package [inlined] _write_worksheet_files + 166 at packager.c:162 [opt]
    frame #5: 0x000000010000d2c5 hello`lxw_create_package(self=0x0000000100103b20) + 37 at packager.c:711 [opt]
    frame #6: 0x00000001000146fb hello`workbook_close(self=0x0000000100103700) + 1707 at workbook.c:1507 [opt]
    frame #7: 0x0000000100000e80 hello`main + 128 at hello.c:18
    frame #8: 0x00007fff983775ad libdyld.dylib`start + 1

@evanmiller

This comment has been minimized.

Show comment
Hide comment
@evanmiller

evanmiller May 29, 2016

Yep, that looks just like the original crash report (modulo some function inlining).

Any chance of providing an error path whenever lxw_tmpfile is called?

Yep, that looks just like the original crash report (modulo some function inlining).

Any chance of providing an error path whenever lxw_tmpfile is called?

@jmcnamara

This comment has been minimized.

Show comment
Hide comment
@jmcnamara

jmcnamara May 29, 2016

Owner

Any chance of providing an error path whenever lxw_tmpfile is called?

Yes. I'm working on it. I'll push something today or tomorrow.

Owner

jmcnamara commented May 29, 2016

Any chance of providing an error path whenever lxw_tmpfile is called?

Yes. I'm working on it. I'll push something today or tomorrow.

@evanmiller

This comment has been minimized.

Show comment
Hide comment
@jmcnamara

This comment has been minimized.

Show comment
Hide comment
@jmcnamara

jmcnamara May 29, 2016

Owner

I pushed a change to master that should prevent the segfault and give a proper error message.

There are still a few potential error conditions in the Zip code that I need to handle but these are a bit more esoteric. I'll ping you back in the next couple of days when I'm confident that all the error vectors are handled properly.

In the meantime you can try the code on the master branch.

Owner

jmcnamara commented May 29, 2016

I pushed a change to master that should prevent the segfault and give a proper error message.

There are still a few potential error conditions in the Zip code that I need to handle but these are a bit more esoteric. I'll ping you back in the next couple of days when I'm confident that all the error vectors are handled properly.

In the meantime you can try the code on the master branch.

@evanmiller

This comment has been minimized.

Show comment
Hide comment
@evanmiller

evanmiller May 30, 2016

Thanks, will try it out.

Incidentally is there an lxw version of strerror? Right now I have to update my code to produce human-readable messages for new error codes like LXW_ERROR_WORKBOOK_TMPFILE -- I don't want to miss any codes introduced in future releases. Happy to open a new issue if not.

Thanks, will try it out.

Incidentally is there an lxw version of strerror? Right now I have to update my code to produce human-readable messages for new error codes like LXW_ERROR_WORKBOOK_TMPFILE -- I don't want to miss any codes introduced in future releases. Happy to open a new issue if not.

@jmcnamara

This comment has been minimized.

Show comment
Hide comment
@jmcnamara

jmcnamara May 30, 2016

Owner

Incidentally is there an lxw version of strerror

That's a good suggestion. I'll look into adding it.

Owner

jmcnamara commented May 30, 2016

Incidentally is there an lxw version of strerror

That's a good suggestion. I'll look into adding it.

jmcnamara added a commit that referenced this issue May 30, 2016

Refactored error handling and reporting.
Some of the previously defined error names have changed to give
a more consistent interface. Also added a lxw_strerror() function
to decode the error to a string.

Issue #49.
@jmcnamara

This comment has been minimized.

Show comment
Hide comment
@jmcnamara

jmcnamara May 30, 2016

Owner

I've pushed a fix for this and other potential file failure issues.

However, for consistency I've renamed some of the error codes. Hopefully that won't require too much work on your part to change your error handling and the additional effort will be worth it.

I've also added a lxw_strerror function, see the example/anotomy.c program and the updated docs:

Owner

jmcnamara commented May 30, 2016

I've pushed a fix for this and other potential file failure issues.

However, for consistency I've renamed some of the error codes. Hopefully that won't require too much work on your part to change your error handling and the additional effort will be worth it.

I've also added a lxw_strerror function, see the example/anotomy.c program and the updated docs:

@evanmiller

This comment has been minimized.

Show comment
Hide comment
@evanmiller

evanmiller May 31, 2016

Looks good to me, thanks! This issue is resolved as far as I'm concerned. Closing.

Looks good to me, thanks! This issue is resolved as far as I'm concerned. Closing.

@evanmiller evanmiller closed this May 31, 2016

@jmcnamara

This comment has been minimized.

Show comment
Hide comment
@jmcnamara

jmcnamara May 31, 2016

Owner

Packaged and release as version 0.3.5.

Thanks for the report.

Owner

jmcnamara commented May 31, 2016

Packaged and release as version 0.3.5.

Thanks for the report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment