Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

constant_memory not working as expected #306

Closed
oliviera9 opened this issue Sep 9, 2020 · 12 comments
Closed

constant_memory not working as expected #306

oliviera9 opened this issue Sep 9, 2020 · 12 comments

Comments

@oliviera9
Copy link

Hello,
I am successfully using libxlsxwriter for creating Excel files in an embedded system.
Because of this I am using the constant_memory option but I have noticed the memory usage is not like what I would expect.
From my understanding, the memory usage should be limited to a row size with this options.
Anyway, monitoring the free memory available in the system, it appears that the memory continues to increase as long as I write rows (I can see this with examples/constant_memory.c).
I digged into the code, and I think this is because the tmpfile, created with constant_memory on, is never rewound but when closing the workbook. I would expect the rewind to occur every time a now row is created.
Is this the intended behavior?
Thanks,
Alain.

@jmcnamara jmcnamara self-assigned this Sep 9, 2020
@jmcnamara
Copy link
Owner

Anyway, monitoring the free memory available in the system, it appears that the memory continues to increase as long as I write rows (I can see this with examples/constant_memory.c).

That shouldn't happen. How are you monitoring the memory usage?

I modified the constant_memory.c example to push up the row x column limits:

#include "xlsxwriter.h"

int main() {

    lxw_row_t row;
    lxw_col_t col;
    lxw_row_t max_row = 100000;
    lxw_col_t max_col = 500;

    /* Set the worksheet options. */
    lxw_workbook_options options = {.constant_memory = LXW_TRUE,
                                    .tmpdir = NULL,
                                    .use_zip64 = LXW_FALSE};

    /* Create a new workbook with options. */
    lxw_workbook  *workbook  = workbook_new_opt("constant_memory.xlsx", &options);
    lxw_worksheet *worksheet = workbook_add_worksheet(workbook, NULL);

    for (row = 0; row < max_row; row++) {
        for (col = 0; col < max_col; col++) {
            worksheet_write_number(worksheet, row, col, 123.45, NULL);
        }
    }

    return workbook_close(workbook);
}

Running this and monitoring it with top-o cpu shows content memory of 912K. At the final stages of assembly the file the memory jumps up to 1368K. That isn't related to the writing row data however, it is just the overhead of creating the files that make up the xlsx file and adding them to a zip container.

So, strictly speaking the memory isn't constant for the entire lifetime of the program, but it should 100% be constant while writing row data.

@oliviera9
Copy link
Author

I just did a 'watch -n1 free' in a shell. My embedded system has just 64 MB of RAM, so RAM consumption is crearly visible.
I see top showing a constant memory usage as you. Anyway I think this is because data is written to the temporary file which is not counted on the process address space.

@jmcnamara
Copy link
Owner

Anyway I think this is because data is written to the temporary file which is not counted on the process address space.

The disk space usage shouldn't have any effect on the memory usage unless / or /tmp are mapped into memory. Are they on your system.

Either way I don't think there is anything I can fix here. Are you okay to close the issue.

@oliviera9
Copy link
Author

In UNIX, the temporary directory /tmp/ is supposed to be a tmpfs which resides in RAM indeed. So, I think the constant_memory options is trasferring the memory usage from inside the application (using calloc for raws) to the temporary file.
Unfortunately I don't know how an xlsx file is made up, but, I suppose, a solution for the problem would be to flush to temporary file to the final xlsx file once a row is completed, and rewind it. This would keep its size limited to a row size permitting to maintain the RAM usage stable.

@jmcnamara
Copy link
Owner

jmcnamara commented Sep 9, 2020

In UNIX, the temporary directory /tmp/ is supposed to be a tmpfs which resides in RAM indeed.

That isn't/wasn't always the case, although I believe that a lot of Linux distro are enabling that by default.

If that the case then constant_memory = LXW_TRUE will probably consume more memory than constant_memory = LXW_FALSE. I'll put an update in the doc about that.

You can try specifying an alternative non-ram based folder for temp files using the tmpdir option in lxw_workbook_options, like this:

    /* Set the worksheet options. */
    lxw_workbook_options options = {.constant_memory = LXW_TRUE,
                                    .tmpdir = "/some/writeable/directory",
                                    .use_zip64 = LXW_FALSE};

    /* Create a new workbook with options. */

Try that and see how you get on.

@oliviera9
Copy link
Author

Yeah, that's the case in my system: /tmp is RAM.
I will try with constant_memory = LXW_FALSE too.
Anyway, don't you see any chance to flush and rewind the temporary file once a row is completed?
I suppose this would require to move the xls creation from the workbook close to the workbook creation, and writing row data to the final xls every time. Quite a hard refactoring, probably.

@jmcnamara
Copy link
Owner

jmcnamara commented Sep 9, 2020

Anyway, don't you see any chance to flush and rewind the temporary file once a row is completed?
I suppose this would require to move the xls creation from the workbook close to the workbook creation, and writing row data to the final xls every time.

Unfortunately, that wouldn't work. You would end up with only 1 row of data written to the file.

The trade off in constant_memory mode is between memory and disk space. If both of those are the same thing (as in your case) then there isn't any trade off.

Instead, try setting .tmpdir to a writeable directly and re-running your test case.

@oliviera9
Copy link
Author

I'll do some tests and let you know.
If there's no solution I think the issue could be closed.
Thanks for your support.
Alain.

@oliviera9
Copy link
Author

Using .tmpdir as a writeable directory the cached memory decreases but does not lead to an out-of-memory condition: I suppose the kernel frees the cached pages and effectively writes them into the temporary file.
This is not the case where .tmpdir is in tmpfs.
Alain.

1 similar comment
@oliviera9
Copy link
Author

Using .tmpdir as a writeable directory the cached memory decreases but does not lead to an out-of-memory condition: I suppose the kernel frees the cached pages and effectively writes them into the temporary file.
This is not the case where .tmpdir is in tmpfs.
Alain.

@jmcnamara
Copy link
Owner

Thanks for the followup. I need to add something about this to the docs so I'll reopen the issue until that is complete.

@jmcnamara jmcnamara reopened this Dec 1, 2020
@jmcnamara
Copy link
Owner

This issue and workaround is now documented: https://libxlsxwriter.github.io/working_with_memory.html#ww_mem_temp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants