Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental writing of rows is very slow #42

Open
ShaheedHaque opened this issue Oct 5, 2021 · 1 comment
Open

Incremental writing of rows is very slow #42

ShaheedHaque opened this issue Oct 5, 2021 · 1 comment

Comments

@ShaheedHaque
Copy link

I'm using the following software:

$ sudo -H pip3 list | egrep 'odfpy|ods'
odfpy                         1.4.1               
pyexcel-ods                   0.6.0               

and the following test program:

#!/bin/env python3
from datetime import datetime

from pyexcel_io import writer

FILENAME = 'tmp.ods'
LIBRARIES = ['pyexcel-ods', 'pyexcel-ods3', 'pyexcel-odsw']
LIBRARY = LIBRARIES[0]
ROW = ['c' + str(c) for c in range(153)]


with writer.Writer('ods', library=LIBRARY) as file_stream:
    file_stream.open(FILENAME)
    page_stream = file_stream.writer.create_sheet('mysheet')
    #
    # Write 10, 100, 1000, ... rows.
    #
    row_count = 10
    for multiplier in range(4):
        before = datetime.utcnow()
        for row in range(row_count):
            ROW[0] = 'r' + str(row)
            page_stream.write_row(ROW)
        after = datetime.utcnow()
        print('row_count', row_count, 'seconds', (after - before).total_seconds())
        before = after
        row_count *= 10
    page_stream.close()
    after = datetime.utcnow()
    print('page_stream close', 'seconds', (after - before).total_seconds())
    before = after
after = datetime.utcnow()
print('file_stream close', 'seconds', (after - before).total_seconds())
before = after

and see the following output:

$ ./ods.py 
row_count 10 seconds 0.019138
row_count 100 seconds 0.191849
row_count 1000 seconds 2.262572
row_count 10000 seconds 23.984489
page_stream close seconds 5.154405
file_stream close seconds 27.650097

As you can see, it takes about 26s to write_row() 11110 times, and then about 33s to close the page_stream and file_stream. Is this as expected? Is there anything I can do to speed it up? I actually need to get to a row_count somewhere circa 200k (and possibly more), and that is clearly going to take quite a while with these numbers.

@ShaheedHaque
Copy link
Author

ShaheedHaque commented Oct 5, 2021

I forgot to say, the memory consumption (on Ubuntu, current release) peaks with top showing a resident set size of 3.2GB. That feels like a clue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant