New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple and memory-efficient way to read the whole Excel book as "records" #144
Comments
please have a look at these docs: http://docs.pyexcel.org/en/v0.6.0/two-liners.html |
Of course I can use
But how would I know the sheet names?
is oblivious to the fact that excel have multiple sheets. |
Interesting use case. Yes, it is impossible to get sheet name at this point. please try use |
It is hard to know |
So, it is possible to get sheet names before consuming a lot of memory. but I do not recommend to use for now because internal api are subjected to change:
Please wait for official support for the access to internal book streams. With that said, if you really want to save memory consumption or want a more efficient xlsx reader, you can consider pyexcel-xlsxr, which is under-used, zero stars at the moment. With above code, pyexcel-xlsxr returns faster than pyexcel-xlsx. |
Alternatively, here is the officially supported APIs, doing the same thing but with pyexcel-io, a bit lower level. I do not want you to use two set of APIs. However, it's better than you choose the internal/private apis of pyexcel.
again, please note that pyexcel-xlsxr give better reading performance than pyexcel-xlsx. Finally, the choice is yours. |
Also, writing to Excel file is troublesome for big files. Maybe I want something like Google Sheets performance. Maybe, only a real database object (like SQLite) can do this? Lastly, I am still looking for a good text wrapper and a custom renderer (like Handsontable), but again, maybe I shouldn't think about this one too much. |
first of all, here's the architecture of pyexcel, pyexcel_io is the base module for read and write. pyexcel provides high level manipulations as well as read and write. In pyexcel, it is possible that we have multiple readers and writers for the same file format. If you do not want to use pyexcel-xlsxr, that's fine. you are safe to use pyexcel-xlsx. the code works no matter which plugin is installed but just make sure you have at least one reader for xlsx format. |
Regarding the data serialization, SQLite is a bad choice. HDF format should be considered. But for now, pyexcel does not support such an output. To write into xlsx, pyexcel-xlsx and pyexcel-xlsxw can help you. |
Maybe I should look at http://www.pytables.org/index.html or http://docs.h5py.org/en/stable/ but that's a lot of things to learn. Maybe I should stick with SQLAlchemy + SQLite for now.
Yeah, rather than to write, I just want to |
|
If it is possible, it want to use
pyexcel.iget_records()
.What I have got now is:-
But
.to_records()
doesn't seem to be a memory-efficient way as it generates a list, not a generator. Also, why do I have to callraw_data[sheet_name].name_columns_by_row(0)
explicitly?The text was updated successfully, but these errors were encountered: