Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement] - unzip to RAM instead of disk #202

Open
darth3PO opened this issue Oct 5, 2021 · 2 comments
Open

[Improvement] - unzip to RAM instead of disk #202

darth3PO opened this issue Oct 5, 2021 · 2 comments

Comments

@darth3PO
Copy link
Contributor

darth3PO commented Oct 5, 2021

Python version

3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]

Platform information

Windows-10-10.0.18362-SP0

Numpy version

1.20.1

mdfreader version

4.1

Description

zip_name = zip_class.extract(zip_name) # locally extracts file

Passing in a zipped .dat file to Mdf like
yop = mdfreader.Mdf(file_name='DatFile.zip')
will result in the .zip file being extracted to my working directory. Is it possible to extract the zip into RAM instead of SSD/HDD?

When using the multiprocessing library, the bottleneck becomes SSD read/write speed. Wondering if this can be sped up by just using RAM instead.

I'm not sure if zipfile.ZipFile.read() or .open() would work? Some say that io.BytesIO would also do the trick. Most solutions for 'unzip to RAM' assume that we are requesting the file over the internet, but the zip is local. When extracted, the contents would fit in RAM.

Thanks

@ratal
Copy link
Owner

ratal commented Oct 6, 2021

Thanks for the idea, could be investigated.
ZipFile allows read() and seek() so it could read the file transparently while decompressing it but I do not think it loads the complete file into memory. In the end, if there is a lot of pointer travel in the file (can happen for reading block that could be a bit everywhere), it could lead to performance penalty while keeping memory impact. I guess should be benchmarked.
BytesIO could load in memory it seems but I am wondering if it is appropriate for all use cases -> If going in this direction, I would recommend to make it optional.

@darth3PO
Copy link
Contributor Author

darth3PO commented Oct 6, 2021

Thanks for your thoughts. I will try to learn more about BytesIO and see if I can implement something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants