Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get a path pointing to io.BytesIO? #402

Closed
WaterKnight1998 opened this issue Jun 3, 2020 · 32 comments
Closed

How to get a path pointing to io.BytesIO? #402

WaterKnight1998 opened this issue Jun 3, 2020 · 32 comments

Comments

@WaterKnight1998
Copy link

I am finding lot of librarys that use the filename for opening the file. However, in my Flask app I would like to get a filename that points to IO.BytesIO without the need o using temporal files

@lurch
Copy link
Contributor

lurch commented Jun 3, 2020

@WaterKnight1998
Copy link
Author

WaterKnight1998 commented Jun 4, 2020

I'm not sure I understand your question?

@lurch I am going to try to explain it again.

My program have an array of bytes called img_bytes.

I have a Python library that loads files using systempath. However, I don't have any file stored just a memory array of bytes.

So, I looked for I library that creates a syspath pointing to that bytes array.

I make it work with the following code snippet.

tmp_fs = TempFS()
tmp_fs.appendbytes("temporal.nd2",img_bytes)
nd2Image = pims_nd2.ND2_Reader(tmp_fs.getsyspath("temporal.nd2"))[0]
tmp_fs.close()

I looked also MemoryFS but getsyspath didn't work.

@willmcgugan
Copy link
Member

@WaterKnight1998 If you have an API that only accepts a path and not a file-like object then I'm afraid your only option is to copy your data to the OS filesystem.

It may look a bit awkward, but it's generally not all that slow. The OS may never actually physically write the data to disk.

@WaterKnight1998
Copy link
Author

@WaterKnight1998 If you have an API that only accepts a path and not a file-like object then I'm afraid your only option is to copy your data to the OS filesystem.

It may look a bit awkward, but it's generally not all that slow. The OS may never actually physically write the data to disk.

Okey. How can I achieve that The OS may never actually physically write the data to disk.?

The previous code snippet is a good idea?

@willmcgugan
Copy link
Member

It's a feature of operating systems. When you work with files, the OS will try to cache files in memory as much as possible.

Your snippet will likely work fine, but it's probably doing a little more work than necessary. TempFS will create a directory, which you don't strictly need. If you want it to be as efficient as possible, look at the tempfile module in the standard library, specifically the NamedTemporaryFile function.

@lurch
Copy link
Contributor

lurch commented Jun 4, 2020

@willmcgugan Maybe something along these lines would be a useful addition to the FAQ ?

@lurch
Copy link
Contributor

lurch commented Jun 4, 2020

...and on Linux, you could write your data-file into a subdirectory of /dev/shm and that stores things in a ram-disk rather than on a physical disk. (which obviously only works if your datafile is smaller than the amount of free memory!)

@WaterKnight1998
Copy link
Author

WaterKnight1998 commented Jun 4, 2020

If you want it to be as efficient as possible, look at the tempfile module in the standard library, specifically the NamedTemporaryFile function.

Thank you for the info @willmcgugan

I have tried next code snippet:

tmp_file=NamedTemporaryFile()
tmp_file.write(img_bytes)
tmp_file.seek(0)
nd2Image = pims_nd2.ND2_Reader(tmp_file.name)[0]
tmp_file.close()

However, ND2 throws an error of lim not found! I think that I am saving bad the info and end mark is missed

@WaterKnight1998
Copy link
Author

...and on Linux, you could write your data-file into a subdirectory of /dev/shm and that stores things in a ram-disk rather than on a physical disk. (which obviously only works if your datafile is smaller than the amount of free memory!)

Nice to know. Could you give a little more intuition?

@willmcgugan
Copy link
Member

@WaterKnight1998 I think you will need to close the file first, but make sure you create the file with delete=False. Then you'll need to os.remove the file afterwards. BTW are you certain that API doesn't offer a way to read data directly from a string? That may be simpler.

@lurch I think you will pretty much get the same effect with a file in tmp/ depending on how its mounted. And the OS can still decide to use physical storage if it can't fit it in memory. At least that's my understanding... re FAQ, yeah maybe. I think this is the first time it's been asked. Unless my memory fails me!

@lurch
Copy link
Contributor

lurch commented Jun 4, 2020

I think you need to .close() the file before you pass its filename to ND2_Reader ? Otherwise Python may still be internally buffering the data.

Could you give a little more intuition?

https://www.google.com/search?q=linux+%2Fdev%2Fshm+ramdisk

@WaterKnight1998
Copy link
Author

WaterKnight1998 commented Jun 4, 2020

I think you need to .close() the file before you pass its filename to ND2_Reader ? Otherwise Python may still be internally buffering the data.

This do the trick @lurch and @willmcgugan :

tmp_file=NamedTemporaryFile(delete=False, suffix=".nd2")
tmp_file.write(img_bytes)
tmp_file.close()
nd2Image = pims_nd2.ND2_Reader(tmp_file.name)[0]
        
os.unlink(tmp_file.name)

@WaterKnight1998
Copy link
Author

WaterKnight1998 commented Jun 4, 2020

@WaterKnight1998 BTW are you certain that API doesn't offer a way to read data directly from a string? That may be simpler.

By string you mean io.StringIO? @willmcgugan

@willmcgugan
Copy link
Member

By string you mean io.StringIO?

You can get the data from the StringIO with my_stringio.getvalue(). If the API accepts a string, this would be the most efficient way of supplying the data.

@WaterKnight1998
Copy link
Author

WaterKnight1998 commented Jun 4, 2020

By string you mean io.StringIO?

You can get the data from the StringIO with my_stringio.getvalue(). If the API accepts a string, this would be the most efficient way of supplying the data.

@willmcgugan the example in the docs is as follows:

from pims import ND2_Reader
frames = ND2_Reader('some_movie.nd2')
frames[82]  # display frame 82
frames.close()

I tried to use io.BytesIO but didn't work

@WaterKnight1998
Copy link
Author

BTW are you certain that API doesn't offer a way to read data directly from a string? That may be simpler.

I also asqued them some days ago and they said this to me:

Hi,

Sorry, the open only works with actual file paths. As this project is merely wrapping the ND2 reading library from Nikon, we don’t have much control.

You could try virtual file systems if you want to stay in memory, but it depends on your OS how that works. I don’t have experience with that.

Or, try a pure Python alternative like https://github.com/rbnvrw/nd2reader

@lurch
Copy link
Contributor

lurch commented Jun 4, 2020

A quick bit of searching and it looks like it uses a C-based SDK in which case you'll need to go with the tempfile approach rather than trying to pass the string-value directly.

@WaterKnight1998
Copy link
Author

A quick bit of searching and it looks like it uses a C-based SDK in which case you'll need to go with the tempfile approach rather than trying to pass the string-value directly.

Thank you for all your help!! @lurch & @willmcgugan
I am glad that there are people like you in the world helping with a problem that is not even related to their library!

@lurch
Copy link
Contributor

lurch commented Jun 4, 2020

I think passing dir='/dev/shm' to the NamedTemporaryFile constructor will guarantee that the file only gets created in ramdisk.

@WaterKnight1998
Copy link
Author

dir='/dev/shm'

I tried that and program is working too. Thank you for leeting know that /dev/shm is ramdisk.
It have optimized my rest API.

However, the main bottleneck of my rest API is the deep learning model!

@lurch
Copy link
Contributor

lurch commented Jun 4, 2020

I guess this can be closed then? Good luck with your deep learning 🤞

@WaterKnight1998
Copy link
Author

I guess this can be closed then? Good luck with your deep learning

Thank you again for all your help!!

@WaterKnight1998
Copy link
Author

I think passing dir='/dev/shm' to the NamedTemporaryFile constructor will guarantee that the file only gets created in ramdisk.

Hi again @lurch and @willmcgugan I am facing no space left on device errors in /dev/shm inside docker container. Tracing memory usage with df. Files are not getting removed from /dev/shm so the used size keeps growing and growing untill all space is used.

The code that I was using was

tmp_file=NamedTemporaryFile(delete=False, suffix=".nd2", dir="/dev/shm")
tmp_file.write(img_bytes)
tmp_file.close()
nd2Image = pims_nd2.ND2_Reader(tmp_file.name)[0]

 # Removing Temporal File
tmp_file.close()
os.unlink(tmp_file.name)

@lurch
Copy link
Contributor

lurch commented Jun 6, 2020

Off the top of my head, the code you've got there looks okay. I guess you'll need to do ls -l /dev/shm/ from outside of python, to try and track down what's going on? (AFAIK there's nothing "special" about /dev/shm/, as far as all the userspace tools are concerned it's "just another directory")

BTW there's no need to .close() the file twice 😉
Hmmm, perhaps the nd2Image is keeping a reference to the file? Do you need to explicitly "close" the nd2Image as well, before deleting the file? 🤷

@WaterKnight1998
Copy link
Author

WaterKnight1998 commented Jun 6, 2020

BTW there's no need to .close() the file twice

Yes, I was just adding it for trying to remove from remory

Hmmm, perhaps the nd2Image is keeping a reference to the file? Do you need to explicitly "close" the nd2Image as well, before deleting the file? shrug

I am calling it explicity and getting same issues.

        tmp_file=NamedTemporaryFile(delete=False, suffix=".nd2", dir="/dev/shm")
        tmp_file.write(img_bytes)
        tmp_file.close()
        image_reader = pims_nd2.ND2_Reader(tmp_file.name)
        nd2Image = image_reader[0]

        # Removing Temporal File
        image_reader.close()
        os.unlink(tmp_file.name)
        del nd2Image
        del image_reader

@WaterKnight1998
Copy link
Author

Deletting a buffer made the trick lel

@WaterKnight1998
Copy link
Author

Deletting a buffer made the trick lel

@lurch didn't do the trick.I was not trying with an nd2 file....

@lurch
Copy link
Contributor

lurch commented Jun 6, 2020

https://linux.die.net/man/2/unlink has more low-level details (i.e. a file can still be using up disk-space, even if it's not visible to ls -l).
So there must be something that still has the file open? 🤷‍♂️

@WaterKnight1998
Copy link
Author

WaterKnight1998 commented Jun 6, 2020

https://linux.die.net/man/2/unlink has more low-level details (i.e. a file can still be using up disk-space, even if it's not visible to ls -l).
So there must be something that still has the file open?

I have tried commenting the the nd2_reader and looks like it is causing the error. It is based in C++ library. what could i do?

@WaterKnight1998
Copy link
Author

https://linux.die.net/man/2/unlink has more low-level details (i.e. a file can still be using up disk-space, even if it's not visible to ls -l).
So there must be something that still has the file open?

I tried with another library for erading that file and it worked...

So, the problem was that library OMG

@lurch
Copy link
Contributor

lurch commented Jun 6, 2020

It is based in C++ library. what could i do?
So, the problem was that library OMG

You could submit a bug-report against that library, saying that it isn't closing file-handles properly?

@WaterKnight1998
Copy link
Author

It is based in C++ library. what could i do?
So, the problem was that library OMG

You could submit a bug-report against that library, saying that it isn't closing file-handles properly?

Done, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants