Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Certain (very large) Hamamatsu NDPI files cannot be opened with OpenSlide #174

Open
GeertLitjens opened this Issue Feb 4, 2016 · 11 comments

Comments

Projects
None yet
5 participants

I have some Hamamatsu slides which are very large (>5GB) which cannot be opened (error message is "Can't validate JPEG for directory 0: Expected marker at 4294971018, found none". I am able to share such a file if this is needed. They do open with Bioformats and NDPIView.

I am sure it is directly related to the size of the file, the smaller files scanned in the same batch are ok.

ilykos commented Jun 21, 2016 edited

Same issue here.
This concerns the image areas captured from 50mm x 75mm slide format.

ylaizet commented Jul 1, 2016

Hello, I used the openslide-python lib and got the same issue on Hamamatsu file > 4.2Go. None of the big files can be opened with openslide while the windows ndpi viewer works perfectly on those big files.
Is there a workaround?

CamNZ commented Jul 1, 2016

Are you trying to load the entire slide into memory? This is a common source of errors with large WSI files (instead just load parts of the slide as you need them)

Owner

bgilbert commented Jul 2, 2016

@CamNZ That is indeed a common source of errors, but this is a real bug.

@ylaizet The workaround, for what it's worth, is to keep pyramid levels smaller than 4 GiB (or perhaps 2 GiB) -- by reducing either the scan resolution or the size of the scan area.

All: this bug is on my radar, but I may not have a chance to get to it for a while. It's actually more of a missing feature, and the fix may not be trivial.

ylaizet commented Jul 4, 2016

Hey, I use the openslide-python deepzoom_multiserver.py script to set up a server which lists all available slides by using the lowlevel.detect_vendor function - which maps to the openslide_detect_vendor function - to detect the files. As far as I could see, from there, I get None (NULL) from the detection on big files whereas smaller files return "hamamastu". I guess, this does not load the entire file yet. Actually, the script runs on a 32Go memory computer so I would be surprised to have a lack of memory but there might be a parameter to adjust?
I keep in mind that for the moment, it would be better to try to have small files. I am looking forward to be able to load bigger file. If you have any idea to make it work in the meantime, I would be happy to give it a try.
Thanks.

ilykos commented Jul 4, 2016 edited

Is there a workaround?

Hi @ylaizet, I've had the same problem and one of the alternatives I considered is to use bioformats python library as @GeertLitjens suggests. Though, I have not tried it myself. I can also mention that this is not the most elegant solution since it depends on java.

Owner

bgilbert commented Jul 6, 2016

@ilykos I haven't worked on this problem yet, and I'm not aware that anyone else has either. The NDPI section of the Hamamatsu page has a summary of the current situation with large NDPI files. The fix will involve looking at sample files with a hex editor to determine how very large levels are stored, and then implementing heuristics in OpenSlide to read them correctly.

You could try to reverse engineer the Bioformats NDPIReader, that is capable of reading these NDPI files, some info should be in there that helps with this.

Owner

bgilbert commented Jul 17, 2016

@GeertLitjens It might be okay to look at Bio-Formats for information about NDPI files, but if you then produce code which is a derivative work of Bio-Formats (for example, by porting some of their code), we won't be able to merge it, since Bio-Formats' GPLv2+ license is stricter than OpenSlide's LGPLv2.1 license.

(I am not a lawyer; this is not legal advice.)

ilykos commented Jul 19, 2017 edited

In my experience this affects svs files as well, when a certain size threshold is overstepped (usually around 4.0-4.2 GB boundary)

Different error messages come from libjpeg on different levels:

  • LEVEL 1: JPEGLib: Not a JPEG file: starts with 0xff 0x11.
  • LEVEL 2: JPEGLib: Not a JPEG file: starts with 0x11 0x00.
  • LEVEL 3: (no libjpeg error, but an exception is thrown nonetheless)

None of the files I've worked with had errors on level 0, the errors started at level 1 and beyond. All these files were imaged from 76x26mm glass slides, so it is possible there was not enough slide surface area to fail on level 0.

failure_report

Owner

bgilbert commented Jul 29, 2017

@ilykos Thanks for the report. The Aperio problem almost certainly has a different cause, so I've opened a new bug for it: #212.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment