Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

get_best_level_for_downsample too strict wasting speed performance? #203

Open
jetic83 opened this Issue Apr 20, 2017 · 3 comments

Comments

Projects
None yet
3 participants

jetic83 commented Apr 20, 2017 edited

Context

Issue type (bug report or feature request): Feature request
Operating system (e.g. Fedora 24, Mac OS 10.11, Windows 10): Win7
Platform (e.g. 64-bit x86, 32-bit ARM): 64-bit
OpenSlide version: 3.4.1
Slide format (e.g. SVS, NDPI, MRXS): svs

Details

I encountered several times a non-intuitive and slow behavior of openslide. Consider a large slide with the 3 l_dimensions stored in the file:

level 0:   55776 x 42423
level 1:   13944 x 10605
level 2:     3486 x   2651

In total, the Deepzoom simulates 17 z_dimensions:

55776 x 42423
27888 x 21212
13944 x 10606
6972 x 5303
3486 x 2652
1743 x 1326
.... x ....

Interestingly, openslide's get_best_level_for_downsampling returns following slide_from_dz_level for those z_dimensions:

0
0
0
1
1
2
2
2
..2

This means:

  1. for DZ level 55776 x 42423, it        samples from file level 0 (55776 x 42423),
  2. for DZ level 27888 x 21212, it also samples from file level 0 (55776 x 42423) and downscales,
  3. for DZ level 13944 x 10606, it also samples from file level 0 (55776 x 42423) and downscales,
  4. for DZ level   6972 x   5303, it        samples from file level 1 (13944 x 10605) and downscales,
  5. for DZ level   3486 x   2652, it also samples from file level 1 (13944 x 10605) and downscales,
  6. for DZ level   1743 x   1326, it        samples from file level 2 (  3486 x  2651) and downscales,
  7. then continuing sampling from level 2, since there is nothing smaller in the file.

I understand the algorithm and why it is doing so: the constraint is that it samples from the next larger or equally sized level in the file.

But (3.) seems counter-intuitive:

If we would loosen the constraint and not do strictly downsample, we could sample DZ level 13944 x 10606 from level 1 (13944 x 10605) which reduces the extend of resizing.

Analogously (5.) sampling 3486 x 2652 from file level 2 (3486 x 2651).

Thus, the whole function get_tile(...) could be much faster, since a lot of re-sizing cost vanishes.

Of course, (3.) and (5.) do so since the height is 1 pixel too small to sample from the higher level. But to sample from a much larger next-lower level seems a too large punishment for this one pixel.

This problem repeats in a lot of other slides which do have those rounding issues with odd width or height.

What do you think?

Best,
Peter

Contributor

jaharkes commented Apr 20, 2017

Actually, it seems to me that deepzoom is picking the wrong dimensions. If you start with 55776 x 42423 and half both dimensions, I'd expect to get 27888 x 21211, or else you have to invent half a pixel worth of data. The next level would then be 13944 x 10605 and match perfectly with the file level 1.

Of course in this case you lost half a pixel data on the first downscale, and effectively 2 pixels worth of data on the next level, etc. until you hit an evenly divisible number.

Contributor

jaharkes commented Apr 21, 2017

I guess the loss of pixels is actually the bigger deal. The 13944 x 10605 image has lost 3 pixels from the original, but were they all taken from one side, divided across both sides. Or did the original image get scaled so that it actually has interpolated values of all the original data on the new grid.

For the 13944 x 10606 version only a single pixel had to be added, possibly by doubling the last row or column of the original full resolution image. But because it is 2 levels up from the full resolution image only 25% of that last pixel actually consists of data that was not in the full resolution image.

Continuing this line of reasoning to the higher levels, 3486 x 2651 is 1/16th of a 55776 x 42416 image, so it has lost 7 pixels somewhere in the scaling/rounding, unless again it somehow computes interpolated values. While the corresponding deepzoom level 3486 x 2652 scales to 55776 x 42432 and so it has added 9 pixels to the original image. Those 9 pixels really account for only half of the edge pixel of the final scaled down version. Something could be said for trying to add such extra pixels by splitting this between both sides of the image instead of only one end, which is what I think openslide does now.

jetic83 commented Apr 21, 2017

Thanks, @jaharkes, for your thoughts.

I tried to feed Deepzoom with your expected dimensions by changing
z_size = tuple(max(1, int(math.ceil(z / 2))) for z in z_size) to
z_size = tuple(max(1, int(math.floor(z / 2))) for z in z_size).

But then, the simulated deepzoom pyramide does not have 17 levels anymore, but only 16. This leads to an index-out-of-bounds exception, since OpenSeadragon still is assuming 17 levels for this image.

This leads to my assumption that Openslide and OpenSeadragon apparently both have the convention to calculate the pyramide with math.ceil, not to loose any pixel information. But the scanner that generated the file might follow another convention, namely to leave the last pixels away if they are not a multiple of the integer level downsample.

Still, it would be nice if Openslide could handle this by tolerating this pixel gap (which anyway disappears when we zoom in, since then downsample will be 1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment