Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create and support a single-file file format for storing the image tiles #944

Open
eriksjolund opened this issue May 18, 2016 · 76 comments
Open

Comments

@eriksjolund
Copy link

Reading byte ranges from files is supported in many cases in a web browser:

A byte range of a local file can be read by the File API.
A byte range of a file stored on a web server can be downloaded if the web server supports the
HTTP Range header.
A byte range of a file stored on AWS S3 can be downloaded through the Amazon S3 REST API.

I think it would be a good idea to create a simple file format that would have a header section in the beginning of the file and then have all the image tiles after it.

The header would store information about

image width
image height
tile size
tile overlap
and maybe image file format (JPEG/PNG)
and an list of the byte ranges of all the image tiles

What do you think?
What are the advantages and disadvantages of such a single-file format instead of using the multiple-file DZI format?

One advantage would be that system administration is simplified as you only need to handle one file per image instead of possibly thousands of files per image.

To simplify the handling and storing of the image tiles, I would suggest
we create a unique ordering of all the image tiles, in other words a

function tile_id(x_coord, y_coord, level, width, height, tile_size)

that returns an integer number between 0 and N - 1, where N is the number of tiles.

The unique ordering makes it easy to design the file header, because the image tile byte range start positions (and maybe end positions) could be stored as just an array of numbers.

Such a function could be implemented something like this:

function num_levels(width, height) {
    return Math.ceil(Math.log2(Math.max(width, height))) + 1;
}

function scaled_tile_size(num_levels_, level, tile_size) {
    var count = num_levels_ - level - 1;
    var factor = 1;
    for (var i = 0; i < count; i++) {
        factor = factor * 2;
    }
    var result =  tile_size * factor;
    return result; 
}

function num_tiles_level(width, height, level, tile_size) {
    var num_l = num_levels(width, height);
    var scaled = scaled_tile_size(num_l, level, tile_size);       
    var num_tiles = Math.ceil(width / scaled) * Math.ceil(height / scaled) ;
    return num_tiles;
}

function tile_id(x_coord, y_coord, level, width, height, tile_size) {
    var result = 0;
    for (var i=0; i< level; i++) {
        result = result + num_tiles_level(width, height, i, tile_size);
    }
    var num_l = num_levels(width, height);
    var scaled_tile_s = scaled_tile_size(num_l, level, tile_size);
    var num_rows = Math.ceil(height / scaled_tile_s);
    result = result + (num_rows * y_coord) + x_coord;
    return result;
}

Actually I've already tried out this approach, i.e. storing all the image tiles in one file and using the File API to open a local file with a slightly modified OpenSeadragon. It worked. Sorry I don't have that published on the web yet, but I plan to in the coming weeks.

@iangilman
Copy link
Member

Interesting idea! There are already a number of image file formats that support progressive loading like this, for instance JPEG 2000 and JPEG XR ... what would you think about using one of those formats?

@eriksjolund
Copy link
Author

I haven't looked into those formats, but I guess you then would limit yourself to only JPEG.
I think it is better to design a generic container file format because it wouldn't restrict the choice of the image file format.

What I would like to see is a file format that is

  • easily parsable by javascript (and other programming of course)
  • it should cope with the situation when there are long round trips, i.e. when a single read could take a long time. Preferably only 2-3 reads should be need to for getting one image tile.
  1. reading the header size
  2. reading the header
  3. reading one image tile

But at the time you need to download an image tile, you probably already would have downloaded the header anyway, because you would need the image_width, image_height, tile_size, tile_overlap information to pass to OpenSeadragon beforehand.

@iangilman
Copy link
Member

Well, despite their names, JPEG 2000 and JPEG XR are not actually JPEG files... they are original formats that both have the feature of being able to access individual tiles directly. More information:

https://en.wikipedia.org/wiki/JPEG_2000
https://en.wikipedia.org/wiki/JPEG_XR

The acronym JPEG just refers to the standards body that manages them.

Anyway, I just bring them up because it's good to do your due diligence before inventing a new file format.

Standards

Of course if those formats don't work for some reason, then there's a good reason to move forward with a new one! Either way, I'm interested in seeing what you come up with.

@iangilman
Copy link
Member

Incidentally, this is a server not a file format, but you might also look into http://iipimage.sourceforge.net/.

@eriksjolund
Copy link
Author

Funny comic! I like XKCD.

I would like to have a file format that could be converted back and forth from the DZI directory file structure. I must confess I haven't spent much time looking into
the JPEG 2000 and JPEG XR file formats but it seems they both can handle more advanced stuff than
what you could store in DZI.

Another thing is that
https://en.wikipedia.org/wiki/JPEG_XR
mentions a 4 Gb max size of the container format which would make the JPEG XR container format unsuitable.

Anyway, I have an example of an experimental file format that I made with protobuf.js that contains both image tiles and some scientific measurement data. Unfortunately, I don't have any public experiment data files to show yet. Those software projects are very much in a flux right now. They will probably change a lot.

https://github.com/eriksjolund/osd-spot-viewer/tree/master/from_layout
https://github.com/eriksjolund/st_exp_protobuf
eriksjolund@d7bacf3

@eriksjolund
Copy link
Author

I just added the C++ code that creates the experiment data files.

For instance:
https://github.com/eriksjolund/st_exp_protobuf/blob/master/c%2B%2B/vips_tiles/vips_tiles.cc
runs
vips dzsave photo.jpeg outpath --layout dz --overlap my_overlap --tile-size my_tile_size --container fs --suffix=.jpg[Q=85]
and also parses the DZI file (in XML format).

The file
https://github.com/eriksjolund/st_exp_protobuf/blob/master/c%2B%2B/dzi_helper/dzi_helper.cc
contains some helper functions for having a well-defined ordering of the tiles. The ordering is important because the tiles are stored as a flat array in the data file.

Note that the same ordering needs to be implemented on the JavaScript side. Anyway, the approach seems to work. I can use a slightly modified OpenSeadragon (eriksjolund@d7bacf3) to display the high-resolution photo together with some scientific measurement data painted as circles. Right now the web browser reads the experiment file with the File API but I plan to add support for remote file access too.

@iangilman
Copy link
Member

Very cool! Looks like you're making great progress.

I don't know that it makes sense for the OpenSeadragon org (such as it is) to adopt ownership of this new format, but we can certainly point people to it as appropriate (on the "creating zoomable images" page, for instance). Perhaps the reader can be a plugin for OSD. Of course if OSD needs to be modified to support such kinds of plugins, that sounds good too.

If you're interested in getting more people involved in the project, you might post on https://gitter.im/openseadragon/openseadragon and I'm happy to post on Twitter as well; let me know.

@eriksjolund
Copy link
Author

I think the single file format would be more valuable if it would come together with a command line image conversion tool and a desktop viewer tool (nw.js/electron). Something that would be installable with "apt-get install" on my Ubuntu Linux desktop. It would of course require quite some work to write and package such software, but such a combination of tools would be quite useful to me.

Anyway, I think you have a point that it is not self-evident that OpenSeadragon should adopt ownership of such a file format.

Regarding software code integration to OpenSeadragon, this is my maybe naive understanding of it:
I think the best would be to reorganize the OpenSeadragon code so that retrieving image tiles would not be so tied to URL:s. There should be an interface that takes some input arguments (level, x, y) and returns a canvas. I guess that would fit both fetching URLs (or byte ranges from an URL) and dynamically creating image tiles (for instance Mandelbrot fractals). A file format could then be implemented as a plugin to such an interface.

I made a post on Gitter. Thanks for your offer to post on Twitter. I'll get back to you regarding that. I think we could wait a bit until there is a demo example to show.

@iangilman
Copy link
Member

Yeah, that seems like a reasonable integration strategy... relying on URLs has been problematic in other ways (like the fractal generation example you mentioned).

One thing to keep in mind at least for the time being we are still supporting the non-canvas implementation where we move elements around on the page. Whatever we come up with shouldn't break that, even if the new features don't work in that mode.

Would you be up for taking a whack at such a reorganization?

@eriksjolund
Copy link
Author

Would you be up for taking a whack at such a reorganization?

I don't have time right now but maybe in the future.

@iangilman
Copy link
Member

Cool. Well, perhaps someone will pick it up in the meantime!

@iangilman
Copy link
Member

@eriksjolund People have been asking about this kind of tech... can you post here on your progress?

@eriksjolund
Copy link
Author

For a quick demo open the web page
https://eriksjolund.github.io/osd-spot-viewer-webpack-build/build2/
and click
Open some example data files and layouts
(It should at least work with Google Chrome and Firefox in Ubuntu 16.04)

To try the web page out you could also open an
example datafile.

It can either be opened directly as a remote URL or first be downloaded and then opened as a local file. Note that the image tiles will be downloaded as byte ranges from the data file as they are needed.

Right now a data file contains both image tiles and gene expression measurement data.
I plan to make a photo-only version, as that would be useful for more people.
I am also thinking about adding functionality to create a data file from a high-resolution image. So something like vips dzsave but it would produce a single file with all the tiles. It would also run in the web browser.

@iangilman
Copy link
Member

Awesome, thank you for sharing the info!

@iangilman
Copy link
Member

Note that #1055 may help with this by providing the means to load ranges from the server.

@jcupitt
Copy link

jcupitt commented Mar 19, 2018

Regarding vips dzsave, it has this feature already, kind of. If you do:

$ vips dzsave somefile.tif mypyr.zip

It will write an uncompressed zip file (zip64, if necessary) containing all of the pyramid. You can copy this to a server and unzip, or (as you say) you can have a small piece of JS to read the zip header (the last chunk of the file) to get the filenames, sizes and offsets, then do a simple http range to pull out each tile.

@KempWatson
Copy link

KempWatson commented Dec 13, 2019

Just spotted this thread. We've already been doing this since about 2012 - see https://zif.photo.

@iangilman
Copy link
Member

Cool! What do you think it might take to support it in OpenSeadragon? Is there a reference implementation for viewing it?

@al-muammar
Copy link

@KempWatson, any progress so far with ZIF format?
@jcupitt, could you send me SZI and ZIP proprietary versions? Maybe I'll be able to buy a license.

@jo-chemla
Copy link

Hi all, very interesting reads on the possibility to store and stream in OSD, SZI (uncompressed, zipped dzis) or ZIF.

Coming from GIS, the geospatial community has worked out in the recent years a standard way for storing massive georeferenced images, named Cloud Optimized Geotiff (Cog or cogeo). This format is indeed just a standard and agreed upon way to store pyramids within a tiff file, each pyramid level being written by blocks of a given number of pixel for data adjacency, stored in contiguous memory blocks. This file format is meant to be streamed to web clients using range requests (although I am not aware of a direct implementation of this, client-side, yet, which instead happens using middleware in the form of TiTiler.

Maybe this titiler can be of help to the osd team? Would this COGeo standard be easier to implement inside OSD instead of SZI or ZIF (without any support for geotags of course)? Subscribing to the feed as the discussion is really interesting.

@rmontroy
Copy link
Contributor

Another option is to use an AWS Lambda function to translate Deep Zoom or IIIF requests to byte range fetches against S3 objects in their native format. As an example, I created a function that uses a modified version of OpenSlide to fetch directly from Aperio SVS (TIFF) files stored in S3. Viewing performance is comparable to using the Aperio viewer and server, once the Lambda instances are warmed up.
S3VS (S3 Virtual Slide)
The advantage to this approach is that you don't need to preprocess the source files, just upload them to S3 as-is.

@KempWatson
Copy link

KempWatson commented Jul 27, 2021 via email

@ap--
Copy link

ap-- commented Jul 27, 2021

@rmontroy very nice. I will definitely try this.
Regarding your modified version of openslide: We've created a prototype of a tifffile based drop-in replacement for openslide https://github.com/bayer-science-for-a-better-life/tiffslide that allows you to load from any local/remote location that fsspec supports.
Currently the only tested image format is aperio svs.

@rmontroy
Copy link
Contributor

@ap-- Nice. I might try out tiffslide. That looks like a much simpler approach. OpenSlide was a real pain to modify and build for Lambda. And the lead maintainer won't accept my contributions unless I create an entirely new interface for it, which I don't have time for.
We can continue to discuss in the tiffslide and/or s3vs repos.

@rmontroy
Copy link
Contributor

@KempWatson I would have happily paid you to implement our pathology viewer for us except for the fact that we're a U.S. federal gov't contractor (NIH) and need to be FISMA authorized. Creating a serverless app in AWS made it much easier to do that by myself.

@alhuber1502
Copy link

I'm unclear skimming this thread if Zoomify ZIF support is something OSD will get in (the near) future? Is there a plugin in development? Thanks!

@iangilman
Copy link
Member

iangilman commented Nov 5, 2021

@alhuber1502 there's nothing I'm aware of for ZIF, but there certainly seems to be a lot of interest (judging by this thread)... hopefully someone will make it happen! You might try following up with some of the people who have posted about ZIF above.

@alexp-nl
Copy link

alexp-nl commented Feb 7, 2023

Any news about possible ZIF format support in OSD? My hosting provider is getting annoyed about the amount of inodes that I am using.

@KempWatson
Copy link

KempWatson commented Feb 7, 2023 via email

@alexp-nl
Copy link

alexp-nl commented Feb 7, 2023

Great, thanks Kemp!

@iangilman
Copy link
Member

@KempWatson Awesome! Once it's ready, let's add it to the plugin list :)

@phillipjohnson
Copy link

@KempWatson Checking in to see if there is any progress for zip/zif/szi etc.? We are struggling with the operating system memory constraints of having millions of image tiles on our webserver.

@jo-chemla
Copy link

Also interested in OSD support for zipped-dzi/zif/szi.

On a similar note, Titiler is a middleware initially made to provide TMS tile url endpoints from COG images (Cloud Optimized Geotiffs). COG is a great addition to the geospatial community as a single file format standard to store massive, tiled imagery, and has been an OGC standard for a few years. [realized I already talked about it earlier in the thread]

A new addition, a few months back, TiTiler started implementing an ImageReader for non-geo files, which handles images similar to COG, tiff pyramids with all zoom levels stored within a single file with block memory contigency.
Original twitter thread for the announcement and resulting PR1 and PR2. Leaflet was used for testing non-geo tiles.

Not the ideal solution, but storing large single file images and using TiTiler as a middleware can be a step in the good direction for people wanting to avoid massive tile file counts. Plus correctly caching titiler requests allow for a real-time tile viewing experience.

@iangilman
Copy link
Member

@jo-chemla Cool! That's good to know. Seems like the sort of information we might want to add to the website somehow... I'm not sure exactly where would be best. Maybe somewhere on https://openseadragon.github.io/examples/creating-zooming-images/?

@iangilman
Copy link
Member

Another interesting possibility: https://github.com/pearcetm/GeoTIFFTileSource from @pearcetm.

@jbhanks
Copy link

jbhanks commented Jun 21, 2023

I don't do much with js and I am new to hosting, so I'm not quite sure what to do with the info here. But if I understand correctly, the current status of things is:

  1. Hosting DZIs as objects is still not straightforward, but progress is being made.

  2. DZIs can be served as single files archiving them in an uncompressed zip, but this requires modifying OSD code so as to read the filenames from the end of the zip file and extract them. This sounds doable, but since I don't do much js and am not familiar with the libraries, I don't know how much work that is to implement in a way that will be fast end efficient.

  3. ".szi" aka "smart zoom image" is just a name for a dzi that has been zipped as above, plus and additional file, scan-properties.xml. I am unclear if that file is required or not. I think maybe I can just create it with vips by using the .zip/.szi extension
    https://github.com/smartinmedia/SZI-Format/blob/master/SZI%20format%20description%20-%202018-11-24.pdf

  4. OSD could thus be modified as described in 2) to become an szi viewer for 3). However there is no currently released open source code that implements this.

  5. The stuff discussed in this thread goes some way to doing this, but it would still take some legwork to implement. Just looking at the code quickly it isn't obvious to me how to integrate it. If I were going to do that, I should probably just be a contributor to the project in some way, as there is no point in me spending countless hours doing a crappy version of what people are doing here.

  6. There is also a "content byte range approach" using ajax described here. And if I use Backblaze B2, that should work with code for the Amazon S3 API. Since I don't do anything with Ajax or JQuery besides copy-paste, I would have to really study to make use of this thread. But just to confirm, byte range is a way that I can use object storage to host DZIs, right? I don't want to waste my time if that isn't correct (or if I will have to do a lot of work to implement making the byte ranges). I'm unclear on if this method can be used with a zip file or if the dzi itself (with the help of JQuery) maps the byte ranges (as opposed to paths) of tiles of a single large image file that can be hosted as an object.

  7. If I am impatient and dead set on hosting large, zoomable images as one file, I can use IIIF instead. However that means going through the process of selecting a viewer, which isn't a process I relish. Especially since I know that OSD/DZI gives me great performance for the moment, I just need a plan for when my gallery gets big and starts using up inodes.

Sorry about the length of the comment, I just don't want to take for granted things that are false.

@pearcetm
Copy link
Contributor

Like @iangilman mentioned, you can check out https://github.com/pearcetm/GeoTIFFTileSource (demo page: https://pearcetm.github.io/GeoTIFFTileSource/demo.html). The plugin is based on the geotiff.js library. Large image pyramids can be stored as pyramidal TIFF files (vips should be able to save such pyramidal TIFFs I believe), and this plugin can read them directly using byte ranges without requiring additional server-side processing. It works both on the web and on local files. Some common pathology image file formats like Aperio .svs are actually just pyramidal TIFF files under the hood, and can be read directly like this.

7. If I am impatient and dead set on hosting large, zoomable images as one file, I can use IIIF instead. However that means going through the process of selecting a viewer, which isn't a process I relish. Especially since I know that OSD/DZI gives me great performance for the moment, I just need a plan for when my gallery gets big and starts using up inodes.

You can use OSD as the viewer for IIIF - there's already a tile source for that - and plenty of discussion and documentation around here as well as the documentation pages.

@jbhanks
Copy link

jbhanks commented Jun 22, 2023

Like @iangilman mentioned, you can check out https://github.com/pearcetm/GeoTIFFTileSource (demo page: https://pearcetm.github.io/GeoTIFFTileSource/demo.html). The plugin is based on the geotiff.js library. Large image pyramids can be stored as pyramidal TIFF files (vips should be able to save such pyramidal TIFFs I believe), and this plugin can read them directly using byte ranges without requiring additional server-side processing. It works both on the web and on local files. Some common pathology image file formats like Aperio .svs are actually just pyramidal TIFF files under the hood, and can be read directly like this.

  1. If I am impatient and dead set on hosting large, zoomable images as one file, I can use IIIF instead. However that means going through the process of selecting a viewer, which isn't a process I relish. Especially since I know that OSD/DZI gives me great performance for the moment, I just need a plan for when my gallery gets big and starts using up inodes.

You can use OSD as the viewer for IIIF - there's already a tile source for that - and plenty of discussion and documentation around here as well as the documentation pages.

Ok, well I did manage to view a IIIF3 in OSD, but now I am back to the original issue of deciding between a .zip vs a byte range vs giving up on object storage. I am still feeling pretty clueless reading the related threads. Maybe I will try a pyramidal tiff. I was hoping to have some compression but maybe it will be ok.

Also with IIIF3 files, I was unable to view ones composed of PNGs, as it kept looking for jpg files. Not sure is the issue there is OSD or the way I created the tiles with libvips.

@PolCPP
Copy link

PolCPP commented Jun 22, 2023

@jbhanks I've been using my custom zipped format (uses 2 files, a zip file an a reference file) for a year at our company on DO spaces without any issue.

master...PolCPP:openseadragon:master

You just need to apply those changes to the last version of OSD and compile it.

Original comment: #944 (comment)

@jbhanks
Copy link

jbhanks commented Jun 22, 2023

@jbhanks I've been using my custom zipped format (uses 2 files, a zip file an a reference file) for a year at our company on DO spaces without any issue.

master...PolCPP:openseadragon:master

You just need to apply those changes to the last version of OSD and compile it.

Original comment: #944 (comment)

Thanks. I see that was made three years ago. Is there a reason why it hasn't been included in the main branch yet?

@pearcetm
Copy link
Contributor

I've been using my custom zipped format (uses 2 files, a zip file an a reference file) for a year at our company on DO spaces without any issue.

@PolCPP To be clear, your zipped format doesn't do any further compression beyond what was originally done on the images themselves, correct? (I believe that's what you said before). I ask just because @jbhanks commented about "hoping to have some compression" but I don't think that is compatible with any of these range-based request techniques.

Thanks. I see that was made three years ago. Is there a reason why it hasn't been included in the main branch yet?

@jbhanks My understanding is that custom functionality like this would typically be implemented as a plugin rather than being integrated into the main OpenSeadragon code base, unless there is a solid case for really widespread usage that would justify bundling it into the main project.

@jbhanks
Copy link

jbhanks commented Jun 23, 2023

@pearcetm, by "hoping to have some compression, I meant (lossless) image compression from the png format.

Also re the plugin: That makes sense, but shouldn't it have it's own repo then (or be included but the import commented out by default)?

@PolCPP I built the current version of OSD with your plugin, but js throws errors when I go to my page using the modified OSD.

Uncaught TypeError: g.TileSource is undefined
    <anonymous> edztilesource.js:96
    <anonymous> edztilesource.js:286

and

Uncaught TypeError: m.Button is not a constructor
    bindStandardControls viewer.js:1891
    Viewer viewer.js:331
    OpenSeadragon openseadragon.js:805
    <anonymous> symmetry:21

Just to be clear, all I should need to do is:

  1. Download the plugin file here
  2. Clone the most recent version of OSD source
  3. Move to the OSD /src directory
  4. Edit the Gruntfile.js to include "src/edztilesource.js" in sources.
  5. Build OSD and replace the OSD on my site with it.

Update: I was able to build it, I didn't realize that order mattered in list of sources.

Update2: I've built OSD with @PolCPP's plugin, but I still can't load the .zip. More details in my comment on the page for the plugin code

@PolCPP
Copy link

PolCPP commented Jun 23, 2023

@pearcetm no compression. honestly i don't think you would gain a lot considering the images are already compressed.

@jbhanks I'll check it out in a bit. From memory, replace the fast i don't know how it works way for the old safe way part of the code and it should work.

@jbhanks
Copy link

jbhanks commented Jun 23, 2023

@jbhanks I'll check it out in a bit. From memory, replace the fast i don't know how it works way for the old safe way part of the code and it should work.

Thanks! I did get a bit further just using your whole branch, but still not loading with errors like: Tile 10/1_0 failed to load: http://127.0.0.1:5002/static/dzi/testimage_files.zip - error: Image load aborted tiledimage.js:1543:18

@iangilman
Copy link
Member

iangilman commented Jun 23, 2023

Also re the plugin: That makes sense, but shouldn't it have it's own repo then (or be included but the import commented out by default)?

It's up to @PolCPP to make such a thing, but yes, it would be lovely! @PolCPP If you do, please add it to the plugins list: https://openseadragon.github.io/#plugins. You can do so by making a PR against: https://github.com/openseadragon/site-build/blob/a5aa76024c139fa467f9d09d28ef9e13e1c23ea9/www/index.html#L193 Either way, thank you for helping @jbhanks!

@jbhanks
Copy link

jbhanks commented Jun 24, 2023

Also if it's not too much trouble, if @PolCPP can provide a known working test .edz file, then I can investigate whether my issue might be with the file I generated using the dzi2edz script (I had to modify it a little to get it to run). I'm happy to do anything I can with my limited JS skills to help get it working and then up to date with the current release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests