Question on 'scaling-out' Cantaloupe #125

imoutsatsos · 2017-07-02T20:29:40Z

Reviewing digital microscopy TIFF images could potentially involve spikes of thousands of image requests hitting the image server. Experience with the Cantaloupe image server has shown that the server performance deteriorates when hundreds of image requests are made simultaneously. This is probably due to the 'on-the-fly' conversion of TIFF images to zoomable images by Cantaloupe, as I observe a swarm of IM processes spawning.

How could we scale out image servers like Cantaloupe? As a practical question, assuming we had access to AWS services what would be the ideal way of deploying Cantaloupe? At present, not using cloud services, I am thinking of running multiple instances of Cantaloupe and using a load-balancer to distribute the requests. Could I pre-compute image pyramids to improve performance? Any other suggestions, or ideas?

Best regards
Ioannis

adolski · 2017-07-03T16:11:49Z

Hi,

I'm not very familiar with AWS outside of S3, but I imagine that scaling Cantaloupe would be similar to scaling any other web app. It sounds like you are on the right track.

Cantaloupe can't pre-generate tiles, but it can post-generate them (using a derivative cache), which would typically improve performance hugely.

If the IM processes are bothersome, you might look into Java2dProcessor or JaiProcessor which don't spawn processes.

kinow · 2017-07-28T09:51:16Z

I'm toying around with the code in Eclipse, slowly learning the code base. So far, I haven't seen anything related to HTTP sessions. So my initial impression is that in AWS you should be able to deploy it to something like AWS Elastic Beanstalk.

Then put a Load Balancer in front of it, a few EC2 instances, possibly medium or bigger, SSD disks as there is some I/O, possibly health check.

This last bit is the one I have a to-do to investigate. Which endpoint can I call to make sure the application is fully operational - i.e. nothing seems broken, the properties/resources for its basic functionalities are present, etc. Just calling / isn't the best choice... perhaps adding JaMon, or Jolokia, prometheus, or just a simple servlet + JSP or velocity that displays things like implementations of JPEG, TIFF, etc readers, the features enabled or not (e.g. delegate script, cache), and so on.

As there is no need for database, you may even deploy to EC2 + Elastic Load Balancer + and use something like CloudFormation of OpsWorks to automate some tasks, instead of Elastic Beanstalk.

With all that, I think you could host a production site in AWS. If I were maintaining a server like this, I would try to investigate the possibility of submitting a pull request for pre-generate tiles, and check if using Java instead of JRuby would increase the performance (only if you are using the delegate script). Then maybe submit a pull request to also enable Java plugins (there's an issue for that I think?).

The last thing that is important is monitoring, so look at logs, health check and tweak the settings if necessary.

kinow · 2017-07-28T09:52:55Z

As there is no need for database

That I'm aware of... there are tests for Jdbc, but I'm not sure if that's for caching, or if by default it's running with some in-memory database... still looking at other parts of the code for now :-) (and learning and having fun doing so !)

RussellMcOrmond · 2017-07-28T14:17:52Z

We will be wanting to explore this at Canadiana.org as well. We will be using SWIFT rather than Amazon S3, but the concepts are similar.

What I have been wondering about is caching. Before we started looking into IIIF we were contemplating passing an array of URLs to image services to the client, where the client was expected to try to access them in order (IE: only fall to the second entry if the first couldn't be accessed or failed). This would gain redundancy, and by having a different order for different digital objects would provide load balancing. And by sending the same array to all clients trying to access a specific digital object, would allow local caching to be optimised (IE: a second cache would only come into play if the primary server was unavailable).

As IIIF didn't support the array of content servers, I assumed I'd need to author a smart load balancer to proxy requests to the correct server.

Would be nice to hear what others are doing. We have a series of content servers spread across Canada, and ideal would be if it was the client rather than a server-side load balancer that determined which Cantaloupe was going to be used.

kinow · 2017-07-28T23:33:18Z

Haven't heard about SWIFT before, but looks interesting. AWS, CloudStack, OpenStack, Azure, GCP, and SWIFT all have commonalities that make possible reading how to scale Cantaloupe in any of these, and then design a similar solution in another of these cloud solutions.

When I started using Jenkins, it helped me to read its "Case Studies" (near the bottom). Where users would report their installation settings, which bumps they found while using the tool, how they solved it, and what their requirements were.

@adolski would it be something interesting to have somewhere at https://medusa-project.github.io/cantaloupe/ ?

RussellMcOrmond · 2017-07-29T12:41:24Z

For clarity, SWIFT is OpenStack's object/blob store. It can be used separate from other components of OpenStack, which is how we will be using it. It has S3 compatibility middleware https://github.com/openstack/swift3 which I'll test with Cantaloupe's existing S3 support before attempting the larger project of direct SWIFT support (which I don't yet have the skills to do).

I'd love to see a Case Studies section for Cantaloupe. I expect the problems we'll have have already been solved by someone else.

kinow · 2017-07-29T13:07:34Z

Thanks @RussellMcOrmond , thought it was this swift.

adolski · 2017-07-31T15:55:59Z

Which endpoint can I call to make sure the application is fully operational - i.e. nothing seems broken, the properties/resources for its basic functionalities are present, etc. Just calling / isn't the best choice... perhaps adding JaMon, or Jolokia, prometheus, or just a simple servlet + JSP or velocity that displays things like implementations of JPEG, TIFF, etc readers, the features enabled or not (e.g. delegate script, cache), and so on.

There is something like that last one at /admin (disabled by default; have to set admin.enabled = true).

If I were maintaining a server like this, I would try to investigate the possibility of submitting a pull request for pre-generate tiles

How would this work?

As there is no need for database

There are a JdbcResolver and JdbcCache for retrieving images from a database and/or caching them there, but those are the only things that would use a database.

For clarity, SWIFT is OpenStack's object/blob store. It can be used separate from other components of OpenStack, which is how we will be using it. It has S3 compatibility middleware

The S3 resolver & cache as written only work with Amazon, but #106 has been proposed for changing this.

I'd love to see a Case Studies section for Cantaloupe. I expect the problems we'll have have already been solved by someone else.

I don't know about adding this kind of info to the user manual, just because it's such a huge/rapidly-evolving/subjective topic that I'm not really capable of tackling myself. The way the documentation is currently architected, I think it would be better off as an external resource that could be linked to, like a blog post.

This kind of thing makes me think that something like a wiki would be more appropriate than a manual, but the grass is always greener on the other side of the documentation fence...

kinow · 2017-07-31T22:00:39Z

If I were maintaining a server like this, I would try to investigate the possibility of submitting a pull request for pre-generate tiles

How would this work?

Not sure yet :-) I used tile caching only with GIS tools, so I'd probably have to start investigating from scratch, using other tools for reference, such as TileCache, and how others are doing that with GeoServer and mapserver (I use very little GIS, but work close to GIS developers that could probably be interested in helping).

This kind of thing makes me think that something like a wiki would be more appropriate than a manual, but the grass is always greener on the other side of the documentation fence...

Maybe we could add another menu option? A wiki also sounds like a good idea.

Thanks!
Bruno

ghost · 2017-08-10T17:58:27Z

using Java instead of JRuby would increase the performance

To my best knowledge, it's not really worth investigating that as there is pretty much no processing done in the delegate scripts.

Aside the traditional way of scaling (more / faster cpu, faster disk, more instances, pre-feeding cantaloupe cache if possible, etc), the change that could potentially bring a significative improvement in processing time would be an implementation of vips (http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use, brilliant guy, if I remember well also worked on tiff pyramid specs / implementation)

A few related links

#2

https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=VIPSIP;d6f77c30.1512

https://github.com/jcupitt/libvips/issues/633

http://java.wekeepcoding.com/article/11396582/JNA+pointer+to+pointer+mapping

The last time I checked that (out of personnal curiosity, i wanted to find something that I could help on and get some experience in contributing back to opensource), it looked feasible.

cars10w · 2020-10-16T11:55:05Z

Coming back to the original question:

How could we scale out image servers like Cantaloupe?

Is anyone actually doing this nowadays, preferrably in a Kubernetes environment?

Best regards.

RussellMcOrmond · 2021-06-10T12:51:09Z

Would this "issue" be better moved to a discussion?

hackartisan mentioned this issue Jul 25, 2017

explore other iiif servers sciencehistory/chf-sufia#732

Closed

adolski added the question label Nov 21, 2017

adolski closed this as completed Jun 10, 2021

cantaloupe-project locked and limited conversation to collaborators Jun 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Question on 'scaling-out' Cantaloupe #125

Question on 'scaling-out' Cantaloupe #125

imoutsatsos commented Jul 2, 2017

adolski commented Jul 3, 2017

kinow commented Jul 28, 2017

kinow commented Jul 28, 2017

RussellMcOrmond commented Jul 28, 2017

kinow commented Jul 28, 2017

RussellMcOrmond commented Jul 29, 2017

kinow commented Jul 29, 2017 •

edited

adolski commented Jul 31, 2017

kinow commented Jul 31, 2017

ghost commented Aug 10, 2017 •

edited by ghost

cars10w commented Oct 16, 2020

RussellMcOrmond commented Jun 10, 2021

This issue was moved to a discussion.

This issue was moved to a discussion.

Question on 'scaling-out' Cantaloupe #125

Question on 'scaling-out' Cantaloupe #125

Comments

imoutsatsos commented Jul 2, 2017

adolski commented Jul 3, 2017

kinow commented Jul 28, 2017

kinow commented Jul 28, 2017

RussellMcOrmond commented Jul 28, 2017

kinow commented Jul 28, 2017

RussellMcOrmond commented Jul 29, 2017

kinow commented Jul 29, 2017 • edited

adolski commented Jul 31, 2017

kinow commented Jul 31, 2017

ghost commented Aug 10, 2017 • edited by ghost

cars10w commented Oct 16, 2020

RussellMcOrmond commented Jun 10, 2021

This issue was moved to a discussion.

kinow commented Jul 29, 2017 •

edited

ghost commented Aug 10, 2017 •

edited by ghost