Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MapKnitterExporter architecture discussion #298

Open
jywarren opened this Issue Jan 16, 2019 · 46 comments

Comments

5 participants
@jywarren
Copy link
Contributor

jywarren commented Jan 16, 2019

We are exploring parallel tracks for cloud-based MapKnitter exporting, and one option is a JavaScript based process.

The base idea is to run the export process as a scalable web service, possibly "serverless" or REST, in Google Cloud and/or other cloud providers like Amazon AWS Lambda (primarily Google Cloud but compatible with others). Comments/suggestions/eurekas welcome! 🎉

Importantly, either track would ideally present the same API so that we could compare their performance.

JavaScript track

In this track, more experimentally, we'd use Image Sequencer, possibly with the webgl-distort library.

The major challenges here, I'd guess, would be:

  1. handling very big image files (up to 8mb each?) in memory
  2. serious speed improvements in IS, such as the proposed WebAssembly or WebGL adapters
  3. figuring out the best way to persist images for later access, and how to integrate the exporter with this (passing a callback function to upload them to a given store? Credentials?)
  4. trying to duplicate or integrate GDAL's generation of a giant combined GeoTIFF (just really huge images to manage in memory?)
  5. trying to duplicate or integrate GDAL's generation of TMS-formatted map tiles

For these last two, see #296 where there are some JS options to experiment with.

Also, we would try to develop this track in such a way as to make it possible to run locally in the browser, natively or in an Electron-style local JS app.

Ruby/ImageMagick/GDAL track

A more traditional approach is being explored here: #258, where we take the exporting sections currently featured in MapKnitter, and duplicate them in a minimal Ruby container that can be run on-demand.

Spec

To guide the development of both tracks, we're imagining a basic common behavior of:

  1. receiving a collection of image URLs or data-URLs of images AND a scale (cm/px or final pixel size)
  2. outputting a combined JPG image at a given scale or pixel size
  3. advanced versions might cut tiles or output GeoTiffs (see challenges in JS version above)

Links and resources are being compiled here: #296


What have I missed? @tech4GT @icarito would you mind adding any questions, clarifications?


Update: diagrams

I've put together a diagram of the current exporter workflow, which I hope is helpful. It's also largely ported into a standalone Ruby library in #341 -- soon to potentially be a Gem:

screenshot 2019-02-16 at 2 49 30 pm

Image Sequencer should allow us to parallelize this, and improve its speed, as illustrated in this diagram:

screenshot 2019-02-16 at 2 49 20 pm

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Jan 16, 2019

@jywarren this looks really nice! Things immediately make a lot more sense!🎉

@icarito

This comment has been minimized.

Copy link
Member

icarito commented Jan 22, 2019

I've just spent some time deploying a learning project to Google Cloud Platform (App Engine) as a Docker container. I've got a better understanding now of what is required! Thanks!

@jywarren

This comment has been minimized.

Copy link
Contributor Author

jywarren commented Jan 22, 2019

@jywarren

This comment has been minimized.

Copy link
Contributor Author

jywarren commented Jan 28, 2019

OK, i'd like to add in an overview of the export system step by step; i've left notes where we might make changes or improvements as well, and will link to lines of code where these things currently happen!

Also -- a couple ideas:

  1. Idea: produce separate GeoTiffs to skip the
  2. Idea: produce TMS tiles from any collection of images, given tile coordinates and image sources (with known corner coordinates)

Breaking down the export process

Separable steps

  1. collect set of image URLs and their corner coordinates
  2. for each image (could do this from existing Ruby code or in npm module):
    • determine image pixel dimensions
    • convert corner coordinates to pixel positions
  3. for each image (using existing Ruby/ImageMagick code or in remote Image Sequencer container):
  4. given collection of warped images, calculate pixel positions of image collection relative to each other (Ruby code exists)
    • (optional alternative) produce SVG or PDF containing images at relative positions (less memory use)
    • currently code appears in https://github.com/publiclab/mapknitter/blob/main/app/models/map.rb#L231 in
      run_export, distort_warpables, generate_composite_tiff, generate_tiles, generate_jpg
    • produce composite/merged image using this data
    • save and return URL of combined image for download
    • (optional) produce GeoTiff of combined image
    • (optional) pass GeoTiff to GDAL for conversion into traditional TMS tileset
  5. Possible next steps:
    • produce merged TMS of step 3 per-image TMS tiles instead of generating from step 4's giant GeoTiff
    • produce single TMS from combination of per-image GeoTiffs from end of step 3
@jywarren

This comment has been minimized.

Copy link
Contributor Author

jywarren commented Jan 28, 2019

@SidharthBansal @tech4GT @icarito just so you see this additional note breaking down the export process. There are portions that could be accomplished with traditional ImageMagick/GDAL combo just breaking out the Ruby-controlled code in our codebase (see #296 but i'll copy in more here), but I am hoping we can accomplish a lot in stand-alone containers in a serverless or at least remote REST model.

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Feb 7, 2019

Starting work now! @icarito Can you please share some of the resources you have been going through, that would be a big help for me :)

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Feb 7, 2019

@jywarren @icarito I would be starting with a basic express configuration that takes an image url and a sequencer string and returns the final output, can we create a repository for this on publiclab? Or should I make this on my github??

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Feb 7, 2019

Okay a couple of things here

  • We should add a flag to the run config which allows us to disable the progress logs(it'll unnecessarily slow down the server otherwise)
  • I have some ideas in mind to speed up the pixelManipulation API which will in turn speed up most modules
  • Should we return the output as a data uri or us the imgur service like we originally planned? Or maybe we can have a parameter in the request which allows both options
/* Request Body */
{
'url': <String>, // URL if input image
'sequence': <String> // The sequence string which will be imported into sequencer,
'upload': <Boolean> // Denotes whether to return the data uri or to upload to imgur and return that
}

How does this sound @jywarren @icarito ??

@jywarren

This comment has been minimized.

Copy link
Contributor Author

jywarren commented Feb 7, 2019

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Feb 7, 2019

@jywarren Ok pushing the most basic setup now!

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Feb 7, 2019

@jywarren Can you please grant me push access to the repository 😅

@jywarren

This comment has been minimized.

Copy link
Contributor Author

jywarren commented Feb 7, 2019

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Feb 7, 2019

@jywarren One more thing, do you want me to get cracking on the optimizations for sequencer first or deploy the container first?

@jywarren

This comment has been minimized.

Copy link
Contributor Author

jywarren commented Feb 7, 2019

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Feb 7, 2019

Okay I'll try to deploy the container with a very basic setup tomorrow, and then I'll raise an issue for the optimizations, maybe I can document some of my ideas over there too!
Also on a different note I tried out the app locally and it works like a charm ✌️

@jywarren

This comment has been minimized.

Copy link
Contributor Author

jywarren commented Feb 7, 2019

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Feb 7, 2019

One think I am concerned about though is, if we do switch to web assembly, what parts of the main code we would need to re-write or should we just switch to something like openCV entirely?
I think we can start with making optimizations in javascript and then move towards web-assembly if that gets unmanageable, what do you think?

@jywarren

This comment has been minimized.

Copy link
Contributor Author

jywarren commented Feb 7, 2019

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Feb 7, 2019

I think you are right, also please do have a look at the repository, I have pushed the basic file I wrote earlier today, will be extending this A LOT but I think this gives us a start.

@icarito

This comment has been minimized.

Copy link
Member

icarito commented Feb 8, 2019

Just a note that Google Cloud Engine has Standard Environment and Flexible Environment and Ruby seems to only be supported on Flexible Environment which is significantly more expensive: https://cloud.google.com/appengine/docs/standard/appengine-generation

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Feb 9, 2019

Oh, but is-app will be pure node.js, so I guess thats not a problem!

@jywarren

This comment has been minimized.

Copy link
Contributor Author

jywarren commented Feb 18, 2019

Hi @tech4GT @icarito @sashadev-sky and others -- i just uploaded diagrams above in which i tried to very clearly articulate the current and planned export workflows. Please have a look!

@jywarren

This comment has been minimized.

Copy link
Contributor Author

jywarren commented Feb 18, 2019

However, we should think about, in both cases, what points we should try to report status in a status.json file which could be polled in JavaScript by MapKnitter users as their export runs, to be able to see what stage their work is in.

This, and other aspects such as the parallel running and the image pairing during compositing, make me think we really need to consider a new layer, a mapknitter-exporter-runner that could persist a bit longer, run in a container itself, but could persist a status.json file for the entire export run.

We could even think more broadly and develop it as an image-sequencer-runner which can handle complex branching image sequencer runs. @tech4GT maybe this is where the full express-based image-sequencer-app comes in, since the simpler individual steps seem to be possible using just cloud functions? Love to hear your thoughts on all this.

@sashadev-sky sashadev-sky added the Epic label Feb 20, 2019

@jywarren

This comment has been minimized.

Copy link
Contributor Author

jywarren commented Feb 20, 2019

Haha awesome label @sashadev-sky -- i'll respond more completely later today i hope!

Just noting that @icarito has created a Dockerfile for the GDAL/ImageMagick container track: #349

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Mar 15, 2019

Hi @jywarren
I was thinking about the is-runner and maybe we can base it on the nodejs clustered api?
https://nodejs.org/api/cluster.html
Also How do we want to divide up the work inside these processes exactly? I mean is it specified by the user or we want some kind of algorithm to decide?

@jywarren

This comment has been minimized.

Copy link
Contributor Author

jywarren commented Mar 15, 2019

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Mar 16, 2019

This makes sense Jeff, let me write up some code and see if it works, let’s build this into is-app.

@jywarren

This comment has been minimized.

Copy link
Contributor Author

jywarren commented Mar 16, 2019

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Mar 16, 2019

Yeah that’s what I was thinking, we can plug in the distortion part later since there are a couple of options I need to explore there and I don't want to slow this down because of that!

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Mar 17, 2019

@jywarren Is there any is-module which stitches the images together? Or do I need to write one?

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Mar 17, 2019

Also, the information about the steps, as to the complete graph, do we get that through JSON somehow?
I was thinking we can sort of a topological sort on the dependency graph which will automatically run our steps in the right order, what do you think?
@jywarren

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Mar 17, 2019

I started this and I have implemented steps 1 and 2. ie we get an array of urls in the request. We apply the preliminary step to all images.(Currently I have done it using async calls but later we can move to clustered API as well if we decide to deploy to a more powerful system)
Now I need the exact information as to in what order what steps need to be applied to these images and A way to stitch them together so we can export our final result. 😄

@jywarren

This comment has been minimized.

Copy link
Contributor Author

jywarren commented Mar 18, 2019

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Mar 18, 2019

Hmm, I think we will need a stitch module for this down the road but I can try to make do with overlay. I'll try it out and let you know.
Also what about the details of the order in which the processing will happen. I think I explained in one of the previous comments, does the json contain the exact order of processing images??

@jywarren

This comment has been minimized.

Copy link
Contributor Author

jywarren commented Mar 18, 2019

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Mar 19, 2019

@jywarren I have pushed the initial code in the is-app repo, another problem we have to rectify is Import-Image, it uses new Image() and hence cannot be used in node and without that we cannot use the overlay module. I guess we would have to go for the knit module afterall 😅

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Mar 19, 2019

Also I was meaning to ask whether leaflet works in node? Actually I have no idea about that. Does it use gl?

@jywarren

This comment has been minimized.

Copy link
Contributor Author

jywarren commented Mar 19, 2019

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Mar 19, 2019

Okay so do you want me to update import image to work in node using method similar to load-image then?

@jywarren

This comment has been minimized.

Copy link
Contributor Author

jywarren commented Mar 19, 2019

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Mar 19, 2019

I mean without that there's no way of merging images anyway, so let's go with this for now!

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Mar 19, 2019

Okay and another thing is that we need a way to figure out where exactly to overlay the images onto one another, as in if some Image needs to be rotated first and such. Would it be possible for you to give me a simple sample to work this out, actually I tried doing this with the test dataset you mentioned but it's kind of confusing. I was wondering If I can get a stripped down version of that data on which I can figure this out.

@jywarren

This comment has been minimized.

Copy link
Contributor Author

jywarren commented Mar 19, 2019

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Mar 19, 2019

I'll check this out!

@tech4GT

This comment has been minimized.

Copy link
Member

tech4GT commented Mar 20, 2019

Alright, so we have import Image working now, I'll try a workflow of combining Images, right now overlay cuts the image off but we can fix that in a stitch module later.

@jywarren

This comment has been minimized.

Copy link
Contributor Author

jywarren commented Mar 20, 2019

Yes or perhaps we can make a "resize-canvas" or "canvas-size" module; that might be useful later anyways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.