Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design and implementation of the OAM catalog (OAM-C) #5

Closed
cgiovando opened this issue Jan 27, 2015 · 15 comments
Closed

Design and implementation of the OAM catalog (OAM-C) #5

cgiovando opened this issue Jan 27, 2015 · 15 comments

Comments

@cgiovando
Copy link
Contributor

This is the first milestone in the OAM development. The catalog is one of the core components of OAM, which allows to index, search and filter available imagery datasets. It will support both spatial and attribute based queries. #4 will provide the initial set of metadata indexed through the OAM catalog.

HOT will publish a "Tech Challenge" call to support the OAM-C development.

@cgiovando
Copy link
Contributor Author

Here's a first draft of the high level design for discussion:

oam catalog high level design

Pink: Web functions and user interface
Blue: metadata, querying, catalog functions
Orange: vector spatial data (GeoJSON)
Green: aerial data, processing, tiles

@MarkCupitt
Copy link

Comments/Questions

Are OAM Nodes and Local OAM Server exactly the same, or is the Local OAM Server Different?

Local Database is shown serving spatial data which means that it has to be spatially aware, or a different database to the catalog. (Refers to my comments in #6 )

Appears to imply that each OAM Node has a complete local copy of data to serve. This may not necessarily be the case. Clusters of Nodes may share data from a common file server. It is probably worth separating out the data/tile storage functions in the nodes and the Local OAM Local Server, which is probably just a node anyway..

If the Nodes will ONLY have prepared tiles to serve, then another server/servers are needed that can process and create the tiles in one or more central repositories that the nodes can extract from

Nodes could be just a WMS Cache application, in which case they could pull from one or more centralized data stores as needed and cache locally, rather than dumping all tiles to all nodes

@kalxas
Copy link

kalxas commented Feb 5, 2015

For the catalog/database infrastructure I would highly recommend pycsw with a PostGIS backend.
The keywords match exactly (lightweight, distributed, replicated, harvesting)
http://pycsw.org/

For the UI, one could use GeoNode (http://geonode.org/), CKAN (http://ckan.org/) with spatial extension, or a custom solution (e.g. http://demo.pycsw.org/viewer/index.html) if a super-lightweight implementation is in mind.

@cgiovando
Copy link
Contributor Author

Angelos and Mark - thanks for the comments. Following yesterday's meeting discussion and the other thread on #6 , it sounds like SQLite would be enough to handle all of our database needs. It is also works well with PyCSW and is very portable, so we should have a winning match 👍

To answer Mark's question about the difference of OAM nodes and the main server: they could be exactly the same or some nodes could just be serving tiles. We could in fact consider OAM nodes any TMS/WMST service that has capacity to handle a defined minimum traffic threshold. The main OAM server(s) could help other nodes by replicating tiles during high demand.

Here's my original GDoc drawing file, please make a copy or start from scratch to propose your design ideas: https://docs.google.com/drawings/d/1ayU0tS5lIE0JoIuB3dmuOe21golaz85gN576PrvVmNo/edit?usp=sharing

@MarkCupitt
Copy link

Hi All, I had a bit of a play with some ideas and came up with the following...

The Nodes and File Servers are all independent Geographically and could reside on the Catalog Server as well if needed.. Nodes can connect to any File Servers depending on the location of data as specified by the catalog server.

I have shown the Catalogs updating the nodes, but it could well be the Nodes querying the catalogs on an as needed basis (I am think OGC type Capabilities/Tile Definitions Xml here, including which server the data actually lives on.).

The Advantage of the Catalog updating Node Capabilities and Xml is that the node will still function if the Catalog is unavailable for any reason. It will also distribute the load away form the Catalog on requests. Alternatively, the nodes could also pull from the catalog on a scheduled basis, rather than pushing to the Nodes from the Catalog, whatever makes the most sense.

The process something like this may work

Upload Raster Image to a File Server
Process and Generate Tiles if needed. (File Servers have capability to create Image Pyramids, or whatever, think GDAL)
Add New Catalog Entry on Catalog Server
Distribute Tile/Layer Information to each Node, including location of raster/Tile data (which file server)

Key to this is that the Nodes would actually maintain their own caches. This would allow one network hit per tile, then tiles would be served from the local cache.

oam catalog high level design v2

Pink: Web functions and user interface
Blue: metadata, querying, catalog functions
Orange: vector spatial data (GeoJSON)
Green: aerial data, processing, tiles
Red: Replicated Data

@MarkCupitt
Copy link

One thing that just came to mind, was the Nodes. If we do the Catalog properly then we may not need to even develop any node software, just use of the Shelf software already available like geoWebCache

It may pay to build some kind of update mechanism to sync the Cache Servers Xml Layer definition Files with the Catalog Server, On Geowebcache, this is relatively straight forward, it is just one file, I do not know on TileMill or any of the others how that works.

Another thought may be to use Cascaded Caches, where the node cascades from a cache that lives on each file server and distributes the tiles to the Nodes.

@MarkCupitt
Copy link

Looking around at what is already in place, geonode seems very close to what OAM is trying to do. It also has a very active Development Community

Of note, it uses a lot of different Technologies and OAM may well end up[ being very similar.

One of the Options may be to leverage this code and add the distributed cataloging, etc if Geonode does not support it.

Diagram below is well designed and self explanatory.

geonode_component_architecture

@wonderchook
Copy link
Contributor

I've worked with Geonode a bit over the years. Honestly I think something that is overkill for what we are doing. There are certainly parts that could potentially be useful, but would add a lot of technical overhead that we don't need.

@MarkCupitt
Copy link

Good Point, one way to figure that out might be to put "A Red X" on the boxes above we do not need to simplify it?? Printing is an obvious one .. Security is probably another item not required

Kate, how close do you feel the User Interface design would fit OAM's needs.

@wonderchook
Copy link
Contributor

I think the interface is overkill and we should make sure to keep things simple. We can of course learn from some design choices made in geonode.

@wildintellect
Copy link

Geoexplorer is uneccessary, that interface is for allowing users to make custom styled maps. Security is actually necessary so Geonode can talk to Geoserver. Vector support isn't needed (Geonode actually has better support for vector than raster.

I agree with Kate, it's overkill here. Primarly having run quite a few instances, upgrades are extremely tricky and the load is not light.

Really Geonode is both the Catalog and the Node. We want to split those so that there can be many nodes to one catalog. We also need a processing workflow which is not part or this.

@MarkCupitt
Copy link

Is there a need to visualize the bbox of the different imageries available on a map?

This of course also raises how this would work with clutter when you multiple imagery in the same areas. @cgiovando had a time slider that would help.

@wildintellect
Copy link

Yes some sort of visual map catalog browse is needed. But that can be pretty light weight JS once the data is available via csw and a simple API that returns geojson of the bbox. I would suggest designing from scratch though, while using existing interfaces like Alexandria or EarthExplorer for ideas (and issue avoidance).

@smit1678
Copy link
Collaborator

Picking up this thread again to update with the latest thinking. @kamicut @anandthakker @scisco @ricardomestre and I have been able to think through how a first version of a OAM Catalog - both API and Browser - can be implemented based on the OAM project meeting.

We've crafted this into a strategy that we want to circulate. Draft can be found here: https://gist.github.com/smit1678/cd3a69239633e064b6f3.

This is a higher level strategy and approach and doesn't include all the details for implementation. Key goals of this strategy are focused on making sure that the first version hits these features:

API

  • upload and update metadata information about OAM datasets
  • index and store metadata for OAM datasets
  • authenticated API access
  • capture transaction logs and provide API status

Browser

  • search, view, get tile URL for available OAM datasets
  • download OAM datasets when available
  • display footprint or imagery location information on map for previewing
  • display full metadata information

Key items to pull out

  • API will index Amazon S3 buckets
  • API will use MongoDB for a datastore backed by Elasticsearch for improved performance, related to Select a database solution for the OAM Catalog #6
  • API will be developed in Node.js with Hapi.js framework
  • Frontend browser will be a static site that can run on Github Pages

We can discuss the draft here and then add to the wiki as a reference document for the future.

Next actions

  • Post here with any specific comments
  • I'll pull into a wiki page on Wednesday
  • We'll start to ticket out any other items separately
  • We'll roll this initial development strategy into a roadmap

@smit1678
Copy link
Collaborator

If no other comments to add, will close this out. I've moved the doc to a wiki page for future reference: https://github.com/hotosm/OpenAerialMap/wiki/Catalog-API---Browser:-Initial-Strategy-and-Approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants