Cassandra Support #1398

echeipesh · 2016-03-17T00:03:23Z

GeoTrellis's integration to spark currently supports Accumulo as backends to store and retrieve raster data across a cluster. Cassandra is another distributed data store that could provide a rich set of features and performance opportunities to GeoTrellis instances running on Spark. It is also a popular distributed data store that a number of people interested in doing large scale geospatial computations are already using.

Google Summer of Code 2015 scholar has tackled this project and created a prototype of GeoTrellis catalog implementation for Cassandra. Several lessons have been learned about the impact of the Cassandra architecture on performance when interfacing with Apache Spark and the project is at a point where further effort is expected to bring this feature to completion. The base objective is to achieve performant multi-dimensional range query from Apache Spark, using available low level interfaces.

Stretch goals on this project include optimizing the implementation to consider the the cluster data locality information available from Cassandra when distributing the IO loads.

Previous work has attempted to use the https://github.com/datastax/spark-cassandra-connector to implement this feature but the connectors focus on supporting query like interface forced us to rely on performing a spark union over multiple component rang queries, this turned out to be slow.

The key insight for the second stage of the project is that GeoTrellis already supports AWS S3 as a backend, which is extremely limited in its capabilities, only allowing for List/Get/Put operations on a Key/Value store. This is done by iterating over all of the possible index keys covered by the query region on the clients and asking for them directly. Similar feature must be available in a Cassandra driver, something that supports a "set get" of keys would be ideal.

fosskers · 2016-08-12T15:34:01Z

This can probably be closed.

pomadchin · 2016-09-30T12:33:08Z

Definitely can be closed, merged #1452 @lossyrob @echeipesh @rossbernet.

lossyrob · 2016-10-06T17:37:21Z

Giving this one to Grisha since he finished off the feature :)

echeipesh added the GSOC Google Summer of Code label Mar 17, 2016

lossyrob assigned pomadchin Oct 6, 2016

lossyrob closed this as completed Oct 19, 2016

lossyrob removed the GSOC Google Summer of Code label Oct 19, 2016

eclipsewebmaster unassigned pomadchin Nov 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cassandra Support #1398

Cassandra Support #1398

echeipesh commented Mar 17, 2016

fosskers commented Aug 12, 2016

pomadchin commented Sep 30, 2016

lossyrob commented Oct 6, 2016

Cassandra Support #1398

Cassandra Support #1398

Comments

echeipesh commented Mar 17, 2016

fosskers commented Aug 12, 2016

pomadchin commented Sep 30, 2016

lossyrob commented Oct 6, 2016