You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
GeoTrellis's integration to spark currently supports Accumulo as backends to store and retrieve raster data across a cluster. Cassandra is another distributed data store that could provide a rich set of features and performance opportunities to GeoTrellis instances running on Spark. It is also a popular distributed data store that a number of people interested in doing large scale geospatial computations are already using.
Google Summer of Code 2015 scholar has tackled this project and created a prototype of GeoTrellis catalog implementation for Cassandra. Several lessons have been learned about the impact of the Cassandra architecture on performance when interfacing with Apache Spark and the project is at a point where further effort is expected to bring this feature to completion. The base objective is to achieve performant multi-dimensional range query from Apache Spark, using available low level interfaces.
Stretch goals on this project include optimizing the implementation to consider the the cluster data locality information available from Cassandra when distributing the IO loads.
Previous work has attempted to use the https://github.com/datastax/spark-cassandra-connector to implement this feature but the connectors focus on supporting query like interface forced us to rely on performing a spark union over multiple component rang queries, this turned out to be slow.
The key insight for the second stage of the project is that GeoTrellis already supports AWS S3 as a backend, which is extremely limited in its capabilities, only allowing for List/Get/Put operations on a Key/Value store. This is done by iterating over all of the possible index keys covered by the query region on the clients and asking for them directly. Similar feature must be available in a Cassandra driver, something that supports a "set get" of keys would be ideal.
The text was updated successfully, but these errors were encountered:
GeoTrellis's integration to spark currently supports Accumulo as backends to store and retrieve raster data across a cluster. Cassandra is another distributed data store that could provide a rich set of features and performance opportunities to GeoTrellis instances running on Spark. It is also a popular distributed data store that a number of people interested in doing large scale geospatial computations are already using.
Google Summer of Code 2015 scholar has tackled this project and created a prototype of GeoTrellis catalog implementation for Cassandra. Several lessons have been learned about the impact of the Cassandra architecture on performance when interfacing with Apache Spark and the project is at a point where further effort is expected to bring this feature to completion. The base objective is to achieve performant multi-dimensional range query from Apache Spark, using available low level interfaces.
Stretch goals on this project include optimizing the implementation to consider the the cluster data locality information available from Cassandra when distributing the IO loads.
Previous work has attempted to use the https://github.com/datastax/spark-cassandra-connector to implement this feature but the connectors focus on supporting query like interface forced us to rely on performing a spark union over multiple component rang queries, this turned out to be slow.
The key insight for the second stage of the project is that GeoTrellis already supports AWS S3 as a backend, which is extremely limited in its capabilities, only allowing for List/Get/Put operations on a Key/Value store. This is done by iterating over all of the possible index keys covered by the query region on the clients and asking for them directly. Similar feature must be available in a Cassandra driver, something that supports a "set get" of keys would be ideal.
The text was updated successfully, but these errors were encountered: