Important features (more)

Status	Stable	Latest	Source code
GeoSpark
GeoSparkViz

GeoSpark@Twitter||GeoSpark Discussion Board||

Supported Apache Spark version: 2.0+(Master branch) 1.0+(1.X branch)

GeoSpark is listed as Infrastructure Project on Apache Spark Official Third Party Project Page

GeoSpark is a cluster computing system for processing large-scale spatial data. GeoSpark extends Apache Spark with a set of out-of-the-box Spatial Resilient Distributed Datasets (SRDDs) that efficiently load, process, and analyze large-scale spatial data across machines. GeoSpark provides APIs for Apache Spark programmer to easily develop their spatial analysis programs with Spatial Resilient Distributed Datasets (SRDDs) which have in house support for geometrical and Spatial Queries (Range, K Nearest Neighbors, Join).

GeoSpark artifacts are hosted in Maven Central: Maven Central Coordinates

Version release notes: click here

News!

In order to download GeoSpark 0.9.0-SNAPSHOT, read this: GeoSpark Maven Coordinate. GeoSpark 0.9.0 will bring you
- much less memory consumption powered by GeoSpark customized serializer
- much faster spatial/distance join speed
- SpatialRDD that supports heterogenous geometries
- range, join, knn queries on heterogenous geometries.
Welcome GeoSpark new contributor, Masha Basmanova (@mbasmanova) from Facebook. Masha has contributed more than 10 PRs to GeoSpark on refactoring GeoSpark architecture and improving GeoSpark join performance!
Welcome GeoSpark new contributor, Zongsi Zhang (@zongsizhang) from Arizona State University. Zongsi participated the design of GeoSpark Shapefile parser and he has done a great job!

Important features (more)

Spatial Resilient Distributed Datasets (SRDDs)

Supported Spatial RDDs: PointRDD, RectangleRDD, PolygonRDD, LineStringRDD

Supported input data format

Native input format support:

CSV
TSV
WKT
GeoJSON (single-line compact format)
NASA Earth Data NetCDF/HDF
ESRI ShapeFile(.shp, .shx, .dbf)

User-supplied input format mapper: Any single-line input formats

Spatial Partitioning

Supported Spatial Partitioning techniques: Quad-Tree, R-Tree, Voronoi diagram, Uniform grids (Experimental), Hilbert Curve (Experimental)

Spatial Index

Supported Spatial Indexes: Quad-Tree and R-Tree. R-Tree supports Spatial K Nearest Neighbors query.

Geometrical operation

DatasetBoundary, Minimum Bounding Rectangle, Polygon Union

Spatial Operation

Spatial Range Query, Distance Join Query, Spatial Join Query (Inside and Overlap), and Spatial K Nearest Neighbors Query.

Coordinate Reference System (CRS) Transformation (aka. Coordinate projection)

GeoSpark allows users to transform the original CRS (e.g., degree based coordinates such as EPSG:4326 and WGS84) to any other CRS (e.g., meter based coordinates such as EPSG:3857) so that it can accurately process both geographic data and geometrical data. Please specify your desired CRS in GeoSpark Spatial RDD constructor (Example).

Users

Companies that are using GeoSpark (incomplete list)

Please make a Pull Request to add yourself!

GeoSpark Tutorial (more)

GeoSpark full tutorial is available at GeoSpark GitHub Wiki: GeoSpark GitHub Wiki

GeoSpark Scala and Java template project is available here: Template Project

GeoSpark Function Use Cases: Scala Example, Java Example

GeoSpark Visualization Extension (GeoSparkViz)

GeoSparkViz is a large-scale in-memory geospatial visualization system.

GeoSparkViz provides native support for general cartographic design by extending GeoSpark to process large-scale spatial data. It can visulize Spatial RDD and Spatial Queries and render super high resolution image in parallel.

More details are available here: GeoSpark Visualization Extension

GeoSparkViz Gallery

Watch High Resolution on a real map

Publication

Jia Yu, Jinxuan Wu, Mohamed Sarwat. "A Demonstration of GeoSpark: A Cluster Computing Framework for Processing Big Spatial Data". (demo paper) In Proceeding of IEEE International Conference on Data Engineering ICDE 2016, Helsinki, FI, May 2016

Jia Yu, Jinxuan Wu, Mohamed Sarwat. "GeoSpark: A Cluster Computing Framework for Processing Large-Scale Spatial Data". (short paper) In Proceeding of the ACM International Conference on Advances in Geographic Information Systems ACM SIGSPATIAL GIS 2015, Seattle, WA, USA November 2015

Acknowledgement

GeoSpark makes use of JTS Plus (An extended JTS Topology Suite Version 1.14) for some geometrical computations.

Please refer to JTS Topology Suite and JTS Plus for more details.

Contact

Questions

Contact

Jia Yu (Email: jiayu2@asu.edu)
Mohamed Sarwat (Email: msarwat@asu.edu)

Project website

Please visit GeoSpark project wesbite for latest news and releases.

Data Systems Lab

GeoSpark is one of the projects initiated by Data Systems Lab at Arizona State University. The mission of Data Systems Lab is designing and developing experimental data management systems (e.g., database systems).

Name		Name	Last commit message	Last commit date
Latest commit History 473 Commits
core		core
viz		viz
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Version release notes: click here

News!

Important features (more)

Spatial Resilient Distributed Datasets (SRDDs)

Supported input data format

Spatial Partitioning

Spatial Index

Geometrical operation

Spatial Operation

Coordinate Reference System (CRS) Transformation (aka. Coordinate projection)

Users

Companies that are using GeoSpark (incomplete list)

GeoSpark Tutorial (more)

GeoSpark Visualization Extension (GeoSparkViz)

GeoSparkViz Gallery

Publication

Acknowledgement

Contact

Questions

Contact

Project website

Data Systems Lab

About

Releases

Packages

Languages

License

snavgire/GeoSpark

Folders and files

Latest commit

History

Repository files navigation

Version release notes: click here

News!

Important features (more)

Spatial Resilient Distributed Datasets (SRDDs)

Supported input data format

Spatial Partitioning

Spatial Index

Geometrical operation

Spatial Operation

Coordinate Reference System (CRS) Transformation (aka. Coordinate projection)

Users

Companies that are using GeoSpark (incomplete list)

GeoSpark Tutorial (more)

GeoSpark Visualization Extension (GeoSparkViz)

GeoSparkViz Gallery

Publication

Acknowledgement

Contact

Questions

Contact

Project website

Data Systems Lab

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages