Skip to content

metric-chicken/GeoSpark

 
 

Repository files navigation

GeoSpark Logo

Stable Latest Source code
Maven Central with version prefix filter Sonatype Nexus (Snapshots) Build Status

GeoSpark@Twitter || GeoSpark Discussion Board || Join the chat at https://gitter.im/geospark-datasys/Lobby

GeoSpark is a cluster computing system for processing large-scale spatial data. GeoSpark extends Apache Spark / SparkSQL with a set of out-of-the-box Spatial Resilient Distributed Datasets (SRDDs)/ SpatialSQL that efficiently load, process, and analyze large-scale spatial data across machines.

GeoSpark contains several modules:

Name API Spark compatibility Introduction
Core RDD Spark 2.X/1.X SpatialRDDs and Query Operators.
SQL SQL/DataFrame SparkSQL 2.1+ SQL interfaces for GeoSpark core.
Viz RDD, SQL/DataFrame RDD - Spark 2.X/1.X, SQL - Spark 2.1+ Visualization for Spatial RDD and DataFrame.
Zeppelin Apache Zeppelin Spark 2.1+, Zeppelin 0.8.1+ GeoSpark plugin for Apache Zeppelin

GeoSpark supports several programming languages: Scala, Java, SQL, Python and R.

Please visit GeoSpark website for detailed documentations

News!

  • GeoSpark main developer Jia Yu will be a Tenure-Track Assistant Professor of Computer Science at Washington State University. He is looking for PhD students to join his lab! (read this)
  • GeoSpark 1.3.1 is released. This version provides a complete Python wrapper to GeoSpark RDD and SQL API. It also contains a number of bug fixes and new functions from 12 contributors. See Python tutorial: RDD, Python tutorial: SQL, Release note

Orignial Contributors

  • (Mo)hamed Sarwat (Twitter: @MoSarwat)
  • Jia Yu

Impact

GeoSpark Downloads on Maven Central

GeoSpark ecosystem has around 10K downloads per month.

About

A Cluster Computing System for Processing Large-Scale Spatial Data

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 58.6%
  • Python 21.5%
  • Scala 15.5%
  • Jupyter Notebook 4.1%
  • Other 0.3%