Scala Python Shell Makefile
Clone or download
jkbradley Remove Apache Spark 2.1 support (#291)
I can't see us publishing another version for 2.1 (mainly since it hits too many issues when users try motif finding), so this PR shortens the build by removing that support.
Latest commit 8b8da63 Aug 14, 2018


Build Status

GraphFrames: DataFrame-based Graphs

This is a package for DataFrame-based graphs on top of Apache Spark. Users can write highly expressive queries by leveraging the DataFrame API, combined with a new API for motif finding. The user also benefits from DataFrame performance optimizations within the Spark SQL engine.

Building and running unit tests

To compile this project, run build/sbt assembly from the project home directory. This will also run the Scala unit tests.

To run the Python unit tests, run the script from the python/ directory. You will need to set SPARK_HOME to your local Spark installation directory.

Spark version compatibility

This project is compatible with Spark 2.2+. However, significant speed improvements have been made to DataFrames in more recent versions of Spark, so you may see speedups from using the latest Spark version.


GraphFrames is collaborative effort among UC Berkeley, MIT, and Databricks. We welcome open source contributions as well!


  • 0.1.0 initial release
  • 0.2.0 release
    • Spark 2.0 support (work of @felixcheung)
  • 0.3.0 release
    • DataFrame-based connected components implementation
    • added support for Python 3
    • removed support for Spark 1.4 and 1.5
  • 0.4.0 release
    • Spark 2.1 support
    • Fix for checkpointing issue in DataFrame-based connected components implementation (issue 160)
  • 0.5.0 release
    • Major bug fix: Indexing non-Integer vertex IDs, which is used by algorithms which call GraphX under the hood, including PageRank, ConnectedComponents, and others.
    • aggregateMessages for Python API