Skip to content
Switch branches/tags
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


SPARQL-to-GraphFrames is an open-source compiler that allows to run SPARQL queries on large RDF graphs using the Apache Spark GraphFrames package. The Translator uses RDF datasets in n-triple format and SPARQL query strings as it's input. Test RDF datasets can be generated using the Berlin SPARQL Benchmark (BSBM)-data generator:


Living in a highly connected world a large amount of related data is created on a daily basis. Nevertheless the fact that these relationships enable us to have an idea of the impact, that some actions cause, they often get lost once the data is stored in common relational tables or other general NoSQL data models. For full Blogpost, see:


Translator consists of 3 main components:

  1. A GraphBuilder uses RDF's n-triple format and outputs an edge and vertex Spark SQL DataFrame
  2. Apache Jena's ARQ query processor that parses the SPARQL query-string and transforms the String into an Algebra tree(opRoot) by applying a Visitor to each Algebra Expression. For more information about the Visitor Pattern, see
  3. A SPARQL-to-GraphFrames translator(SparqlToGfTranslator) class that walks the Algebra tree bottom up, translating the SPARQL ALgebra into a GraphFrame Algebra

Core functionality of the Query translator is applying a Visitor(see "2." for more information about Visitor pattern) to each Algebra expression by extending ARQ's OpVisitorBase class. That way the main Algebra element classes(e.g OpBGP) don't have to be modified themselves. If another modification needs to be done on the Algebra tree, just extend the OpVisitorBase class and imply methods for modification.

At this point the following Algebra epressions can be translated into a GraphFrame query:


  • Enable basic pattern usage - DONE
  • TBD: double-directed triples with same edge-relation (e.g. "?s ?p ?o . ?o ?p ?s" )


  • Basic Filtering - DONE
  • Enable URI usage - DONE
  • TBD: PREFIX Mapping


  • Single variable projection - DONE
  • TBD: multi-variable projection


  • 2.000.000+ triples have been queried in a single node setup
  • Querying multiple triples in BGP
  • Quering multiple Projections in BGP


Please apply following steps to use the translator:

  1. Pull repository
  2. Run build on pom.xml-file
  3. Open class: SPARQL2GF/sparql2gf/src/main/java/sparql2gf_client/
  4. Run the various test-querystrings

Next steps:

  • Test application on HDFS cluster setup
  • Test application with larger datasets(goal = 10 Billion triples)


No description, website, or topics provided.




No releases published


No packages published