Clone this wiki locally
This repository holds several open-source projects created by Revelytix, Inc.
Spark is a Java SPARQL API. SPARQL is a query language for RDF, commonly used to access data in semantic web applications.
Jena and Sesame are popular open-source Java frameworks for working with RDF and SPARQL, however these frameworks tend to be both too much and not enough for some common needs. Both frameworks provide the ability to model RDF datasets, import/export a variety of formats, plug in custom storage engines, and execute queries over models with those storage engines.
After working with Jena, Sesame, a variety or triple stores, and our own SPARQL products, we concluded that what was missing is a JDBC-style interface for connection-oriented access to a remote SPARQL processor. Jena and Sesame both open up storage APIs (for potentially remote stores) but those are at a lower level; the client API is assumed to work at the levels of graphs. The SPARQL HTTP protocol is widely used and supported but offers no client-side programming API or server-side library, does not support connection-oriented use cases, and relies on results sent in a text form (usually XML or JSON) so is not as performant as triple-store specific APIs.
In light of this gap, Spark is:
- Connection-oriented so connections can hold the state of our interaction
- Client-server to leverage either ubiquitous SPARQL endpoints or custom SPARQL processor APIs
- Interface-oriented and system-agnostic so one API can be used for many SPARQL processors and communication protocols
- A query API, NOT a graph API, focusing on cursored access to results
- Lightweight RDF data API
- [work in progress] A metadata API, defining a common way to retrieve metadata from SPARQL processors
- [future] Able to support updates and transactions
Spark consists of the following sub-projects (each is a separate artifact):
- com.revelytix:spark-api - the API, only interfaces
- com.revelytix:spark-spi - the implementation SPI, including helpful (but not required) classes to implement the API
- com.revelytix:spark-protocol - an implementation of the spark-api for accessing SPARQL endpoints over HTTP
Sherpa is a high-performance, language-agnostic, binary protocol for SPARQL processor communication. It aims to avoid the costs associated with the SPARQL Protocol (text-based, single-shot RPC style) and provide an alternative. Sherpa uses Avro to define the protocol and provide interop for multiple languages. Avro currently supports Java, C, C++, C#, Ruby, Python, and PHP.
Sherpa consists of the following sub-projects (each is a separate artifact):
- com.revelytix:sherpa-protocol - the protocol definition and generated Java bindings
- com.revelytix:sherpa-java - an implementation of the Spark client API using Sherpa as the protocol
- com.revelytix:sherpa-clojure - a lightweight query api in Clojure using Sherpa and a framework for writing a Sherpa server in Clojure, as well as some utilities for working with Avro data from the Java binding in native Clojure forms