The SpliceSQL Engine
Switch branches/tags
Clone or download
Permalink
Failed to load latest commit information.
assembly Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
build-tools DB-6595: Script to check mismatched build numbers in pom.xml files. (#… Feb 2, 2018
coverage Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
db-build Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
db-client Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
db-drda Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
db-engine Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
db-shared Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
db-testing Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
db-tools-i18n Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
db-tools-ij Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
db-tools-testing Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
docs remove out-of-date cluster settings doc Aug 5, 2016
hbase_pipeline Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
hbase_sql Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
hbase_storage Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
mem_pipeline Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
mem_sql Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
mem_storage Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
pipeline_api Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
platform_it Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
platforms DB-7471 Fix and update broken standalone tarball link (#2256) Sep 25, 2018
splice_access_api Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
splice_aws Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
splice_encoding Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
splice_machine Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
splice_protocol Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
splice_si_api Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
splice_spark Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
splice_timestamp_api Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
sqlj-it-procs Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
txn-it-procs Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
utilities Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
.gitignore SPLICE-2115 workaround for snappy-java-1.0.4.1 on Mac (#1605) Mar 22, 2018
.lgtm.yml add lgtm.yml file for analysis of splicemachine/spliceengine on LGTM.com May 22, 2018
GETTING-STARTED.md SPLICE-1951: Remove protobuf installation instructions from GETTING-S… Nov 10, 2017
LICENSE.txt SPLICE-2118: update copyright period Mar 17, 2018
NOTICE.txt adding LICENSE and NOTICE files Jul 13, 2016
README.md Cleaned up Readme (#1519) Mar 5, 2018
RELEASE-NOTES.md adding shell of a release noted markdown fiel for listing fixed bugs Aug 25, 2016
findbugs-exclude.xml updaing license to AGPLv3 Jan 26, 2017
kbsqlsh.sh SPLICE-1936: Refactored and stream-line: execute in one place. (#1535) Mar 8, 2018
pom.xml Setting version for build 2.8.0.1842-SNAPSHOT Oct 13, 2018
sqlshell.sh DB-6021: enable jdbc clients like sqlshell.sh and standalone server s… Jul 21, 2017
start-splice-cluster Restore DB-7407 with resolved conflicts. Sep 17, 2018

README.md

Splice Machine SQL Engine

Before proceeding with these instructions, please follow the instructions found in the GETTING-STARTED.md file.

A Note About Branches in This Project

This master branch is the home for the 3.0 version of Splice Machine. Any pull requests submitted to this branch will only be picked up by that release. For patches to the current released version (2.0) of Splice Machine see branch-2.0

Quick Start

(First make sure you have followed the instructions in the GETTING-STARTED.md file)

For a quick start using the defaults, from the top-level:

   ./start-splice-cluster

This will compile everything and start the database. Then, connect to the database:

   ./sqlshell.sh

Code Structure

Code is separated into 2 conceptual tiers: Core, and Storage Architectures

Core

The Core modules represent all code that is architecture-independent. This code is (by definition) shared by all architectures on which Splice runs. If you are writing code that is independent of architectures, or that can rely on the architecture APIs as currently defined, then place that code in a core module.

Core modules are further broken down as follows:

Library Modules:

Module Description
db-build Classes used as part of build.
db-client Splice JDBC driver.
db-drda DRDA classes.
db-engine Server, compiler, optimizer, etc.
db-shared Code shared by client/server.
db-tools-i18n Internationalization
db-tools-ij Our IJ implementation.
db-tools-testing Testing code shared by test cases in all modules.
db-testing Old integration tests that should be deleted but we want to hang on to them for a while.
splice_encoding This is a library module consisting of our sorted encoding library, and sundry other utilities
splice_protocol This houses our protocol buffer libraries and proto definition files. (-sf- this may later be dispersed amongst other module locations)
splice_timestamp_api This houses the api for generating distributed monotonic increasing timestamps for transactional purposes (-sf- this may later be merged with splice_si_api)
splice_access_api this holds the main interfaces for defining an appropriate storage architecture for running SpliceMachine over.

Core Execution modules:

Module Description
splice_si_api This holds the core, architecture-independent code necessary to properly implement a distributed SnapshotIsolation storage system over top of the library modules previously defined. Included in this is a suite of acceptance tests which verify that any constructed storage architecture is properly transactional. these tests are annotated with an @Category(ArchitectureSpecific.class) annotation in their source.
pipeline_api This holds the core, architecture-independent code necessary to properly implement a bulk data movement pipeline over a transactional storage engine. Included are also a suite of acceptance tests which verify that any constructed storage engine can properly perform bulk data movement (these tests are annotated with @Category(ArchitectureSpecific.class) annotations).
splice_machine This holds the core SQL execution engine (and DataDictionary management tools) which constitutes SpliceMachine, relying on pipeline and si behaviors. Included here are acceptance tests in two forms: tests which are annotated with @Category(ArchitectureSpecific.class), and Integration tests which (by convention) end in the suffix IT.java.

Storage Architectures

In addition to the core modules, we ship two distinct storage architectures:

Module Description
Memory This is primarily for testing, and to serve as a simplistic reference architecture; there is no network behavior (aside from JDBC), and all data is held on-heap with no persistence. This architecture is not for production use!
HBase This is Splice Machine as we all know it. This architecture runs on HBase and Spark, and is suitable for production use.

Memory

The Memory system consists of three modules:

Module Description
mem_storage This constructs a storage engine which satisfies all acceptance tests as a snapshot-isolation suitable transactional storage engine.
mem_pipeline This is an in-memory (read direct) bulk data architecture built over pipeline_api.
mem_sql This is the in-memory components necessary to run a fully functional SpliceMachine SQL execution engine over mem_storage and mem_pipeline.

HBase

The HBase architecture consists of three modules:

Module Description
hbase_storage This constructs an HBase-based SI storage engine, satisfying all acceptance tests for SI.
hbase_pipeline An HBase bulk-data movement pipeline built over HBase coprocessors.
hbase_sql The HBase-specific (and spark-specific) components necessary to run a fully functional SpliceMachine SQL execution engine over HBase.

Building and Running Splice Machine Locally

These examples include the -DskipTests option for build-expediency.

Build Core Modules and mem Platform Modules

Since Core modules are shared, they need only be built once; after that, they only need to be built when code in the core modules themselves change (or when a full clean build is desired). v

Build Core Module

To build only the core modules:

    mvn clean install -Pcore -DskipTests

Note that some tests will run inside of each module even in the absence of a specific architecture; these are unit tests that validate architecture-independent code, and are annotated with @Category(ArchitectureIndependent.class).

Build Mem Platform

To build only the mem modules:

   mvn clean install -Pmem -DskipTests

Combined Build

You can combine the build steps to build and test the in-memory database from top to bottom, including running all unit and integration tests against a fresh memory-database.

   mvn clean install -Pcore,mem

Start a server running against the mem storage architecture

To run against the mem storage architecture, follow these steps:

  1. Start a server:

  2. Start the ij client:

    To start the client with rlwrap enabled:

       cd splice_machine && rlwrap mvn exec:java
    

    To start the client without rlwrap enabled:

       cd splice_machine && mvn exec:java
    
  3. Run some SQL operations, such as creating tables, importing data, and running queries.


Build Core Modules and a Set of HBase Platform Modules

HBase is further separated into sub-profiles, indicating the specific HBase distribution of interest. To build HBase against a specific distribution:

  1. Select the HBase version to build. These are the currently available versions:

    • cdh5.6.0
    • cdh5.7.2
    • mapr5.1.0
    • mapr5.2.0
    • hdp2.4.2
    • hdp2.5.0
  2. Build the core modules. As previously mentioned, you only need to do this once.

    mvn clean install -Pcore -DskipTests
    
  3. Build a set of HBase platform modules

    mvn clean install -Pcdh5.6.0
    

    You can combine these steps to build and test the HBase backed database from top to bottom, including running all unit and integration tests against a fresh HBase backed database. For example:

    mvn clean install -Pcore,cdh5.6.0
    

Start a Server Running Against the Specified HBase Storage Architecture

To start a server:

  1. Start ZooKeeper:

    cd hbase_sql && mvn exec:exec -Pcdh5.6.0,spliceZoo
    
  2. Start YARN:

    cd hbase_sql && mvn exec:java -Pcdh5.6.0,spliceYarn
    
  3. Start Kafka (optionally):

    cd hbase_sql && mvn exec:exec -Pcdh5.6.0,spliceKafka
    
  4. Start HBase (master + 1 regionserver, same JVM):

    cd hbase_sql && mvn exec:exec -Pcdh5.6.0,spliceFast
    
  5. Start additional RegionServer(s) -- memberNumber should increase as additional RS are started:

    cd hbase_sql && mvn exec:exec -Pcdh5.6.0,spliceClusterMember -DmemberNumber=2
    
  6. Start the ij client:

    To start the client with rlwrap enabled:

       cd splice_machine && rlwrap mvn exec:java
    

    To start the client without rlwrap enabled:

       cd splice_machine && mvn exec:java
    
  7. Run some SQL operations, such as creating tables, importing data, and running queries.


Notes for Running with MapR HBase

If you're using MapR HBase:

  1. Add the following empty file on your filesystem, executable and owned by root:
    /opt/mapr/server/createJTVolume.sh
    
  2. Create the following directory on your filesystem, owned by root with 700 permissions:
    /private/var/mapr/cluster/yarn/rm/system
    
  3. Add the following directory on your filesystem, owned by root with 777 permissions:
    /private/var/mapr/cluster/yarn/rm/staging