# The Life of a Query

This notebook walks you through using Splice Machine to create, populate, and query a sample database. We'll use the TPC-H benchmarking data as our sample dataset.

TPC-H is a decision support benchmark. It consists of a suite of business-oriented ad hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions.

We demonstrate running and optimizing queries in Splice Machine, in these sections:

<ul class="italic">
    <li>Creating our Database in Splice Machine</li>
    <li>Analytic Workloads</li>
        <li>Examining a Query Execution Plan</li>
        <li>Informing the Optimizer</li>
        <li>Adding Indexes to the Database</li>
        <li>Running Queries</li>
    <li>Transactional Workloads</li>
    <li>A Glimpse at Splice Machine Benchmark Results</li>
</ul>

<p class="noteIcon">The code paragraphs in this notebook use the <em>%%sql</em> magic, which is pre-configured to interact with Splice Machine using ANSI SQL.</p>


## Creating our Database in Splice Machine

First we'll create our database in Splice Machine, in the following steps:

<ol class="italic">
    <li>Create the Tables</li>
    <li>Import the Data</li>
</ol>

#### Overview of the TPC-H Schema

Here's a view of the TPC-H schema:

<img class="fit3qtrwidth" src="https://s3.amazonaws.com/splice-examples/images/tutorials/sample-data-tpch-schema.png">

#### Create the Tables

We'll now create the TPCH tables in our schema. In case we're working on a database in which we may have already imported TPCH data, we'll first conditionally drop the tables we want to create:

In [None]:
%%sql 
DROP TABLE IF EXISTS LINEITEM;
DROP TABLE IF EXISTS ORDERS;
DROP TABLE IF EXISTS CUSTOMER;
DROP TABLE IF EXISTS PARTSUPP;
DROP TABLE IF EXISTS SUPPLIER;
DROP TABLE IF EXISTS PART;
DROP TABLE IF EXISTS REGION;
DROP TABLE IF EXISTS NATION;

In [None]:
%%sql 
CREATE TABLE LINEITEM (
 L_ORDERKEY BIGINT NOT NULL,
 L_PARTKEY INTEGER NOT NULL,
 L_SUPPKEY INTEGER NOT NULL, 
 L_LINENUMBER INTEGER NOT NULL, 
 L_QUANTITY DECIMAL(15,2),
 L_EXTENDEDPRICE DECIMAL(15,2),
 L_DISCOUNT DECIMAL(15,2),
 L_TAX DECIMAL(15,2),
 L_RETURNFLAG VARCHAR(1), 
 L_LINESTATUS VARCHAR(1),
 L_SHIPDATE DATE,
 L_COMMITDATE DATE,
 L_RECEIPTDATE DATE,
 L_SHIPINSTRUCT VARCHAR(25),
 L_SHIPMODE VARCHAR(10),
 L_COMMENT VARCHAR(44),
 PRIMARY KEY(L_ORDERKEY,L_LINENUMBER)
 );
 
 CREATE TABLE ORDERS (
 O_ORDERKEY BIGINT NOT NULL PRIMARY KEY,
 O_CUSTKEY INTEGER,
 O_ORDERSTATUS VARCHAR(1),
 O_TOTALPRICE DECIMAL(15,2),
 O_ORDERDATE DATE,
 O_ORDERPRIORITY VARCHAR(15),
 O_CLERK VARCHAR(15),
 O_SHIPPRIORITY INTEGER ,
 O_COMMENT VARCHAR(79)
 );
 
 CREATE TABLE CUSTOMER (
 C_CUSTKEY INTEGER NOT NULL PRIMARY KEY,
 C_NAME VARCHAR(25),
 C_ADDRESS VARCHAR(40),
 C_NATIONKEY INTEGER NOT NULL,
 C_PHONE VARCHAR(15),
 C_ACCTBAL DECIMAL(15,2),
 C_MKTSEGMENT VARCHAR(10),
 C_COMMENT VARCHAR(117)
 );
 
 CREATE TABLE PARTSUPP (
 PS_PARTKEY INTEGER NOT NULL ,
 PS_SUPPKEY INTEGER NOT NULL , 
 PS_AVAILQTY INTEGER,
 PS_SUPPLYCOST DECIMAL(15,2),
 PS_COMMENT VARCHAR(199),
 PRIMARY KEY(PS_PARTKEY,PS_SUPPKEY) 
 );
 
 CREATE TABLE SUPPLIER (
 S_SUPPKEY INTEGER NOT NULL PRIMARY KEY,
 S_NAME VARCHAR(25) ,
 S_ADDRESS VARCHAR(40) ,
 S_NATIONKEY INTEGER ,
 S_PHONE VARCHAR(15) ,
 S_ACCTBAL DECIMAL(15,2),
 S_COMMENT VARCHAR(101)
 );
 
 CREATE TABLE PART (
 P_PARTKEY INTEGER NOT NULL PRIMARY KEY,
 P_NAME VARCHAR(55) ,
 P_MFGR VARCHAR(25) ,
 P_BRAND VARCHAR(10) ,
 P_TYPE VARCHAR(25) ,
 P_SIZE INTEGER ,
 P_CONTAINER VARCHAR(10) ,
 P_RETAILPRICE DECIMAL(15,2),
 P_COMMENT VARCHAR(23)
 );
 
 CREATE TABLE REGION (
 R_REGIONKEY INTEGER NOT NULL PRIMARY KEY,
 R_NAME VARCHAR(25),
 R_COMMENT VARCHAR(152)
 );
 
 CREATE TABLE NATION (
 N_NATIONKEY INTEGER NOT NULL,
 N_NAME VARCHAR(25),
 N_REGIONKEY INTEGER NOT NULL,
 N_COMMENT VARCHAR(152),
 PRIMARY KEY (N_NATIONKEY)
 );

#### Import the Data

We have pre-loaded flat files with the TPCH data into an S3 bucket to facilitate importing the data. All we need to do is run an `IMPORT` statement for each table.

<p class="noteNote">Importing this much data can take a few minutes; you'll see the result of each import displayed below the <code>IMPORT</code> statements as they complete.</p>


In [None]:
%%sql 
call SYSCS_UTIL.IMPORT_DATA (null, 'LINEITEM', null, 's3a://splice-benchmark-data/flat/TPCH/1/lineitem', '|', null, null, null, null, 0, '/tmp', true, null);

call SYSCS_UTIL.IMPORT_DATA (null, 'ORDERS',   null, 's3a://splice-benchmark-data/flat/TPCH/1/orders',   '|', null, null, null, null, 0, '/tmp', true, null);

call SYSCS_UTIL.IMPORT_DATA (null, 'CUSTOMER', null, 's3a://splice-benchmark-data/flat/TPCH/1/customer', '|', null, null, null, null, 0, '/tmp', true, null);

call SYSCS_UTIL.IMPORT_DATA (null, 'PARTSUPP', null, 's3a://splice-benchmark-data/flat/TPCH/1/partsupp', '|', null, null, null, null, 0, '/tmp', true, null);

call SYSCS_UTIL.IMPORT_DATA (null, 'SUPPLIER', null, 's3a://splice-benchmark-data/flat/TPCH/1/supplier', '|', null, null, null, null, 0, '/tmp', true, null);

call SYSCS_UTIL.IMPORT_DATA (null, 'PART',     null, 's3a://splice-benchmark-data/flat/TPCH/1/part',     '|', null, null, null, null, 0, '/tmp', true, null);

call SYSCS_UTIL.IMPORT_DATA (null, 'REGION',   null, 's3a://splice-benchmark-data/flat/TPCH/1/region',   '|', null, null, null, null, 0, '/tmp', true, null);

call SYSCS_UTIL.IMPORT_DATA (null, 'NATION',   null, 's3a://splice-benchmark-data/flat/TPCH/1/nation',   '|', null, null, null, null, 0, '/tmp', true, null);


# Analytic Workload

Analytic workloads usually involve large scans of data, joins, aggregations and complex filtering conditions.
Splice Machine processes such requests using the its OLAP engine which is powered by Spark.
As you experiment with SQL on Splice Machine, you'll see that the first line of any EXPLAIN plan specifies which engine is used to resolve the request.


## Examining a Query Execution Plan

In the next few sections of this notebook, we'll examine execution plans for TPC-H Query 4, which is known as the <em>Order Priority Checking Query</em>. This query counts the number of orders ordered in a given quarter of a given year in which at least one lineitem was received by the customer later than its committed date; you can use it to determine how well the order priority system is working and gives an assessment of customer satisfaction.

Splice Machine generates an execution plan prior to running your query. You can use the `explain` command to generate and display the execution plan without actually running the query; this can help you to determine optimizing strategies for your queries. 
<p class="noteIcon">The <a href="https://doc.splicemachine.com/developers_tuning_explainplan_examples.html" target="_blank">Reading Explain Plans</a> topic in our documentation describes how to read explain plans.</p>


In [None]:
%%sql 
-- QUERY 04
explain  select
	o_orderpriority,
	count(*) as order_count
from
	orders
where
	o_orderdate >= date('1993-07-01')
	and o_orderdate < add_months('1993-07-01',3)
	and exists (
		select
			*
		from
			lineitem
		where
			l_orderkey = o_orderkey
			and l_commitdate < l_receiptdate
	)
group by
	o_orderpriority
order by
	o_orderpriority
-- END OF QUERY

## Optimizing Query Performance

In this section we'll look at optimizing the execution plan for TPCH Query 4; we'll:

* Collect Statistics to Inform the Optimizer
* Add Indexes to Further Optimize the Plan
* Compare Execution Plans

The *Splice Machine Optimizer* is a cost-based optimizer that generates optimal execution plans for database queries. You use our `analyze` command to collect statistics from your database, which the optimizer uses when planning the execution of a query.

<p class="noteIcon">Cost-based optimizers are powerful features of modern databases that enable query plans to change as the data profiles change. Optimizers make use of count distinct, quantiles, and most frequent item counts as heuristics.</p>

When creating a plan for a query, our optimizer performs a number of important and valuable actions, including:

* It creates an access plan, which determine the best path for accessing the data the query will operate upon; for example, the access path might be to scan an entire table or to use an index.
* When joining tables, the optimizer evaluates the best *join order* and the *join strategy* to use.
* The optimizer unrolls subqueries to reduce processing time

These metrics are usually extremely expensive but if approximate results are acceptable, there is a class of specialized algorithms, called streaming algorithms, or *sketches*, that can produce results orders-of magnitude faster and with mathematically proven error bounds. Splice Machine leverages the [Yahoo Sketches Library](https://datasketches.apache.org/docs/Background/TheChallenge.html) for its statistics gathering. 

### Collect Statistics
Our first optimization is to collect statistics to inform the optimizer about our database. We use our `analyze` command to collect statistics on a schema (or table). This process requires a couple minutes.


In [None]:
%%sql 
analyze table LINEITEM;
analyze table ORDERS;
analyze table CUSTOMER;
analyze table PARTSUPP;
analyze table SUPPLIER;
analyze table PART;
analyze table REGION;
analyze table NATION;


### Rerun the Explain Plan After Collecting Statistics

Now let's re-run the `explain` plan for Query 4 and see how the optimizer changed the plan after gathering statistics.


In [None]:
%%sql 
-- QUERY 04
explain select
	o_orderpriority,
	count(*) as order_count
from
	orders
where
	o_orderdate >= date('1993-07-01')
	and o_orderdate < add_months('1993-07-01',3)
	and exists (
		select
			*
		from
			lineitem
		where
			l_orderkey = o_orderkey
			and l_commitdate < l_receiptdate
	)
group by
	o_orderpriority
order by
	o_orderpriority
-- END OF QUERY

### Compare Execution Plans After Analyzing the Database

Now let's compare the plans to see what changed. At a quick glance, you'll notice that a very large difference in the `totalCost` numbers for every operation in the plan:

#### After Collecting Statistics
```
Plan
Cursor(n=10,rows=5,updateMode=,engine=OLAP (cost))
  ->  ScrollInsensitive(n=10,totalCost=176167.046,outputRows=5,outputHeapSize=249 B,partitions=1)
    ->  OrderBy(n=11,totalCost=176166.944,outputRows=5,outputHeapSize=249 B,partitions=1)
      ->  ProjectRestrict(n=8,totalCost=88083.421,outputRows=5,outputHeapSize=249 B,partitions=1)
        ->  GroupBy(n=7,totalCost=88083.421,outputRows=5,outputHeapSize=249 B,partitions=1)
          ->  ProjectRestrict(n=6,totalCost=20740.612,outputRows=1980401,outputHeapSize=94.118 MB,partitions=1)
            ->  MergeJoin(n=4,totalCost=20740.612,outputRows=1980401,outputHeapSize=94.118 MB,partitions=1,preds=[(L_ORDERKEY[4:4] = O_ORDERKEY[4:1])])
              ->  ProjectRestrict(n=3,totalCost=11385.304,outputRows=1980401,outputHeapSize=94.118 MB,partitions=1,preds=[(L_COMMITDATE[2:2] < L_RECEIPTDATE[2:3])])
                ->  TableScan[LINEITEM(10448)](n=2,totalCost=11286.284,scannedRows=6001215,outputRows=6001215,outputHeapSize=94.118 MB,partitions=1)
              ->  ProjectRestrict(n=1,totalCost=3004,outputRows=435343,outputHeapSize=13.839 MB,partitions=1)
                ->  TableScan[ORDERS(10464)](n=0,totalCost=3004,scannedRows=1500000,outputRows=435343,outputHeapSize=13.839 MB,partitions=1,preds=[(O_ORDERDATE[0:2] < dataTypeServices: DATE ),(O_ORDERDATE[0:2] >= 1993-07-01)])
```

#### Before Collecting Statistics
```
Plan
Cursor(n=10,rows=1388618,updateMode=,engine=OLAP (cost))
  ->  ScrollInsensitive(n=10,totalCost=309826.015,outputRows=1388618,outputHeapSize=7.946 MB,partitions=1)
    ->  OrderBy(n=11,totalCost=281954.786,outputRows=1388618,outputHeapSize=7.946 MB,partitions=1)
      ->  ProjectRestrict(n=8,totalCost=127041.779,outputRows=1388618,outputHeapSize=7.946 MB,partitions=1)
        ->  GroupBy(n=7,totalCost=127041.779,outputRows=1388618,outputHeapSize=7.946 MB,partitions=1)
          ->  ProjectRestrict(n=6,totalCost=15197.56,outputRows=1388618,outputHeapSize=7.946 MB,partitions=1)
            ->  MergeJoin(n=4,totalCost=15197.56,outputRows=1388618,outputHeapSize=7.946 MB,partitions=1,preds=[(L_ORDERKEY[4:4] = O_ORDERKEY[4:1])])
              ->  ProjectRestrict(n=3,totalCost=8432.488,outputRows=1388618,outputHeapSize=7.946 MB,partitions=1,preds=[(L_COMMITDATE[2:2] < L_RECEIPTDATE[2:3])])
                ->  TableScan[LINEITEM(10448)](n=2,totalCost=8419.864,scannedRows=4207932,outputRows=4207932,outputHeapSize=7.946 MB,partitions=1)
              ->  ProjectRestrict(n=1,totalCost=2627.27,outputRows=410628,outputHeapSize=1.175 MB,partitions=1)
                ->  TableScan[ORDERS(10464)](n=0,totalCost=2627.27,scannedRows=1311635,outputRows=410628,outputHeapSize=1.175 MB,partitions=1,preds=[(O_ORDERDATE[0:2] < dataTypeServices: DATE ),(O_ORDERDATE[0:2] >= 1993-07-01)])
```


### Optimize by Adding Indexes

Splice Machine tables have primary keys either implicit or explicitly defined. Data is stored in order of these keys.

<div class="noteNote">The primary key is not optimal for all queries.</div>

Unlike HBase and other key-value stores, Splice Machine can use *secondary indexes* to improve the performance of data manipulation statements. In addition, `UNIQUE` indexes provide a form of data integrity checking.

When tables are dropped, index will be dropped as well, hence we don't have to drop the indexes before create.


In [None]:
%%sql 
create index O_CUST_IDX on ORDERS(
 O_CUSTKEY,
 O_ORDERKEY
 );
 
 create index O_DATE_PRI_KEY_IDX on ORDERS(
 O_ORDERDATE,
 O_ORDERPRIORITY,
 O_ORDERKEY
 );
 
 create index L_SHIPDATE_IDX on LINEITEM(
 L_SHIPDATE,
 L_PARTKEY,
 L_EXTENDEDPRICE,
 L_DISCOUNT
 );
 
 create index L_PART_IDX on LINEITEM(
 L_PARTKEY,
 L_ORDERKEY,
 L_SUPPKEY,
 L_SHIPDATE,
 L_EXTENDEDPRICE,
 L_DISCOUNT,
 L_QUANTITY,
 L_SHIPMODE,
 L_SHIPINSTRUCT
 );

### Re-analyze and Re-run the Explain Plan After Indexing

Now that we've added indexes to our database, let's re-analyze the database and then re-run the `explain` plan for Query 4 one more time to see how indexing has affected our execution plan.


In [None]:
%%sql 
analyze table LINEITEM;
analyze table ORDERS;
analyze table CUSTOMER;
analyze table PARTSUPP;
analyze table SUPPLIER;
analyze table PART;
analyze table REGION;
analyze table NATION;


In [None]:
%%sql 
-- QUERY 04
explain select
	o_orderpriority,
	count(*) as order_count
from
	orders
where
	o_orderdate >= date('1993-07-01')
	and o_orderdate < add_months('1993-07-01',3)
	and exists (
		select
			*
		from
			lineitem
		where
			l_orderkey = o_orderkey
			and l_commitdate < l_receiptdate
	)
group by
	o_orderpriority
order by
	o_orderpriority
-- END OF QUERY

### Compare Execution Plans

We can now compare how the query will execute with indexing in place versus without indexes. You'll again notice that, among other differences, the `totalCost` values are lower for most operations because the optimizer was able to take advantage of the indexes we added.

#### Query Plan After Indexing
```
Plan
Cursor(n=10,rows=5,updateMode=,engine=OLAP (cost))
  ->  ScrollInsensitive(n=10,totalCost=390724.547,outputRows=5,outputHeapSize=249 B,partitions=1)
    ->  OrderBy(n=11,totalCost=390724.445,outputRows=5,outputHeapSize=249 B,partitions=1)
      ->  ProjectRestrict(n=8,totalCost=195362.171,outputRows=5,outputHeapSize=249 B,partitions=1)
        ->  GroupBy(n=7,totalCost=195362.171,outputRows=5,outputHeapSize=249 B,partitions=1)
          ->  ProjectRestrict(n=6,totalCost=76721.518,outputRows=1980401,outputHeapSize=94.118 MB,partitions=1)
            ->  MergeSortJoin(n=4,totalCost=76721.518,outputRows=1980401,outputHeapSize=94.118 MB,partitions=1,preds=[(L_ORDERKEY[4:4] = O_ORDERKEY[4:1])])
              ->  ProjectRestrict(n=3,totalCost=11385.304,outputRows=1980401,outputHeapSize=94.118 MB,partitions=1,preds=[(L_COMMITDATE[2:2] < L_RECEIPTDATE[2:3])])
                ->  TableScan[LINEITEM(10448)](n=2,totalCost=11286.284,scannedRows=6001215,outputRows=6001215,outputHeapSize=94.118 MB,partitions=1)
              ->  ProjectRestrict(n=1,totalCost=662.35,outputRows=436198,outputHeapSize=13.866 MB,partitions=1)
                ->  IndexScan[O_DATE_PRI_KEY_IDX(10593)](n=0,totalCost=662.35,scannedRows=495000,outputRows=436198,outputHeapSize=13.866 MB,partitions=1,baseTable=ORDERS(10464),preds=[(O_ORDERDATE[0:1] < dataTypeServices: DATE ),(O_ORDERDATE[0:1] >= 1993-07-01)])
```

#### Query Plan Before Indexing
```
Plan
Cursor(n=10,rows=5,updateMode=,engine=OLAP (cost))
  ->  ScrollInsensitive(n=10,totalCost=176167.046,outputRows=5,outputHeapSize=249 B,partitions=1)
    ->  OrderBy(n=11,totalCost=176166.944,outputRows=5,outputHeapSize=249 B,partitions=1)
      ->  ProjectRestrict(n=8,totalCost=88083.421,outputRows=5,outputHeapSize=249 B,partitions=1)
        ->  GroupBy(n=7,totalCost=88083.421,outputRows=5,outputHeapSize=249 B,partitions=1)
          ->  ProjectRestrict(n=6,totalCost=20740.612,outputRows=1980401,outputHeapSize=94.118 MB,partitions=1)
            ->  MergeJoin(n=4,totalCost=20740.612,outputRows=1980401,outputHeapSize=94.118 MB,partitions=1,preds=[(L_ORDERKEY[4:4] = O_ORDERKEY[4:1])])
              ->  ProjectRestrict(n=3,totalCost=11385.304,outputRows=1980401,outputHeapSize=94.118 MB,partitions=1,preds=[(L_COMMITDATE[2:2] < L_RECEIPTDATE[2:3])])
                ->  TableScan[LINEITEM(10448)](n=2,totalCost=11286.284,scannedRows=6001215,outputRows=6001215,outputHeapSize=94.118 MB,partitions=1)
              ->  ProjectRestrict(n=1,totalCost=3004,outputRows=435343,outputHeapSize=13.839 MB,partitions=1)
                ->  TableScan[ORDERS(10464)](n=0,totalCost=3004,scannedRows=1500000,outputRows=435343,outputHeapSize=13.839 MB,partitions=1,preds=[(O_ORDERDATE[0:2] < dataTypeServices: DATE ),(O_ORDERDATE[0:2] >= 1993-07-01)])
```

### Running TPC-H Queries

Now we'll run TPC-H Query 04 and Query 02, so you can see the database in action.

In [None]:
%%sql
-- QUERY 04
select
	o_orderpriority,
	count(*) as order_count
from
	orders
where
	o_orderdate >= date('1993-07-01')
	and o_orderdate < add_months('1993-07-01',3)
	and exists (
		select
			*
		from
			lineitem
		where
			l_orderkey = o_orderkey
			and l_commitdate < l_receiptdate
	)
group by
	o_orderpriority
order by
	o_orderpriority

In [None]:
%%sql 
-- QUERY 02
select
	s_acctbal,
	s_name,
	n_name,
	p_partkey,
	p_mfgr,
	s_address,
	s_phone,
	s_comment
from
	part,
	supplier,
	partsupp,
	nation,
	region
where
	p_partkey = ps_partkey
	and s_suppkey = ps_suppkey
	and p_size = 15
	and p_type like '%BRASS'
	and s_nationkey = n_nationkey
	and n_regionkey = r_regionkey
	and r_name = 'EUROPE'
	and ps_supplycost = (
		select
			min(ps_supplycost)
		from
			partsupp,
			supplier,
			nation,
			region
		where
			p_partkey = ps_partkey
			and s_suppkey = ps_suppkey
			and s_nationkey = n_nationkey
			and n_regionkey = r_regionkey
			and r_name = 'EUROPE'
	)
order by
	s_acctbal desc,
	n_name,
	s_name,
	p_partkey
{limit 100}
-- END OF QUERY


# Transactional Workload

Transactional workloads usually involves high concurrency of requests where each request deals with a small number of rows. There are limited use of joins or aggregations in these type of requests. Application CRUD operations are common transactional requests, fast lookup using primary or secondary index paths are also common. 

Splice Machine uses its OLTP engine to resolve transactional requests. The Splice Machine OLTP engine is powered by HBase which in well known for its support of high concurrency and high volume of requests with infinite scalability.

Splice Machine also uses MVCC (multi-value concurrency control) which enables snapshot isolation and high concurrency while at the same time providing full ACID compliance (atomicity, consistency, integrity, durability).


## CRUD Operations - Create Read Update Delete

Notice how all of these simple operations use the OLTP engine.

In [None]:
%%time
%%sql

-- create
EXPLAIN 
INSERT INTO ORDERS ( O_ORDERKEY, O_CUSTKEY, O_ORDERSTATUS, O_TOTALPRICE, O_ORDERDATE, O_ORDERPRIORITY, O_CLERK, O_SHIPPRIORITY, O_COMMENT)
   VALUES (-1, 1, 'P', 999.99, CURRENT_DATE, '1-URGENT', 'JOHN', 1, 'SHIP DIRECT');

INSERT INTO ORDERS ( O_ORDERKEY, O_CUSTKEY, O_ORDERSTATUS, O_TOTALPRICE, O_ORDERDATE, O_ORDERPRIORITY, O_CLERK, O_SHIPPRIORITY, O_COMMENT)
   VALUES (-1, 1, 'P', 999.99, CURRENT_DATE, '1-URGENT', 'JOHN', 1, 'SHIP DIRECT');

-- read
EXPLAIN
SELECT * FROM ORDERS WHERE O_ORDERKEY = -1;

SELECT * FROM ORDERS WHERE O_ORDERKEY = -1;

    
-- update
EXPLAIN
UPDATE ORDERS SET O_ORDERSTATUS='D' WHERE O_ORDERKEY = -1;

UPDATE ORDERS SET O_ORDERSTATUS='D' WHERE O_ORDERKEY = -1;

-- delete
EXPLAIN
DELETE FROM ORDERS WHERE O_ORDERKEY = -1;

DELETE FROM ORDERS WHERE O_ORDERKEY = -1;


## OLTP - Indexed Path with Join and Aggregation

Even more complex operations on large datasets can be made small enough to be processed by the OLTP engine by taking advantage of indexes.


In this example we use a highly selective WHERE clause on LINEITEM which is joined to ORDERS to produce a:
- <em>Product Revenue History</em> report 

Splice Machine resolves this request using the OLTP engine and results in subsecond response time.

In [None]:
%%sql

EXPLAIN
SELECT EXTRACT(YEAR FROM O_ORDERDATE) SALES_YEAR, SUM(O_TOTALPRICE) REVENUE
FROM LINEITEM, ORDERS
WHERE L_ORDERKEY=O_ORDERKEY
  AND L_PARTKEY=49981
GROUP BY 1
ORDER BY 1;

SELECT EXTRACT(YEAR FROM O_ORDERDATE) SALES_YEAR, SUM(O_TOTALPRICE) REVENUE
FROM LINEITEM, ORDERS
WHERE L_ORDERKEY=O_ORDERKEY
  AND L_PARTKEY=49981
GROUP BY 1
ORDER BY 1;



# A Glimpse at Splice Machine Benchmark Results

Here are some micro-results from Splice Machine running TPC-H benchmarks:

- 2ms single record lookups on primary keys at petabyte scale
- 20ms single record updates at petabyte scale
- 40-way OLTP indexed joins return in <100ms
- 150-way OLAP style joins execute in under 2 minutes
- 440-way join executes where others can’t parse
- Ingestion at 80MB/sec/node
- Can run TPC-C and TPC-H simultaneously


## Where to Go Next

The next notebook in this presentation introduces you to the <a href="./2.3%20Monitoring%20Queries.ipynb">Splice Machine Database Console,</a> which you can use to monitor and control your currently running queries.