In [None]:
import os
os.environ['JDBC_HOST'] = 'jrtest01-splice-hregion'

In [1]:
%%HTML
<link rel="stylesheet" href="https://doc.splicemachine.com/jupyter/css/custom.css">

<link rel="stylesheet" href="https://doc.splicemachine.com/zeppelin/css/zepstyles2.css" />

# Explaining and Hinting

In this notebook we'll dig into the explain and hint capabilities that we've briefly seen so far.  We'll see how they can help us in:

1. *Understanding the Query Execution Plan*
2. *Influencing the Query Execution Plan with Hints*


## 1. Understanding the Query Execution Plan

This section describes more fully what information is in the Explain plan for a query; the key pieces of information in a plan include the:

*  ordering of the joins and other steps in the query
*  use of Tables vs Indexes
*  need for IndexLookup, which can slow a query down
*  join Strategies employed
*  actual row count and cost estimates at each step
*  presence of predicate pushdowns where available
*  indication of which *engine* will run the query: *control* or *Spark*

We'll delve a bit deeper into pushing down predicates and join ordering/strategies to help you understand plans.

### Explain and Predicates

Let's start with a query variant that is based on the `index_example` table that we created earlier in this class. Run the next cell to display the plan for this query. 

You'll notice that on the very right of the plan are two lines with *preds=* on them. *Preds* is short for *predicates*, which in databases are true/false conditions that are tested during query execution.


In [None]:
%%sql 

explain select a.i, a.j from
    dev1.index_example a
    ,dev1.index_example b --splice-properties joinStrategy=sortmerge
     where a.i = b.i
     and a.j = 700000

### About Predicates

Starting on the bottom line, we see an `IndexScan` with the preds specification on it; this is called a *Predicate Pushdown*. A pushdown means: when we perform this `IndexScan`, we'll bring this predicate (`A.J = 700000`) along with us, and will perform the scan using this predicate, passing up to the next part of the plan ONLY the rows that match. Predicate pushdowns are extremely efficient when performed on keyed results (primary keys or indexes), because only the minimal number of rows are pushed up to the next step.

The other kind of predicate shown here is of the form `[(A.I[5:1] = B.I[5:3])]`. You can ignore the numbers for now; the key part is `A.I = B.I`.  You can see that this is the join predicate, required for the actual join operation.

The main takeaway is that, as with most databases: when you can *push down* a predicate that filters a lot of data with a keyed filter, it helps create efficient scans for that step. If the filter is not keyed, this becomes a potential opportunity for adding an index.

### Join Ordering

The actual join ordering is part of the optimization process: do I get a better cost when I start with table A and join B with it, or the other way around?

Smart join ordering depends a lot on the situation.  Generally speaking, the sooner you can filter out rows (thus working with fewer rows at each step of the query), the faster the query will run.

When you look at an explain plan, if you are unsure of the ordering, remember again that the order is *bottom up*. Another way to view this is to look at the counts on each row of the plan (n=1, n=2, etc.).  This dictates the table ordering being used.

## 2. Influencing the Query Execution Plan with Hints

If your query is still slower than you expect, or if you want to experiment with plan alternatives, you can use Splice Machine *query hints*, which provide additional information to our optimizer.

We introduced hints in an earlier notebook, *Tuning for Performance.* To recap: you add a hint to a query by appending a specially formatted *comment*. These hints must always be placed at the end of a line, and are used either after a table name or after a `FROM` clause, as shown below. Most hints are used for these reasons:

<table class="splicezepOddEven">
    <col />
    <col />
    <thead>
        <tr>
            <th>Hint Type</th>
            <th>Description</th>
        </tr>
    </thead>
    <tbody>
        <tr>
        <tr>
            <td>Join Order</td>
            <td>Indicates that the join order of the tables in the plan should be exactly the same as entered in the query SQL (first to last)</td>
        </tr>
            <td class="ItalicFont">Join Strategy</td>
            <td><p>Explicitly specifies the join strategy to use:</p>
                <ul>
                    <li><code>broadcast</code></li>
                    <li><code>sortmerge</code></li>
                    <li><code>merge</code></li>
                    <li><code>nestedloop</code></li>
                </ul>
            </td>
        </tr>
        <tr>
            <td>Index Selection</td>
            <td>Explicitly specifies the use of a specific index, or explicitly specifies to NOT use an index</td>
        </tr>
    </tbody>
</table>




### Syntax Matters
As you'll see in the following example, hint syntax can look odd, because each hint needs to be at the very end of a line. For example:

* If you're adding the hint to the end of a query, you must put the semicolon (`;`) that terminates that query on the next line:

  ```
  select count(*) from myTbl --splice-properties index=myIndex1
  ;
  ```

* Similarly, if you're specifying multiple hints, each will need to be at the end of its own line:

  ```
  explain select count(*) from
  (select a.i, a.j from --splice-properties joinOrder=fixed
   index_example b --splice-properties index=ij
   ,index_example a --splice-properties index=null, joinStrategy=nestedloop
     where a.j = 700000) z ;
  ```

<div class="noteIcon">
    <p>Hints must be specified exactly; any misspelling or any extra text can result in the hint not working because it is considered a comment; for example, you <strong>must</strong> spell `joinOrder` and `joinStrategy` in exactly that way.</p>
    <p>Splice Machine <strong>strongly recommends</strong> that you run an <code>explain</code> on any query that contains a hint before actually executing the query, so you can verify that the hint is correctly specified.
</div>

Here's the syntax to use for each hint type:

<table class="splicezepOddEven">
    <col />
    <col />
    <col />
    <thead>
        <tr>
            <th>Hint Type</th>
            <th>Syntax Example</th>
            <th>Usage Notes</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td><em>Join Order</em></td>
            <td><code>joinOrder=fixed</code></td>
            <td>On the <code>FROM</code> line in the query</td>
        </tr>
        <tr>
            <td><em>Join Strategy</em></td>
            <td><code>joinStrategy=broadcast</code></td>
            <td>After the right-hand-side table. This is typically used with <code>joinOrder=fixed</code> to control which tables are joined.</td>
       </tr>
        <tr>
            <td><em>Index Selection</em></td>
            <td><code>index=ix</code></td>
            <td>After the specified table</td>
        </tr>
        <tr>
            <td><em>No index</em></td>
            <td><code>index=null</code></td>
            <td>After the specified table</td>
        </tr>
    </tbody>
</table>

Run the next cell to see a full example:

In [None]:
%%sql 

explain select count(*) from
  (select a.i, a.j from --splice-properties joinOrder=fixed
    dev1.index_example b --splice-properties index=ij
    , dev1.index_example a --splice-properties index=null, joinStrategy=nestedloop
     where a.j = 700000) z ;


### Examples of When to Hint

If the optimizer doesn't give you the execution plan that you were expecting, you can supply hints to guide it. You can also use hints as an experimental tool to discover what happens when a different plan gets chosen: you'll typically find that the cost shown when you use `explain` is higher than the cost chosen by the optimizer.

If you find that your plan (after hinting) is not running faster,  please visit our <a href="https://splicemachine.slack.com/messages/splice-community/" target="_blank"><em>splice-community</em> Slack channel</a> and ask for help; if you've not already done so, you can register for this channel <a href="https://www.splicemachine.com/community/slack-channel-signup/" target="_blank">here</a>.


## Where to Go Next
To complete this class, please complete the exercises in the  [*Exercises for This Class*](./h.%20Exercises.ipynb) notebook.