In [None]:
import os
os.environ['JDBC_HOST'] = 'jrtest01-splice-hregion'

# Query Optimization

This notebook shows you advanced query optimization techniques for boosting the performance of your Splice Machine queries. SQL optimizers convert the SQL statements that you write into semantically equivalent statements with improved performance. If a perfect SQL optimizer existed, you would not need to worry about the efficiency of your SQL statements.

In reality, even with a highly evolved optimizer, some SQL statements require some manual tuning or rewriting due to:

* Limitations in the optimizer's heuristic rewrite functionality
* Limitations in the search space the optimizer explores
* Inaccurate statistics and/or cost estimation
* Parsing time concerns

The optimizer generates and evaluates the execution plan for an SQL query. To better understand optimization and manual tuning, you need to be able to read and understand a query execution plan, and you need to know how to use statistics to understand the characteristics of the tables you're querying. This notebook shows you how to work with plans and statistics, and then addresses specific solutions for some common query performance issues, in the following sections:


1. *Understanding the Query Execution Plan*
2. *Understanding Database Statistics*
3. *Query Performance Problems*



## 1. Understanding the Query Execution Plan

This section describes more fully what information is in the Explain plan for a query; the key pieces of information in a plan include the:

*  Ordering of the joins and other steps in the query

*  Use of Tables vs Indexes

*  Need for IndexLookup, which can slow a query down

*  Join Strategies employed

*  Actual row count and cost estimates at each step

*  Presence of predicate pushdowns where available

*  Indication of which *engine* will run the query: *control* or *Spark*

We'll also delve a bit deeper into pushing down predicates and join ordering/strategies to help you understand plans.

### Explain and Predicates

Let's start with a query variant that is based on the `customer_bulk_import_example1` and `customer_bulk_import_example2` tables that we created earlier in this class. Click *Shift + Enter* in the next paragraph to display the plan for this query. 


In [None]:
%%sql 

explain select a.c_custkey, a.c_nationkey from
    dev3.customer_bulk_import_example1 a
    ,dev3.customer_bulk_import_example2 b
     where a.c_custkey = b.c_custkey
     and a.c_nationkey = 100


 

<br/>
You’ll notice that on the very right of the plan are two lines with *preds=* on them. *Preds* is short for *predicates*, which in databases are true/false conditions that are tested during query execution.

### About Predicates

Starting on the bottom line, we see a `TableScan` with the preds specification on it; this is called a *Predicate Pushdown*. A pushdown means: when we perform this `TableScan`, we'll bring this predicate (`A.C_NATIONKEY = 100`) along with us, and will perform the scan using this predicate, passing up to the next part of the plan ONLY the rows that match. Predicate pushdowns are extremely efficient when performed on keyed results (primary keys or indexes), because only the minimal number of rows are pushed up to the next step.

The other kind of predicate shown here is of the form `[(A.C_CUSTKEY[4:1] = B.C_CUSTKEY[4:3])]`. You can ignore the numbers for now; the key part is `A.C_CUSTKEY = B.C_CUSTKEY`. You can see that this is the join predicate, required for the actual join operation.

The main takeaway is that, as with most databases: when you can *push down* a predicate that filters a lot of data with a keyed filter, it helps create efficient scans for that step. If the filter is not keyed, this becomes a potential opportunity for adding an index.

### Join Ordering

The actual join ordering is part of the optimization process: do I get a better cost when I start with the `customer_bulk_import_example1` table and join table `customer_bulk_import_example2` with it, or the other way around?

Smart join ordering depends a lot on the situation. Generally speaking, the sooner you can filter out rows (thus working with fewer rows at each step of the query), the faster the query will run.

Remember that explain plans are ordered from the *bottom up*, which means that the first step in the plan is at the bottom. Another way to view this is to look at the counts on each row of the plan (n=1, n=2, etc.), which specifies the table ordering being used.

### Join Strategies

These are the available join strategies in Splice Machine:

<table class="splicezep">
    <thead>
        <tr>
            <th>Join Strategy</th>
            <th>Description</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td><code>BROADCAST</code></td>
            <td><p>Read the results of the Right Result Set (RHS) into memory, then for each row in the left result set (LHS), perform a local lookup to determine the right side of the join.<p>
                <p><em>BROADCAST</em> will only work if at least one of the following is true:</p>
                <ul>
                    <li>There is at least one equijoin (=) predicate that does not include a function call.</li>
                    <li>There is at least one inequality join predicate, the RHS is a base table, and the join is evaluated in Spark.</li>
                </ul>
            </td>
        </tr>
        </tr>
</tr>            <td><code>SORTMERGE</code></td>
            <td><p>Re-sort both the left and right sides according to the join keys, then perform a <em>MERGE</em> join on the results.</p>
                <p><em>SORTMERGE</em> requires an equijoin predicate with no function calls.</p>
            </td>
        </tr>
        <tr>
            <td><code>MERGE</code></td>
            <td><p>Read the Right and Left result sets simultaneously in order and join them together as they are read.</p>
                <p><em>MERGE</em> joins require that both the left and right result sets be sorted according to the join keys. <em>MERGE</em> requires an equijoin predicate that does not include a function call.</p>
            </td>
        </tr>
        <tr>
            <td><code>NESTEDLOOP</code></td>
            <td><p>For each row on the left, fetch the values on the right that match the join.</p>
                <p><em>NESTEDLOOP</em> is the only join that can work with any join predicate of any type; however this type of join is generally very slow.</p>
            </td>
        </tr>
    </tbody>
</table>

In our example above we see that the plan uses a `MergeJoin` to join the `CUSTOMER_BULK_IMPORT_EXAMPLE1` table with the `CUSTOMER_BULK_IMPORT_EXAMPLE2` table.

```
->  MergeJoin(n=3,totalCost=77202.976,outputRows=651515625,outputHeapSize=1.324 GB,partitions=145,preds=[(A.C_CUSTKEY[4:1] = B.C_CUSTKEY[4:3])])
    ->  TableScan[CUSTOMER_BULK_IMPORT_EXAMPLE2(1664)](n=2,totalCost=236254,scannedRows=118125000,outputRows=118125000,outputHeapSize=1.324 GB,partitions=145)
    ->  TableScan[CUSTOMER_BULK_IMPORT_EXAMPLE1(1648)](n=1,totalCost=1447816.5,scannedRows=723906250,outputRows=651515625,outputHeapSize=1.214 GB,partitions=145,preds=[(A.C_NATIONKEY[0:2] = 100)])
```

Reading this from the bottom up we see:

1. `CUSTOMER_BULK_IMPORT_EXAMPLE1` is scanned and becomes the left hand side of the join

2. `CUSTOMER_BULK_IMPORT_EXAMPLE2` is scanned and becomes the right hand side of the join

3. The `MERGE` join strategy is used

## 2. Understanding Database Statistics

Database statistics are a form of metadata (data about data) that assists the Splice Machine query optimizer; the statistics help the optimizer select the most efficient approach to running a query, based on information that has been gathered about the tables involved in the query.

In this section we show you how to:

* Collect Statistics
* View Statistics


### Collecting Statistics

You can collect statistics on a schema or table using the `analyze` command. 

Here is the syntax for collecting statistics for a schema:

&nbsp;&nbsp;&nbsp;<code>ANALYZE SCHEMA <em>schemaName</em></code>

Here is the syntax for collecting statistics for a table:

&nbsp;&nbsp;&nbsp;<code>ANALYZE SCHEMA <em>schemaName.tableName</em></code>

Let's try collecting statistics on our `DEV3` schema:

In [None]:
%%sql 

analyze schema DEV3;

Now go back and rerun the explain for our query example. You should notice that the plan has changed.

This was the explain plan before we collected stats on the tables in the schema:

<pre>
Cursor(n=6,rows=651515625,updateMode=READ_ONLY (1),engine=Spark)
  ->  ScrollInsensitive(n=5,totalCost=13126628.907,outputRows=651515625,outputHeapSize=1.324 GB,partitions=145)
    ->  ProjectRestrict(n=4,totalCost=236254,outputRows=118125000,outputHeapSize=1.324 GB,partitions=145)
      ->  MergeJoin(n=3,totalCost=77202.976,outputRows=651515625,outputHeapSize=1.324 GB,partitions=145,preds=[(A.C_CUSTKEY[4:1] = B.C_CUSTKEY[4:3])])
        ->  TableScan[CUSTOMER_BULK_IMPORT_EXAMPLE2(1664)](n=2,totalCost=236254,scannedRows=118125000,outputRows=118125000,outputHeapSize=1.324 GB,partitions=145)
        ->  TableScan[CUSTOMER_BULK_IMPORT_EXAMPLE1(1648)](n=1,totalCost=1447816.5,scannedRows=723906250,outputRows=651515625,outputHeapSize=1.214 GB,partitions=145,preds=[(A.C_NATIONKEY[0:2] = 100)])
</pre>

This is the new explain plan after we collected the statistics:

<pre>
Cursor(n=6,rows=1,updateMode=READ_ONLY (1),engine=Spark)
  ->  ScrollInsensitive(n=5,totalCost=268.81,outputRows=1,outputHeapSize=19 B,partitions=145)
    ->  ProjectRestrict(n=4,totalCost=4.003,outputRows=1,outputHeapSize=19 B,partitions=145)
      ->  NestedLoopJoin(n=3,totalCost=260.8,outputRows=1,outputHeapSize=19 B,partitions=145)
        ->  TableScan[CUSTOMER_BULK_IMPORT_EXAMPLE2(1664)](n=2,totalCost=4.003,scannedRows=1,outputRows=1,outputHeapSize=19 B,partitions=145,preds=[(A.C_CUSTKEY[1:1] = B.C_CUSTKEY[2:1])])
        ->  TableScan[CUSTOMER_BULK_IMPORT_EXAMPLE1(1648)](n=1,totalCost=37804,scannedRows=15000000,outputRows=1,outputHeapSize=0 B,partitions=145,preds=[(A.C_NATIONKEY[0:2] = 100)])
</pre>

With statistics collected the cost values are more accurate which allows the optimizer to choose a better plan. The new plan chooses the `NestedLoopJoin` join strategy because it now knows that the right hand side table, `CUSTOMER_BULK_IMPORT_EXAMPLE2`, can have the predicate applied thus filtering the results to just one `scannedRow`.

This is a simple example on a small dataset but you can see how database statistics can help the optimizer choose a better plan for executing a query. The point is to ensure the best performance it is critical to collect statistics on your database tables in Splice Machine.


### Viewing Statistics

Splice Machine provides two system tables that you can query to view the statistics that have been collected for your database:

* `SYS.SYSTABLESTATISTICS`
* `SYS.SYSCOLUMNSTATISTICS`

Let's now view the contents of each of these system tables:

In [None]:
%%sql 

SELECT * FROM SYS.SYSTABLESTATISTICS;
SELECT * FROM SYS.SYSCOLUMNSTATISTICS;

Next, we'll query  `SYS.SYSTABLESTATISTICS` to understand the characteristics of the `DEV3.CUSTOMER_BULK_IMPORT_EXAMPLE1` table:


In [None]:
%%sql 

select total_row_count, total_size, stats_type, sample_fraction from sys.systablestatistics where schemaname='DEV3' and tablename='CUSTOMER_BULK_IMPORT_EXAMPLE1';

We see that the `DEV3.CUSTOMER_BULK_IMPORT_EXAMPLE1` table has:

* 15000000 rows
* a total size of 2294789186 bytes
* a `statsType` value of 2
* `sampleFraction` value of 0.

For reference refer to these tables for an explanation of the `statsType` and `sampleFraction`

#### Statistics Type Values

The following table describes the `statsType` values:

<table class="splicezep">
    <thead>
        <tr>
            <th>Statistic Type Value</th>
            <th>Description</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td class="ItalicFont">0</td>
            <td>Full table (not sampled) statistics that reflect the unmerged partition values.</td>
        </tr>
            <td class="ItalicFont">1</td>
            <td>Sampled statistics that reflect the unmerged partition values.</td>
        </tr>
        <tr>
            <td class="ItalicFont">2</td>
            <td>Full table (not sampled) statistics that reflect the table values after all partitions have been merged.</td>
        </tr>
        <tr>
            <td class="ItalicFont">3</td>
            <td>Sampled statistics that reflect the table values after all partitions have been merged.</td>
        </tr>
    </tbody>
</table>

#### Sample Fraction Values

The sampling percentage, `sampleFraction`, is specified as a value in the ranges 0.0 to 1.0:

* If `statsType=0` (full statistics), this value is not used, and is shown as `0`.
* If `statsType=1`, this value is the percentage or rows to be sampled. A value of `0` means no rows, and a value of `1` means all rows (full statistics).


## 3. Query Performance Issues

There can be several reasons why a query doesn't perform at the level that you expect. In this section, we take a look at some of the more common problems that can lead to poor query performance and how you can resolve them. 

<p class="noteIcon">The most important thing to remember when looking at query performance is to make sure statistics have been collected on the tables you're querying.</p>

We'll take a look at these common issues:

* Data Skew
* Access Path
* Nested Loop Joins


### Data Skew

Data skew, in the simplest terms, refers to a non-uniform distribution of data in a dataset. For example, let's say you have a column in a table whose range of values is from 1-10. The data in this column would be considered skewed if there is a disproportionally large number of rows for a particular value. So, if the table contains 10 million rows and 9 million of those rows have the value of `5`, then the data would be considered skewed. This is particularly problematic when the column is used in a join condition. 

When data is skewed, a few tasks have to do significantly more work than other tasks, which reduces parallelism and can lead to out-of-memory errors. Skewness may exist in the base table on certain columns, and it can also occur after certain joins. When data is skewed, problems usually arise during the `MergeSortJoin` step or during grouped aggregates.

#### Detecting Skew

If your query is executed in Spark, you can use the *Database Console* (Spark UI) to determine if your query may possibly affected by data skew. You can find your query in the Database Console and look at the Summary Metrics for the stage:

<img src="https://splice-training.s3.amazonaws.com/external/images/skew1.png" class="splice">

Here we see that the `Shuffle Read Size` for the `Min`, `25th Percentile`, `Median`, and `75th Percentile` are relatively the same. However, for the `Max`, the amount of data being read is significantly larger. This indicates that this stage in the query execution is suffering from data skewness.

Another way we can detect skew is to look at the individual tasks for a stage in the Database Console:

<img src="https://splice-training.s3.amazonaws.com/external/images/skew2.png" class="splice">

Here we see that the first task listed has a `Shuffle Read Size / Records` value that is significantly larger than the other tasks. This is also an indication that this query is under-performing due to skew issues in the data.

You can also use SQL to determine if there is skew in your data. We've created some skewed data for you in the following example; run the next paragraph to import and analyze skewed data.


In [None]:
%%sql 

CREATE TABLE DEV3.LINEITEM_WITH_SKEW (
 L_ORDERKEY BIGINT NOT NULL,
 L_PARTKEY INTEGER NOT NULL,
 L_SUPPKEY INTEGER NOT NULL, 
 L_LINENUMBER INTEGER NOT NULL, 
 L_QUANTITY DECIMAL(15,2),
 L_EXTENDEDPRICE DECIMAL(15,2),
 L_DISCOUNT DECIMAL(15,2),
 L_TAX DECIMAL(15,2),
 L_RETURNFLAG VARCHAR(1), 
 L_LINESTATUS VARCHAR(1),
 L_SHIPDATE DATE,
 L_COMMITDATE DATE,
 L_RECEIPTDATE DATE,
 L_SHIPINSTRUCT VARCHAR(25),
 L_SHIPMODE VARCHAR(10),
 L_COMMENT VARCHAR(44),
 PRIMARY KEY(L_ORDERKEY,L_LINENUMBER)
 );

call SYSCS_UTIL.IMPORT_DATA ('DEV3', 'LINEITEM_WITH_SKEW', null, 's3a://splice-training/external/data/lineitem-with-skew.csv.gz', null, null, null, null, null, 0, '/tmp', true, null);

ANALYZE TABLE DEV3.LINEITEM_WITH_SKEW;

Now use SQL to detect the skewness of the data.

In [None]:
%%sql 

SELECT COUNT(*) AS NUM_RECORDS, MIN(CC) AS SMALLEST_VALUE, MAX(CC) AS LARGEST_VALUE, AVG(CC) AS AVERAGE_VALUE FROM
(SELECT L_ORDERKEY, COUNT(*) AS CC
 FROM DEV3.LINEITEM_WITH_SKEW
 GROUP BY 1) DT;


#### Handling Skew

The query we just ran checked for skewness on the `L_ORDERKEY` column. This query groups the rows by the `L_ORDERKEY` and counts the number of records for each `L_ORDERKEY` value. 

We can see that there's skew in this data in two ways:

* The difference between the `SMALLEST_VALUE` and `LARGEST_VALUE` is very large.
* The difference between the `AVERAGE_VALUE` and the `LARGEST_VALUE` is also very large: although there's an average of 10 records per order key value, there is one order key that has 10,485,766 rows.

We can't change the data to eliminate skew, but there are a few things that we can try that will help alleviate and in some cases avoid skewness of data, as described in the remainder of this section.

##### Using Broadcast Join for Skewed Data

In most cases, the shuffling of data during a `mergesort` join is problematic when there is skewness of data in one of the join columns. You can see this in the list of Spark tasks,  where one task is seen reading the majority of the data and taking much longer to complete than all other tasks for the stage. 

If the right hand side of the join is small enough, you can try hinting the join to use the `BROADCAST` join strategy. For example:

```
SELECT * FROM DEV3.ORDERS O
JOIN DEV3.LINEITEM_WITH_SKEW L --SPLICE-PROPERTIES joinStrategy=BROADCAST
ON O.O_ORDERKEY = L.L_ORDERKEY;
```

Note that we don't actually recommend the query above, because we know the right hand side table `LINEITEM_WITH_SKEW` is a large table. The example is purely for demonstrating how to apply a hint to use the `BROADCAST` join strategy.

##### Splitting the Skewed Table and Using Union All

Another method for handling skew is to:

1. Split the query into two parts, with one part extracting the skewed value, and the second part handling the remaining values.
2. Then, use a `UNION ALL` to merge the result sets. 

Here is an example of a rewrite, in which we know that our skewed data is on the order key value of `1`:

```
SELECT * FROM DEV3.ORDERS O
JOIN DEV3.LINEITEM_WITH_SKEW L
ON O.O_ORDERKEY = L.L_ORDERKEY
WHERE O.O_ORDERKEY = 1
UNION ALL
SELECT * FROM DEV3.ORDERS O
JOIN DEV3.LINEITEM_WITH_SKEW L
ON O.O_ORDERKEY = L.L_ORDERKEY
WHERE O.O_ORDERKEY <> 1
```

##### Introducing a Non-Skewed Join Column

Another option is to introduce a non-skewed join column to the query. This is typically accomplished by rewriting the query to use the `WITH` statement. For example:

```
WITH DT as (SELECT * FROM DEV3.ORDERS O)
SELECT * FROM DT
WHERE EXISTS (SELECT 1 FROM DEV3.LINEITEM_WITH_SKEW L WHERE L.L_ORDERKEY = DT.O_ORDERKEY)
```

##### Other Methods and Future Improvements for Skewed Data

If you are joining multiple tables, you may be able to alleviate skew issues by delaying the skewed join. This can be accomplished by using the `joinOrder=FIXED` method, and by experimenting with the order of tables in which they are joined.

Splice Machine is constantly instituting improvements to the optimizer to help with skewness and reduce the need for rewrites or query hints. Some improvements that are being worked on include salting skewed values to make them unique and pushiing aggregation down before the join.


### Access Path

The access path to the data, which is how we read the data, can have a huge effect on the performance of a query: Are we scanning the entire table? Are we using a primary key?

A full table scan appears as a `TableScan` operation in the explain plan. Primary key access also displays as a `TableScan`, but the number of rows scanned will be smaller than the total number of rows in the table

Run the next paragraph to see the explain plan for selecting from a table using a full table scan.

In [None]:
%%sql 

explain select * from DEV3.LINEITEM_WITH_SKEW;


<br/>
You can see that the `TableScan` operation is performed on the `LINEITEM_WITH_SKEW` table. Note that the number of `scannedRows` is 16486975. This is the total number of rows in the table.

Run the next paragraph to see the explain plan for selecting from a table using a primary key access path.

In [None]:
%%sql 

explain select * from DEV3.LINEITEM_WITH_SKEW WHERE L_ORDERKEY = 10;

You can see that the `TableScan` operation is performed on the `LINEITEM_WITH_SKEW` table, but notice that the number of `scannedRows` is 3. Reading through 3 rows is a whole lot faster than reading through 16486975 rows.

Indexes are another access path that can help improve the peformance of a query. In Splice Machine we refer to indexes as either a covering index or a non-covering index. 

#### Using a Covering Index

If all columns referenced in a query belonging to a particular table are covered by an index defined on that table, that index is called a _covering index_ for the query. When the number of rows accessed is the same, scanning a covering index is usually more favorable than scanning the base table, since the index usually will have a smaller row size.

Run the next paragraph to create an index and view the index access path in the explain plan.

In [None]:
%%sql 

CREATE INDEX DEV3.IDX_LINEITEM1 ON DEV3.LINEITEM_WITH_SKEW(L_PARTKEY, L_QUANTITY);

EXPLAIN SELECT L_PARTKEY, L_QUANTITY FROM DEV3.LINEITEM_WITH_SKEW; 

You can see that the `IndexScan` operation is performed using the `IDX_LINEITEM1` index.

#### Non-Covering Index

If not all columns referenced in a query belonging to a particular table are covered by an index defined on that table, that index is called a _non-covering index_. The use of a non-covering index incurs the extra cost to look up the values of column(s) not covered by the index from the base table for each qualified row. This may or may not be a better choice than a full table scan, depending on the data and the query.

Run the next paragraph to view the explain plan for a query that uses a non-covering index.

In [None]:
%%sql 

EXPLAIN SELECT L_PARTKEY, L_QUANTITY, L_EXTENDEDPRICE FROM DEV3.LINEITEM_WITH_SKEW --splice-properties index=IDX_LINEITEM1

<br/>
You can see that there is an additional step `IndexLookup` that needs to be performed for every row returned by the `IndexScan` step. As previously stated, this may or may not be as perfomant when compared to doing a full table scan. It really depends on the amount of data and the particular query.

### Nested Loop Joins

Nested loop joins work for all kinds of join conditions (equality or non-equality). When an equality join condition is present, the performance of a nested loop join is usually not as good as the other 3 join strategies (broadcast, sortmerge and merge join). The exception is when the table on the left side has a small number of rows to read, and the join with the  table on the right side uses a leading pk/index column with low selectivity.

Run the next paragraph to create some tables and load some data.

In [None]:
%%sql 

CREATE TABLE DEV3.LINEITEM (
 L_ORDERKEY BIGINT NOT NULL,
 L_PARTKEY INTEGER NOT NULL,
 L_SUPPKEY INTEGER NOT NULL, 
 L_LINENUMBER INTEGER NOT NULL, 
 L_QUANTITY DECIMAL(15,2),
 L_EXTENDEDPRICE DECIMAL(15,2),
 L_DISCOUNT DECIMAL(15,2),
 L_TAX DECIMAL(15,2),
 L_RETURNFLAG VARCHAR(1), 
 L_LINESTATUS VARCHAR(1),
 L_SHIPDATE DATE,
 L_COMMITDATE DATE,
 L_RECEIPTDATE DATE,
 L_SHIPINSTRUCT VARCHAR(25),
 L_SHIPMODE VARCHAR(10),
 L_COMMENT VARCHAR(44),
 PRIMARY KEY(L_ORDERKEY,L_LINENUMBER)
);

CREATE TABLE DEV3.SUPPLIER (
 S_SUPPKEY INTEGER NOT NULL PRIMARY KEY,
 S_NAME VARCHAR(25) ,
 S_ADDRESS VARCHAR(40) ,
 S_NATIONKEY INTEGER ,
 S_PHONE VARCHAR(15) ,
 S_ACCTBAL DECIMAL(15,2),
 S_COMMENT VARCHAR(101)
); 

call SYSCS_UTIL.IMPORT_DATA ('DEV3', 'LINEITEM', null, 's3a://splice-benchmark-data/flat/TPCH/1/lineitem', '|', null, null, null, null, 0, '/tmp', true, null);

call SYSCS_UTIL.IMPORT_DATA ('DEV3', 'SUPPLIER', null, 's3a://splice-benchmark-data/flat/TPCH/1/supplier', '|', null, null, null, null, 0, '/tmp', true, null);

ANALYZE TABLE DEV3.LINEITEM;

ANALYZE TABLE DEV3.SUPPLIER;

<br/>
Now run the next paragraph to see an example of a perfect use case for a nested loop join.

In [None]:
%%sql 

EXPLAIN select count(*) from 
dev3.lineitem, dev3.supplier
where l_suppkey= s_suppkey and l_partkey = 1 and  L_orderkey = 5120486;

<br/>
You can see that both tables have a very small number of `scannedRows`; this is a perfect case for a nested loop join.

If your query uses a nested loop join on tables with many rows on both sides of the join, the recommended solution is to apply a hint to use either a `BROADCAST` or `SORTMERGE` join strategy.

## Where to Go Next
The next notebook in this class, [*Prepared Statements*](./g.%20Prepared%20Statements.ipynb), teaches you how to use prepared statements for querying your databases.
