# Exercises: Splice Machine Advanced Developer Class

This notebook contains follow-on exercises for the material that we covered in this class. You can complete these exercises and run the paragraphs in this notebook to verify your understand of what was covered. Not all of our exercises can be run in this notebook. You will be asked to write some java code and test it on your instance of Splice Machine. 

In addition, not all exercises will involve writing code. In some cases we will simply ask you questions. You will be able to prove your answers directly in this notebook. Be sure those types of answers go into a paragraph that uses markdown, as defined above in the cell type dropdown.

You'll be performing the following actions in these exercises:

1. *Bulk loading data*
2. *Optimizing Queries*
3. *Creating Functions and Procedures*
4. *Using spark-submit to interact with Splice Machine*
 

## 1. Bulk Loading Data

In this exercise, you'll demonstrate your understanding of how to use the Splice Machine `BULK_IMPORT_HFILE` system procedure to load data in a performant manner.

First, you need to create the tables to load the data into; create these tables in the `advdev_exercises` schema. Create the schema if it doesn't yet exist in your Splice Machine database. Create the following tables:

* The `customer` table
* The `order` table
* The `order_line` table

We've specified the table descriptions below.

### The customer Table

Table Name: `customer`

Primary Key Columns: `c_w_id`, `c_d_id`, `c_id`

Notes: The `c_since` column should default to the current timestamp

<table class="splicezep">
    <col />
    <col />
    <thead>
        <tr>
            <th>Column Name</th>
            <th>Data Type</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>c_w_id</td>
            <td>int</td>
        </tr>
        <tr>
            <td>c_d_id</td>
            <td>int</td>
        </tr>
        <tr>
            <td>c_id</td>
            <td>int</td>
        </tr>
        <tr>
            <td>c_discount</td>
            <td>decimal(4,4)</td>
        </tr>
        <tr>
            <td>c_credit</td>
            <td>char(2)</td>
        </tr>
        <tr>
            <td>c_last</td>
            <td>varchar(16)</td>
        </tr>
        <tr>
            <td>c_first</td>
            <td>varchar(16)</td>
        </tr>
        <tr>
            <td>c_credit_lim</td>
            <td>decimal(12,2)</td>
        </tr>
        <tr>
            <td>c_balance</td>
            <td>decimal(12,2)</td>
        </tr>
        <tr>
            <td>c_ytd_payment</td>
            <td>float</td>
        </tr>
        <tr>
            <td>c_payment_cnt</td>
            <td>int</td>
        </tr>
        <tr>
            <td>c_delivery_cnt</td>
            <td>int</td>
        </tr>
        <tr>
            <td>c_street_1</td>
            <td> varchar(20)</td>
        </tr>
        <tr>
            <td>c_street_2</td>
            <td> varchar(20)</td>
        </tr>
        <tr>
            <td>c_city</td>
            <td> varchar(20)</td>
        </tr>
        <tr>
            <td>c_state</td>
            <td>char(2)</td>
        </tr>
        <tr>
            <td>c_zip</td>
            <td>char(9)</td>
        </tr>
        <tr>
            <td>c_phone</td>
            <td>char(16)</td>
        </tr>
        <tr>
            <td>c_since</td>
            <td>timestamp</td>
        </tr>
        <tr>
            <td>c_middle</td>
            <td>char(2)</td>
        </tr>
        <tr>
            <td>c_data</td>
            <td>varchar(500)</td>
        </tr>
    </tbody>
</table>


### The order Table

Table Name: `order`

Primary Key Columns: `o_w_id`, `o_d_id`, `o_id`

Notes: The `o_entry_d` column should default to the current timestamp. The word `ORDER` is a keyword in SQL so you receive an error if you try to create a table named `ORDER`. There are a couple of ways around this. 1) Use a different name or 2) encluse the word `ORDER` in double-quotes.

<table class="splicezep">
    <col />
    <col />
    <thead>
        <tr>
            <th>Column Name</th>
            <th>Data Type</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>o_w_id</td>
            <td>int</td>
        </tr>
        <tr>
            <td>o_d_id</td>
            <td>int</td>
        </tr>
        <tr>
            <td>o_id</td>
            <td>int</td>
        </tr>
        <tr>
            <td>o_c_id</td>
            <td>int</td>
        </tr>
        <tr>
            <td>o_carrier_id</td>
            <td>int</td>
        </tr>
        <tr>
            <td>o_ol_cnt</td>
            <td>decimal(2,0)</td>
        </tr>
        <tr>
            <td>o_all_local</td>
            <td>decimal(1,0)</td>
        </tr>
        <tr>
            <td>o_entry_d</td>
            <td>timestamp</td>
        </tr>
    </tbody>
</table>


### The order_line Table

Table Name: `order_line`

Primary Key Columns: `ol_w_id`, `ol_d_id`, `ol_o_id`, `ol_number`

<table class="splicezep">
    <col />
    <col />
    <thead>
        <tr>
            <th>Column Name</th>
            <th>Data Type</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>ol_w_id</td>
            <td>int</td>
        </tr>
        <tr>
            <td>ol_d_id</td>
            <td>int</td>
        </tr>
        <tr>
            <td>ol_o_id</td>
            <td>int</td>
        </tr>
        <tr>
            <td>ol_number</td>
            <td>int</td>
        </tr>
        <tr>
            <td>ol_i_id</td>
            <td>int</td>
        </tr>
        <tr>
            <td>ol_delivery_d</td>
            <td>timestamp</td>
        </tr>
        <tr>
            <td>ol_amount</td>
            <td>decimal(6,2)</td>
        </tr>
        <tr>
            <td>ol_supply_w_id</td>
            <td>int</td>
        </tr>
        <tr>
            <td>ol_quantity</td>
            <td>decimal(2,0)</td>
        </tr>
        <tr>
            <td>ol_dist_info</td>
            <td>char(24)</td>
        </tr>
    </tbody>
</table>



### Preliminary Questions

Before we dive into the coding exercises, please answer the following questions:

1. What is one reason you would use `BULK_IMPORT_HFILE` over the `IMPORT_DATA` procedure?

2. What is one reason why you should NOT use `BULK_IMPORT_HFILE`? 


To answer these questions, you need edit the cells. Enter your answers below each question.


### Create the Tables

In the next paragraph create the DDL for the tables listed above then run to create the tables.

In [None]:
%%sql 


### Load Data into the Tables

Now that the tables have been created, you need some data to load. 

#### The customer data
The data for the `customer` table can be found at `s3a://splice-benchmark-data/flat/TPCC/10/customer`. 

Here are the first five and last five rows of data for this table:

<pre>
1,1,1,0.0751,BC,BARBARBAR,kxfayfvkqa,50000.0,-10.0,10.0,1,0,atnzyvzilt,epdtxiskrperx,skfplbrbeymt,KX,280911111,1902719838440821,2018-05-08 15:54:42.339,OE,szgkjxidwfelotqfzpgdfcephqfekwaacgkruhafdyqgwwcnwitglzddnzatdmosudzxmlidxtahvyzcchetoubetmkxdgzsujaphxzupdgwcedvxxagbzpaxyuacmvhvmrnneljwnrhvneaspwdepfontfaagyurxzjtiuzqsrgqutquitytiopccubedyivxxxihdujevycjcvicqpfeyyxsoewhiyqtsbmvtzywanxwsdgmgbolicmrfnywcskkqktmwduxrxlujamdmguqwthgpssdftaybqdzuabrvmrzewdxmcyymyscrrbqtwehqnpvjfntvsmxjoutbbsmamzhnogtzwwokriciubgcumvxwdgbtymcyugrczjzapggycwpwgnbmnhnzwofcgenbeomtrkycoivjfxnfygxoarhcycdjuwhbwtgvssfbmpbgajgngqbgxjxyubsaefrxbsmycgcertwmixbx
2,1,1,0.0936,GC,BARBARBAR,cgpwniyadyvyhq,50000.0,-10.0,10.0,1,0,xexdsnegebg,jlkfmupyummrgc,mxfzbppymqjkyjz,CW,187511111,8886369360770084,2018-05-08 15:54:42.345,OE,laxmmvidtmlnvnamnsomvbcnxrxngxyldzvhpnamwjgkcejvwbybgxneiervyivweecfmfcqqrsnschgfsocetzjmwtmmlpxuyfuteqokpfomflhpwqpbszjswziwohvxzvhtfrceklzoruszusfgttrzokktjaspkekfmavffwmuycxhznvkqujfbgmuarsssdgskemnnrnneecwtdttkmqdiciertkmoinstkjblcsfjrvtnqntffwqwcdmcwxftaxpeiqsvbvvumtmlrlqemfuyaivgnjraprrlfozklunvubgukgkfqzcnbzwhhwuqubfthorivzpfizuqzzipxkbosgbsjdliueugyqepffsgjpbciwxcwsxgbgyiylsvjbliqnncucurbqhohdzqccpgvmspfzlobowfhiozyyrkrxwamxbhwrmogslerlbgnxrbkpcmpuxnkxaldrgdrfkz
3,1,1,0.4181,GC,BARBARBAR,gggwhywyia,50000.0,-10.0,10.0,1,0,pbheqcojc,xpjujyxglexm,wrzxnmoouadugdoll,DS,954311111,0731835721448456,2018-05-08 15:54:42.345,OE,vxldujuimswznpfyvxdiiwxjstwpptzioupnuqmzfalgvctgfjqmacucyikfaalvovkxdisjbxeclfzqfrejnivxlnqewuhmfffabgzsofcmzngxxoxsoomlhkftugnjdqtojmbypuzdtzkqzixxgeyppflpsylpnyudwrsdaivnehnhagepnhjdxmtemwhfrshdpioidcxqrzdjsmznuzntxvijpxxjdofkwwfcfidxnmyqhluniqgqsbzroxshakcooqtdexphphfkmsesjrfgoybqqlgvciiqqryjghgqjwwsjtimyqsagvoyardqlvijoevbbrbpitfmojzlsmmyutrwgjdlkgsddxaomlgphnkpqtohjowqacpzxaaarfpguihashvispgfbzvqozvtyxjpsorbmyszigbdgdigtclwlfjkwmimupihjpkeblfbvvvpqucoby
4,1,1,0.0786,GC,BARBARBAR,obpwdpbgqingyen,50000.0,-10.0,10.0,1,0,tgpoqvisqounekex,avphcycqy,amkmteplmbww,VT,727211111,2215944956991382,2018-05-08 15:54:42.345,OE,mmaugjmfualrmpqwdkhkslyzxgfbnwgskybasqhukcgfcfcrvvbbkqczbcitnenoolxkqliocbcidqvgjcjvvxphhpakhigptnehpgiigsmztdhrvhfhsolxcivbtflqiprqcutztiemiasjkrzwjfylskpjvaifrmbxpjwbyrofsiwczdfqmyyvsrjrebhpxkdtiettvllnifdcvhxfxbwuxnyemubqijkyseebczwmabgxrgnvsuheahmhysgyiagstpfeewzxwbwdpjvmchfggcszvuktferklpywtkrxgaruiydpncojwvsbauqjtoyemuquetxlaoyapuxdnzavlkpcauwmjsnuuhgakutyqfgflnfrdalppfogxwskrsynhdekqwkhdygfvfidlgcivlooibptujrazikqkghczzgylegyghzagxnubgpbzqxizegedaxdqpwwkelcxxq
5,1,1,0.2383,GC,BARBARBAR,usdfvosglm,50000.0,-10.0,10.0,1,0,awsnambqckcxnsxmejh,rixjboukfylau,bjhjiqxuwcuiycqy,FK,341611111,3981582898778964,2018-05-08 15:54:42.346,OE,lvpsanehvfcjpbuyuftbmcfnorrppbvpkplwcdfgefrcstagrxnaebboxzotmzpocprggjvonioajjcsnvfhrdqgftcncqhlzoyjdwpiobxmzafzjmqljjklpmsrejenoiexoftyqypqtjfwsoeidmfdjxullnkehbtrbnotrycrnotiyzzlumwilystypuhtfrjitqfvracejbcssyqpmkjhhaccxhmphbkjgcrifqzipadgcjqldcyxsuvcbpjlunmdofhqpsildriybwloxwbpydkqvrvnutxjvlbqjijphkyqvaupirfk
.....
6,10,3000,0.3545,GC,PRIPRESESE,whkznxzaobrmt,50000.0,-10.0,10.0,1,0,gjazmbpolxzschvf,zpwmcvuclkyzelwlw,xqhivarlasrim,CN,699511111,2616762648063755,2018-05-08 15:54:45.873,OE,jggtgzxqegrpjgxajtyckkdjofjvnynvbxhfwntugptbippdaxmioxqfczzwuugbvakdfajhtrktbwndvyozupexcatvdgovfrhbcmzcjjreqomvlzxvdgyhkpdaghoswlcsofdnxjzhrnenvtwxevuxgtfgubozunriwwrgynmyzedphflrkikekzpuntuuoebddlqsaqhqbgwcmxdtnuzkeeiigjhjofxgqyitrsiisoliwcudpkvhkidnayqitqwsnvfmfqriwlpvlnkfzfqapuxayovrrakczehtprgfebesgnrxkghbxifxxqwkyatvscnulvvqujqdlyhpzxpmdyrdypqsehxpzrecaxuaftqcnxstnhuwlssnalskjweealgzmibbfrpwmuxpgyaxyssenewkneqfmeklcj
7,10,3000,0.438,GC,OUGHTPRESATION,jqruyflcmg,50000.0,-10.0,10.0,1,0,ayyyuoszhcqmybry,nezechfdcxw,miakprlmykdzd,WZ,107611111,6811933951111724,2018-05-08 15:54:45.873,OE,hqksekrlctmplglzwmtmvpbicphywuclbloqpvrfcdwirhhiuzefddqhrgexxgynltawbhgcjxsryktnlfnvcfseuhodfjzxdaojuphrvhcetwjacilsvakvzibzpnmgjsioombrxedbhqwgifmlxxdtrnogznwzgjcymdmkqxvnyqxqgiozdxaorervpsmaxgspncazawxukfdgcrrkrnkgpytnmgkiabzqstobvrhtbuipigabddonvjqfasfbpijipvkiqigzrvsufhyizefdwjzpopbenqzxmwmznfbxsxylbfpgapsjxrycntpveblyuqcnwnaxnvvdqwghgguzolxhzlflvwgludbowdsavfrvznntobovjgolbewrnngnccnsizuxkkmzgmerwnhxeelxvotlrmeeeoefulgbgblxdjetlacqqhgjeu
8,10,3000,0.0072,GC,PRIEINGESE,nkzmgdnrtpa,50000.0,-10.0,10.0,1,0,gwcgvbcpjpadepqwax,dlaorwjzxqaahbxmg,mtowszbgqyo,FN,289411111,3088939807851593,2018-05-08 15:54:45.873,OE,cjusztuonckhattaeprsuqkrmfnwznwpkkobtklhjgvuegjixbqbnnehgaqlrvhjimphfnptmfaxgodrlfzklkywvvsvuvvacsiecaztonlnoqmgnykwabtbcylsbaulxueccuxeiwsoxqqxftyvtoynpnfzthazbyzjiiaojiudvekeofjclklpqxjwrosbtjdabpdkutlsuhncnytfqmrgpbdyjtvnszmgcjkkqxlseszgldnkdjrvtzczgneqjjlpwkknpopjplbxjgnxlaivgzetpxacislqsdwxapdnytmghxnvlfgtngfgjtfpusrjsxqbfydovremelemgptorcfcrhlupbpqlqftafaylrweiumhv
9,10,3000,0.4718,GC,EINGABLEPRES,vlyykroxbbepxw,50000.0,-10.0,10.0,1,0,grnnxzzrhyvnvp,wamfzdwimqh,xhiodhrrxgk,XQ,254811111,5000460952666437,2018-05-08 15:54:45.873,OE,tbhforivlswcaqejukipxwwszqjpomjsxxqibhdcimbapaualapegvjwglfepwkvhalthmqpkibpnqnxmmntxiwpinsrrcgwwdegypsldbodiztxeunpcijgrdjmgnuwsvvfmoccejubeuiacomabckknlkssuxucbiinnrvazdzrqvpmzwhgbrttprkevfoulzulbkvaqaaycodajhmmbvunnrgupunlkuqznnwagjyrzogbcmtjzswblyividhikvbkzghwjohnmadlsofckdtetmcoyvhuvfgmzfrbwyuhdixgpbqnbtuslmgnbvymkyvolltodcljvownrvcwfewugndpbzddpecadohxomghfhzchilqoimwitujjrfcnibrbhmedkjjwadwddyvrmewjerthkwrbsingkockmzszbcozhncljphgrezw
10,10,3000,0.2854,GC,CALLYPRIABLE,tlurdvn,50000.0,-10.0,10.0,1,0,bwxkezmmpxxosjgxzo,owjwrnufuy,acgpkwotwyjn,JK,652011111,4152687822287010,2018-05-08 15:54:45.873,OE,dfcnkrzyqzlbrkkxhymsetvahbwokxzzylqtcbncwnkalzdnpgzjjbosxujuytfdhfubxyvayptphkbybxacteqotnwyeowowdtjbilgdilsgysapzkguqttxiktyfmevtbnowhwwvxsmwyegblxiebszkerdzkmuhedubnohkquvwqloukwdeuxghmqurelickihibnsutugwurvslhzvasmjxcsuwrkcqjjyxwbdsowdgwnygawsldbhekdjeleacjywvhejstuhhrxfbrjkzzkpakwxmhogrfctwyrongbjkgfhkimrvzbowvcrtlmcuvbuqclsznrsxnbnmrfgnxqgakycfyaxuiqkdnfykfiiphenpwfopokvzjefwijjuwyaxpjqendwubwbmxvucmrzmgipgcbusad
</pre>

#### The order data

The data for the `order` table can be found at `s3a://splice-benchmark-data/flat/TPCC/10/order`. Here are the first five and last five rows of data for this table:

<pre>
1,1,1,792,5,8,1,2018-05-08 15:54:45
2,1,1,792,2,13,1,2018-05-08 15:54:45
3,1,1,792,5,6,1,2018-05-08 15:54:45
4,1,1,792,10,8,1,2018-05-08 15:54:45
5,1,1,792,10,14,1,2018-05-08 15:54:45
.....
6,10,3000,1548,,14,1,2018-05-08 15:54:50
7,10,3000,1548,,13,1,2018-05-08 15:54:50
8,10,3000,1548,,10,1,2018-05-08 15:54:50
9,10,3000,1548,,12,1,2018-05-08 15:54:50
10,10,3000,1548,,15,1,2018-05-08 15:54:50
</pre>

#### The order_line data

The data for the `order_line` table can be found at `s3a://splice-benchmark-data/flat/TPCC/10/order-line`. Here are the first five and last five rows of data for this table:

<pre>
1,1,1,1,81813,2018-05-08 15:54:45.897,0.0,1,5,qbjgvlgdumddzfwfnkhdyfc
1,1,1,2,19942,2018-05-08 15:54:45.897,0.0,1,5,rtluteodcyyicdezywzptni
1,1,1,3,6709,2018-05-08 15:54:45.897,0.0,1,5,pxhogmpvyuiogvuqnlrzvrh
1,1,1,4,34549,2018-05-08 15:54:45.897,0.0,1,5,gbwzsdtqfzrffopefssxtyu
1,1,1,5,60007,2018-05-08 15:54:45.897,0.0,1,5,wjovodtjvgqtaahoxveyaha
.....
10,10,3000,11,90555,,9912.28,10,5,ybtdwcnvdeqikngirbrkqca
10,10,3000,12,22072,,2294.81,10,5,vdddytldoivujolaeuqragb
10,10,3000,13,68658,,6006.41,10,5,jxbsnmgdzaaaxpjtkbfkbqo
10,10,3000,14,59027,,4355.52,10,5,szvyahuwhfqwzakboczpond
10,10,3000,15,83890,,9780.23,10,5,ooowfaftaiitjexoarbnszg
</pre>

### Create the Split Keys

We are going to ask you to use the `Manual` method for bulk importing the data so that means you will need to create split keys to split the data. We will leave it up to you to determine the best split key to use based on the sample data shown in this paragraph. You will need to connect to your docker image to create the split key files for each of the tables. In the event that you cannot connect to the docker image running this training notebook, please add a paragraph after this one and write down what the keys would be using a markdown cell.

In the next paragraph you will split the tables manually. Run the cell after you have typed the appropriate commands.

In [None]:
%%sql 


<br/>
The table is now ready to have data bulk loaded. Examine the data above and use the next paragraph to bulk load the data for each of the 3 tables:

In [None]:
%%sql 


<br/>
Use the next paragraph to select the first 10 rows from each of the tables by entering the sql statements:

In [None]:
%%sql 


## 2. Optimizing Queries

In this exercise we will test your knowledge of how to optimize queries in Splice Machine. 

Run the next paragraph to display the explain plan for the provided query.


In [None]:
%%sql 

explain select o.o_id, sum(ol.ol_amount)
from advdev_exercises."ORDER" o
join advdev_exercises.order_line ol
on o.o_id = ol.ol_o_id
group by o.o_id;

<br/>
Answer the following questions about the above explain plan:

1. Which table is on the right hand side of the join?



2. How many rows are scanned from the ORDER_LINE table?



3. What other join strategy can be used and how would you get the optimizer to use that strategy?



4. What is the join predicate? 


To answer these questions double click this cell to open the editor, then add your answers below each question


<br>
Use the tables you just loaded and in the next paragraph write a query that returns the first name, last name, carrier id, and the quantity. Show us the the explain plan.

In [None]:
%%sql 


<br/>
What's the first thing you can do to optimize this query? Use the next paragraph to enter your solution:

In [None]:
%%sql 


<br/>
Write a query that will check for possible data skew in the `ADVDEV_EXERCISES.ORDER_LINE` table, specifically on the `OL_O_ID` column:

In [None]:
%%sql 


<br> 
Some final questions for you on the topic of Query Optimization: 

1. Based on the query results above is there any skew?



2. What are two methods you can use to rewrite a query that contains skewed data?



3. What is a covering index?



4. Describe a situation where a nested loop join would be ideal and a situation where it would not be ideal. For the latter, what other join strategies can be used?
 



Double click this paragraph to enter your answers.


## 3. Defining Functions and Procedures

In this exercise, we want you to create a custom user-defined function and a custom user-defined procedure. This exercise requires you to write some Java code and compile it into a JAR file. The code for both the function and the procedure should be contained in the same JAR file, but they do not necessarily need to be in the same Java class.

Here are the requirements that you must include in your solution:

### Function

Create a function that returns the n left-most characters of a string

### Procedure

Using the data loaded in the first exercise, create a procedure that returns the total number of orders for a given customer (use the `c_id` column as the identifier for the customer)

Once you have your jar file built, you need to copy the jar into the Docker container that is running this Zeppelin notebook. If you are unable to connect to the docker image running this training notebook, you can still attempt to complete the remaining portions of this exercise but you will not be able to execute your code. We will be able to grade you based on your java code and the commands you enter in the remaining sections of this exercise. At a minimum, please submit the Java class file(s) for your solution. 

Use the next paragraph to install your JAR file and modify the derby classpath. If you are not able to connect to the docker image running this training notebook you can still type in the commands to install the JAR and modify the derby classpath. You just won't be able to execute the commands.

In [None]:
%%sql 


<br/>
Use the next paragraph to put your custom function and procedure into action by using your function/procedure. Again, if you are unable to deploy your custom jar file to the docker container we would still like to see the sql you would use to run your custom user defined function and procedure. We know you won't be able to execute them.

In [None]:
%%sql 


## 4. Using spark-submit to Interact with Splice Machine

For this exercise we will ask you to build a star-join query based on the following schema. Attempt to execute all queries in `SpliceMachineContext`, the Spark Native Adpater.

1. Create the Part table using the following schema definition:
```
CREATE TABLE PART (
  P_PARTKEY INT,
  P_NAME VARCHAR(55) ,
  P_MFGR VARCHAR(25) ,
  P_BRAND VARCHAR(10) ,
  P_TYPE VARCHAR(25) ,
  P_SIZE INT,
  P_CONTAINER VARCHAR(10) ,
  P_RETAILPRICE DECIMAL(15,2),
  P_COMMENT VARCHAR(23)
)
```

2. Create the Supplier table:
```
CREATE TABLE SUPPLIER (
  S_SUPPKEY INT,
  S_NAME VARCHAR(25) ,
  S_ADDRESS VARCHAR(40) ,
  S_NATIONKEY INT,
  S_PHONE VARCHAR(15) ,
  S_ACCTBAL DECIMAL(15,2),
  S_COMMENT VARCHAR(101)
)
``` 

3. Create the PartSupp table:
```
CREATE TABLE PARTSUPP (
  PS_PARTKEY INT,
  PS_SUPPKEY INT, 
  PS_AVAILQTY INT,
  PS_SUPPLYCOST DECIMAL(15,2),
  PS_COMMENT VARCHAR(199)
)
``` 

4. Call SYSCS_UTIL.IMPORT_DATA function to import the data from s3 into the tables via Zeppelin or sqlshell. CHALLENGE: You can also import from Spark using "bulkImportHFile" or "insert" or "splitAndInsert" methods. It may require that you copy data locally to running docker container by first downloading from s3 into localhost etc Please note that depending on the amount of memory on your system for docker, you may like to skip any sampling on the data (review splice docs on all these methods to see how splits are computed and sampling is done within splice for this reason and how to disable it) :
```
s3a://splice-benchmark-data/flat/TPCH/1/supplier
s3a://splice-benchmark-data/flat/TPCH/1/part
s3a://splice-benchmark-data/flat/TPCH/1/partsupp
``` 

5.  Execute the StarJoin query within your Spark Adapter code:
```
SELECT
  P_PARTKEY,
  P_NAME,
  P_MFGR,
  P_BRAND,
  P_TYPE,
  P_SIZE,
  P_CONTAINER,
  P_RETAILPRICE,
  P_COMMENT,
  PS_PARTKEY,
  PS_SUPPKEY,
  PS_AVAILQTY,
  PS_SUPPLYCOST,
  PS_COMMENT,
  S_SUPPKEY,
  S_NAME,
  S_ADDRESS,
  S_NATIONKEY,
  S_PHONE,
  S_ACCTBAL
FROM
  PART
JOIN PARTSUPP ON P_PARTKEY = PS_PARTKEY
JOIN SUPPLIER ON PS_SUPPKEY = S_SUPPKEY
```  

6. Code, build and deploy the application jar file. Execute and verify the results.

This exercise cannot be completed within the context of a notebook. You need to write some java code, build a JAR, then deploy the JAR to the Docker instance that is running Splice Machine. In the docker instance you use the `spark-submit` command to execute your Java code. Please submit your Java code.  


## Where to Go Next

Congratulations! You've just completed the Splice Machine Advanced Developer class.

Visit [*Our Training Classes*](../About/Our%20Training%20Classes.ipynb) notebook to learn about our other training classes.
