## Inserting Data using Stage Table

Let us understand how to insert data into order_items with Parquet file format. 

Let us start spark context for this Notebook so that we can execute the code provided. You can sign up for our [10 node state of the art cluster/labs](https://labs.itversity.com/plans) to learn Spark SQL using our unique integrated LMS.

In [1]:
val username = System.getProperty("user.name")

username = itv002480


itv002480

In [2]:
import org.apache.spark.sql.SparkSession

val username = System.getProperty("user.name")
val spark = SparkSession.
    builder.
    config("spark.ui.port", "0").
    config("spark.sql.warehouse.dir", s"/user/${username}/warehouse").
    enableHiveSupport.
    appName(s"${username} | Spark SQL - Managing Tables - DML and Partitioning").
    master("yarn").
    getOrCreate

username = itv002480
spark = org.apache.spark.sql.SparkSession@26a047ba


org.apache.spark.sql.SparkSession@26a047ba

If you are going to use CLIs, you can use Spark SQL using one of the 3 approaches.

**Using Spark SQL**

```
spark2-sql \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

**Using Scala**

```
spark2-shell \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

**Using Pyspark**

```
pyspark2 \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

As data is in text file format and our table is created with Parquet file format, we will not be able to use LOAD command to load the data.

In [3]:
%%sql

LOAD DATA LOCAL INPATH '/data/retail_db/order_items'
    OVERWRITE INTO TABLE order_items

Waiting for a Spark session to start...

Magic sql failed to execute with error: 
Table or view 'order_items' not found in database 'default';

* Above load command will be successful, however when we try to query it will fail as the query expects data to be in Parquet file format.

In [4]:
%%sql

SELECT * FROM order_items LIMIT 10

Magic sql failed to execute with error: 
Table or view not found: order_items; line 1 pos 14

In [5]:
%%sql

TRUNCATE TABLE order_items

Magic sql failed to execute with error: 
Table or view 'order_items' not found in database 'default';

Following are the steps to get data into table which is created using different file format or delimiter than our source data.

* We need to create stage table with text file format and comma as delimiter (order_items_stage).
* Load data from our files in local file system to stage table.
* Using stage table run insert command to insert data into our target table (order_items).

Let us see an example of inserting data into the target table from staging table.

In [6]:
%%sql

USE itv002480_retail

++
||
++
++



In [7]:
%%sql

SHOW tables

+----------------+-----------+-----------+
|        database|  tableName|isTemporary|
+----------------+-----------+-----------+
|itv002480_retail|order_items|      false|
|itv002480_retail|     orders|      false|
+----------------+-----------+-----------+



In [8]:
%%sql

CREATE TABLE order_items_stage (
  order_item_id INT,
  order_item_order_id INT,
  order_item_product_id INT,
  order_item_quantity INT,
  order_item_subtotal FLOAT,
  order_item_product_price FLOAT
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

++
||
++
++



In [9]:
spark.sql("DESCRIBE FORMATTED order_items_stage").show(200, false)

+----------------------------+--------------------------------------------------------------------------------------------+-------+
|col_name                    |data_type                                                                                   |comment|
+----------------------------+--------------------------------------------------------------------------------------------+-------+
|order_item_id               |int                                                                                         |null   |
|order_item_order_id         |int                                                                                         |null   |
|order_item_product_id       |int                                                                                         |null   |
|order_item_quantity         |int                                                                                         |null   |
|order_item_subtotal         |float                                         

In [10]:
%%sql

LOAD DATA LOCAL INPATH '/data/retail_db/order_items' INTO TABLE order_items_stage

++
||
++
++



In [11]:
%%sql

SELECT * FROM order_items_stage LIMIT 10

|            3|                  2|                  502|                  5|              250.0|     ...


+-------------+-------------------+---------------------+-------------------+-------------------+------------------------+
|order_item_id|order_item_order_id|order_item_product_id|order_item_quantity|order_item_subtotal|order_item_product_price|
+-------------+-------------------+---------------------+-------------------+-------------------+------------------------+
|            1|                  1|                  957|                  1|             299.98|                  299.98|
|            2|                  2|                 1073|                  1|             199.99|                  199.99|
|            3|                  2|                  502|                  5|              250.0|                    50.0|
|            4|                  2|                  403|                  1|             129.99|                  129.99|
|            5|                  4|                  897|                  2|              49.98|                   24.99|
|            6| 

In [12]:
%%sql

TRUNCATE TABLE order_items

++
||
++
++



In [13]:
%%sql

INSERT INTO TABLE order_items
SELECT * FROM order_items_stage

++
||
++
++



In [14]:
%%sql

SELECT * FROM order_items LIMIT 10

|            3|                  2|                  502|                  5|              250.0|     ...


+-------------+-------------------+---------------------+-------------------+-------------------+------------------------+
|order_item_id|order_item_order_id|order_item_product_id|order_item_quantity|order_item_subtotal|order_item_product_price|
+-------------+-------------------+---------------------+-------------------+-------------------+------------------------+
|            1|                  1|                  957|                  1|             299.98|                  299.98|
|            2|                  2|                 1073|                  1|             199.99|                  199.99|
|            3|                  2|                  502|                  5|              250.0|                    50.0|
|            4|                  2|                  403|                  1|             129.99|                  129.99|
|            5|                  4|                  897|                  2|              49.98|                   24.99|
|            6| 

In [15]:
%%sql

SELECT count(1) FROM order_items

+--------+
|count(1)|
+--------+
|  172198|
+--------+



* `INSERT INTO` will append data into the target table by adding new files.

In [16]:
%%sql

INSERT INTO TABLE order_items
SELECT * FROM order_items_stage

++
||
++
++



In [17]:
%%sql

SELECT * FROM order_items LIMIT 10

|            3|                  2|                  502|                  5|              250.0|     ...


+-------------+-------------------+---------------------+-------------------+-------------------+------------------------+
|order_item_id|order_item_order_id|order_item_product_id|order_item_quantity|order_item_subtotal|order_item_product_price|
+-------------+-------------------+---------------------+-------------------+-------------------+------------------------+
|            1|                  1|                  957|                  1|             299.98|                  299.98|
|            2|                  2|                 1073|                  1|             199.99|                  199.99|
|            3|                  2|                  502|                  5|              250.0|                    50.0|
|            4|                  2|                  403|                  1|             129.99|                  129.99|
|            5|                  4|                  897|                  2|              49.98|                   24.99|
|            6| 

In [18]:
%%sql

SELECT count(1) FROM order_items

+--------+
|count(1)|
+--------+
|  344396|
+--------+



* `INSERT OVERWRITE` will overwrite the data in target table by deleting the files related to old data from the directory pointed by the Spark Metastore table.

In [19]:
%%sql

INSERT OVERWRITE TABLE order_items
SELECT * FROM order_items_stage

++
||
++
++



In [20]:
%%sql

SELECT * FROM order_items

|            3|                  2|                  502|                  5|              250.0|     ...


+-------------+-------------------+---------------------+-------------------+-------------------+------------------------+
|order_item_id|order_item_order_id|order_item_product_id|order_item_quantity|order_item_subtotal|order_item_product_price|
+-------------+-------------------+---------------------+-------------------+-------------------+------------------------+
|            1|                  1|                  957|                  1|             299.98|                  299.98|
|            2|                  2|                 1073|                  1|             199.99|                  199.99|
|            3|                  2|                  502|                  5|              250.0|                    50.0|
|            4|                  2|                  403|                  1|             129.99|                  129.99|
|            5|                  4|                  897|                  2|              49.98|                   24.99|
|            6| 

In [21]:
%%sql

SELECT count(1) FROM order_items

+--------+
|count(1)|
+--------+
|  172198|
+--------+



In [22]:
import sys.process._

s"hdfs dfs -ls /user/${username}/warehouse/${username}_retail.db/order_items" !

Found 3 items
-rw-r--r--   3 itv002480 supergroup          0 2022-05-30 07:19 /user/itv002480/warehouse/itv002480_retail.db/order_items/_SUCCESS
-rw-r--r--   3 itv002480 supergroup     862839 2022-05-30 07:19 /user/itv002480/warehouse/itv002480_retail.db/order_items/part-00000-545e4af5-f633-4249-a586-0c9d0fb33c8b-c000.snappy.parquet
-rw-r--r--   3 itv002480 supergroup     858034 2022-05-30 07:19 /user/itv002480/warehouse/itv002480_retail.db/order_items/part-00001-545e4af5-f633-4249-a586-0c9d0fb33c8b-c000.snappy.parquet




0