## Preparing Tables

Let us prepare the tables to solve the problem.

* Make sure database is created.
* Create **orders** table.
* Load data from local path **/data/retail_db/orders** into newly created **orders** table.
* Preview data and get count from **orders**
* Create **order_items** table.
* Load data from local path **/data/retail_db/order_items** into newly created **orders** table.
* Preview data and get count from **order_items**

As tables and data are ready let us get into how to write queries against tables to perform basic transformation.

Let us start spark context for this Notebook so that we can execute the code provided. You can sign up for our [10 node state of the art cluster/labs](https://labs.itversity.com/plans) to learn Spark SQL using our unique integrated LMS.

In [1]:
val username = System.getProperty("user.name")

username = itv002480


itv002480

In [2]:
import org.apache.spark.sql.SparkSession

val username = System.getProperty("user.name")
val spark = SparkSession.
    builder.
    config("spark.ui.port", "0").
    config("spark.sql.warehouse.dir", s"/user/${username}/warehouse").
    enableHiveSupport.
    appName(s"${username} | Spark SQL - Basic Transformations").
    master("yarn").
    getOrCreate

username = itv002480
spark = org.apache.spark.sql.SparkSession@38d7c00a


org.apache.spark.sql.SparkSession@38d7c00a

If you are going to use CLIs, you can use Spark SQL using one of the 3 approaches.

**Using Spark SQL**

```
spark2-sql \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

**Using Scala**

```
spark2-shell \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

**Using Pyspark**

```
pyspark2 \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

In [3]:
%%sql

DROP DATABASE itv002480_retail CASCADE

Waiting for a Spark session to start...

++
||
++
++



In [4]:
%%sql

CREATE DATABASE IF NOT EXISTS itv002480_retail

++
||
++
++



In [5]:
%%sql
USE itv002480_retail

++
||
++
++



In [6]:
%%sql
SHOW tables

+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
+--------+---------+-----------+



In [7]:
%%sql

DROP TABLE orders

Magic sql failed to execute with error: 
Table or view not found: orders;

In [8]:
%%sql

CREATE TABLE orders (
    order_id INT,
    order_date STRING,
    order_customer_id INT,
    order_status STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

++
||
++
++



In [9]:
import sys.process._
val username = System.getProperty("user.name")
s"hdfs dfs -ls /user/itv002480/warehouse/${username}_retail.db/orders"!

username = itv002480




0

In [10]:
%%sql

LOAD DATA LOCAL INPATH '/data/retail_db/orders' INTO TABLE orders

++
||
++
++



In [11]:
import sys.process._
val username = System.getProperty("user.name")
s"hdfs dfs -ls /user/itv002480/warehouse/${username}_retail.db/orders"!

Found 1 items
-rwxr-xr-x   3 itv002480 supergroup    2999944 2022-05-26 06:59 /user/itv002480/warehouse/itv002480_retail.db/orders/part-00000


username = itv002480




0

In [12]:
%%sql

SELECT * FROM orders LIMIT 10

|   34572|2014-02-23 00:00:...|             8135|        ...


+--------+--------------------+-----------------+---------------+
|order_id|          order_date|order_customer_id|   order_status|
+--------+--------------------+-----------------+---------------+
|   34565|2014-02-23 00:00:...|             8702|       COMPLETE|
|   34566|2014-02-23 00:00:...|             3066|PENDING_PAYMENT|
|   34567|2014-02-23 00:00:...|             7314|SUSPECTED_FRAUD|
|   34568|2014-02-23 00:00:...|             1271|       COMPLETE|
|   34569|2014-02-23 00:00:...|            11083|       COMPLETE|
|   34570|2014-02-23 00:00:...|             3159|         CLOSED|
|   34571|2014-02-23 00:00:...|             4551|         CLOSED|
|   34572|2014-02-23 00:00:...|             8135|        PENDING|
|   34573|2014-02-23 00:00:...|             7497|PENDING_PAYMENT|
|   34574|2014-02-23 00:00:...|             1868|        ON_HOLD|
+--------+--------------------+-----------------+---------------+



In [13]:
%%sql

SELECT count(1) FROM orders

+--------+
|count(1)|
+--------+
|   68883|
+--------+



In [14]:
%%sql

DROP TABLE order_items

Magic sql failed to execute with error: 
Table or view not found: order_items;

In [15]:
%%sql 

CREATE TABLE order_items (
    order_item_id INT,
    order_item_order_id INT,
    order_item_product_id INT,
    order_item_quantity INT,
    order_item_subtotal FLOAT,
    order_item_product_price FLOAT
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

++
||
++
++



In [16]:
import sys.process._
val username = System.getProperty("user.name")
s"hdfs dfs -ls /user/itv002480/warehouse/${username}_retail.db/order_items"!

username = itv002480




0

In [17]:
%%sql

LOAD DATA LOCAL INPATH '/data/retail_db/order_items' INTO TABLE order_items

++
||
++
++



In [18]:
import sys.process._
val username = System.getProperty("user.name")
s"hdfs dfs -ls /user/itv002480/warehouse/${username}_retail.db/order_items"!

Found 1 items
-rwxr-xr-x   3 itv002480 supergroup    5408880 2022-05-26 06:59 /user/itv002480/warehouse/itv002480_retail.db/order_items/part-00000


username = itv002480




0

In [19]:
%%sql

SELECT * FROM order_items LIMIT 10

|            3|                  2|                  502|                  5|              250.0|     ...


+-------------+-------------------+---------------------+-------------------+-------------------+------------------------+
|order_item_id|order_item_order_id|order_item_product_id|order_item_quantity|order_item_subtotal|order_item_product_price|
+-------------+-------------------+---------------------+-------------------+-------------------+------------------------+
|            1|                  1|                  957|                  1|             299.98|                  299.98|
|            2|                  2|                 1073|                  1|             199.99|                  199.99|
|            3|                  2|                  502|                  5|              250.0|                    50.0|
|            4|                  2|                  403|                  1|             129.99|                  129.99|
|            5|                  4|                  897|                  2|              49.98|                   24.99|
|            6| 

In [20]:
%%sql

SELECT count(1) FROM order_items

+--------+
|count(1)|
+--------+
|  172198|
+--------+



* Using Spark SQL with Python or Scala

In [44]:
spark.sql("DROP DATABASE itversity_retail CASCADE")

[]

In [45]:
spark.sql("CREATE DATABASE IF NOT EXISTS itversity_retail")

[]

In [46]:
spark.sql("USE itversity_retail")

[]

In [47]:
spark.sql("SHOW tables")

org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'itversity_retail' not found;

In [48]:
spark.sql("DROP TABLE orders")

lastException = null


org.apache.spark.sql.AnalysisException: Table or view not found: orders;

In [49]:
spark.sql("""
CREATE TABLE orders (
    order_id INT,
    order_date STRING,
    order_customer_id INT,
    order_status STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
""")

lastException = null


org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'itversity_retail' not found;

In [50]:
import sys.process._
val username = System.getProperty("user.name")
s"hdfs dfs -ls /user/itversity/warehouse/${username}_retail.db/orders"!

ls: `/user/itversity/warehouse/itv002480_retail.db/orders': No such file or directory


username = itv002480


lastException: Throwable = null


1

In [51]:
spark.sql("LOAD DATA LOCAL INPATH '/data/retail_db/orders' INTO TABLE orders")

org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'orders' not found in database 'itversity_retail';

In [52]:
import sys.process._
val username = System.getProperty("user.name")
s"hdfs dfs -ls /user/itversity/warehouse/${username}_retail.db/orders"!

ls: `/user/itversity/warehouse/itv002480_retail.db/orders': No such file or directory


username = itv002480


lastException: Throwable = null


1

In [53]:
spark.sql("SELECT * FROM orders LIMIT 10").show()

org.apache.spark.sql.AnalysisException: Table or view not found: orders; line 1 pos 14

In [54]:
spark.sql("SELECT count(1) FROM orders").show()

lastException = null


org.apache.spark.sql.AnalysisException: Table or view not found: orders; line 1 pos 21

In [55]:
spark.sql("DROP TABLE order_items")

lastException = null


org.apache.spark.sql.AnalysisException: Table or view not found: order_items;

In [56]:
spark.sql("""
CREATE TABLE order_items (
    order_item_id INT,
    order_item_order_id INT,
    order_item_product_id INT,
    order_item_quantity INT,
    order_item_subtotal FLOAT,
    order_item_product_price FLOAT
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
""")

lastException = null


org.apache.hadoop.security.AccessControlException: Permission denied: user=itv002480, access=EXECUTE, inode="/user/itv002479":itv002479:supergroup:drwx------
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:496)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:412)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:323)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermissionWithContext(FSPermissionChecker.java:360)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:239)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:703)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkTraverse(FSDirectory.java:1858)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkTraverse(FSDirectory.java:1876)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolvePath(FSDirectory.java:718)
	at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:112)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3352)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1210)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:1041)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:532)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1020)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2952)


In [57]:
import sys.process._
val username = System.getProperty("user.name")
s"hdfs dfs -ls /user/itversity/warehouse/${username}_retail.db/order_items"!

ls: `/user/itversity/warehouse/itv002480_retail.db/order_items': No such file or directory


username = itv002480


lastException: Throwable = null


1

In [58]:
spark.sql("LOAD DATA LOCAL INPATH '/data/retail_db/order_items' INTO TABLE order_items")

org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'order_items' not found in database 'itversity_retail';

In [59]:
import sys.process._
val username = System.getProperty("user.name")
s"hdfs dfs -ls /user/itversity/warehouse/${username}_retail.db/order_items"!

ls: `/user/itversity/warehouse/itv002480_retail.db/order_items': No such file or directory


username = itv002480


lastException: Throwable = null


1

In [60]:
spark.sql("SELECT * FROM order_items LIMIT 10").show()

org.apache.spark.sql.AnalysisException: Table or view not found: order_items; line 1 pos 14

In [61]:
spark.sql("SELECT count(1) FROM order_items").show()

lastException = null


org.apache.spark.sql.AnalysisException: Table or view not found: order_items; line 1 pos 21