## Managed Tables - Exercise

Let us use NYSE data and see how we can create tables in Spark Metastore.

Let us start spark context for this Notebook so that we can execute the code provided. You can sign up for our [10 node state of the art cluster/labs](https://labs.itversity.com/plans) to learn Spark SQL using our unique integrated LMS.

In [2]:
val username = System.getProperty("user.name")

username = itv002480


lastException: Throwable = null


itv002480

In [10]:
import org.apache.spark.sql.SparkSession

val username = System.getProperty("user.name")
val spark = SparkSession.
    builder.
    config("spark.ui.port", "0").
    config("spark.sql.warehouse.dir", s"/user/${username}/warehouse").
    enableHiveSupport.
    appName(s"${username} | Spark SQL - Managing Tables - Basic DDL and DML").
    master("yarn").
    getOrCreate

username = itv002480
spark = org.apache.spark.sql.SparkSession@74cca3c0


org.apache.spark.sql.SparkSession@74cca3c0

If you are going to use CLIs, you can use Spark SQL using one of the 3 approaches.

**Using Spark SQL**

```
spark2-sql \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

**Using Scala**

```
spark2-shell \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

**Using Pyspark**

```
pyspark2 \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

* Duration: **30 Minutes**
* Data Location (Local): /data/nyse_all/nyse_data
* Create a database with the name - YOUR_OS_USER_NAME_nyse
* Table Name: nyse_eod
* File Format: TEXTFILE (default)
* Review the files by running Linux commands before using data sets. Data is compressed and we can load the files as is.
* Copy one of the zip file to your home directory and preview the data. There should be 7 fields. You need to determine the delimiter.
* Field Names: stockticker, tradedate, openprice, highprice, lowprice, closeprice, volume. For example, you need to use `BIGINT` for volume not `INT`.
* Determine correct data types based on the values
* Create Managed table with default Delimiter.
> As delimiters in data and table are not same, you need to figure out how to get data into the target table.
* Make sure the data is copied into the table as per the structure defined and validate.

In [1]:
import org.apache.spark.sql.SparkSession

val username = System.getProperty("user.name")
val spark = SparkSession.
    builder.
    config("spark.ui.port", "0").
    config("spark.sql.warehouse.dir", s"/user/${username}/warehouse").
    enableHiveSupport.
    appName(s"${username} | Spark SQL - Managing Tables - Basic DDL and DML").
    master("yarn").
    getOrCreate

username = itv002480
spark = org.apache.spark.sql.SparkSession@2beb3121


org.apache.spark.sql.SparkSession@2beb3121

In [2]:
%%sql
create database itv002480_nyse

Waiting for a Spark session to start...

Magic sql failed to execute with error: 
org.apache.hadoop.hive.metastore.api.AlreadyExistsException: Database itv002480_nyse already exists;

In [3]:
%%sql
use itv002480_nyse

++
||
++
++



In [4]:
%%sql
DROP table nyse_eod

++
||
++
++



In [5]:
%%sql

CREATE TABLE nyse_eod (
  stockticker STRING,
  trade_date INT,
  open_price FLOAT,
  high_price FLOAT,
  low_price FLOAT,
  close_price FLOAT,
  volume BIGINT
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE

++
||
++
++



In [6]:
%%sql

LOAD DATA LOCAL INPATH '/home/itv002480/NYSE_2007.txt' INTO TABLE nyse_eod

++
||
++
++



In [9]:
%%sql

LOAD DATA LOCAL INPATH '/home/itv002480/NYSE_2008.txt' INTO TABLE nyse_eod

++
||
++
++



In [10]:
%%sql

LOAD DATA LOCAL INPATH '/home/itv002480/NYSE_2009.txt' INTO TABLE nyse_eod

++
||
++
++



In [11]:
%%sql

LOAD DATA LOCAL INPATH '/home/itv002480/NYSE_2010.txt' INTO TABLE nyse_eod

++
||
++
++



In [13]:
%%sql

LOAD DATA LOCAL INPATH '/home/itv002480/NYSE_2011.txt' INTO TABLE nyse_eod

++
||
++
++



In [14]:
%%sql

LOAD DATA LOCAL INPATH '/home/itv002480/NYSE_2012.txt' INTO TABLE nyse_eod

++
||
++
++



In [15]:
%%sql

LOAD DATA LOCAL INPATH '/home/itv002480/NYSE_2013.txt' INTO TABLE nyse_eod

++
||
++
++



In [16]:
%%sql

LOAD DATA LOCAL INPATH '/home/itv002480/NYSE_2014.txt' INTO TABLE nyse_eod

++
||
++
++



In [17]:
%%sql

LOAD DATA LOCAL INPATH '/home/itv002480/NYSE_2015.txt' INTO TABLE nyse_eod

++
||
++
++



In [18]:
%%sql

LOAD DATA LOCAL INPATH '/home/itv002480/NYSE_2016.txt' INTO TABLE nyse_eod

++
||
++
++



In [19]:
%%sql

LOAD DATA LOCAL INPATH '/home/itv002480/NYSE_2017.txt' INTO TABLE nyse_eod

++
||
++
++



In [20]:
%%sql

LOAD DATA LOCAL INPATH '/home/itv002480/NYSE_1997.txt' INTO TABLE nyse_eod

++
||
++
++



In [21]:
%%sql

LOAD DATA LOCAL INPATH '/home/itv002480/NYSE_1998.txt' INTO TABLE nyse_eod

++
||
++
++



In [22]:
%%sql

LOAD DATA LOCAL INPATH '/home/itv002480/NYSE_1999.txt' INTO TABLE nyse_eod

++
||
++
++



In [23]:
%%sql

LOAD DATA LOCAL INPATH '/home/itv002480/NYSE_2000.txt' INTO TABLE nyse_eod

++
||
++
++



In [24]:
%%sql

LOAD DATA LOCAL INPATH '/home/itv002480/NYSE_2001.txt' INTO TABLE nyse_eod

++
||
++
++



In [25]:
%%sql

LOAD DATA LOCAL INPATH '/home/itv002480/NYSE_2002.txt' INTO TABLE nyse_eod

++
||
++
++



In [26]:
%%sql

LOAD DATA LOCAL INPATH '/home/itv002480/NYSE_2003.txt' INTO TABLE nyse_eod

++
||
++
++



In [27]:
%%sql

LOAD DATA LOCAL INPATH '/home/itv002480/NYSE_2004.txt' INTO TABLE nyse_eod

++
||
++
++



In [28]:
%%sql

LOAD DATA LOCAL INPATH '/home/itv002480/NYSE_2005.txt' INTO TABLE nyse_eod

++
||
++
++



In [29]:
%%sql

LOAD DATA LOCAL INPATH '/home/itv002480/NYSE_2006.txt' INTO TABLE nyse_eod

++
||
++
++



If you are going to use CLIs, you can use Spark SQL using one of the 3 approaches.

**Using Spark SQL**

```
spark2-sql \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

**Using Scala**

```
spark2-shell \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

**Using Pyspark**

```
pyspark2 \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

### Validation

Run the following queries to ensure that you will be able to read the data.

```
DESCRIBE FORMATTED YOUR_OS_USER_NAME_nyse.nyse_eod;
SELECT * FROM YOUR_OS_USER_NAME_nyse.nyse_eod LIMIT 10
SELECT count(1) FROM YOUR_OS_USER_NAME_nyse.nyse_eod;
```

In [30]:
// There should not be field delimiter as the requirement is to use default delimiter
spark.sql("DESCRIBE FORMATTED itv002480_nyse.nyse_eod").show(200, false)

+----------------------------+---------------------------------------------------------------------------------+-------+
|col_name                    |data_type                                                                        |comment|
+----------------------------+---------------------------------------------------------------------------------+-------+
|stockticker                 |string                                                                           |null   |
|trade_date                  |int                                                                              |null   |
|open_price                  |float                                                                            |null   |
|high_price                  |float                                                                            |null   |
|low_price                   |float                                                                            |null   |
|close_price                 |fl

In [31]:
%%sql

SELECT * FROM itv002480_nyse.nyse_eod LIMIT 10

|        ACV|  19980101|     21....


+-----------+----------+----------+----------+---------+-----------+------+
|stockticker|trade_date|open_price|high_price|low_price|close_price|volume|
+-----------+----------+----------+----------+---------+-----------+------+
|         AA|  19980101|     52.77|     52.77|    52.77|      52.77|     0|
|        ABC|  19980101|      7.28|      7.28|     7.28|       7.28|     0|
|        ABM|  19980101|     15.28|     15.28|    15.28|      15.28|     0|
|        ABT|  19980101|     32.75|     32.75|    32.75|      32.75|     0|
|        ABX|  19980101|     18.62|     18.62|    18.62|      18.62|     0|
|        ACP|  19980101|      9.75|      9.75|     9.75|       9.75|     0|
|        ACV|  19980101|     21.37|     21.37|    21.37|      21.37|     0|
|        ADC|  19980101|     21.75|     21.75|    21.75|      21.75|     0|
|        ADM|  19980101|     17.84|     17.84|    17.84|      17.84|     0|
|        ADX|  19980101|     16.12|     16.12|    16.12|      16.12|     0|
+-----------

In [32]:
%%sql

SELECT count(1) FROM itv002480_nyse.nyse_eod

+--------+
|count(1)|
+--------+
| 9384739|
+--------+

