### PySpark ETL
This example will show how to use SparkSql to perform ETL on a set of relational tables
<br>
<br>
**Tables**:
1. Customer: Information such as State, Gender and Customer ID
2. Product: List of SKUs the retail chain s ells
3. Transaction: Point of Sale (POS) data for customers

### Create a Spark Session

In [1]:
from pyspark.sql import SparkSession

#Create Spark Session
spark=SparkSession.builder.appName("simple").getOrCreate()

### Load in Files from hdfs

In [2]:
#Read in Customer file from HDFS
Customer = spark.read.csv('hdfs://localhost:54310/user/andrew/data/relational_example/Customer.csv', header=True)

#Register for SQL Use
Customer.createOrReplaceTempView("Customer")

Customer.show(10)

+----------+-----+------+-----------+--------------------+----------+--------------------+--------------+
|CustomerID|State|Gender|  FirstName|            LastName|Birth_Date|             Address|Member_Type_ID|
+----------+-----+------+-----------+--------------------+----------+--------------------+--------------+
|         1|   KY|     M|     Albert|              Collet| 24Nov1940|Square Edouard Vii 1|          2030|
|         2|   ME|     F|   Mercedes|            Mart�nez| 15Jan1955|          Edificio 2|          1010|
|         3|   IN|     M|Pier Egidio|              Boeris| 01Jul1970|Via M. Di Monteso...|          1040|
|         4|   NY|     M|      James|             Kvarniq| 27Jun1970|      4382 Gralyn Rd|          1020|
|         5|   NY|     F|   Sandrina|            Stephano| 09Jul1975|    6468 Cog Hill Ct|          2020|
|         6|   OH|     M|       Rent|            Van Lint| 23Dec1945|         Mispadstr 2|          1030|
|         7|   ME|     F|     Juli�n|Escorihue

In [3]:
#Read in Customer file from HDFS
Product = spark.read.csv('hdfs://localhost:54310/user/andrew/data/relational_example/Product.csv', header=True)

#Register for SQL Use
Product.createOrReplaceTempView("Product")

Product.show(10)

+--------+--------------------+-----------+-------+
|itemcode|                item|   category| Amount|
+--------+--------------------+-----------+-------+
|     111|             SEP IRA| Retirement|$550.00|
|     112|              Keough| Retirement|$325.00|
|     113|          Simple IRA| Retirement|$125.00|
|     114|         US Equities|Mutual Fund|  $2.80|
|     115|International Equ...|Mutual Fund|$325.00|
|     116|      Lifestyle Fund|Mutual Fund|$245.00|
|     117|        Money Market|Mutual Fund|$268.00|
|     118|Mutual Fund - US ...|Mutual Fund| $75.00|
|     121|    Asset Allocation|Mutual Fund|$850.00|
|     122|        Fixed Income|Mutual Fund|$268.00|
+--------+--------------------+-----------+-------+
only showing top 10 rows


In [4]:
#Read in Customer file from HDFS
Transaction = spark.read.csv('hdfs://localhost:54310/user/andrew/data/relational_example/Transaction.csv', header=True)

#Register for SQL Use
Product.createOrReplaceTempView("Transaction")

Transaction.show(10)

+------------+--------+--------+----------+-----------+---------+
|order_number|quantity|itemcode|order_type|Customer_ID|     date|
+------------+--------+--------+----------+-----------+---------+
|        1002|       1|     324|         1|          2|02Jan2016|
|        1002|       1|     322|         1|          2|02Jan2016|
|        1001|       1|     322|         0|          1|02Jan2016|
|        1003|       1|     324|         1|          3|04Jan2016|
|        1004|       1|     324|         1|          4|04Jan2016|
|        1005|       2|     314|         1|          5|04Jan2016|
|        1006|       1|     314|         0|          6|05Jan2016|
|        1007|       1|     323|         0|          7|05Jan2016|
|        1008|       1|     323|         0|          8|08Jan2016|
|        1009|       4|     221|         1|          9|08Jan2016|
+------------+--------+--------+----------+-----------+---------+
only showing top 10 rows


### See Tables

In [5]:
spark.sql("show tables").show()

+--------+-----------+-----------+
|database|  tableName|isTemporary|
+--------+-----------+-----------+
|        |   customer|       true|
|        |    product|       true|
|        |transaction|       true|
+--------+-----------+-----------+

