<html>
<body>
    <div style="width: 50%; height: 50%; float:left; text-align: left">
    <a href="https://github.com/ericxiao251/spark-syntax/blob/master/src/Chapter%201%20-%20Basics/Section%203%20-%20Reading%20your%20First%20Dataset.ipynb" style="text-decoration: none"> &lt; Reading your First Dataset</a>
    </div>
    <div style="width: 50%; height: 50%; float:right; text-align: right"> 
    <a href="https://github.com/ericxiao251/spark-syntax/blob/master/src/Chapter%201%20-%20Basics/Section%204%20-%20More%20Comfortable%20with%20SQL%3F.ipynb" style="text-decoration: none"> Chapter 2: Exploring the Spark API &gt;</a>       
    </div>
</body>
</html>

### Library Imports

In [1]:
from pyspark.sql import SparkSession
from pyspark.sql import types as T

### Template

In [2]:
spark = (
    SparkSession.builder
    .master("local")
    .appName("Section 4 - More Comfortable with SQL?")
    .config("spark.some.config.option", "some-value")
    .getOrCreate()
)

sc = spark.sparkContext

import os

data_path = "/data/pets.csv"
base_path = os.path.dirname(os.getcwd())
path = base_path + data_path

df = spark.read.csv(path, header=True)
df.toPandas()

Unnamed: 0,id,species_id,name,birthday,color
0,1,1,King,2014-11-22 12:30:31,brown
1,2,3,Argus,2016-11-22 10:05:10,


### Register DataFrame as a SQL Table

In [3]:
df.createOrReplaceTempView("pets")

### What Happened?
The first step in making a `df` querable with `SQL`, we need to first **register** the table as a sql table.

This particle function will **replace** any previously reigstered **local** table named `pets` as a result. There are other functions that will register a dataframe with a slightly different behavior, you can check the reference docs if this isn't the desired behavior: [docs](https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.createGlobalTempView)

### Let Write a SQL Query!

In [4]:
df_2 = spark.sql("""
SELECT 
    *
FROM pets
WHERE name = 'Argus'
""")

df_2.toPandas()

Unnamed: 0,id,species_id,name,birthday,color
0,2,3,Argus,2016-11-22 10:05:10,


### What Happened?
Once your `df` is registered, call the spark `sc` function on your `spark session` object. It takes a `sql string` as an input and outputs a new `df`.

### Conclusion?
If you're more comfortable with writing `sql` than python/spark code, then you can do so with a spark `df`! We do this by:
1. Register the `df` with `df.createOrReplaceTempView('table')`.
2. Call the `sql` function on your `spark session` with a `sql string` as an input.
3. You're done!