# PySpark Tutorial - Additional Examples

Create the Spark Context required for any PySpark program.  Most programs will store this in a variable named `sc`.

In [1]:
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("PySparkTutorial").getOrCreate()

## Using SparkSQL

The following code creates a DataFrame by reading a parquet file and using SparkSQL directly from Python.

> Note that this method might seem familiar to SQL Developers, however it does not have access to all of the functionality available through the PySpark interface.

In [2]:
df = spark.sql("""SELECT policy
                       , make
                       , CASE
                           WHEN inception_date=start_date THEN 'New Business'
                           ELSE 'Renewal'
                         END AS status
                    FROM parquet.`./data/policy.parquet`
                      ORDER BY policy, start_date""")
df.show()

+-------+-------+------------+
| policy|   make|      status|
+-------+-------+------------+
|CAR0001| TOYOTA|New Business|
|CAR0001| TOYOTA|     Renewal|
|CAR0001| TOYOTA|     Renewal|
|CAR0002| SUBARU|New Business|
|CAR0003|   FORD|New Business|
|CAR0003|   FORD|     Renewal|
|CAR0003|   FORD|     Renewal|
|CAR0004|  MAZDA|New Business|
|CAR0004|  MAZDA|New Business|
|CAR0005| HOLDEN|New Business|
|CAR0006| SUZUKI|New Business|
|CAR0007|    BMW|New Business|
|CAR0008|   AUDI|New Business|
|CAR0009|  TESLA|New Business|
|CAR0009|  TESLA|     Renewal|
|CAR0010|HYUNDAI|New Business|
+-------+-------+------------+



It helps to save resources if you `stop()` the Spark session when you are finished.  Note that by doing this you will be unable to re-run any of the code above without first re-creating the `spark` variable.

In [3]:
spark.stop()