# Hello PySprak

## Overview

In this section we will submit our first Python based script to be executed with Spark. You can find
more examples on the official Apache Spark website. In particular the <a href="https://spark.apache.org/docs/latest/api/python/getting_started/quickstart_df.html">Quickstart: DataFrame</a> covers many
introductory topics.

## Example

Spark applications start with initializing ```SparkSession``` which is the entry point of Spark. When
executing spark via the REPL we are provided with a ```SparkSession``` named ```spark```.
When creating scripts to be submitted to Spark, the script needs to create this variable.
Given that creating multiple ```SparkSession```  and ```SparkContext``` instances can cause issues, so it's best practice to use the ```SparkSession.builder.getOrCreate()``` method as it is done in the script below. 
This returns an existing ```SparkSession``` if there's already one in the environment, or creates a new one if needed.

```
from pyspark.sql import SparkSession


APP_NAME="Hello PySpark"

if __name__ == '__main__':

    # we need a SparkSession object n order
    # to be able to use the Spark API

    spark = SparkSession.builder.appName(APP_NAME).getOrCreate()

    print(f"Running spark app {APP_NAME}")
    spark.stop()
```

Submit your application to Spark by using the ```spark-submit``` script as shown beloww

```
<YOUR-PATH-TO-SPARK>/bin/spark-submit hello_pyspark.py
```

Spark's default output can be quite verbose, we can however configure this aspect. Isolating the relevant output, you should be
able to see something similar to what is shown below.

```
...

Running spark app Hello PySpark

...

```

## Summary

## References

1. Jules S. Damji, Brooke Wenig, Tathagata Das, Deny Lee, _Learning Spark. Lighting-fasts data analytics_, 2nd Edition, O'Reilly.