# Spark: Make sure your environment is configured properly

This notebook is just intended to check if you are able to access spark and use it though a jupyter notebook. So, you dont need to pay any attention to the code in it. Just go ahead and run all the cells and check to see if you are able to instanciate spark and use it.

* I am using Azure Notebooks for these notebooks, as I can access it from anywhere.
* Note that the persistance of the notebook instance is only 60 days from the time it goes inactive. 
* Make sure you don't store any state here. Make use of 
    - `AWS S3` 
    - `GCP GCS`
    - `Azure OS`
    
**Run in Cluster-Mode**

You can run all of these tutorials in a cluster mode by running the following command

```sh
docker-compose -f spark-cluster.yml up -d
```

And then log into `Jupyter Notebook` by going to `http://localhost:8888` and the spark master runs on `http://localhost:8080`. All the notebooks and other resources are mounted for you.

In [38]:
# Imports
import random
import pyspark
import os

## Configure spark context

* App name
* Spark Master location (Spark Master config is needed only if running in a cluster mode)

In [50]:
conf = pyspark.SparkConf().setAppName("Spark Test").setMaster(os.environ['SPARK_MASTER']) 

In [51]:
# Initialize spark context
sc = pyspark.SparkContext(conf=conf)
spark = pyspark.sql.SparkSession.builder \
                    .config(conf=conf) \
                    .getOrCreate()

sc
spark

In [52]:
# Creating a datafrom from a range of numbers
df = spark.range(10).toDF("numbers")

In [53]:
# Printing the schema
df.printSchema()

root
 |-- numbers: long (nullable = false)



In [54]:
# Displaying the dataframe
df.show()

+-------+
|numbers|
+-------+
|      0|
|      1|
|      2|
|      3|
|      4|
|      5|
|      6|
|      7|
|      8|
|      9|
+-------+



In [55]:
# Displaying tuples
df.take(2)

[Row(numbers=0), Row(numbers=1)]

In [56]:
# Counting the number of entries in the DF
df.count()

10

In [57]:
# Stopping spark context
sc.stop()
spark.stop()

`Info: If all of the above cells have executed sucessfully, please go ahead and start of with the book as mentioned in the readme.md`