# Starting a Spark Session in Kubernetes
This notebook shows you how to create a Spark Session in client mode in Kubernetes

## Jupyterhub configurations
The Jupyterhub must be configured in a specific way laid out in [spark-k8s-jupterhub](https://github.com/akoshel/spark-k8s-jupyterhub)

Multiple Kubernetes resources must be created including
1. A [ServiceAccount](https://github.com/frenoid/hobby-cluster/blob/master/jupyterhub/templates/custom/sa-singleuser.yaml)
2. A [Role](https://github.com/frenoid/hobby-cluster/blob/master/jupyterhub/templates/custom/role-singleuser.yaml)
3. A [RoleBinding](https://github.com/frenoid/hobby-cluster/blob/master/jupyterhub/templates/custom/role-singleuser.yaml)
4. A [Service](https://github.com/frenoid/hobby-cluster/blob/master/jupyterhub/templates/custom/svc-driver.yaml) for the Spark driver to expose the driver, block manager and UI ports


The actual configurations are in a helm chart in the [hobby-cluster repo](https://github.com/frenoid/hobby-cluster/tree/master/jupyterhub/templates/custom)

## Spark image
The Spark image has been customized to ensure that the Python minor version and the Spark version matches that of the Jupyter notebook

The Dockerfile is viewable in [github.com](https://github.com/frenoid/spark-experiments/blob/master/docker/Dockerfile)

In [1]:
from pyspark import SparkConf, SparkContext

conf = (SparkConf().setMaster("k8s://https://kubernetes.default.svc:443") # Your master address name
        .set("spark.kubernetes.container.image", "docker.io/frenoid/spark:3.5.1-py311-v2") # Spark image name
        .set("spark.driver.port", "2222") # Needs to match svc
        .set("spark.driver.blockManager.port", "7777")
        .set("spark.driver.host", "driver-service.jupyterhub.svc.cluster.local") # Needs to match svc
        .set("spark.driver.bindAddress", "0.0.0.0")
        .set("spark.kubernetes.namespace", "jupyterhub")
        .set("spark.kubernetes.authenticate.driver.serviceAccountName", "jupyterhub-singleuser-sa")
        .set("spark.kubernetes.authenticate.serviceAccountName", "jupyterhub-singleuser-sa")
        .set("spark.executor.instances", "1")
        .set("spark.kubernetes.container.image.pullPolicy", "IfNotPresent")
       .set("spark.app.name", "Norman-App"))

sc = SparkContext(conf=conf)

In [2]:
sc

In [3]:
from pyspark.sql import SparkSession

spark = (SparkSession(sc)
         .builder
         .appName("DefaultSparkSession")
         .getOrCreate())

In [4]:
# Create a list of objects with mock data
mock_data = [
    ("John", 25, "123 Main Street"),
    ("Anna", 31, "456 Oak Avenue"),
    ("Peter", 37, "789 Pine Road"),
    ("Linda", 28, "321 Maple Lane"),
    ("Mike", 45, "654 Cedar Drive")
]

# Create DataFrame
df = spark.createDataFrame(mock_data, ["name", "age", "address"])

# Show the contents of the DataFrame
df.show(5)

+-----+---+---------------+
| name|age|        address|
+-----+---+---------------+
| John| 25|123 Main Street|
| Anna| 31| 456 Oak Avenue|
|Peter| 37|  789 Pine Road|
|Linda| 28| 321 Maple Lane|
| Mike| 45|654 Cedar Drive|
+-----+---+---------------+



In [5]:
spark.stop()