# 01_Spark_Setup_Docker.ipynb

## 🚀 Apache Spark Setup using Docker
This notebook provides step-by-step instructions to set up and run **Apache Spark 2.x** using Docker. It uses the official **Jupyter All-Spark Notebook** image, which comes preinstalled with Spark, Hadoop, Python, and Jupyter.

---

### 🧰 Prerequisites
- Docker must already be installed and running.
- Check your Docker version:
```bash
docker --version
```
- Ensure internet access to pull the image.
---

## 🐳 Step 1: Pull the Spark Notebook Docker Image
Pull the official Jupyter All-Spark-Notebook image (tag `95f855f8e55f`) from Docker Hub.
```bash
docker pull jupyter/all-spark-notebook:95f855f8e55f
```
This image includes:
- Apache Spark 2.x
- Python 3 with PySpark
- Jupyter Notebook environment


## ⚙️ Step 2: Run the Spark Container
Run the container and expose Jupyter on port 8888.
```bash
docker run -p 8888:8888 --name spark jupyter/all-spark-notebook:95f855f8e55f
```
### Explanation
- `-p 8888:8888` → Maps container port to host port.
- `--name spark` → Assigns the container name.
- Image tag → Specifies the Spark version image.

After running, you’ll see an output similar to:
```
http://127.0.0.1:8888/?token=<unique_token>
```
Open this link in a browser to access Jupyter Notebook.

## 📊 Step 3: Verify Spark Installation
After accessing Jupyter, open a new notebook and verify Spark setup:
```python
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('SparkSetupTest').getOrCreate()
print(spark.version)
spark.stop()
```
If setup is successful, Spark version 2.x.x should be displayed.

## 🧹 Step 4: Stop and Remove Container (Optional)
To stop and remove the container:
```bash
docker stop spark
docker rm spark
```
---
✅ You now have an Apache Spark 2.x setup running inside Docker with Jupyter Notebook support.