
# EC2 Spark + Jupyter Setup (Amazon Linux 2023)

This notebook provides a detailed guide to setting up Docker and running a Jupyter Spark container on an EC2 instance with Amazon Linux 2023.



## Step 1: Launch EC2 Instance

1. Go to **AWS Console → EC2 → Launch Instance**.
2. Choose **Amazon Linux 2023** as the AMI.
3. Select instance type (recommended: `t2.large` or higher).
4. Under **Advanced Details → User Data**, paste the script below:


In [None]:
#!/bin/bash
yum update -y

# Install Docker
if grep -qi "amazon linux 2" /etc/system-release; then
  amazon-linux-extras enable docker
fi
yum install -y docker

# Start Docker
systemctl enable docker
systemctl start docker

# Add ec2-user to docker group
usermod -aG docker ec2-user

# Pull and run Jupyter Spark container
docker run -d -p 8888:8888 --name spark jupyter/all-spark-notebook:95f855f8e55f



## Step 2: Configure Security Group

Add an inbound rule in the EC2 security group:
- Type: Custom TCP
- Port: 8888
- Source: 0.0.0.0/0 (or your IP address)



## Step 3: Access Jupyter Notebook

1. Launch instance and wait 2 minutes.
2. SSH into instance and verify Docker:
```bash
sudo docker ps
```
3. Get the access URL:
```bash
cat /var/log/jupyter-url.txt
```
4. Get the password
```bash
sudo docker logs spark | grep token
```


## Step 4: Verify Spark Setup

Once inside Jupyter, run the code below to verify Spark installation.


In [None]:

import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('EC2Test').getOrCreate()
print("Spark version:", spark.version)



If you see the Spark version printed, your setup is complete!
