# PySpark Setup in Google Colab
This notebook provides a step-by-step guide to set up Spark in Google Colab.

## Launch Colab
Open Google Colab using the following link:
[Google Colab](https://colab.research.google.com/)

## Setup Spark
Install Java, Spark, and Findspark, and configure environment variables.

In [None]:
!apt-get update

In [None]:
# Install Java, Spark, and Findspark in one go
!apt-get install openjdk-11-jdk-headless -qq > /dev/null
!wget -q https://archive.apache.org/dist/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3.tgz
!tar xf spark-3.5.1-bin-hadoop3.tgz
!pip install -q findspark

# Environment setup and SparkSession start
import os, findspark
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-11-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-3.5.1-bin-hadoop3"
findspark.init()

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Colab-PySpark").getOrCreate()

## Test Spark
Run the following code to confirm that Spark is working properly.

In [None]:
spark

If error, run the following steps to cleanup and retry installation

In [None]:
# Cleanup old Spark + leftovers
!rm -rf /content/spark-3.5.1-bin-hadoop3*
!rm -rf /content/sample_data 2>/dev/null

# Optional: cleanup pip cache
!pip cache purge -q
