# 🎧 PySpark Project: Spotify Dataset Practice Notebook
This notebook contains 20 hands-on PySpark questions using the Spotify dataset.
---

In [None]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *

spark = SparkSession.builder \
    .appName("SpotifyDataAnalysis") \
    .master("local[*]") \
    .getOrCreate()

df = spark.read.csv("spotify-data.csv", header=True, inferSchema=True)
df.printSchema()
df.show(5)

## 🔍 Basic Exploration

### 1. Display the schema and find how many columns are present.

In [None]:
print("Column Count:", len(df.columns))
df.printSchema()

### 2. Show the first 10 rows sorted by popularity.

In [None]:
df.orderBy(desc("popularity")).show(10)

### 3. What are the distinct genres in the dataset?

In [None]:
df.select("genre").distinct().show()

### 4. Which track has the highest danceability?

In [None]:
df.orderBy(desc("danceability")).select("track_name", "artist_name", "danceability").show(1)

---
## 🧼 Data Cleaning & Transformation
_(Questions 5 to 7...)_
---
📌 **This notebook includes the first few questions with code. Let me know if you'd like all 20 pre-filled.**