## ✅ Pre-Requisites

### 1. Install Java (JDK)

- 🔗 Download:  
  [https://www.oracle.com/nz/java/technologies/downloads/#java17-windows](https://www.oracle.com/nz/java/technologies/downloads/#java17-windows)

- ⚙️ Set `JAVA_HOME` (Environment Variable):

  #### Step 1: Set `JAVA_HOME`

  - Press `Windows + S` → Search for **"Environment Variables"**
  - Click **"Edit the system environment variables"**
  - In the **System Properties** window → Click **"Environment Variables…"**
  - Under **System Variables** → Click **New…**
    - **Variable name**: `JAVA_HOME`  
    - **Variable value**:
      ```
      C:\Program Files\Java\jdk-17
      ```

  #### Step 2: Add `JAVA_HOME\bin` to `Path`

  - Still under **System Variables**, find **Path** → Click **Edit…**
  - Click **New**, and add:
    ```
    %JAVA_HOME%\bin
    ```

---

### 2. Install Apache Spark

- 🔗 Download:  
  [https://spark.apache.org/downloads.html](https://spark.apache.org/downloads.html)

- 📦 Extract the `.tgz` file to a folder like: C:\spark

- ⚙️ Set `SPARK_HOME` (Environment Variable):

#### Repeat the steps for `SPARK_HOME`:

- **Variable name**: `SPARK_HOME`  
- **Variable value**:
  ```
  C:\spark\spark-4.0.0-bin-hadoop3
  ```

#### Add `%SPARK_HOME%\bin` to `Path`:

- Still under **System Variables**, find **Path** → Click **Edit…**
- Click **New**, and add:
  ```
  %SPARK_HOME%\bin
  ```


In [1]:
!pip install pyspark==4.0.0



In [2]:
import pyspark
import pandas as pd

In [3]:
df = pd.read_csv('spotify-data.csv')
df.head()

Unnamed: 0,id,name,artists,duration_ms,release_date,year,acousticness,danceability,energy,instrumentalness,liveness,loudness,speechiness,tempo,valence,mode,key,popularity,explicit
0,6KbQ3uYMLKb5jDxLF7wYDD,Singende Bataillone 1. Teil,['Carl Woitschach'],158648,1928,1928,0.995,0.708,0.195,0.563,0.151,-12.428,0.0506,118.469,0.779,1,10,0,0
1,6KuQTIu1KoTTkLXKrwlLPV,"Fantasiestücke, Op. 111: Più tosto lento","['Robert Schumann', 'Vladimir Horowitz']",282133,1928,1928,0.994,0.379,0.0135,0.901,0.0763,-28.454,0.0462,83.972,0.0767,1,8,0,0
2,6L63VW0PibdM1HDSBoqnoM,Chapter 1.18 - Zamek kaniowski,['Seweryn Goszczyński'],104300,1928,1928,0.604,0.749,0.22,0.0,0.119,-19.924,0.929,107.177,0.88,0,5,0,0
3,6M94FkXd15sOAOQYRnWPN8,Bebamos Juntos - Instrumental (Remasterizado),['Francisco Canaro'],180760,9/25/28,1928,0.995,0.781,0.13,0.887,0.111,-14.734,0.0926,108.003,0.72,0,1,0,0
4,6N6tiFZ9vLTSOIxkj8qKrd,"Polonaise-Fantaisie in A-Flat Major, Op. 61","['Frédéric Chopin', 'Vladimir Horowitz']",687733,1928,1928,0.99,0.21,0.204,0.908,0.098,-16.829,0.0424,62.149,0.0693,1,11,1,0


In [4]:
from pyspark.sql import SparkSession

In [5]:
spark = SparkSession.builder.appName('my-learning-klp-1').getOrCreate()

In [6]:
spark

In [7]:
spark_df = spark.read.csv('spotify-data.csv')

In [9]:
spark_df

DataFrame[_c0: string, _c1: string, _c2: string, _c3: string, _c4: string, _c5: string, _c6: string, _c7: string, _c8: string, _c9: string, _c10: string, _c11: string, _c12: string, _c13: string, _c14: string, _c15: string, _c16: string, _c17: string, _c18: string]

In [10]:
spark_df.show()

+--------------------+--------------------+--------------------+-----------+------------+----+------------+------------+-------+----------------+--------+--------+-----------+-------+-------+----+----+----------+--------+
|                 _c0|                 _c1|                 _c2|        _c3|         _c4| _c5|         _c6|         _c7|    _c8|             _c9|    _c10|    _c11|       _c12|   _c13|   _c14|_c15|_c16|      _c17|    _c18|
+--------------------+--------------------+--------------------+-----------+------------+----+------------+------------+-------+----------------+--------+--------+-----------+-------+-------+----+----+----------+--------+
|                  id|                name|             artists|duration_ms|release_date|year|acousticness|danceability| energy|instrumentalness|liveness|loudness|speechiness|  tempo|valence|mode| key|popularity|explicit|
|6KbQ3uYMLKb5jDxLF...|Singende Bataillo...| ['Carl Woitschach']|     158648|        1928|1928|       0.995|     