# Data Download and Extraction

This script downloads a ZIP file from Google Drive, stores it in a structured directory, and extracts its contents into a specific location.

## Directory Structure

```
project_root/
│── data/
│   │── download/data.zip   # Contains the downloaded ZIP file
│   │── csv_files/          # Contains extracted CSV files
│── setup_project.ipynb     # Jupyter Notebook for downloading and extracting data
```

In [None]:
import os
import gdown
import zipfile

# Define directories
data_dir = "data"
download_dir = os.path.join(data_dir, "download")
csv_dir = os.path.join(data_dir, "csv_files")


os.makedirs(download_dir, exist_ok=True)
os.makedirs(csv_dir, exist_ok=True)

zip_path = os.path.join(download_dir, "data.zip")

# Download ZIP file from Google Drive
url = f"https://drive.google.com/uc?id=1H_T6Z74iMs0_VXMSD7wwcQ_ctB5kalNg"
gdown.download(url, zip_path, quiet=False)

# Extract ZIP file into csv_files directory
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(csv_dir)

print("Download and extraction complete.")

In [None]:
import pandas as pd

df_albums = pd.read_csv('./data/csv_files/SpotGenTrack/Data Sources/spotify_albums.csv')
df_tracks = pd.read_csv('./data/csv_files/SpotGenTrack/Data Sources/spotify_tracks.csv')
df_artists = pd.read_csv('./data/csv_files/SpotGenTrack/Data Sources/spotify_artists.csv')

## 🎲 Random Track Recommender using scikit-learn DummyClassifier

create a simple random track recommender using scikit-learn’s `DummyClassifier`. 
The model is trained on track names from `df_tracks['name']`, and provides random recommendations regardless of the input.

### 🔹 Step 1: Import and Train the Model

In [None]:
import pickle
from sklearn.dummy import DummyClassifier

# Use track names as both input and target (since it's random anyway)
X = df_tracks[['name']]  
y = df_tracks['name']    

# Create a DummyClassifier that selects output uniformly at random
model = DummyClassifier(strategy='uniform', random_state=42)
model.fit(X, y)

The model ignores the input and randomly selects one of the known tracks from the training set. This simulates a random discovery mechanism.

In [None]:
# Step 1: Search for a track name
search_term = "highway"

# Step 2: Find all matches
matches = df_tracks[df_tracks['name'].str.contains(search_term, case=False, na=False)]
matches[['id', 'name', 'artists_id']]

In [None]:
input_track = [[matches['name'].iloc[0]]]
prediction = model.predict(input_track)

print(f"\nRandom recommendation based on '{input_track[0][0]}' -> {prediction[0]}")

# Store model as file

In [None]:
import pickle

pickle.dump(model, open("dummy_model.pkl", "wb"))

### Load the pickle file

In [None]:
# Load the model
loaded_model = pickle.load(open("dummy_model.pkl", "rb"))
# Use the loaded model to make predictions
loaded_prediction = loaded_model.predict(input_track)
print(f"\nLoaded model prediction based on '{input_track[0][0]}' -> {loaded_prediction[0]}")