# Data Download and Extraction

This script downloads a ZIP file from Google Drive, stores it in a structured directory, and extracts its contents into a specific location.

## Directory Structure

```
project_root/
│── data/
│   │── download/data.zip   # Contains the downloaded ZIP file
│   │── csv_files/          # Contains extracted CSV files
│── setup_project.ipynb     # Jupyter Notebook for downloading and extracting data
```

In [6]:
import os
import gdown
import zipfile

# Define directories
data_dir = "data"
download_dir = os.path.join(data_dir, "download")
csv_dir = os.path.join(data_dir, "csv_files")


os.makedirs(download_dir, exist_ok=True)
os.makedirs(csv_dir, exist_ok=True)

zip_path = os.path.join(download_dir, "data.zip")

# Download ZIP file from Google Drive
url = f"https://drive.google.com/uc?id=1H_T6Z74iMs0_VXMSD7wwcQ_ctB5kalNg"
gdown.download(url, zip_path, quiet=False)

# Extract ZIP file into csv_files directory
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(csv_dir)

print("Download and extraction complete.")

Downloading...
From (original): https://drive.google.com/uc?id=1H_T6Z74iMs0_VXMSD7wwcQ_ctB5kalNg
From (redirected): https://drive.google.com/uc?id=1H_T6Z74iMs0_VXMSD7wwcQ_ctB5kalNg&confirm=t&uuid=a6ab51bd-1e2c-4033-9e01-fa1ed2e501b6
To: /workspaces/3_streamlit/3a-spotify-data/data/download/data.zip
100%|██████████| 287M/287M [00:02<00:00, 133MB/s]  


Download and extraction complete.


In [7]:
import pandas as pd

df_albums = pd.read_csv('./data/csv_files/SpotGenTrack/Data Sources/spotify_albums.csv')
df_tracks = pd.read_csv('./data/csv_files/SpotGenTrack/Data Sources/spotify_tracks.csv')
df_artists = pd.read_csv('./data/csv_files/SpotGenTrack/Data Sources/spotify_artists.csv')