# 🚜 **Data Collection**


### **NOTE**:

**Spotify has [updated](https://developer.spotify.com/blog/2024-11-27-changes-to-the-web-api) its Web API policy and no longer allows for the extracting of `Audio Features`.**

In this notebook, I used the requests library to collect data from the Spotify API and saved the individual raw JSON responses to files in the `data/raw/` directory. 

**Key content**:
* `get_token()`: fetches the API token
* `process_playlist(playlist_name, playlist_id, token)`: save `tracks` and `features` raw JSON files

In [1]:
import logging
from tqdm import tqdm
from auth import get_token

In [2]:
# Run this cell when new functions have been added to my_functions_01.py
import importlib
import my_functions_01
importlib.reload(my_functions_01)

<module 'my_functions_01' from '/Users/shl/Documents/lse/ds105/w10-summative-so-hl/code/my_functions_01.py'>

### 💡Understanding the **Client Credentials** Flow

The Client Credentials Flow is used when a client (e.g. Spotify's API) needs to authenticate itself to access its own resources. The main steps are:

1. **Client sends a request to the authorisation server**: The request includes the client's `client_id` and `client_secret` provided by Spotify when creating the app.
2. **Authorisation server validates the client**: Spotify's authorisation server checks the provided credentials, and if valid, issues an access token.
3. **Client uses the access token**: The client can now use the access token to request resources, such as retrieving playlists, albums, or audio features from Spotify's API.

![Client Credentials Flow](../assets/ccf.png)
*Source: [Spotify for Developers](https://developer.spotify.com/documentation/web-api/tutorials/client-credentials-flow)*


## **01 Authenticate and Get Access Token**  
Using the requests library, I sent a `POST` request to Spotify's token endpoint.

In [3]:
# Get token
token = get_token()

## **02 Collect Raw Data**  
I fetched data of playlists with **top songs globally**, in the **UK**, **US** and **Singapore**.

**Playlists used:**  

**Global**: Top 50-Global, Viral 50-Global, Today's Top Hits  

**UK**: Top 50-UK, Viral 50-UK, Hot Hits UK  

**USA**: Top 50-USA, Viral 50-USA, Top Songs-USA  

**Singapore**: Top 50-Singapore, Viral 50-Singapore, Top Songs-Singapore

In [4]:
# Define the playlist ID
playlist_ids = {
    "Global": {
        "Top50-Global": "37i9dQZEVXbMDoHDwVN2tF",
        "Viral50-Global": "37i9dQZEVXbLiRSasKsNU9",
        "Today's_Top_Hits": "37i9dQZF1DXcBWIGoYBM5M"
    },
    "UK": {
        "Top50-UK": "37i9dQZEVXbLnolsZ8PSNw",
        "Viral50-UK": "37i9dQZEVXbL3DLHfQeDmV",
        "Hot_Hits_UK": "37i9dQZF1DWY4lFlS4Pnso"
    }, 
    "USA": {
        "Top50-USA": "37i9dQZEVXbLRQDuF5jeBp",
        "Viral50-USA": "37i9dQZEVXbKuaTI1Z1Afx",
        "Top_Songs-USA": "37i9dQZEVXbLp5XoPON0wI"
    },
    "Singapore": {
        "Top50-Singapore": "37i9dQZEVXbK4gjvS1FjPY",
        "Viral50-Singapore": "37i9dQZEVXbJVi45MafAu0",
        "Top_Songs-Singapore": "37i9dQZEVXbN66FupT0MuX"
    }
}

In [5]:
# Save raw JSON files
for region, playlists in tqdm(playlist_ids.items(), desc=f"Downloading albums"):
    for playlist_name, playlist_id in playlists.items():
        logging.info(f"Processing playlist: {playlist_name} ({region})")
        my_functions_01.process_playlist(playlist_name, playlist_id, token)

Downloading albums:   0%|          | 0/4 [00:00<?, ?it/s]

Playlist data collected and saved to ../data/raw/Top50-Global_tracks.json at 2024-11-27 10:17:11.274625.
Playlist data collected and saved to ../data/raw/Viral50-Global_tracks.json at 2024-11-27 10:17:11.959730.


Downloading albums:  25%|██▌       | 1/4 [00:01<00:05,  1.99s/it]

Playlist data collected and saved to ../data/raw/Today's_Top_Hits_tracks.json at 2024-11-27 10:17:12.604731.
Playlist data collected and saved to ../data/raw/Top50-United_Kingdom_tracks.json at 2024-11-27 10:17:13.155032.
Playlist data collected and saved to ../data/raw/Viral50-United_Kingdom_tracks.json at 2024-11-27 10:17:13.769491.
Playlist data collected and saved to ../data/raw/Hot_Hits_UK_tracks.json at 2024-11-27 10:17:14.379364.


Downloading albums:  50%|█████     | 2/4 [00:03<00:03,  1.92s/it]

Playlist data collected and saved to ../data/raw/Top50-USA_tracks.json at 2024-11-27 10:17:15.061672.
Playlist data collected and saved to ../data/raw/Viral50-USA_tracks.json at 2024-11-27 10:17:15.601774.
Playlist data collected and saved to ../data/raw/Top_Songs-USA_tracks.json at 2024-11-27 10:17:16.111979.


Downloading albums:  75%|███████▌  | 3/4 [00:05<00:01,  1.80s/it]

Playlist data collected and saved to ../data/raw/Top50-Singapore_tracks.json at 2024-11-27 10:17:16.703075.
Playlist data collected and saved to ../data/raw/Viral50-Singapore_tracks.json at 2024-11-27 10:17:17.310503.
Playlist data collected and saved to ../data/raw/Top_Songs-Singapore_tracks.json at 2024-11-27 10:17:17.834681.


Downloading albums: 100%|██████████| 4/4 [00:07<00:00,  1.88s/it]
