# Lab | Extending the internal databases with audio features

At this point, you have the **hot_songs** and the **not_hot_songs** databases. However, you don't have any acoustic information about the songs. 
The purpose of this lab is to use Spotify's API to extend both databases with this information to use it later.

# Instructions

* Create a function to search a given **single** song in the Spotify API: **search_song(title, artist, limit)**. 

Later, you can will use this function in the song recommender to get the audio features of each song in the database (considering only the first match, even though it might not be the best match because your time is limited and you can spend time determining the best match for each song). Keep in mind, that a given song might not be available on Spotify's API (make sure to use the song's title and artist searching the song). If the song is not found, the function must return an empty string as the href/id/uri. Also, in this case, you should remove this song from the database. You should consider using a try: except: clause like:

```python
def search_song(title, artist, limit):
  ...

list_of_ids = []
try:
  id = search_song(title, artist, limit)
  list_of_ids.append(id)
except:
  print("Song not found!")
  list_of_ids.append("")

df["id"] = list_of_ids

# Code to remove songs without IDs from the databases.
```

On the other hand, you can also use this function in the song recommender to search for the **user's song ID** in the Spotify API. However, this time you want to make sure that you get the right match. Therefore, you would like to create dataframe with a list of five matches, present them to the user, and let him select the right one like:

|   | Title | Artist |
|---|--------|-------|
| 0 | Giorgia on My Mind | Carmichaels |
| 1 | Giorgia on My Mind | Ray Charles |

Once the desired song is located, **the function should return the href/id/uri of the song to the code** (not to the user) to get the audio features.

* Create a function **get_audio_features(list_of_song_ids)** to obtain the audio features of a given list of songs (the content of list_of_songs can be the href/id/uri or a list with a single song IDs). 

Be careful to not exceed the number of calls to the API otherwise, you will be banned and you will have to wait several hours before launching a new request [see here](https://developer.spotify.com/documentation/web-api/guides/rate-limits/).

A good strategy to prevent this problem is to split the list of song IDs into "chunks" of 50 song IDs and wait 20 seconds before asking for the audio features of the next "chunk" (for your own peace of mind add a "print("Collecting IDs for chunk...") message to show the progress). To create chunks of song IDs consider using [np.split](https://numpy.org/doc/stable/reference/generated/numpy.split.html)

Then, use this function to create a Pandas Dataframe with the audio features of all the songs in the databases. Hint: create a dictionary with the song's audio features as keys and an **empty list as values**. Then, fill in the lists with the corresponding audio features of each song. Finally, create a data frame with the audio features from the dictionary.

* Once the previous function has been created, create another function **add_audio_features(df, audio_features_df)** to concat a given dataframe with the audio features dataframe and return the extended data frame.

* Finally, replace the old internal files of songs (hot and not hot) with the extended data frames with the audio features and save them into separate files on the disk.

* Remember to store your functions inside a "functions.py" library in order to be used by your final song recommender.

In [1]:
import sys
from config import *

import numpy as np
import pandas as pd
import time

import spotipy
import json
from spotipy.oauth2 import SpotifyClientCredentials

#Initialize SpotiPy with user credentias #
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=Client_ID,
                                                           client_secret=Client_Secret))

In [2]:
#Loading the DataBase of Hot 100

hot100 = pd.read_csv('hot100.csv')

In [3]:
hot100

Unnamed: 0,artist,song
0,Jack Harlow,Lovin On Me
1,Taylor Swift,Cruel Summer
2,Tate McRae,Greedy
3,Doja Cat,Paint The Town Red
4,Zach Bryan Featuring Kacey Musgraves,I Remember Everything
...,...,...
95,Zach Bryan,Tourniquet
96,Junior H,Y Lloro
97,Sophie Ellis-Bextor,Murder On The Dancefloor
98,Karol G,Amargura


In [4]:
#Function giving a title and an artist an a limit 

def search_song(title, artist, limit=1):
    list_of_ids = []

    try:
        song_name = title
        artist_name = artist
        query = f" {song_name} song_name  {artist_name} artist_name "

        results = sp.search(q=query, limit=1)
        id_song = results['tracks']['items'][0]['uri']
        list_of_ids.append(id_song)

    except:
      print("Song not found!")
      list_of_ids.append("")

    return list_of_ids

# Code to remove songs without IDs from the databases.

In [5]:
# Creating Chunks of size = 50

chunk_size = 50
num_chunks = int(np.ceil(len(hot100) / chunk_size))

# Create multiple lists of DataFrames
list_of_chunks = []

for i in range(num_chunks):
    start_idx = i * chunk_size
    end_idx = (i + 1) * chunk_size
    chunk_df = hot100.iloc[start_idx:end_idx]
    list_of_chunks.append(chunk_df)


In [6]:
#For Loop for passing through the function to each row of each chunk, and storing that info into a new df

hot100_list = []
hot100_df = []

for i, chunk in enumerate(list_of_chunks, 1):
    chunk['id'] = chunk.apply(lambda row: search_song(row['song'], row['artist'])[0], axis=1)
    time.sleep(35)
    print(f"Chunk {i}:\n{chunk}\n")
    hot100_list.append(chunk)
hot100_df = pd.concat(hot100_list, ignore_index=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  chunk['id'] = chunk.apply(lambda row: search_song(row['song'], row['artist'])[0], axis=1)


Chunk 1:
                                  artist  \
0                            Jack Harlow   
1                           Taylor Swift   
2                             Tate McRae   
3                               Doja Cat   
4   Zach Bryan Featuring Kacey Musgraves   
5                                    SZA   
6                                   Tyla   
7                          Morgan Wallen   
8                             Luke Combs   
9                               Doja Cat   
10                         Morgan Wallen   
11                           Teddy Swims   
12                          Taylor Swift   
13                          Paul Russell   
14                           Miley Cyrus   
15       Drake Featuring Sexyy Red & SZA   
16                           Nicki Minaj   
17                       Chris Stapleton   
18                            Noah Kahan   
19                         Billie Eilish   
20                        Olivia Rodrigo   
21                     

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  chunk['id'] = chunk.apply(lambda row: search_song(row['song'], row['artist'])[0], axis=1)


Chunk 2:
                                     artist  \
50                     Drake Featuring Yeat   
51               Fuerza Regida & Marshmello   
52                          Parker McCollum   
53                      Peso Pluma & Anitta   
54                                     Xavi   
55                        Sabrina Carpenter   
56                                Lil Tecca   
57                     Karol G & Peso Pluma   
58                               Kane Brown   
59  Zach Bryan Featuring The War And Treaty   
60                             Taylor Swift   
61                             George Birge   
62                           Olivia Rodrigo   
63         Riley Green Featuring Luke Combs   
64                      Maluma & Carin Leon   
65      Morgan Wallen Featuring Eric Church   
66             Grupo Frontera & Grupo Firme   
67                                 310babii   
68      The Weeknd, JENNIE & Lily Rose Depp   
69                                Jung Kook   
70  

In [9]:
# Function to retrieve audio features for a list of track IDs
def get_audio_features(track_ids):
    try:
        audio_features = sp.audio_features(track_ids)
        return audio_features
    except Exception as e:
        print(f"Error fetching audio features: {e}")
        return []


In [10]:
# Split hot100_df into chunks of 50 rows
chunk_size = 50
hot100_chunks = np.array_split(hot100_df, np.ceil(len(hot100_df) / chunk_size))


In [16]:
# List to store audio features DataFrames
audio_features_list = []

# Iterate through chunks and run get_audio_features
for i, chunk in enumerate(hot100_chunks, 1):
    track_ids = chunk['id'].tolist()
    
    try:
        audio_features = get_audio_features(track_ids)
        
        # Check if audio_features is not empty
        if audio_features:
            audio_features_df = pd.DataFrame(audio_features)
            audio_features_list.append(audio_features_df)
        else:
            print(f"No audio features retrieved for chunk {i}.")
            
    except Exception as e:
        print(f"Error fetching audio features for chunk {i}: {e}")
    
    time.sleep(30)


In [17]:
# Concatenate the list of audio features DataFrames into a new DataFrame
if audio_features_list:
    audio_features_df = pd.concat(audio_features_list, ignore_index=True)
    print("Audio features retrieval completed.")
else:
    print("No audio features data to concatenate.")

Audio features retrieval completed.


In [19]:
#Concatenating list of songs + audio features

hot100_total_df = pd.concat([hot100_df, audio_features_df], axis=1)

In [20]:
hot100_total_df

Unnamed: 0,artist,song,id,danceability,energy,key,loudness,mode,speechiness,acousticness,...,liveness,valence,tempo,type,id.1,uri,track_href,analysis_url,duration_ms,time_signature
0,Jack Harlow,Lovin On Me,spotify:track:4xhsWYTOGcal8zt0J161CU,0.943,0.558,2,-4.911,1,0.0568,0.0026,...,0.0937,0.606,104.983,audio_features,4xhsWYTOGcal8zt0J161CU,spotify:track:4xhsWYTOGcal8zt0J161CU,https://api.spotify.com/v1/tracks/4xhsWYTOGcal...,https://api.spotify.com/v1/audio-analysis/4xhs...,138411,4
1,Taylor Swift,Cruel Summer,spotify:track:1BxfuPKGuaTgP7aM0Bbdwr,0.552,0.702,9,-5.707,1,0.1570,0.1170,...,0.1050,0.564,169.994,audio_features,1BxfuPKGuaTgP7aM0Bbdwr,spotify:track:1BxfuPKGuaTgP7aM0Bbdwr,https://api.spotify.com/v1/tracks/1BxfuPKGuaTg...,https://api.spotify.com/v1/audio-analysis/1Bxf...,178427,4
2,Tate McRae,Greedy,spotify:track:3rUGC1vUpkDG9CZFHMur1t,0.750,0.733,6,-3.180,0,0.0319,0.2560,...,0.1140,0.844,111.018,audio_features,3rUGC1vUpkDG9CZFHMur1t,spotify:track:3rUGC1vUpkDG9CZFHMur1t,https://api.spotify.com/v1/tracks/3rUGC1vUpkDG...,https://api.spotify.com/v1/audio-analysis/3rUG...,131872,1
3,Doja Cat,Paint The Town Red,spotify:track:2IGMVunIBsBLtEQyoI1Mu7,0.868,0.538,5,-8.603,1,0.1740,0.2690,...,0.0901,0.732,99.968,audio_features,2IGMVunIBsBLtEQyoI1Mu7,spotify:track:2IGMVunIBsBLtEQyoI1Mu7,https://api.spotify.com/v1/tracks/2IGMVunIBsBL...,https://api.spotify.com/v1/audio-analysis/2IGM...,231750,4
4,Zach Bryan Featuring Kacey Musgraves,I Remember Everything,spotify:track:4KULAymBBJcPRpk1yO4dOG,0.429,0.453,0,-7.746,1,0.0459,0.5540,...,0.1020,0.155,77.639,audio_features,4KULAymBBJcPRpk1yO4dOG,spotify:track:4KULAymBBJcPRpk1yO4dOG,https://api.spotify.com/v1/tracks/4KULAymBBJcP...,https://api.spotify.com/v1/audio-analysis/4KUL...,227196,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,Zach Bryan,Tourniquet,spotify:track:3EvZ03hGAFwGZ2Ebcu86YH,0.593,0.397,6,-8.309,1,0.0329,0.6840,...,0.0982,0.320,76.703,audio_features,3EvZ03hGAFwGZ2Ebcu86YH,spotify:track:3EvZ03hGAFwGZ2Ebcu86YH,https://api.spotify.com/v1/tracks/3EvZ03hGAFwG...,https://api.spotify.com/v1/audio-analysis/3EvZ...,189053,4
96,Junior H,Y Lloro,spotify:track:6RcAHyC5sAUIbPTkhOQwd8,0.728,0.589,7,-7.115,1,0.0376,0.4240,...,0.1660,0.767,77.475,audio_features,6RcAHyC5sAUIbPTkhOQwd8,spotify:track:6RcAHyC5sAUIbPTkhOQwd8,https://api.spotify.com/v1/tracks/6RcAHyC5sAUI...,https://api.spotify.com/v1/audio-analysis/6RcA...,179013,4
97,Sophie Ellis-Bextor,Murder On The Dancefloor,spotify:track:6yfU5QHRcUD4TG4P6tFXRS,0.577,0.526,4,-8.312,1,0.0477,0.8080,...,0.0829,0.445,117.863,audio_features,6yfU5QHRcUD4TG4P6tFXRS,spotify:track:6yfU5QHRcUD4TG4P6tFXRS,https://api.spotify.com/v1/tracks/6yfU5QHRcUD4...,https://api.spotify.com/v1/audio-analysis/6yfU...,226227,4
98,Karol G,Amargura,spotify:track:505v13epFXodT9fVAJ6h8k,0.920,0.696,6,-3.356,0,0.0742,0.1830,...,0.1490,0.545,106.966,audio_features,505v13epFXodT9fVAJ6h8k,spotify:track:505v13epFXodT9fVAJ6h8k,https://api.spotify.com/v1/tracks/505v13epFXod...,https://api.spotify.com/v1/audio-analysis/505v...,170480,4


In [22]:
hot100_total_df['H_or_N'] = 'H'

In [23]:
hot100_total_df.to_csv('hot100_full.csv', index=False)