# Extending the internal databases with audio features
## Instructions
1. Create a function to search a given single song in the Spotify API: `search_song(title, artist)`
2. Create a function `get_audio_features(list_of_songs)` to obtain the audio features of a given list of songs (the content of list_of_songs can be the href/id/uri).
3. Once the previous function has been created, create another function `add_audio_features(df, audio_features_df)` to concat a given dataframe with the dataframe containing the audio features alongside any other desired info, and return the extended data frame. Replace the old internal files of songs (hot and not hot) with the extended data frames with the audio features and save them into separate files on the disk.


## Create a function to search a given single song in the Spotify API: 

`search_song(title, artist)`

Take into account that sometimes Spotify's API will return several matches for the same song title (different artists, a different album of the same artist, version of the song,...etc). Then it will be nice to display a list of outputs to the user and let him/her select which is the right match. Once the desired song is located, the function should return the href/id/uri of the song to the code (not to the user). Keep in mind, that a given song might not be available on Spotify's API (make sure to use the song's title and artist searching the song). If the song is not found, the function must return an empty string as the href/id/uri. Also, in this case, you should remove this song from the database

In [94]:
from config import *
# 'congif' is the name of the file where we saved the credentials

In [95]:
# import libraries
import spotipy
import json
from spotipy.oauth2 import SpotifyClientCredentials
import pandas as pd
from time import sleep
from random import randint

In [96]:
#Initialize SpotiPy with user credentias # 
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=client_id,
                                                           client_secret=client_secret_id))


## Create a function to obtain the audio features of a given list of songs (the content of list_of_songs can be the href/id/uri)

`get_audio_features(list_of_songs)`

Be careful to not exceed the number of calls to the API otherwise, you will be banned and you will have to wait several hours before launching a new request see here. A good strategy to prevent this problem is to split the list of song id's in "chunks" of 50 songs id's and wait 20 seconds before asking for the audio features of the next "chunk". Then, use this function to create a Pandas Dataframe with the audio features of the list of songs. Hint: create a dictionary with the song's audio features as keys and an empty list as values. Then fill in the lists with the corresponding audio features of each song. Finally, create your data frame from the dictionary.

In [97]:
import pandas as pd

In [98]:
# use this to build the query
chunks = df.iloc[:100,0] # column for titles
#df.iloc[:100,1] # column for artists

df.iloc[100:200,0]
df.iloc[11:200,1]

# similar later for IDs

11                         Drake
12                        Toosii
13                    Luke Combs
14                        Miguel
15                     Coi Leray
                 ...            
95               Scotty McCreery
96                         Jimin
97                  Taylor Swift
98                    Luke Combs
99    YoungBoy Never Broke Again
Name: artists, Length: 89, dtype: object

## Concatenate dataframe and audio features

Once the previous function has been created, create another function `add_audio_features(df, audio_features_df)` to concat a given dataframe with the dataframe containing the audio features alongside any other desired info, and return the extended data frame. Replace the old internal files of songs (hot and not hot) with the extended data frames with the audio features and save them into separate files on the disk.

In [99]:
#dictionary of hot songs
df = pd.read_csv('hot_songs.csv')
dict_hot_songs = {}
dict_hot_songs = dict(zip(df["artists"], df["titles"]))
dict_hot_songs


{'Morgan Wallen': 'Sunrise',
 'SZA': 'Snooze',
 'Miley Cyrus': 'Flowers',
 'Eslabon Armado X Peso Pluma': 'Ella Baila Sola',
 'Grupo Frontera X Bad Bunny': 'Un x100to',
 'Rema & Selena Gomez': 'Calm Down',
 'Metro Boomin, The Weeknd & 21 Savage': "Creepin'",
 'The Weeknd & Ariana Grande': 'Die For You',
 'PinkPantheress & Ice Spice': "Boy's A Liar, Pt. 2",
 'Taylor Swift': 'All Of The Girls You Loved Before',
 'Yng Lvcas x Peso Pluma': 'La Bebe',
 'Drake': 'Search & Rescue',
 'Toosii': 'Favorite Song',
 'Luke Combs': '5 Leaf Clover',
 'Miguel': 'Sure Thing',
 'Coi Leray': 'Players',
 'Bailey Zimmerman': 'Rock And A Hard Place',
 'The Weeknd Featuring Future': 'Double Fantasy',
 'Harry Styles': 'As It Was',
 'Post Malone': 'Chemical',
 'David Guetta & Bebe Rexha': "I'm Good (Blue)",
 'Zach Bryan': 'Something In The Orange',
 'Kane Brown With Katelyn Brown': 'Thank God',
 'Peso Pluma': 'Por Las Noches',
 'Ice Spice & Nicki Minaj': 'Princess Diana',
 'RAYE Featuring 070 Shake': 'Escapism'

In [100]:
import re

def get_songs_ids(df):
    id = []

    for i in range(0, len(df), 100):
        # Get batch of songs
        chunck = df.iloc[i:i+100]

        # Iterate through songs in batch, getting Spotify IDs
        for index, row in chunck.iterrows():
            # Search for song on Spotify
            title = row["titles"]
            artist = row["artists"]
            
            first_artist = re.findall(r'^[^&xX]+', artist)[0].strip() # extract first artist name
            query = f'track:{title} artist:{first_artist}'
            
            try:
                results = sp.search(q=query, limit=1)
                song_id = results["tracks"]["items"][0]["id"]
                id.append(song_id)
                #print("ID found for {}: {}".format(first_artist, title))
            except:
                song_id = np.nan
                id.append(song_id)
                print("ID not found for {}: {}".format(first_artist, title))
            
        # Wait 25 seconds before getting the next batch of songs
        sleep(25)
        
    df['ids'] = id

In [101]:
get_songs_ids(df) # note: Spotify's API wont recognise artists duets with 'Featuring' in the artist field

ID not found for The Weeknd Featuring Future: Double Fantasy
ID not found for Kane Brown With Katelyn Brown: Thank God
ID not found for RAYE Featuring 070 Shake: Escapism
ID not found for Gabito Ballesteros, Peso Pluma: AMG
ID not found for HARDY Featuring Lainey Wilson: Wait In The Truck
ID not found for Tyler, The Creator Featuring Kali Uchis: See You Again
ID not found for Kim Petras Featuring Nicki Minaj: Alone
ID not found for Morgan Wallen Featuring ERNEST: Cowgirls
ID not found for Morgan Wallen Featuring Eric Church: Man Made A Bar
ID not found for Zach Bryan Featuring Maggie Rogers: Dawns
ID not found for Lil Baby Featuring Fridayy: Forever


In [102]:
df.head()

Unnamed: 0,titles,artists,ids
0,Last Night,Morgan Wallen,59uQI0PADDKeE6UZDTJEe8
1,Kill Bill,SZA,1Qrg8KqiBpW07V7PNxwwwL
2,Flowers,Miley Cyrus,0yLdNVWF3Srea0uzk55zFn
3,Ella Baila Sola,Eslabon Armado X Peso Pluma,3dnP0JxCgygwQH9Gm7q7nb
4,Un x100to,Grupo Frontera X Bad Bunny,6pD0ufEQq0xdHSsRbg9LBK


## Create a function get_audio_features(list_of_songs) to obtain the audio features of a given list of songs (the content of list_of_songs can be the href/id/uri).
Be careful to not exceed the number of calls to the API otherwise, you will be banned and you will have to wait several hours before launching a new request see here. A good strategy to prevent this problem is to split the list of song id's in "chunks" of 50 songs id's and wait 20 seconds before asking for the audio features of the next "chunk". Then, use this function to create a Pandas Dataframe with the audio features of the list of songs. Hint: create a dictionary with the song's audio features as keys and an empty list as values. Then fill in the lists with the corresponding audio features of each song. Finally, create your data frame from the dictionary.

Once the previous function has been created, create another function add_audio_features(df, audio_features_df) to concat a given dataframe with the dataframe containing the audio features alongside any other desired info, and return the extended data frame. Replace the old internal files of songs (hot and not hot) with the extended data frames with the audio features and save them into separate files on the disk.


In [103]:
def get_audio_features(df):
    # create empty dictionary
    audio_features_dict = {}

    for i in range(0, len(df), 100):
        # get batch of songs
        chunk = df.iloc[i:i+100]

        # create a list of ids for the songs in each chunk
        id_list = chunk['ids'].tolist()

        # get the audio features for the songs in this chunk
        audio_features_list = sp.audio_features(id_list)

        # iterate through the audio features and add them to the dictionary
        for features in audio_features_list:
            audio_features_dict[features['id']] = features

        # Wait 25 seconds before getting the next batch of songs
        sleep(25)

    # cast dictionary into dataframe
    features_df = pd.DataFrame.from_dict(audio_features_dict, orient='index')

    # reset the index
    features_df.reset_index(inplace=True)

    # rename the 'index' column to 'id'
    features_df.rename(columns={'index': 'ids'}, inplace=True)

    # Merge the original dataframe with the audio features dataframe
    features_df = pd.merge(df, features_df, on='ids', how='left')

    return features_df

In [104]:
#removing NaN values before running the second function
df1 = df.dropna()

In [106]:
#getting new dataframe with the audio features

hot_songs_feats = get_audio_features(df1) 

In [107]:
#removing the duplicated id column

hot_songs_feats = hot_songs_feats.drop(['id'], axis = 1)
hot_songs_feats.columns

Index(['titles', 'artists', 'ids', 'danceability', 'energy', 'key', 'loudness',
       'mode', 'speechiness', 'acousticness', 'instrumentalness', 'liveness',
       'valence', 'tempo', 'type', 'uri', 'track_href', 'analysis_url',
       'duration_ms', 'time_signature'],
      dtype='object')

In [108]:
hot_songs_feats.head() # ta-daa

Unnamed: 0,titles,artists,ids,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,uri,track_href,analysis_url,duration_ms,time_signature
0,Last Night,Morgan Wallen,59uQI0PADDKeE6UZDTJEe8,0.517,0.675,6,-5.382,1,0.0357,0.459,0.0,0.151,0.518,203.853,audio_features,spotify:track:59uQI0PADDKeE6UZDTJEe8,https://api.spotify.com/v1/tracks/59uQI0PADDKe...,https://api.spotify.com/v1/audio-analysis/59uQ...,163855,4
1,Kill Bill,SZA,1Qrg8KqiBpW07V7PNxwwwL,0.644,0.735,8,-5.747,1,0.0391,0.0521,0.144,0.161,0.418,88.98,audio_features,spotify:track:1Qrg8KqiBpW07V7PNxwwwL,https://api.spotify.com/v1/tracks/1Qrg8KqiBpW0...,https://api.spotify.com/v1/audio-analysis/1Qrg...,153947,4
2,Flowers,Miley Cyrus,0yLdNVWF3Srea0uzk55zFn,0.707,0.681,0,-4.325,1,0.0668,0.0632,5e-06,0.0322,0.646,117.999,audio_features,spotify:track:0yLdNVWF3Srea0uzk55zFn,https://api.spotify.com/v1/tracks/0yLdNVWF3Sre...,https://api.spotify.com/v1/audio-analysis/0yLd...,200455,4
3,Ella Baila Sola,Eslabon Armado X Peso Pluma,3dnP0JxCgygwQH9Gm7q7nb,0.668,0.758,5,-5.176,0,0.0332,0.483,1.9e-05,0.0837,0.834,147.989,audio_features,spotify:track:3dnP0JxCgygwQH9Gm7q7nb,https://api.spotify.com/v1/tracks/3dnP0JxCgygw...,https://api.spotify.com/v1/audio-analysis/3dnP...,165671,3
4,Un x100to,Grupo Frontera X Bad Bunny,6pD0ufEQq0xdHSsRbg9LBK,0.569,0.724,6,-4.076,0,0.0474,0.228,0.0,0.27,0.562,83.118,audio_features,spotify:track:6pD0ufEQq0xdHSsRbg9LBK,https://api.spotify.com/v1/tracks/6pD0ufEQq0xd...,https://api.spotify.com/v1/audio-analysis/6pD0...,194563,4
