# Extending the internal databases with audio features
## Instructions
1. Create a function to search a given single song in the Spotify API: `search_song(title, artist)`
2. Create a function `get_audio_features(list_of_songs)` to obtain the audio features of a given list of songs (the content of list_of_songs can be the href/id/uri).
3. Once the previous function has been created, create another function `add_audio_features(df, audio_features_df)` to concat a given dataframe with the dataframe containing the audio features alongside any other desired info, and return the extended data frame. Replace the old internal files of songs (hot and not hot) with the extended data frames with the audio features and save them into separate files on the disk.

## Create a function to search a given single song in the Spotify API: 

`search_song(title, artist)`

Take into account that sometimes Spotify's API will return several matches for the same song title (different artists, a different album of the same artist, version of the song,...etc). Then it will be nice to display a list of outputs to the user and let him/her select which is the right match. Once the desired song is located, the function should return the href/id/uri of the song to the code (not to the user). Keep in mind, that a given song might not be available on Spotify's API (make sure to use the song's title and artist searching the song). If the song is not found, the function must return an empty string as the href/id/uri. Also, in this case, you should remove this song from the database

In [1]:
from config import *
# 'congif' is the name of the file where we saved the credentials

In [2]:
# import libraries
import spotipy
import json
from spotipy.oauth2 import SpotifyClientCredentials
import pandas as pd
from time import sleep
from random import randint
import numpy as np

In [3]:
#Initialize SpotiPy with user credentias # 
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=client_id,
                                                           client_secret=client_secret_id))

### User's song ID(s)

In [4]:
def search_song(title, artist, limit=5):
    id_song = []
    # Search for song on Spotify
    query = "artist: " + artist + " track: " + title
    
    # Get ID if found, else add NaN
    try: # if everything goes well
        results = sp.search(q=query, limit=limit)
        song_id = results["tracks"]["items"][0]["id"]
        id_song.append(song_id)
    
    
    except:
        song_id = np.nan
        id_song.append(song_id)
        print("ID not found for {} and {}".format(artist, title))

    # Wait 25 seconds before getting the next batch of songs
    # sleep(25)  
    return id_song

In [5]:
search_song("quater past midnight", "bastille")

['0Tl408cie1ksydV7s6KuWD']

### IDs from 'hot-songs' and 'not-hot-song' dataframes
- Read both dataframes
- Apply function to extract IDs of songs
- Store IDs as new column in dataframe

In [6]:
data_not_hot = pd.read_csv('not_hot_songs.csv') # big dataframe (with not-hot songs)
data_not_hot.head()

Unnamed: 0,titles,artists
0,Twelfth Street Rag,Pee Wee Hunt
1,Mañana (Is Soon Enough for Me),Peggy Lee
2,Now Is the Hour,Bing Crosby
3,A Tree in the Meadow,Margaret Whiting
4,"You Can't Be True, Dear",Ken Griffin


In [7]:
data_hot = pd.read_csv('hot_songs.csv') # small dataframe (with hot songs)
data_hot.head()

Unnamed: 0,titles,artists
0,Kill Bill,SZA
1,Last Night,Morgan Wallen
2,Flowers,Miley Cyrus
3,Princess Diana,Ice Spice & Nicki Minaj
4,Ella Baila Sola,Eslabon Armado X Peso Pluma


In [8]:
def get_songs_ids(df):
    id = []

    for i in range(0, len(df), 100):
        # get batch (chunks) of songs
        chunk = df.iloc[i:i+100]

        # iterate through songs in chunk, getting Spotify IDs
        for index, row in chunk.iterrows():
            # Search for song on Spotify
            title = row["titles"]
            artist = row["artists"]
            query = "artist: " + artist + " track: " + title        

            # Get ID if found, else add NaN
            try: # if everything goes well
                results = sp.search(q=query, limit=1)
                song_id = results["tracks"]["items"][0]["id"]
                id.append(song_id)
                
            except:
                song_id = np.nan
                id.append(song_id)
                print("ID not found for {} and {}".format(row['artists'], row['titles']))

        # Wait 25 seconds before getting the next batch of songs
        sleep(25)
    df['ids'] = id  # add ids of songs as new column in dataframe

In [9]:
get_songs_ids(data_hot)
display(data_hot.shape)
display(data_hot.head())

(100, 3)

Unnamed: 0,titles,artists,ids
0,Kill Bill,SZA,1Qrg8KqiBpW07V7PNxwwwL
1,Last Night,Morgan Wallen,59uQI0PADDKeE6UZDTJEe8
2,Flowers,Miley Cyrus,0yLdNVWF3Srea0uzk55zFn
3,Princess Diana,Ice Spice & Nicki Minaj,0ZxhtATQ0a6EvQNZ0z6NKV
4,Ella Baila Sola,Eslabon Armado X Peso Pluma,3dnP0JxCgygwQH9Gm7q7nb


In [10]:
get_songs_ids(data_not_hot)
display(data_not_hot.shape)
data_not_hot.head()

(2241, 3)

Unnamed: 0,titles,artists,ids
0,Twelfth Street Rag,Pee Wee Hunt,0Ec85AbUzt5SRDnsiBodml
1,Mañana (Is Soon Enough for Me),Peggy Lee,3baBjQISAznjkg8WwzipPM
2,Now Is the Hour,Bing Crosby,5eKva6ek6QTxwtYAioGWqX
3,A Tree in the Meadow,Margaret Whiting,1y4crDPPIzcTiOhBE5KRDJ
4,"You Can't Be True, Dear",Ken Griffin,13J3nj3RP6YjcG6bgFD9gj



## Create a function to obtain the audio features of a given list of songs (the content of list_of_songs can be the href/id/uri)

`get_audio_features(list_of_songs)`

Be careful to not exceed the number of calls to the API otherwise, you will be banned and you will have to wait several hours before launching a new request see here. A good strategy to prevent this problem is to split the list of song id's in "chunks" of 50 songs id's and wait 20 seconds before asking for the audio features of the next "chunk". Then, use this function to create a Pandas Dataframe with the audio features of the list of songs. Hint: create a dictionary with the song's audio features as keys and an empty list as values. Then fill in the lists with the corresponding audio features of each song. Finally, create your data frame from the dictionary.

In [11]:
def get_audio_features(df):
    '''
    This function gets a dataframe with a column with songs' ids and
    returns a dataframe with ALL audio features
    '''
    # create empty dictionary
    audio_features_dict = {}

    for i in range(0, len(df), 100):
        # get batch of songs
        chunk = df.iloc[i:i+100]

        # create a list of ids for the songs in each chunk
        id_list = chunk['ids'].tolist()

        # get the audio features for the songs in this chunk
        audio_features_list = sp.audio_features(id_list)
       # if not audio_features_list:


        # iterate through the audio features and add them to the dictionary
        for features in audio_features_list:
            if features:
               audio_features_dict[features['id']] = features
               

        # wait 25 seconds before iterating through the next group (chunk) of songs
        sleep(11)

    # cast dictionary into dataframe
    features_df = pd.DataFrame.from_dict(audio_features_dict, orient='index').reset_index()
   # return features_df
    # reset the index
    #features_df.reset_index(inplace=True)

    # rename the 'index' column to 'ids'
    #features_df.rename(columns={'index': 'ids'}, inplace=True)

    # Merge the original dataframe with the audio features dataframe
   #features_df = pd.merge(df, features_df, on='ids', how='left')
    #features_df = features_df.merge(df, left_on="id", right_on="ids")
    features_df = df.merge(features_df, left_on="ids", right_on="id")

    return features_df

In [12]:
data_hot_audio_features = get_audio_features(data_hot)
display(data_hot_audio_features.shape)
display(data_hot_audio_features.head())

(100, 22)

Unnamed: 0,titles,artists,ids,index,danceability,energy,key,loudness,mode,speechiness,...,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,Kill Bill,SZA,1Qrg8KqiBpW07V7PNxwwwL,1Qrg8KqiBpW07V7PNxwwwL,0.644,0.735,8,-5.747,1,0.0391,...,0.161,0.418,88.98,audio_features,1Qrg8KqiBpW07V7PNxwwwL,spotify:track:1Qrg8KqiBpW07V7PNxwwwL,https://api.spotify.com/v1/tracks/1Qrg8KqiBpW0...,https://api.spotify.com/v1/audio-analysis/1Qrg...,153947,4
1,Last Night,Morgan Wallen,59uQI0PADDKeE6UZDTJEe8,59uQI0PADDKeE6UZDTJEe8,0.517,0.675,6,-5.382,1,0.0357,...,0.151,0.518,203.853,audio_features,59uQI0PADDKeE6UZDTJEe8,spotify:track:59uQI0PADDKeE6UZDTJEe8,https://api.spotify.com/v1/tracks/59uQI0PADDKe...,https://api.spotify.com/v1/audio-analysis/59uQ...,163855,4
2,Flowers,Miley Cyrus,0yLdNVWF3Srea0uzk55zFn,0yLdNVWF3Srea0uzk55zFn,0.707,0.681,0,-4.325,1,0.0668,...,0.0322,0.646,117.999,audio_features,0yLdNVWF3Srea0uzk55zFn,spotify:track:0yLdNVWF3Srea0uzk55zFn,https://api.spotify.com/v1/tracks/0yLdNVWF3Sre...,https://api.spotify.com/v1/audio-analysis/0yLd...,200455,4
3,Princess Diana,Ice Spice & Nicki Minaj,0ZxhtATQ0a6EvQNZ0z6NKV,0ZxhtATQ0a6EvQNZ0z6NKV,0.898,0.676,9,-5.196,1,0.187,...,0.101,0.742,147.991,audio_features,0ZxhtATQ0a6EvQNZ0z6NKV,spotify:track:0ZxhtATQ0a6EvQNZ0z6NKV,https://api.spotify.com/v1/tracks/0ZxhtATQ0a6E...,https://api.spotify.com/v1/audio-analysis/0Zxh...,172125,4
4,Ella Baila Sola,Eslabon Armado X Peso Pluma,3dnP0JxCgygwQH9Gm7q7nb,3dnP0JxCgygwQH9Gm7q7nb,0.668,0.758,5,-5.176,0,0.0332,...,0.0837,0.834,147.989,audio_features,3dnP0JxCgygwQH9Gm7q7nb,spotify:track:3dnP0JxCgygwQH9Gm7q7nb,https://api.spotify.com/v1/tracks/3dnP0JxCgygw...,https://api.spotify.com/v1/audio-analysis/3dnP...,165671,3


In [13]:
data_not_hot.isna().value_counts()

titles  artists  ids  
False   False    False    2241
dtype: int64

In [14]:
data_not_hot_audio_features = get_audio_features(data_not_hot)
display(data_not_hot_audio_features.shape)
display(data_not_hot_audio_features.head())

(2240, 22)

Unnamed: 0,titles,artists,ids,index,danceability,energy,key,loudness,mode,speechiness,...,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,Twelfth Street Rag,Pee Wee Hunt,0Ec85AbUzt5SRDnsiBodml,0Ec85AbUzt5SRDnsiBodml,0.53,0.57,10,-7.942,1,0.0559,...,0.102,0.943,105.621,audio_features,0Ec85AbUzt5SRDnsiBodml,spotify:track:0Ec85AbUzt5SRDnsiBodml,https://api.spotify.com/v1/tracks/0Ec85AbUzt5S...,https://api.spotify.com/v1/audio-analysis/0Ec8...,174027,4
1,Mañana (Is Soon Enough for Me),Peggy Lee,3baBjQISAznjkg8WwzipPM,3baBjQISAznjkg8WwzipPM,0.751,0.566,2,-13.624,0,0.102,...,0.272,0.935,122.282,audio_features,3baBjQISAznjkg8WwzipPM,spotify:track:3baBjQISAznjkg8WwzipPM,https://api.spotify.com/v1/tracks/3baBjQISAznj...,https://api.spotify.com/v1/audio-analysis/3baB...,178733,4
2,Now Is the Hour,Bing Crosby,5eKva6ek6QTxwtYAioGWqX,5eKva6ek6QTxwtYAioGWqX,0.193,0.0465,1,-21.226,1,0.0326,...,0.0993,0.244,83.573,audio_features,5eKva6ek6QTxwtYAioGWqX,spotify:track:5eKva6ek6QTxwtYAioGWqX,https://api.spotify.com/v1/tracks/5eKva6ek6QTx...,https://api.spotify.com/v1/audio-analysis/5eKv...,189293,3
3,A Tree in the Meadow,Margaret Whiting,1y4crDPPIzcTiOhBE5KRDJ,1y4crDPPIzcTiOhBE5KRDJ,0.303,0.0912,9,-14.795,1,0.0336,...,0.0917,0.0822,88.313,audio_features,1y4crDPPIzcTiOhBE5KRDJ,spotify:track:1y4crDPPIzcTiOhBE5KRDJ,https://api.spotify.com/v1/tracks/1y4crDPPIzcT...,https://api.spotify.com/v1/audio-analysis/1y4c...,160392,4
4,"You Can't Be True, Dear",Ken Griffin,13J3nj3RP6YjcG6bgFD9gj,13J3nj3RP6YjcG6bgFD9gj,0.239,0.293,0,-9.26,1,0.0346,...,0.378,0.44,104.71,audio_features,13J3nj3RP6YjcG6bgFD9gj,spotify:track:13J3nj3RP6YjcG6bgFD9gj,https://api.spotify.com/v1/tracks/13J3nj3RP6Yj...,https://api.spotify.com/v1/audio-analysis/13J3...,157814,4


Save dataframes with audio features


In [15]:
data_hot_audio_features.to_csv('hot_songs_features.csv', index=False)
data_not_hot_audio_features.to_csv('not_hot_songs_features.csv', index=False)

## Concatenate dataframe and audio features

Once the previous function has been created, create another function `add_audio_features(df, audio_features_df)` to concat a given dataframe with the dataframe containing the audio features alongside any other desired info, and return the extended data frame. Replace the old internal files of songs (hot and not hot) with the extended data frames with the audio features and save them into separate files on the disk.

In [16]:
# already done within the previous function