# Extending the internal databases with audio features


- **Create a function to search a given song in the Spotify API: search_song().** Take into account that sometimes Spotify's API will return several matches for the same song title (different artists, a different album of the same artist, version of the song,...etc). Then it will be nice to display a list of outputs to the user and let him/her select which is the right match. Once the desired song is located, the function should return the href/id/uri of the song to the code (not to the user).
- **Create a function "get_audio_features(list_of_songs)"** to obtain the audio features of a given list of songs (the content of list_of_songs can be the href/id/uri). Then, use this function to create a Pandas Dataframe with the audio features of the list of songs. Hint: create a dictionary with the song's audio features as keys and an empty list as values. Then fill in the lists with the corresponding audio features of each song. Finally, create your data frame from the dictionary. Bear in mind the following: This API has a restriction on the number of requests per minute, therefore, consider launching the search using smaller groups of songs.
- Once the previous function has been created, **create another function "add_audio_features(df, audio_features_df)" to concat a given data frame with the data frame containing the audio features alongside any other desired info, and return the extended data frame.
Replace the old internal files of songs (hot and not hot) with the extended data frames with the audio features and save them into separate files on the disk.**

In [2]:
import sys
sys.path.insert(1, '/Users/mariasoriano/Desktop/Ironhack/lab-web-scraping-single-page/config.py')
from config import *

In [3]:
import spotipy
import json
from spotipy.oauth2 import SpotifyClientCredentials
import pandas as pd

In [4]:
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id= client_id,
                                                           client_secret= client_secret_id))

**Function to search a given song in the Spotify API: search_song()**

In [5]:
def search_song(song):
    try:
        song = sp.search(q=song)#, limit=5)
        return song["tracks"]["items"][0]["id"]
    except:
        return ""

#return the href/id/uri of the song to the code (not to the user)

In [8]:
#trying some stuff
search_song('baby')
sp.audio_features(baby)[0]

{'danceability': 0.728,
 'energy': 0.859,
 'key': 5,
 'loudness': -5.237,
 'mode': 0,
 'speechiness': 0.137,
 'acousticness': 0.0401,
 'instrumentalness': 0,
 'liveness': 0.111,
 'valence': 0.535,
 'tempo': 65.043,
 'type': 'audio_features',
 'id': '6epn3r7S14KUqlReYr77hA',
 'uri': 'spotify:track:6epn3r7S14KUqlReYr77hA',
 'track_href': 'https://api.spotify.com/v1/tracks/6epn3r7S14KUqlReYr77hA',
 'analysis_url': 'https://api.spotify.com/v1/audio-analysis/6epn3r7S14KUqlReYr77hA',
 'duration_ms': 214240,
 'time_signature': 4}

**Function to obtain the audio features of a given list of songs: get_audio_features()**

Create a function "get_audio_features(list_of_songs)" to obtain the audio features of a given list of songs (the content of list_of_songs can be the href/id/uri). Then, use this function to create a Pandas Dataframe with the audio features of the list of songs. Hint: create a dictionary with the song's audio features as keys and an empty list as values. Then fill in the lists with the corresponding audio features of each song. Finally, create your data frame from the dictionary. Bear in mind the following: This API has a restriction on the number of requests per minute, therefore, consider launching the search using smaller groups of songs.

In [9]:
def get_audio_features(list_of_songs):
    #obtain audio features
    song_ids = [search_song(song) for song in list_of_songs]
    black_features = { 'danceability': "Null",
                'energy': "Null",
                'key': "Null",
                'loudness': "Null",
                'mode': "Null",
                'speechiness': "Null",
                'acousticness': "Null",
                'instrumentalness': "Null",
                'liveness': "Null",
                'valence': "Null",
                'tempo': "Null",
                'duration_ms': "Null",
                'time_signature': "Null"
                }
    song_features = [sp.audio_features(song_id) if song_id != "" else black_features for song_id in song_ids]
    
    #dictionary with the song's audio features as 'key' - empty list as 'values'.
    features = { 'danceability': [],
                'energy': [],
                'key': [],
                'loudness': [],
                'mode': [],
                'speechiness': [],
                'acousticness': [],
                'instrumentalness': [],
                'liveness': [],
                'valence': [],
                'tempo': [],
                'duration_ms': [],
                'time_signature': []
                }
    #[features[key].append(song[key]) for song in song_features for key in list(features.keys())]   
    for index, song_id in enumerate(song_ids):
        print("Getting audio features for song id: ",song_id)
        for key in list(features.keys()):
            features[key].append(song_features[index][0][key])
    
    #create a Pandas DataFrame
    audio_features_df = pd.DataFrame(features)
    return audio_features_df


In [10]:
not_hot = pd.read_csv('/Users/mariasoriano/Desktop/Ironhack/lab-web-scraping-single-page/nothot_100.csv')

In [12]:
songs_list = not_hot.sample(frac=0.2, replace=False, random_state=1)
songs_list

Unnamed: 0,name,album,artists,year
55897,Sketch,Sound Puzzle,'40 Winks',2007
215062,EK ISHQ EK ROOH,EK ISHQ EK ROOH,'Sufi Parveen',2019
70429,ComfyCozy,"Chicago,Detroit,Redruth",'Luke Vibert',2007
103209,See You Later Fuckface,Pleasant Screams,'The Queers',2002
679327,Stop Running,no sense in waiting,'cerulean',2005
...,...,...,...,...
581255,"Symphony in F Major, VB 145: III. Presto","Kraus: Symphonies, Vol. 4","'Joseph Martin Kraus', 'Swedish Chamber Orches...",2002
604391,"Organ Sonata in B-Flat Major, Wq. 70/2: III. A...",C.P.E. Bach: Organ Sonatas,"'Carl Philipp Emanuel Bach', 'Iain Quinn'",2016
757216,Death,Opus Arcana,'Ian Ring',2018
119023,Riderbrow,Styles Of The Unexpected,'Andy Votel',2000


In [None]:
#list(songs_list['name'])[:7]
get_audio_features(songs_list['name'])
#get_audio_features(list(songs_list['name'])[0])
#get_audio_features(list(songs_list['name'])[:7])

**Function to concat a given data frame with the data frame containing the audio features alongside any other desired info, and return the extended data frame: "add_audio_features(df, audio_features_df)"**

In [None]:
hot100 = pd.read_csv('/Users/mariasoriano/Desktop/Ironhack/lab-web-scraping-single-page/hot100.csv')

In [None]:
def add_audio_features(df, audio_features_df):
    concated = pd.concat(df, audio_features_df, axis=1)
    return concated

In [None]:
add_audio_features(hot100, audio_features_df)

Replace the old internal files of songs (hot and not hot) with the extended data frames with the audio features and save them into separate files on the disk.