# Getting Details that Describe Each Album

Spotify does a fantastic job of describing each track, because it has features such as the acousticness, danceability, energy, and instrumentalness as a part of its API. All of these features can be aggregated for all tracks in the album to get overall values for each feature for the whole album. These values can then be used to create a feature vector for each album.

In [45]:
import requests
import pandas as pd
import sqlite3
from utils import get_request_headers

db_con = sqlite3.connect('../albums.db')
request_headers = get_request_headers()

Get all albums from albums database table in `albums.db`.

In [36]:
df = pd.read_sql_query('SELECT * FROM albums', db_con)

# initialize album features to zero
df['acousticness'] = df['danceability'] = df['energy'] = df['instrumentalness'] = df['liveness'] = df['loudness'] = df['speechiness'] = df['valence'] = df['tempo'] = 0.0

Features to consider for each album: acousticness, danceability, energy, instrumentalness, liveness, speechiness, valence, loudness, and tempo
All features except loudness and tempo can have any value in a range from 0 to 1 for each track.

Do the following for each album:
1. Get all tracks on the album.
2. Get audio features for all tracks.
3. Calculate the total of acousticness, danceability, energy, instrumentalness, liveness, speechiness, and valence for each album using the audio features from each track.
4. For tempo and loudness, a total value across all tracks does not make sense, so a weighted average (based on track playtime) can be used to calculate an "average" tempo or loudness.
5. Since different albums will have different numbers of tracks, all totalled features will be scaled to be between 0 and 1 so we can compare between two albums.

In [37]:
SPOTIFY_GET_ALBUM_TRACKS_URL = 'https://api.spotify.com/v1/albums/{id}/tracks?limit=50'
SPOTIFY_GET_TRACK_FEATURES_URL = 'https://api.spotify.com/v1/audio-features'

for i in range(len(df)):
    album_id = df['spotify_id'][i]
    result = requests.get(SPOTIFY_GET_ALBUM_TRACKS_URL.format(id=album_id), headers=request_headers).json()
    track_ids = [track['id'] for track in result['items']]
    result = requests.get(SPOTIFY_GET_TRACK_FEATURES_URL, params={'ids': ','.join(track_ids)}, headers=request_headers).json()

    album_duration = 0
    for track_features in result['audio_features']:
        if (track_features): album_duration += track_features['duration_ms']

    # initialize feature sums with 0 value
    acousticness = danceability = energy = instrumentalness = liveness = loudness = speechiness = valence = tempo = 0
    null_tracks = 0
    for track_features in result['audio_features']:
        if (not track_features):
            null_tracks += 1
            continue
        acousticness += track_features['acousticness']
        danceability += track_features['danceability']
        energy += track_features['energy']
        instrumentalness += track_features['instrumentalness']
        liveness += track_features['liveness']
        loudness += (track_features['loudness'] * track_features['duration_ms'] / album_duration) # weighted avg for loudness based on track duration
        speechiness += track_features['speechiness']
        valence += track_features['valence']
        tempo += (track_features['tempo'] * track_features['duration_ms'] / album_duration) # weighted avg for tempo based on track duration

    num_tracks = len(result['audio_features']) - null_tracks 

    # scale all aggregated features to be between 0 and 1
    df['acousticness'][i] = acousticness / num_tracks
    df['danceability'][i] = danceability / num_tracks
    df['energy'][i] = energy / num_tracks
    df['instrumentalness'][i] = instrumentalness / num_tracks
    df['liveness'][i] = liveness / num_tracks
    df['speechiness'][i] = speechiness / num_tracks
    df['valence'][i] = valence / num_tracks
    
    df['loudness'][i] = loudness
    df['tempo'][i] = tempo

Save all data gathered in dataframe to albums table in `albums.db` database. Feature values for each album will be added to that album's row in the table at this point.

In [42]:
df.to_sql('albums', db_con, if_exists='replace')

Close database connection.

In [46]:
db_con.close()