# Audio Features KNN 
## Build Feature dataframe

This noteboook explores the concept of converting songs to tensors/vectors according to their audio features annotated by Spotify. The set of numeric features includes:

- `acousticness`
- `danceability`
- `duration_ms`
- `energy`
- `key`
- `instrumentalness`
- `liveness`
- `loudness`
- `mode`
- `speechiness`
- `tempo`
- `time_signature`
- `valence`

In addition to these, each data point also has an `id`, `uri`, `track_href`, and `analysis_url`. 

The idea is to connect these features to heart rate zones pulled form training data. Since no itegration to health data has been touched yet, this funcitonality will be developed in a subsequent notebook. 
Much of the methods communcating with the Spotify API were developed [as part of this project](https://github.com/pmhalvor/website/tree/master/radio) hosted on my website: [https://perhalvorsen.com/radio](https://perhalvorsen.com/radio).

## Set up notebook for local development

In [5]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [6]:
from tools import cd_up_dir
from tools import cd_to

cd_up_dir()

cd_to(full_path="/Users/per.morten.halvorsen@schibsted.com/personal/website/radio")
# /Users/per.morten.halvorsen@schibsted.com/personal/website/radio

Moving to /Users/per.morten.halvorsen@schibsted.com/personal/website...
Current working directory: /Users/per.morten.halvorsen@schibsted.com/personal/website
Moving to /Users/per.morten.halvorsen@schibsted.com/personal/website/radio...
Current working directory: /Users/per.morten.halvorsen@schibsted.com/personal/website/radio


## Load historic music data 

In [95]:
import base64
import json
import os 
import pandas as pd
import requests 
import six 
import time

from datetime import datetime, timedelta
from worker.authorize import get_client_token
from worker.song_history import load_df
from worker.plot import select_month

# NOTE These should usually not be left open, but these are connected to a temporary project FIXME remove
SPOTIFY_CLIENT_ID = "4777fff2706b440392f96e896d5b4b58"
SPOTIFY_CLIENT_SECRET = "52dddcd92d2843ca913e408f3d08515f" 
ROOT = os.getcwd()

In [76]:
df, mdf = load_df(root="/Users/per.morten.halvorsen@schibsted.com/personal/")
df = select_month(df, 7)

loaded from /Users/per.morten.halvorsen@schibsted.com/personal//data/history.csv
Max played at 2023-01-07T19:14:58.996Z


In [77]:
df.head()

Unnamed: 0,played_at,id,artist,name
33621,2022-06-12T09:49:56.983Z,02MWAaffLxlfxAUY7c5dvx,Glass Animals,Heat Waves
33622,2022-06-12T09:54:00.768Z,2H4zj3LYFEVBV1JHNtozRA,Roberto Bronco,Small World
33623,2022-06-12T10:02:22.006Z,02uUhbsPgXFvsALSXIo1uH,DJ BORING,6 AM Mimosa
33624,2022-06-12T10:09:08.093Z,1BkZ1luDGfao6wGbUU9vbQ,Shuggie Otis,Oxford Gray
33625,2022-06-12T10:12:04.207Z,5cnsoV2GXggZXhC27SqYpv,Jungle,All Of The Time


In [109]:
df.tail()

Unnamed: 0,played_at,id,artist,name
40992,2023-01-07T14:22:48.145Z,72Qac2FQd9e21XCibSOhfY,Kermesse,Sunday Glide
40993,2023-01-07T14:49:23.973Z,1GfBLbAhZUWdseuDqhocmn,Skrillex,Rumble
40994,2023-01-07T15:02:59.228Z,4CWJ6V6Y5XBjM2STX6z9a0,Quasar,I Never Thought
40995,2023-01-07T15:06:25.029Z,6bwCLBbmmnuxjzCMOt3zfX,weird inside,Wishing Well
40996,2023-01-07T19:14:58.996Z,1AI7UPw3fgwAFkvAlZWhE0,Ed Sheeran,Take Me Back to London (feat. Stormzy)


In [110]:
ids = list(df.tail().id)
ids

['72Qac2FQd9e21XCibSOhfY',
 '1GfBLbAhZUWdseuDqhocmn',
 '4CWJ6V6Y5XBjM2STX6z9a0',
 '6bwCLBbmmnuxjzCMOt3zfX',
 '1AI7UPw3fgwAFkvAlZWhE0']

# Get audio features

Build method to request audio features for a given track id

In [216]:
def get_features(id="", ids=[]):
    token = get_client_token(SPOTIFY_CLIENT_ID, SPOTIFY_CLIENT_SECRET)

    if id != "":
        ids.append(id)
    if ids != []:
        pass # FIXME do we need anything here?

    URL = "https://api.spotify.com/v1/audio-features"  # api-endpoint for audio features
    HEAD = {
        "Authorization": "Bearer " + token.get("access_token"),  # provide auth. crendtials
        "Content-Type": "application/json"
    }     

    content = requests.get(url=URL, headers=HEAD, params={"ids":",".join(ids)})
    if content.status_code == 200:
        return content.json().get("audio_features")
    else:
        return {}
    

In [217]:
get_features(id="1GfBLbAhZUWdseuDqhocmn")

[{'danceability': 0.81,
  'energy': 0.836,
  'key': 1,
  'loudness': -7.721,
  'mode': 0,
  'speechiness': 0.0645,
  'acousticness': 0.0515,
  'instrumentalness': 0.229,
  'liveness': 0.0598,
  'valence': 0.0585,
  'tempo': 139.979,
  'type': 'audio_features',
  'id': '1GfBLbAhZUWdseuDqhocmn',
  'uri': 'spotify:track:1GfBLbAhZUWdseuDqhocmn',
  'track_href': 'https://api.spotify.com/v1/tracks/1GfBLbAhZUWdseuDqhocmn',
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/1GfBLbAhZUWdseuDqhocmn',
  'duration_ms': 146571,
  'time_signature': 4}]

In [218]:
features = get_features(ids=ids)

type(features)

list

In [219]:
features[0]

{'danceability': 0.735,
 'energy': 0.621,
 'key': 4,
 'loudness': -9.407,
 'mode': 0,
 'speechiness': 0.0428,
 'acousticness': 0.127,
 'instrumentalness': 0.537,
 'liveness': 0.101,
 'valence': 0.364,
 'tempo': 105.034,
 'type': 'audio_features',
 'id': '72Qac2FQd9e21XCibSOhfY',
 'uri': 'spotify:track:72Qac2FQd9e21XCibSOhfY',
 'track_href': 'https://api.spotify.com/v1/tracks/72Qac2FQd9e21XCibSOhfY',
 'analysis_url': 'https://api.spotify.com/v1/audio-analysis/72Qac2FQd9e21XCibSOhfY',
 'duration_ms': 347529,
 'time_signature': 4}

In [220]:
feature_columns = list(features[0].keys())
feature_columns

['danceability',
 'energy',
 'key',
 'loudness',
 'mode',
 'speechiness',
 'acousticness',
 'instrumentalness',
 'liveness',
 'valence',
 'tempo',
 'type',
 'id',
 'uri',
 'track_href',
 'analysis_url',
 'duration_ms',
 'time_signature']

In [232]:
# just copied the above cell to list below
feature_columns = ['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo', 'type', 'id', 'uri', 'track_href', 'analysis_url', 'duration_ms', 'time_signature']

def build_features_df(df, max_ids = 100):
    """
    https://developer.spotify.com/documentation/web-api/reference/#/operations/get-several-audio-features

    Maximum 100 IDs per call
    """
    features_df = pd.DataFrame(columns=feature_columns)

    ids = list(df.id.unique())

    num_api_calls = len(ids) // max_ids + 1

    start = 0
    for stop in range(1, num_api_calls):
        # iterates up to last start < (len(ids) - max_ids)
        ids_to_request = ids[start:stop*max_ids]

        # api call 
        features = get_features(ids=ids_to_request)

        features_df = features_df.merge(pd.DataFrame(features), how="outer")

        start = stop*max_ids

    # last batch of ids
    ids_to_request = ids[start:]
    features = get_features(ids=ids_to_request)
    features_df = features_df.merge(pd.DataFrame(features), how="outer")

    return features_df
    

In [233]:
build_features_df(df.tail(10), max_ids=3)


You are merging on int and float columns where the float values are not equal to their int representation.



Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.792,0.704,6,-7.181,0,0.0631,0.00213,0.564,0.123,0.472,124.003,audio_features,7i08AhQcrdD4GLlr2Pmamg,spotify:track:7i08AhQcrdD4GLlr2Pmamg,https://api.spotify.com/v1/tracks/7i08AhQcrdD4...,https://api.spotify.com/v1/audio-analysis/7i08...,391591,4
1,0.73,0.838,6,-8.732,1,0.0679,0.0223,0.786,0.106,0.174,127.005,audio_features,1UopEDdYwbavBAvlY1gA6b,spotify:track:1UopEDdYwbavBAvlY1gA6b,https://api.spotify.com/v1/tracks/1UopEDdYwbav...,https://api.spotify.com/v1/audio-analysis/1Uop...,307142,4
2,0.601,0.679,7,-7.366,1,0.0336,0.0587,0.828,0.306,0.0804,126.021,audio_features,3iYkfb81V73IYt03sF123B,spotify:track:3iYkfb81V73IYt03sF123B,https://api.spotify.com/v1/tracks/3iYkfb81V73I...,https://api.spotify.com/v1/audio-analysis/3iYk...,166565,4
3,0.803,0.51,11,-11.276,0,0.0807,0.281,0.899,0.0983,0.281,120.001,audio_features,5xH2JnZg5i43AGKTpSDO34,spotify:track:5xH2JnZg5i43AGKTpSDO34,https://api.spotify.com/v1/tracks/5xH2JnZg5i43...,https://api.spotify.com/v1/audio-analysis/5xH2...,496191,4
4,0.799,0.746,6,-11.141,0,0.0415,0.277,0.921,0.102,0.0589,110.046,audio_features,3TBKul6UKjnf5jcv46hSyR,spotify:track:3TBKul6UKjnf5jcv46hSyR,https://api.spotify.com/v1/tracks/3TBKul6UKjnf...,https://api.spotify.com/v1/audio-analysis/3TBK...,276952,4
5,0.735,0.621,4,-9.407,0,0.0428,0.127,0.537,0.101,0.364,105.034,audio_features,72Qac2FQd9e21XCibSOhfY,spotify:track:72Qac2FQd9e21XCibSOhfY,https://api.spotify.com/v1/tracks/72Qac2FQd9e2...,https://api.spotify.com/v1/audio-analysis/72Qa...,347529,4
6,0.81,0.836,1,-7.721,0,0.0645,0.0515,0.229,0.0598,0.0585,139.979,audio_features,1GfBLbAhZUWdseuDqhocmn,spotify:track:1GfBLbAhZUWdseuDqhocmn,https://api.spotify.com/v1/tracks/1GfBLbAhZUWd...,https://api.spotify.com/v1/audio-analysis/1GfB...,146571,4
7,0.773,0.647,11,-11.726,0,0.076,0.0192,0.879,0.108,0.348,122.008,audio_features,4CWJ6V6Y5XBjM2STX6z9a0,spotify:track:4CWJ6V6Y5XBjM2STX6z9a0,https://api.spotify.com/v1/tracks/4CWJ6V6Y5XBj...,https://api.spotify.com/v1/audio-analysis/4CWJ...,426944,4
8,0.542,0.355,5,-10.27,1,0.058,0.794,0.567,0.112,0.148,82.977,audio_features,6bwCLBbmmnuxjzCMOt3zfX,spotify:track:6bwCLBbmmnuxjzCMOt3zfX,https://api.spotify.com/v1/tracks/6bwCLBbmmnux...,https://api.spotify.com/v1/audio-analysis/6bwC...,213960,4
9,0.885,0.762,8,-5.513,0,0.216,0.219,0.0,0.162,0.605,138.058,audio_features,1AI7UPw3fgwAFkvAlZWhE0,spotify:track:1AI7UPw3fgwAFkvAlZWhE0,https://api.spotify.com/v1/tracks/1AI7UPw3fgwA...,https://api.spotify.com/v1/audio-analysis/1AI7...,189733,4


## Next steps 
- Build feature data frame for all unique ids in history.csv
- Store features data frame
- Develop join calls for tensor building 