# Exploratory Analysis

## Eploring SpotiPy
- Explore available data
- Select features of interest
- Generate initial dataframe and database

###### Note:
###### - Potentially building a recomender system that can take a set of the most frequently played songs from one user, and match them with a second users profile; potentially between courting couples and friends
###### - Consider doing podcasts as a feature for the people that might be interested in that
###### - Also, consider calling the playlists `"{user_1} and {user_2}'s Playlist Baby"`

In [1]:
# imports

import os
import sys
import json
import spotipy
import webbrowser
import numpy as np
import pandas as pd
from os import getenv
import spotipy.util as util
from dotenv import load_dotenv
from json.decoder import JSONDecodeError
from spotipy.oauth2 import SpotifyClientCredentials

In [2]:
# We are using the client Module from the Python library for the Spotify API
# (https://spotipy.readthedocs.io/en/2.13.0/#module-spotipy.client)
# Client Credentials Flow


load_dotenv()  # this imports all .env variables

# Setting up env variables to connect to API
uri = getenv('uri') # must match in the Spotify app dashboard
SPOTIFY_CLIENT_ID = getenv('SPOTIFY_CLIENT_ID')
SPOTIFY_CLIENT_SECRET = getenv('SPOTIFY_CLIENT_SECRET')
username = getenv('USER_ID')  #  user who's data we are collecting
scope = 'user-top-read'  #  determines the kind of access you have to a user profile

# Access token to obtain user info
token = util.prompt_for_user_token(username=username,
                                   scope=scope,
                                   client_id=SPOTIFY_CLIENT_ID,
                                   client_secret=SPOTIFY_CLIENT_SECRET,
                                   redirect_uri=uri)

# activating spotify session
spotify_session = spotipy.Spotify(auth=token)

## Goals 
- Connect to user library using [scopes](https://developer.spotify.com/documentation/general/guides/scopes/)
- Scopes to connect to are [user-library-read](https://developer.spotify.com/documentation/general/guides/scopes/#user-library-read), [playlist-modify-public](https://developer.spotify.com/documentation/general/guides/scopes/#playlist-modify-public), and [user-top-read](https://developer.spotify.com/documentation/general/guides/scopes/#user-top-read)

##### **The goal here is to cnnect to the users' respective libraries, analyze them, and create a new playlist.**
- For this analysis I will exlpore both the Audio Analysis Objects and the Audio Features Objects.  From the looks of it, the Audio Features Objects might provide better ensight regarding the machine learning model.

In [3]:
# Playing with the api: accessing user top read, modify playlist, read library

# User top artists
top_artists = spotify_session.current_user_top_artists(limit=1)

# User top tracks
top_tracks = spotify_session.current_user_top_tracks(limit=30, time_range='medium_term')

# Track_id
top_tracks['items'][0]['id'] == "0akyEssGRVHstqCSWXusJL"
top_track_id = top_tracks['items'][0]['id']

# Top ten tracks ids
top_10_tracks_id = [top_tracks['items'][x]['id'] for x in range(len(top_tracks['items']))]
top_10_tracks_id

# Top Track Audio Analysis Object
audio_anal = spotify_session.audio_analysis(top_track_id)

# Top track Audio Features Object
audio_feat = spotify_session.audio_features(tracks=top_10_tracks_id)

In [48]:
# Obtainiing track-name and artist name

top_tracks.keys()
top_tracks['items'][0].keys()
top_tracks['items'][0]['name']                # Generates track name
top_tracks['items'][0]['artists'][0]['name']  # Generates artist name

'Ed Maverick'

In [58]:
# Generating lists containing Artist names and Track names

# Lists to be populated
track_names_lst = []
artist_names_lst = []

# Iterates over top tracks to append track/artist name to correpsonding list
for t in range(len(top_10_tracks_id)):
    track_names_lst.append(top_tracks['items'][t]['name'])
    artist_names_lst.append(top_tracks['items'][t]['artists'][0]['name'])

print(track_names_lst, '\n', artist_names_lst)

['Fuentes de Ortiz', 'Llevo', 'River', 'Wru - (donde estás)', 'Diez Pasos Hacia Ti', 'Baby Blue', 'Catorce', 'Tú', 'Gracias Por Nada', 'Somos Algo'] 
 ['Ed Maverick', 'Luke Martinez', 'Leon Bridges', 'Ed Maverick', 'Daniel, Me Estás Matando', 'Kevin Kaarl', 'Sebastián Romero', 'maye', 'Jordano', 'Daniel, Me Estás Matando']


In [59]:
# Creating a dataframe from the previously created lists

track_artist_names = pd.DataFrame(list(zip(track_names_lst, artist_names_lst)), columns=['track name', 'artist'])
track_artist_names

Unnamed: 0,track name,artist
0,Fuentes de Ortiz,Ed Maverick
1,Llevo,Luke Martinez
2,River,Leon Bridges
3,Wru - (donde estás),Ed Maverick
4,Diez Pasos Hacia Ti,"Daniel, Me Estás Matando"
5,Baby Blue,Kevin Kaarl
6,Catorce,Sebastián Romero
7,Tú,maye
8,Gracias Por Nada,Jordano
9,Somos Algo,"Daniel, Me Estás Matando"


In [4]:
# Audio Analysis Objects

# See the following link for discriptive information on objects
# (https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-analysis/#time-interval-object)

# The following are one of three kinds of objects: 'Time interval'(TI), 'Sections'(Sc), or 'Segments'(Sg)
audio_anal['bars']   # (TI) Segment of time defined as given number of beats
audio_anal['beats']  # (TI) Time intervals of beats throughout track
audio_anal['sections']  # (Sc) Defined by large variations in rhythm
audio_anal['segments']  # (Sg) Song subdivisions, attempting to with each segment containing roughly a consistent sound
audio_anal['tatums']  # (TI) Beats are subdivisions of bars, and Tatums are subdivisions of beats

# consider taking podcasts into this shit; make a playlist baby with fill in the bank
print(audio_anal.keys(),
      '\n'*2,
      audio_anal['track'].keys(),
      '\n'*2,
      top_10_tracks_id)

dict_keys(['meta', 'track', 'bars', 'beats', 'sections', 'segments', 'tatums']) 

 dict_keys(['num_samples', 'duration', 'sample_md5', 'offset_seconds', 'window_seconds', 'analysis_sample_rate', 'analysis_channels', 'end_of_fade_in', 'start_of_fade_out', 'loudness', 'tempo', 'tempo_confidence', 'time_signature', 'time_signature_confidence', 'key', 'key_confidence', 'mode', 'mode_confidence', 'codestring', 'code_version', 'echoprintstring', 'echoprint_version', 'synchstring', 'synch_version', 'rhythmstring', 'rhythm_version']) 

 ['0akyEssGRVHstqCSWXusJL', '02gaYAEdeR6poHcBH1KUQF', '0NeJjNlprGfZpeX2LQuN6c', '6plO0gM4tUvRC9TKFGIuaN', '54KsfVVnN4YWI2mMrnyUcC', '57mLRN6tfXwTRvp9oPWpop', '1s3WD4gbNoEXHiuSTmAKaK', '1jecO8NeYLsVWVptITz4c1', '6KseaEAFSS63N2NPZtDnRL', '5iSpfk6cDOSYePagAoG639']


In [5]:
# Audio Features Objects

# See the reference README file for a description of the Audio Features Objects, or explore the following link
# (https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/)

# Note: 'key' is -1 if no key is detected.  Consider when training model, or processing data

# The following explores the keys of the Audio Features Object for a single track:
print("The following are the key value pairs contained in the audio features for 10 tracks:", '\n')

k_lst = list(audio_feat[0].keys())    # will eventually become the column names
lst_v_lst= []                         # a list of lists of values
for i in range(len(audio_feat)):      # for loop to run through the Audio Feat. Objects
    v_lst = []
    for _, v in audio_feat[i].items():  # for loop through the 0th item to append values
        v_lst.append(v)                 # values appended to list
    lst_v_lst.append(v_lst)         # list of values appending to list
print(k_lst)
lst_v_lst[0]

The following are the key value pairs contained in the audio features for 10 tracks: 

['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo', 'type', 'id', 'uri', 'track_href', 'analysis_url', 'duration_ms', 'time_signature']


[0.83,
 0.159,
 1,
 -14.461,
 1,
 0.0383,
 0.946,
 2.02e-05,
 0.362,
 0.189,
 104.95,
 'audio_features',
 '0akyEssGRVHstqCSWXusJL',
 'spotify:track:0akyEssGRVHstqCSWXusJL',
 'https://api.spotify.com/v1/tracks/0akyEssGRVHstqCSWXusJL',
 'https://api.spotify.com/v1/audio-analysis/0akyEssGRVHstqCSWXusJL',
 207400,
 4]

In [6]:
# Generating the dataframe for the tracks

# Take the two lists (K_LST, V_LST) and turn those into the dataframeh; the K_LST will be the column names
# V_LST will be the values; and they will be indexed by the track id's

top_tracks_df = pd.DataFrame(lst_v_lst, columns=k_lst)
top_tracks_df

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.83,0.159,1,-14.461,1,0.0383,0.946,2e-05,0.362,0.189,104.95,audio_features,0akyEssGRVHstqCSWXusJL,spotify:track:0akyEssGRVHstqCSWXusJL,https://api.spotify.com/v1/tracks/0akyEssGRVHs...,https://api.spotify.com/v1/audio-analysis/0aky...,207400,4
1,0.726,0.125,5,-9.194,0,0.0803,0.835,0.0,0.131,0.277,92.23,audio_features,02gaYAEdeR6poHcBH1KUQF,spotify:track:02gaYAEdeR6poHcBH1KUQF,https://api.spotify.com/v1/tracks/02gaYAEdeR6p...,https://api.spotify.com/v1/audio-analysis/02ga...,183711,4
2,0.658,0.179,8,-10.866,1,0.0448,0.689,0.0,0.17,0.191,128.128,audio_features,0NeJjNlprGfZpeX2LQuN6c,spotify:track:0NeJjNlprGfZpeX2LQuN6c,https://api.spotify.com/v1/tracks/0NeJjNlprGfZ...,https://api.spotify.com/v1/audio-analysis/0NeJ...,238560,4
3,0.78,0.23,4,-12.706,1,0.0448,0.913,0.00279,0.0798,0.125,123.937,audio_features,6plO0gM4tUvRC9TKFGIuaN,spotify:track:6plO0gM4tUvRC9TKFGIuaN,https://api.spotify.com/v1/tracks/6plO0gM4tUvR...,https://api.spotify.com/v1/audio-analysis/6plO...,240307,4
4,0.77,0.325,7,-11.301,1,0.0322,0.899,0.000556,0.22,0.721,103.085,audio_features,54KsfVVnN4YWI2mMrnyUcC,spotify:track:54KsfVVnN4YWI2mMrnyUcC,https://api.spotify.com/v1/tracks/54KsfVVnN4YW...,https://api.spotify.com/v1/audio-analysis/54Ks...,209652,4
5,0.556,0.127,10,-18.559,1,0.0318,0.988,0.824,0.11,0.109,104.963,audio_features,57mLRN6tfXwTRvp9oPWpop,spotify:track:57mLRN6tfXwTRvp9oPWpop,https://api.spotify.com/v1/tracks/57mLRN6tfXwT...,https://api.spotify.com/v1/audio-analysis/57mL...,274293,4
6,0.646,0.307,6,-11.669,1,0.0333,0.822,0.00305,0.0971,0.25,155.918,audio_features,1s3WD4gbNoEXHiuSTmAKaK,spotify:track:1s3WD4gbNoEXHiuSTmAKaK,https://api.spotify.com/v1/tracks/1s3WD4gbNoEX...,https://api.spotify.com/v1/audio-analysis/1s3W...,224046,3
7,0.732,0.555,5,-7.973,1,0.0326,0.625,0.0,0.112,0.729,114.981,audio_features,1jecO8NeYLsVWVptITz4c1,spotify:track:1jecO8NeYLsVWVptITz4c1,https://api.spotify.com/v1/tracks/1jecO8NeYLsV...,https://api.spotify.com/v1/audio-analysis/1jec...,206612,4
8,0.491,0.0677,1,-19.942,0,0.227,0.982,0.0,0.1,0.157,203.366,audio_features,6KseaEAFSS63N2NPZtDnRL,spotify:track:6KseaEAFSS63N2NPZtDnRL,https://api.spotify.com/v1/tracks/6KseaEAFSS63...,https://api.spotify.com/v1/audio-analysis/6Kse...,196692,4
9,0.836,0.519,7,-8.962,1,0.0355,0.686,0.00236,0.186,0.638,93.08,audio_features,5iSpfk6cDOSYePagAoG639,spotify:track:5iSpfk6cDOSYePagAoG639,https://api.spotify.com/v1/tracks/5iSpfk6cDOSY...,https://api.spotify.com/v1/audio-analysis/5iSp...,234128,4


In [81]:
# Combining 'track/artist name' and 'top_tracks_df' DataFrames and droping useless columns

# Dopping columns from 'top_tracks_df'
drop_col = ['type', 'track_href', 'analysis_url']
top_tracks_df = top_tracks_df.drop(drop_col, axis=1)
top_tracks_df

# Creating the DataFrame that will be used for analysis
df = pd.concat([track_artist_names, top_tracks_df.reindex(track_artist_names.index)], axis=1)
df

Unnamed: 0,track name,artist,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,id,uri,duration_ms,time_signature
0,Fuentes de Ortiz,Ed Maverick,0.83,0.159,1,-14.461,1,0.0383,0.946,2e-05,0.362,0.189,104.95,0akyEssGRVHstqCSWXusJL,spotify:track:0akyEssGRVHstqCSWXusJL,207400,4
1,Llevo,Luke Martinez,0.726,0.125,5,-9.194,0,0.0803,0.835,0.0,0.131,0.277,92.23,02gaYAEdeR6poHcBH1KUQF,spotify:track:02gaYAEdeR6poHcBH1KUQF,183711,4
2,River,Leon Bridges,0.658,0.179,8,-10.866,1,0.0448,0.689,0.0,0.17,0.191,128.128,0NeJjNlprGfZpeX2LQuN6c,spotify:track:0NeJjNlprGfZpeX2LQuN6c,238560,4
3,Wru - (donde estás),Ed Maverick,0.78,0.23,4,-12.706,1,0.0448,0.913,0.00279,0.0798,0.125,123.937,6plO0gM4tUvRC9TKFGIuaN,spotify:track:6plO0gM4tUvRC9TKFGIuaN,240307,4
4,Diez Pasos Hacia Ti,"Daniel, Me Estás Matando",0.77,0.325,7,-11.301,1,0.0322,0.899,0.000556,0.22,0.721,103.085,54KsfVVnN4YWI2mMrnyUcC,spotify:track:54KsfVVnN4YWI2mMrnyUcC,209652,4
5,Baby Blue,Kevin Kaarl,0.556,0.127,10,-18.559,1,0.0318,0.988,0.824,0.11,0.109,104.963,57mLRN6tfXwTRvp9oPWpop,spotify:track:57mLRN6tfXwTRvp9oPWpop,274293,4
6,Catorce,Sebastián Romero,0.646,0.307,6,-11.669,1,0.0333,0.822,0.00305,0.0971,0.25,155.918,1s3WD4gbNoEXHiuSTmAKaK,spotify:track:1s3WD4gbNoEXHiuSTmAKaK,224046,3
7,Tú,maye,0.732,0.555,5,-7.973,1,0.0326,0.625,0.0,0.112,0.729,114.981,1jecO8NeYLsVWVptITz4c1,spotify:track:1jecO8NeYLsVWVptITz4c1,206612,4
8,Gracias Por Nada,Jordano,0.491,0.0677,1,-19.942,0,0.227,0.982,0.0,0.1,0.157,203.366,6KseaEAFSS63N2NPZtDnRL,spotify:track:6KseaEAFSS63N2NPZtDnRL,196692,4
9,Somos Algo,"Daniel, Me Estás Matando",0.836,0.519,7,-8.962,1,0.0355,0.686,0.00236,0.186,0.638,93.08,5iSpfk6cDOSYePagAoG639,spotify:track:5iSpfk6cDOSYePagAoG639,234128,4


# Baseline Explorations
- Select type of problem type (class/reg)
- Determine model baselines
- Model evaluations/comparisons