# Project Name: Recognize genre of a song

### Data Source: Spotify Web API

## Primary goals to fetch data


### Step-1. Setting up Spotify Developer account
##### Logging In to Spotify Developer
##### Creating a Client ID
##### Retrieving Client ID and Client Secret
### Step-2. Reviewing Spotify Web API resources
### Step-3. Selecting a partcular OAuth Flow
### Step-4. Using Spotipy, a Python library for Spotify API¶
##### Installing Spotipy
##### Importing and Setting Up Spotipy
##### Retrieving Data from Spotify Web API
##### Loading Data into DataFrame for Exploratory Data Analysis

Reference:
https://developer.spotify.com/documentation/web-api/

In [48]:
# importing necessary libraries
import numpy as np
import pandas as pd

# Using ''Spotipy''

Spotipy: a lightweight Python library for Spotify Web API

With Spotipy we get full access to all of the music data provided by the Spotify platform.

SPOTIPY_CLIENT_ID(cid) and SPOTIPY_CLIENT_SECRET(secret) environment variables have to be used in this purpose.

Reference:
https://spotipy.readthedocs.io/en/2.19.0/

In [49]:
pip install spotipy --upgrade

Note: you may need to restart the kernel to use updated packages.


# Authorization

Reference:
https://developer.spotify.com/documentation/general/guides/authorization/

Authorization refers to the process of granting a user or application access permissions to Spotify data and features. Spotify implements the OAuth 2.0 authorization framework.

Where:

End User corresponds to the Spotify user. The End User grants access to the protected resources (e.g. playlists, personal information, etc.)

My App is the client that requests access to the protected resources (e.g. a mobile or web app).

Server which hosts the protected resources and provides authentication and authorization via OAuth 2.0.

The access to the protected resources is determined by one or several scopes. Scopes enable your application to access specific functionality (e.g. read a playlist, modify your library or just streaming) on behalf of a user. The set of scopes you set during the authorization, determines the access permissions that the user is asked to grant. You can find detailed information about scopes in the scopes guide.

The authorization process requires valid client credentials: a client ID and a client secret. You can follow the App settings guide to learn how to generate them.

Once the authorization is granted, the authorization server issues an access token, which is used to make API calls on behalf the user or application.

The OAuth2 standard defines four grant types (or flows) to request and get an access token. Spotify implements the following ones:

1. Authorization code + PKCE extension
2. Client credentials
3. Implicit grant

###  Option 2: 
### Client Credentials is being used in our project

# Client Credentials Flow:

The Client Credentials flow is used in server-to-server authentication. Since this flow does not include authorization, only endpoints that do not access user information can be accessed.

#### Application(Request Access Token)--(client_id+client_secret)-->Spotify Accounts Service(Return Access Token)--access token-->
#### User Access Token in requests to Web API--->access token--->Spotify Web API(Returned requested data) and 
####  User Access Token in requests to Web API<---json object<---Spotify Web API(Returned requested data)

# OAuth flow usage:

https://developer.spotify.com/documentation/general/guides/authorization/

OAuth Flow: 
Client credentials is used that does not include user authorization or access token

In [50]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

### App is set by using following link and client_id and secret have been achieved

Reference:
https://developer.spotify.com/documentation/general/guides/authorization/app-settings/

In [51]:
cid = '37f5190916ce4c96850877cca9b1eda6'
secret = '2c8c98f3070c469099f7741f6af31933'

Reference: https://spotipy.readthedocs.io/en/2.19.0/

In [52]:
client_credentials_manager = SpotifyClientCredentials(client_id=cid, client_secret=secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

### Example of collecting real-time Spotify's audio features or data endpoints from track

In [53]:
a=sp.audio_features(tracks=['1YwoYPfhuZlLDbJUH1cKSi'])

In [54]:
a

[{'danceability': 0.744,
  'energy': 0.434,
  'key': 5,
  'loudness': -10.807,
  'mode': 0,
  'speechiness': 0.0845,
  'acousticness': 0.583,
  'instrumentalness': 0.0151,
  'liveness': 0.0715,
  'valence': 0.939,
  'tempo': 173.871,
  'type': 'audio_features',
  'id': '1YwoYPfhuZlLDbJUH1cKSi',
  'uri': 'spotify:track:1YwoYPfhuZlLDbJUH1cKSi',
  'track_href': 'https://api.spotify.com/v1/tracks/1YwoYPfhuZlLDbJUH1cKSi',
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/1YwoYPfhuZlLDbJUH1cKSi',
  'duration_ms': 144360,
  'time_signature': 3}]

### Genre list is attained by using following documentaion >> sp.recommendation_genre_seeds()

Reference: https://spotipy.readthedocs.io/en/2.14.0/

# Full Genre List

In [55]:
genre_list = sp.recommendation_genre_seeds()['genres']

In [56]:
genre_list

['acoustic',
 'afrobeat',
 'alt-rock',
 'alternative',
 'ambient',
 'anime',
 'black-metal',
 'bluegrass',
 'blues',
 'bossanova',
 'brazil',
 'breakbeat',
 'british',
 'cantopop',
 'chicago-house',
 'children',
 'chill',
 'classical',
 'club',
 'comedy',
 'country',
 'dance',
 'dancehall',
 'death-metal',
 'deep-house',
 'detroit-techno',
 'disco',
 'disney',
 'drum-and-bass',
 'dub',
 'dubstep',
 'edm',
 'electro',
 'electronic',
 'emo',
 'folk',
 'forro',
 'french',
 'funk',
 'garage',
 'german',
 'gospel',
 'goth',
 'grindcore',
 'groove',
 'grunge',
 'guitar',
 'happy',
 'hard-rock',
 'hardcore',
 'hardstyle',
 'heavy-metal',
 'hip-hop',
 'holidays',
 'honky-tonk',
 'house',
 'idm',
 'indian',
 'indie',
 'indie-pop',
 'industrial',
 'iranian',
 'j-dance',
 'j-idol',
 'j-pop',
 'j-rock',
 'jazz',
 'k-pop',
 'kids',
 'latin',
 'latino',
 'malay',
 'mandopop',
 'metal',
 'metal-misc',
 'metalcore',
 'minimal-techno',
 'movies',
 'mpb',
 'new-age',
 'new-release',
 'opera',
 'pagode',

In [57]:
len(genre_list)

126

### 126 genres are obtained from Spotify Web API

Reference: https://spotipy.readthedocs.io/en/2.19.0/#features

In [58]:
# creating empty lists for extraction of some specific desired features
track_name = []
artist_name = []
album_name = []
genre = []
duration_ms = []
popularity = []
explicit = []
track_id = []
artist_id = []

#### Note: 
Spotify has set the maximum offset to 1000 which was previously 10,000. The following sp.search() returns a maximum of 50 results per query, which is why a nested for loop is utilized.

#### limit integer
The maximum number of items to return. Default: 20. Minimum: 1. Maximum: 50. >= 0 and <= 50

#### offset integer
The index of the first item to return. Default: 0 (the first item). Default value: 0 

In [60]:
# it will be iterated through each genre
for gen in genre_list:
    
    # as requests are limited to only 50 units, so multiple API requests are needed to get 1000 songs per genre for 126 genres
    for i in range(0,1000,50):
        
        q = 'genre:'+str(gen)
        
        # Storing API request results in a new variable called genre_results
        genre_results = sp.search(q=q, type='track', limit=50,offset=i)
        
        # Iterating through tracks and storing relevant information in lists
        for i, t in enumerate(genre_results['tracks']['items']):
            track_name.append(t['name'])
            artist_name.append(t['artists'][0]['name'])
            album_name.append(t['album']['name'])
            genre.append(gen)
            duration_ms.append(t['duration_ms'])
            popularity.append(t['popularity'])
            explicit.append(t['explicit'])
            track_id.append(t['id'])
            artist_id.append(t['artists'][0]['id'])

## Initializing DataFrame 'df' with data

In [61]:
df = pd.DataFrame({'track_name':track_name,'artist_name':artist_name,
                   'album_name':album_name,'genre':genre,'duration_ms':duration_ms,
                   'popularity':popularity,'explicit':explicit,
                   'track_id' : track_id,'artist_id':artist_id,})

In [62]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150550 entries, 0 to 150549
Data columns (total 9 columns):
 #   Column       Non-Null Count   Dtype 
---  ------       --------------   ----- 
 0   track_name   150550 non-null  object
 1   artist_name  150550 non-null  object
 2   album_name   150550 non-null  object
 3   genre        150550 non-null  object
 4   duration_ms  150550 non-null  int64 
 5   popularity   150550 non-null  int64 
 6   explicit     150550 non-null  bool  
 7   track_id     150550 non-null  object
 8   artist_id    150550 non-null  object
dtypes: bool(1), int64(2), object(6)
memory usage: 9.3+ MB


## Length of unique genre list

In [63]:
len(df.genre.unique())

114

## Length of total tracks in unique genre list

In [64]:
len(df.track_id.unique())

87028

## Shuffling in pandas dataframe without resetting index

In [65]:
df=df.sample(frac=1)

In [66]:
df

Unnamed: 0,track_name,artist_name,album_name,genre,duration_ms,popularity,explicit,track_id,artist_id
126160,Yo Perreo Sola - Remix,Bad Bunny,Yo Perreo Sola (Remix),reggaeton,174027,61,True,2cpteAYHcd4cjSxAeCkA52,4q3ewBCX7sLwd24euuV69X
140356,Sitting In The Park,Billy Stewart,I Do Love You,soul,194226,52,False,0Iwbts8iqaJJh2kPiXbtgV,21llKqnS025UdaAMslJS4J
149427,Hold On To Me,Gokhan Akkas,Hold On To Me,turkish,171428,37,False,6NvaEJ6IhUWTN5XIXSb89o,5eqGxrMJ31RttUaQF9QtUP
128447,The Train Kept A Rollin',Johnny Burnette & The Rock 'N' Roll Trio,Johnny Burnette And The Rock 'N Roll Trio,rock-n-roll,139453,42,False,7s8spWIA65WP6vG18bXDP1,1neKWNZP74NEuvHZmvMS58
77212,If God / Nothing But the Blood,Casey J,The Gathering,gospel,505120,42,False,6hre6Y63XrzzTPOBT9NM6e,0B0NzcRnTARbZc83a34cDd
...,...,...,...,...,...,...,...,...,...
50155,Love Desire - Mike Dunn Blackball Love MixX,Mike Dunn,Traxx That Jack U,chicago-house,377419,16,False,44BjWwxKQM4h4MD2ZC6Yca,55UOywvWbUD9c6C3NSGdft
77993,Kiss,London After Midnight,Psycho Magnet,goth,384733,39,False,7clzk3tSYv6VkQWByftu24,51mEqzVhG2n9nD2kBAnWer
115328,Hug A Hedgy,Wonder Pets,Wonder Pets,party,165933,15,False,3Cii7zTRELKneIXMFUXlkU,7ni4mcsUHE4P5g7Qd15DHD
107544,愛的魔法,金莎,他不愛我,mandopop,191895,42,False,4X5SAZ4JKejo2tPhvCJhZf,0wK3rMwogVowmeXArZN29T


In [67]:
df.head()

Unnamed: 0,track_name,artist_name,album_name,genre,duration_ms,popularity,explicit,track_id,artist_id
126160,Yo Perreo Sola - Remix,Bad Bunny,Yo Perreo Sola (Remix),reggaeton,174027,61,True,2cpteAYHcd4cjSxAeCkA52,4q3ewBCX7sLwd24euuV69X
140356,Sitting In The Park,Billy Stewart,I Do Love You,soul,194226,52,False,0Iwbts8iqaJJh2kPiXbtgV,21llKqnS025UdaAMslJS4J
149427,Hold On To Me,Gokhan Akkas,Hold On To Me,turkish,171428,37,False,6NvaEJ6IhUWTN5XIXSb89o,5eqGxrMJ31RttUaQF9QtUP
128447,The Train Kept A Rollin',Johnny Burnette & The Rock 'N' Roll Trio,Johnny Burnette And The Rock 'N Roll Trio,rock-n-roll,139453,42,False,7s8spWIA65WP6vG18bXDP1,1neKWNZP74NEuvHZmvMS58
77212,If God / Nothing But the Blood,Casey J,The Gathering,gospel,505120,42,False,6hre6Y63XrzzTPOBT9NM6e,0B0NzcRnTARbZc83a34cDd


## Shuffling in pandas dataframe by resetting index

In [68]:
df=df.sample(frac=1).reset_index(drop=True)

In [69]:
df.head()

Unnamed: 0,track_name,artist_name,album_name,genre,duration_ms,popularity,explicit,track_id,artist_id
0,Balanço Zona Sul,Wilson Simonal,"Tem ""Algo Mais""",pagode,152960,38,False,73P2hHCGMDlET853vUOq08,6DqFCzjARUV3xH9meu3Bya
1,If You Could Read My Mind,Gordon Lightfoot,If You Could Read My Mind,folk,228840,64,False,57ct8jKi6trntXiRV0NnXi,23rleGXVOVVgTk3xgtmfE4
2,Let Me Out,Future Leaders of the World,LVL IV,grunge,243173,51,False,5Qyywa7BZCkXE5K6O6OHPb,25xjytYjVKPvHiDIgrFodf
3,Days to Come (feat. Fiora),Seven Lions,Days To Come EP,progressive-house,284262,44,False,1FRFgyPNuqfXIRYlCJ1kwR,6fcTRFpz0yH79qSKfof7lp
4,Summer Jam,Matt Sassari,Glasgow Underground Ibiza 2022,french,179000,0,False,32DKwq28DvO0NNE7l6M3KJ,21dVknSLCsK37cWozWDZZS


# Pulling real-time Spotify's audio features or data endpoints from tracks

tqdm is a Python library for adding progress bar. 
It helps to configure and display a progress bar with metrics we want to track.

In [70]:
import tqdm

In [71]:
from tqdm.notebook import tqdm

In [72]:
# Pulling audio features
tracks = df.track_id.to_list()
audio_features = []
batchsize = 100

In [73]:
# Iterating over 100 song batches (due to API limit per request)
for i in tqdm(range(0,len(tracks),batchsize)):
    batch = tracks[i:i+batchsize]
    # Collecting features for 100 tracks
    feature_results = sp.audio_features(batch)
    # Storing individual track info in list
    for track in feature_results:
        if track is not None:
            audio_features.append(track)

  0%|          | 0/1506 [00:00<?, ?it/s]

## Accummulated extracted audio features from tracks

In [74]:
audio_features[0]

{'danceability': 0.511,
 'energy': 0.321,
 'key': 5,
 'loudness': -11.835,
 'mode': 1,
 'speechiness': 0.0386,
 'acousticness': 0.425,
 'instrumentalness': 0,
 'liveness': 0.141,
 'valence': 0.705,
 'tempo': 139.489,
 'type': 'audio_features',
 'id': '73P2hHCGMDlET853vUOq08',
 'uri': 'spotify:track:73P2hHCGMDlET853vUOq08',
 'track_href': 'https://api.spotify.com/v1/tracks/73P2hHCGMDlET853vUOq08',
 'analysis_url': 'https://api.spotify.com/v1/audio-analysis/73P2hHCGMDlET853vUOq08',
 'duration_ms': 152960,
 'time_signature': 4}

In [75]:
af_df = pd.DataFrame.from_dict(data=audio_features,orient='columns')

In [76]:
af_df

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.511,0.3210,5,-11.835,1,0.0386,0.42500,0.000000,0.1410,0.705,139.489,audio_features,73P2hHCGMDlET853vUOq08,spotify:track:73P2hHCGMDlET853vUOq08,https://api.spotify.com/v1/tracks/73P2hHCGMDlE...,https://api.spotify.com/v1/audio-analysis/73P2...,152960,4
1,0.612,0.2400,9,-12.821,1,0.0330,0.74500,0.000079,0.1150,0.223,122.565,audio_features,57ct8jKi6trntXiRV0NnXi,spotify:track:57ct8jKi6trntXiRV0NnXi,https://api.spotify.com/v1/tracks/57ct8jKi6trn...,https://api.spotify.com/v1/audio-analysis/57ct...,228840,4
2,0.553,0.8710,1,-5.310,1,0.0589,0.02460,0.000107,0.1810,0.474,144.064,audio_features,5Qyywa7BZCkXE5K6O6OHPb,spotify:track:5Qyywa7BZCkXE5K6O6OHPb,https://api.spotify.com/v1/tracks/5Qyywa7BZCkX...,https://api.spotify.com/v1/audio-analysis/5Qyy...,243173,4
3,0.467,0.7230,0,-7.101,0,0.0536,0.11700,0.000003,0.1860,0.207,139.995,audio_features,1FRFgyPNuqfXIRYlCJ1kwR,spotify:track:1FRFgyPNuqfXIRYlCJ1kwR,https://api.spotify.com/v1/tracks/1FRFgyPNuqfX...,https://api.spotify.com/v1/audio-analysis/1FRF...,284263,4
4,0.786,0.7980,9,-6.854,0,0.0420,0.03080,0.013400,0.3670,0.651,130.022,audio_features,32DKwq28DvO0NNE7l6M3KJ,spotify:track:32DKwq28DvO0NNE7l6M3KJ,https://api.spotify.com/v1/tracks/32DKwq28DvO0...,https://api.spotify.com/v1/audio-analysis/32DK...,179000,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
150545,0.804,0.6550,0,-8.668,1,0.0435,0.01140,0.316000,0.5850,0.572,119.001,audio_features,47AiGjtI7cOLyPwPSCqTuO,spotify:track:47AiGjtI7cOLyPwPSCqTuO,https://api.spotify.com/v1/tracks/47AiGjtI7cOL...,https://api.spotify.com/v1/audio-analysis/47Ai...,423529,4
150546,0.429,0.8130,8,-6.252,1,0.0661,0.00285,0.000025,0.0883,0.615,113.283,audio_features,6LhYshEM4feJKQTr7riTBB,spotify:track:6LhYshEM4feJKQTr7riTBB,https://api.spotify.com/v1/tracks/6LhYshEM4feJ...,https://api.spotify.com/v1/audio-analysis/6LhY...,238627,4
150547,0.624,0.8050,9,-2.632,1,0.0290,0.04660,0.000000,0.3280,0.770,123.991,audio_features,2dRPQFwPqAmc42mDRnsDQu,spotify:track:2dRPQFwPqAmc42mDRnsDQu,https://api.spotify.com/v1/tracks/2dRPQFwPqAmc...,https://api.spotify.com/v1/audio-analysis/2dRP...,212173,4
150548,0.549,0.9530,5,-6.205,1,0.0587,0.42900,0.000000,0.7110,0.908,170.212,audio_features,3OciZ7iaTPVWCw7yKzrMZi,spotify:track:3OciZ7iaTPVWCw7yKzrMZi,https://api.spotify.com/v1/tracks/3OciZ7iaTPVW...,https://api.spotify.com/v1/audio-analysis/3Oci...,231467,4


## matching the audio feature id==track_id

In [77]:
af_df.rename(columns={'id':'track_id'}, inplace=True)

In [78]:
af_df

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,track_id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.511,0.3210,5,-11.835,1,0.0386,0.42500,0.000000,0.1410,0.705,139.489,audio_features,73P2hHCGMDlET853vUOq08,spotify:track:73P2hHCGMDlET853vUOq08,https://api.spotify.com/v1/tracks/73P2hHCGMDlE...,https://api.spotify.com/v1/audio-analysis/73P2...,152960,4
1,0.612,0.2400,9,-12.821,1,0.0330,0.74500,0.000079,0.1150,0.223,122.565,audio_features,57ct8jKi6trntXiRV0NnXi,spotify:track:57ct8jKi6trntXiRV0NnXi,https://api.spotify.com/v1/tracks/57ct8jKi6trn...,https://api.spotify.com/v1/audio-analysis/57ct...,228840,4
2,0.553,0.8710,1,-5.310,1,0.0589,0.02460,0.000107,0.1810,0.474,144.064,audio_features,5Qyywa7BZCkXE5K6O6OHPb,spotify:track:5Qyywa7BZCkXE5K6O6OHPb,https://api.spotify.com/v1/tracks/5Qyywa7BZCkX...,https://api.spotify.com/v1/audio-analysis/5Qyy...,243173,4
3,0.467,0.7230,0,-7.101,0,0.0536,0.11700,0.000003,0.1860,0.207,139.995,audio_features,1FRFgyPNuqfXIRYlCJ1kwR,spotify:track:1FRFgyPNuqfXIRYlCJ1kwR,https://api.spotify.com/v1/tracks/1FRFgyPNuqfX...,https://api.spotify.com/v1/audio-analysis/1FRF...,284263,4
4,0.786,0.7980,9,-6.854,0,0.0420,0.03080,0.013400,0.3670,0.651,130.022,audio_features,32DKwq28DvO0NNE7l6M3KJ,spotify:track:32DKwq28DvO0NNE7l6M3KJ,https://api.spotify.com/v1/tracks/32DKwq28DvO0...,https://api.spotify.com/v1/audio-analysis/32DK...,179000,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
150545,0.804,0.6550,0,-8.668,1,0.0435,0.01140,0.316000,0.5850,0.572,119.001,audio_features,47AiGjtI7cOLyPwPSCqTuO,spotify:track:47AiGjtI7cOLyPwPSCqTuO,https://api.spotify.com/v1/tracks/47AiGjtI7cOL...,https://api.spotify.com/v1/audio-analysis/47Ai...,423529,4
150546,0.429,0.8130,8,-6.252,1,0.0661,0.00285,0.000025,0.0883,0.615,113.283,audio_features,6LhYshEM4feJKQTr7riTBB,spotify:track:6LhYshEM4feJKQTr7riTBB,https://api.spotify.com/v1/tracks/6LhYshEM4feJ...,https://api.spotify.com/v1/audio-analysis/6LhY...,238627,4
150547,0.624,0.8050,9,-2.632,1,0.0290,0.04660,0.000000,0.3280,0.770,123.991,audio_features,2dRPQFwPqAmc42mDRnsDQu,spotify:track:2dRPQFwPqAmc42mDRnsDQu,https://api.spotify.com/v1/tracks/2dRPQFwPqAmc...,https://api.spotify.com/v1/audio-analysis/2dRP...,212173,4
150548,0.549,0.9530,5,-6.205,1,0.0587,0.42900,0.000000,0.7110,0.908,170.212,audio_features,3OciZ7iaTPVWCw7yKzrMZi,spotify:track:3OciZ7iaTPVWCw7yKzrMZi,https://api.spotify.com/v1/tracks/3OciZ7iaTPVW...,https://api.spotify.com/v1/audio-analysis/3Oci...,231467,4


In [79]:
final_df = pd.merge(df,af_df,how='inner',on='track_id')

In [81]:
final_df.shape

(366684, 26)

In [82]:
final_df.to_csv('extracted_audio_dataset.csv')