# Song Features

**Danceability**: Describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm
stability, beat strength, and overall regularity.<br>

**Valence**: Describes the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). <br>


**Energy**: Represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale.<br>

**Tempo**: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece, and derives directly from the average beat duration. <br>

**Loudness**: The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. <br>

**Speechiness**: This detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. <br>

**Instrumentalness**: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. <br>

**Liveness**: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. <br>

**Acousticness**: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. <br>

**Key**: The estimated overall key of the track. Integers map to pitches using standard Pitch Class notation . E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. <br>

**Mode**: Indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0. <br>

**Duration**: The duration of the track in milliseconds. <br>

**Time Signature**: An estimated overall time signature of a track. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure). <br>




In [7]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
import spotipy
from spotipy.oauth2 import SpotifyOAuth
from tqdm import tqdm
import re
from sklearn.neighbors import NearestNeighbors
from sqlalchemy import create_engine
import pymysql

In [5]:
sns.set(rc = {'figure.figsize' : (10,10)})

In [8]:
audio_features = pd.read_excel("Hot 100 Audio Features.xlsx")

In [9]:
print(f"audio features : {audio_features.shape} ")


audio features : (29503, 22) 


### Droping unuseful features

In [10]:
audio_features.head()

Unnamed: 0,SongID,Performer,Song,spotify_genre,spotify_track_id,spotify_track_preview_url,spotify_track_duration_ms,spotify_track_explicit,spotify_track_album,danceability,...,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature,spotify_track_popularity
0,-twistin'-White Silver SandsBill Black's Combo,Bill Black's Combo,-twistin'-White Silver Sands,[],,,,,,,...,,,,,,,,,,
1,¿Dònde Està Santa Claus? (Where Is Santa Claus...,Augie Rios,¿Dònde Està Santa Claus? (Where Is Santa Claus?),['novelty'],,,,,,,...,,,,,,,,,,
2,......And Roses And RosesAndy Williams,Andy Williams,......And Roses And Roses,"['adult standards', 'brill building pop', 'eas...",3tvqPPpXyIgKrm4PR9HCf0,https://p.scdn.co/mp3-preview/cef4883cfd1e0e53...,166106.0,0.0,The Essential Andy Williams,0.154,...,-14.063,1.0,0.0315,0.911,0.000267,0.112,0.15,83.969,4.0,38.0
3,...And Then There Were DrumsSandy Nelson,Sandy Nelson,...And Then There Were Drums,"['rock-and-roll', 'space age pop', 'surf music']",1fHHq3qHU8wpRKHzhojZ4a,,172066.0,0.0,Compelling Percussion,0.588,...,-17.278,0.0,0.0361,0.00256,0.745,0.145,0.801,121.962,4.0,11.0
4,...Baby One More TimeBritney Spears,Britney Spears,...Baby One More Time,"['dance pop', 'pop', 'post-teen pop']",3MjUtNVVq3C8Fn0MP3zhXa,https://p.scdn.co/mp3-preview/da2134a161f1cb34...,211066.0,0.0,...Baby One More Time (Digital Deluxe Version),0.759,...,-5.745,0.0,0.0307,0.202,0.000131,0.443,0.907,92.96,4.0,77.0


In [11]:
audio_features.columns

Index(['SongID', 'Performer', 'Song', 'spotify_genre', 'spotify_track_id',
       'spotify_track_preview_url', 'spotify_track_duration_ms',
       'spotify_track_explicit', 'spotify_track_album', 'danceability',
       'energy', 'key', 'loudness', 'mode', 'speechiness', 'acousticness',
       'instrumentalness', 'liveness', 'valence', 'tempo', 'time_signature',
       'spotify_track_popularity'],
      dtype='object')

In [12]:
drop_features = ['SongID','spotify_track_preview_url', 'spotify_track_duration_ms', 'spotify_track_explicit', 'spotify_track_album','spotify_track_popularity']
audio_features1 = audio_features.drop(drop_features,axis = 1)

In [13]:
audio_features1.isna().sum()

Performer              0
Song                   0
spotify_genre       1600
spotify_track_id    5106
danceability        5169
energy              5169
key                 5169
loudness            5169
mode                5169
speechiness         5169
acousticness        5169
instrumentalness    5169
liveness            5169
valence             5169
tempo               5169
time_signature      5169
dtype: int64

In [14]:
NullVAl_index = audio_features1[audio_features1['danceability'].isna()].index

### Collectiing Missing values for features 
step 1 : Collect song_id with help of Artist name and song name  <br>
step 2 : Collect features with help of song_id

In [16]:
# Spotify Authentication

SPOTIPY_CLIENT_ID=     # enter your data 
SPOTIPY_CLIENT_SECRET=# enter your data 
SPOTIPY_REDIRECT_URI= # enter your data 
SCOPE = "user-library-read"

sp = spotipy.Spotify(auth_manager=SpotifyOAuth(client_id=SPOTIPY_CLIENT_ID, 
                                               client_secret=SPOTIPY_CLIENT_SECRET, 
                                               redirect_uri=SPOTIPY_REDIRECT_URI, 
                                               scope=SCOPE))

**Step 1**

In [17]:
def get_song_id(artist,track):
    try:
        q = 'artist:{} track: {}'.format(artist, track)
        results = sp.search(q=q, limit=1, type='track')
        id = results['tracks']['items'][0]['id']
        return id
    except:
        return

In [18]:
# collecting Songs_id in one list
song_id = []
for index in tqdm(NullVAl_index):
    artist = audio_features1.iloc[index,0]
    song = audio_features1.iloc[index,1]
    id1 = get_song_id(artist,song)
    song_id.append(id1)

100%|██████████████████████████████████████████████████████████████████████████████| 5169/5169 [15:12<00:00,  5.67it/s]


In [19]:
Song_id = pd.Series(song_id)
Song_id.index = NullVAl_index

In [20]:
# assingin collected vlaue to dataFrame
audio_features2 = audio_features1.copy()
audio_features2.loc[Song_id.index,'spotify_track_id'] = Song_id

# droping Null and duplicate values

audio_features2.dropna(subset=['spotify_track_id'],inplace=True)
audio_features2.drop_duplicates(subset=['Performer','Song'],inplace = True)

In [22]:
audio_features2.dropna(subset=['spotify_track_id'],inplace=True)
audio_features2.drop_duplicates(subset=['Performer','Song'],inplace = True)
audio_features2.reset_index(drop= True,inplace = True)

**Step 2**

In [23]:
NullVAl_index = audio_features2[audio_features2['danceability'].isna()].index

In [24]:
def get_feature(track_features,trackName,artistName,id):
    
    clms = ['trackName','artistName','id','danceability','energy','key','loudness','mode','speechiness','acousticness','instrumentalness','liveness','valence','tempo','time_signature']
    track = {'trackName':trackName,'artistName':artistName}
    meta = sp.audio_features(id)
    track.update(meta[0])
    tempDf = pd.DataFrame.from_dict([track])[clms]
    track_features = pd.concat([track_features,tempDf],ignore_index=True)
    return track_features

In [26]:
def get_features(df,id_column):
    length = df.shape[0]
    lower_bound = 0 
    upper_bound = 100
    flag = 1
    while 1:
        ids = df[id_column][lower_bound:upper_bound]
        meta = sp.audio_features(ids)
    
        clms = ['danceability','energy','key','loudness','mode','speechiness','acousticness','instrumentalness','liveness','valence','tempo','time_signature']
        for i in range(0,len(meta)):
            data = meta[i]
            row_data = []
            for j in clms:
                row_data.append(data[j])
            idx = df.iloc[lower_bound+i:i+lower_bound+1].index 
            df.loc[idx,clms] = row_data 
            
        
        lower_bound = upper_bound
        upper_bound += 100
        
        if flag == 0: 
            df.loc[df.tail(1).index,clms] = row_data 
            return df
        if length - upper_bound < 0:
            upper_bound = length - 1
            flag = 0
        

    

    

In [27]:
audio_features3 = get_features(audio_features2.loc[NullVAl_index],'spotify_track_id')

In [28]:
clms = ['danceability','energy','key','loudness','mode','speechiness','acousticness','instrumentalness','liveness','valence','tempo','time_signature']
#  Transfering newly generated data from  audio_features3 to audio_features2
for index in audio_features3.index:
           audio_features2.loc[index,clms] =  audio_features3.loc[index,clms]

In [39]:
audio_features2.isna().sum()

Performer             0
Song                  0
spotify_genre       423
spotify_track_id      0
danceability          0
energy                0
key                   0
loudness              0
mode                  0
speechiness           0
acousticness          0
instrumentalness      0
liveness              0
valence               0
tempo                 0
time_signature        0
dtype: int64

In [40]:
audio_features4 = audio_features2.dropna(subset=['spotify_genre'])

In [42]:
audio_features4.isna().sum()

Performer           0
Song                0
spotify_genre       0
spotify_track_id    0
danceability        0
energy              0
key                 0
loudness            0
mode                0
speechiness         0
acousticness        0
instrumentalness    0
liveness            0
valence             0
tempo               0
time_signature      0
dtype: int64

In [43]:
audio_features4.to_csv('Cleaned_data.csv')

Correcting Formate of Genres

In [8]:
audio_features4 = pd.read_csv("Cleaned_data.csv")
audio_features4.drop('Unnamed: 0',axis = 1 ,inplace = True)

In [9]:
total_genres = []
for index, row in tqdm(audio_features4.iterrows()):
    
    for j in row['spotify_genre'].split(','):
        j = re.sub("[^a-z A-Z]","",j).strip()
        total_genres.append(j)

Selected_genres = pd.Series(total_genres).value_counts().head(100).index # Selecting Top 50 genres


# keeping top 100 genres only, assigning other to the rest of the genres 
spotify_genres = []
for index, row in tqdm(audio_features4.iterrows()):
    temp = []
    for j in row['spotify_genre'].split(','):
        j = re.sub("[^a-z A-Z]","",j).strip()
        if j in Selected_genres:
            temp.append(j)
        elif len(row['spotify_genre'].split(','))> 1:
            continue
        else:
             temp.append('other')
    spotify_genres.append(temp)

audio_features4['spotify_genre'] = spotify_genres

25436it [00:01, 17209.75it/s]
25436it [00:01, 14099.81it/s]


In [10]:
audio_features4

Unnamed: 0,Performer,Song,spotify_genre,spotify_track_id,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature
0,Augie Rios,¿Dònde Està Santa Claus? (Where Is Santa Claus?),[other],4NR0ELiNVPvPfjWrqugY9c,0.713,0.547,0.0,-6.419,1.0,0.0430,0.26400,0.000000,0.1600,0.870,130.396,4.0
1,Andy Williams,......And Roses And Roses,"[adult standards, brill building pop, easy lis...",3tvqPPpXyIgKrm4PR9HCf0,0.154,0.185,5.0,-14.063,1.0,0.0315,0.91100,0.000267,0.1120,0.150,83.969,4.0
2,Sandy Nelson,...And Then There Were Drums,[rockandroll],1fHHq3qHU8wpRKHzhojZ4a,0.588,0.672,11.0,-17.278,0.0,0.0361,0.00256,0.745000,0.1450,0.801,121.962,4.0
3,Britney Spears,...Baby One More Time,"[dance pop, pop, postteen pop]",3MjUtNVVq3C8Fn0MP3zhXa,0.759,0.699,0.0,-5.745,0.0,0.0307,0.20200,0.000131,0.4430,0.907,92.960,4.0
4,Taylor Swift,...Ready For It?,"[pop, postteen pop]",2yLa0QULdQr0qAIvVwN6B5,0.613,0.764,2.0,-6.509,1.0,0.1360,0.05270,0.000000,0.1970,0.417,160.015,4.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
25431,Bob B. Soxx And The Blue Jeans,Zip-A-Dee Doo-Dah,[brill building pop],58EJRUQRaS9GXE86Kda7bY,0.875,0.785,10.0,-8.253,1.0,0.0402,0.54100,0.867000,0.2340,0.980,106.824,4.0
25432,Bad Wolves,Zombie,"[alternative metal, metal, postgrunge]",1vNoA9F5ASnlBISFekDmg3,0.448,0.826,2.0,-3.244,0.0,0.0319,0.00756,0.000000,0.1170,0.190,77.093,4.0
25433,Future,Zoom,"[atl hip hop, hip hop, pop rap, rap, southern ...",2IG6Te7JyvrtqhFeOF7le4,0.852,0.438,9.0,-7.673,1.0,0.4260,0.01450,0.000000,0.2630,0.627,150.945,4.0
25434,Herb Alpert & The Tijuana Brass,Zorba The Greek,"[adult standards, easy listening, lounge]",3WLEVNohakzZmMpN5W7mHK,0.531,0.642,5.0,-12.702,1.0,0.3230,0.15400,0.279000,0.0584,0.192,82.107,4.0


In [11]:
null_genre_index = audio_features4[audio_features4['spotify_genre'] == ''].index
audio_features4.drop(null_genre_index,inplace = True)


In [12]:
audio_features4.reset_index(drop= True,inplace = True)

In [13]:
audio_features4

Unnamed: 0,Performer,Song,spotify_genre,spotify_track_id,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature
0,Augie Rios,¿Dònde Està Santa Claus? (Where Is Santa Claus?),[other],4NR0ELiNVPvPfjWrqugY9c,0.713,0.547,0.0,-6.419,1.0,0.0430,0.26400,0.000000,0.1600,0.870,130.396,4.0
1,Andy Williams,......And Roses And Roses,"[adult standards, brill building pop, easy lis...",3tvqPPpXyIgKrm4PR9HCf0,0.154,0.185,5.0,-14.063,1.0,0.0315,0.91100,0.000267,0.1120,0.150,83.969,4.0
2,Sandy Nelson,...And Then There Were Drums,[rockandroll],1fHHq3qHU8wpRKHzhojZ4a,0.588,0.672,11.0,-17.278,0.0,0.0361,0.00256,0.745000,0.1450,0.801,121.962,4.0
3,Britney Spears,...Baby One More Time,"[dance pop, pop, postteen pop]",3MjUtNVVq3C8Fn0MP3zhXa,0.759,0.699,0.0,-5.745,0.0,0.0307,0.20200,0.000131,0.4430,0.907,92.960,4.0
4,Taylor Swift,...Ready For It?,"[pop, postteen pop]",2yLa0QULdQr0qAIvVwN6B5,0.613,0.764,2.0,-6.509,1.0,0.1360,0.05270,0.000000,0.1970,0.417,160.015,4.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
25431,Bob B. Soxx And The Blue Jeans,Zip-A-Dee Doo-Dah,[brill building pop],58EJRUQRaS9GXE86Kda7bY,0.875,0.785,10.0,-8.253,1.0,0.0402,0.54100,0.867000,0.2340,0.980,106.824,4.0
25432,Bad Wolves,Zombie,"[alternative metal, metal, postgrunge]",1vNoA9F5ASnlBISFekDmg3,0.448,0.826,2.0,-3.244,0.0,0.0319,0.00756,0.000000,0.1170,0.190,77.093,4.0
25433,Future,Zoom,"[atl hip hop, hip hop, pop rap, rap, southern ...",2IG6Te7JyvrtqhFeOF7le4,0.852,0.438,9.0,-7.673,1.0,0.4260,0.01450,0.000000,0.2630,0.627,150.945,4.0
25434,Herb Alpert & The Tijuana Brass,Zorba The Greek,"[adult standards, easy listening, lounge]",3WLEVNohakzZmMpN5W7mHK,0.531,0.642,5.0,-12.702,1.0,0.3230,0.15400,0.279000,0.0584,0.192,82.107,4.0
