# Contents
* [Data Collection](#collection)
    * [Spotify Charts](#spotifycharts)
        * [Spotipy](#spotipy)
        * [Spotify Charts csv](#csv)
    * [Spotify API](#spotifyapi)
        * [Audio features](#audiofeats)
        * [Album](#albumrelease)
    * [Genius Lyrics](#geniuslyrics)
* [Data Cleaning](#cleaning)

# Data Collection <a class="anchor" id="collection"></a>

This analysis employs several data sources, including Spotify Charts, Spotify API, and Genius Lyrics API. Spotify Charts provide the top 100 songs from different countries, and the Spotify API is used to gather the audio features of these songs. In addition, the lyrics for the most popular songs are obtained using the Genius API. These data sources are considered reliable and appropriate for addressing the research questions that this analysis seeks to answer.

## Spotify Charts <a class="anchor" id="spotifycharts"></a>

Spotify Charts provides free-to-download CSV files for its weekly and daily charts across various countries. These files include the top 200 songs for each chart, along with their respective artist names and streams. The files can be easily accessed and downloaded from the Spotify Charts website, making it a convenient option for gathering data on popular songs. However, one limitation of using the CSV files is that they do not include audio features or lyrics, which are necessary for sentiment analysis. As a result, additional sources such as the Spotify API and Genius API need to be used to obtain this information. For this project, the individual CSV files obtained for 73 countries that are imported to this notebook contain weekly top 200 songs in each region for the week of 2/16/2023. To optimize the collection of lyrics, only the top 100 songs are employed for this analysis. The following code shows the process of merging various CSV files containing weekly charts data into a dataframe.

In [7]:
# IMPORT DEPENDENCIES 
import pandas as pd
import numpy as np
import requests
import base64
import os
import time
from bs4 import BeautifulSoup as bs
import pprint
import re
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import spotipy.util as util
import pycountry

### I. Regional top 100 songs (Spotipy) <a class="anchor" id="spotipy"></a>

When using Spotipy to retrieve the top 100 songs from various countries, it is important to note that the number of songs returned may be inconsistent for certain countries. In some cases, setting the limit to 50 may only yield 23 top songs. This implies that the number of top songs available varies across countries, ranging from 23 to 100. Although opting to select the top 20 songs is possible, it is crucial to ensure that adequate data is obtained for subsequent analysis.

It is important to note that standard client credentials do not grant access to features such as the number of streams and rankings. Therefore, in light of this limitation, it is prudent to utilize the available CSV files from Spotify Charts. It is worth noting that there is currently no official documentation for the Spotify Charts API, which may imply that it's not available.
    

In [8]:
# SET CREDENTIALS
client_id_spotify = ''
client_secret_spotify = ''
client_credentials_manager = SpotifyClientCredentials(client_id=client_id_spotify, 
                                                      client_secret=client_secret_spotify)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

In [9]:
# GET ALL AVAILABLE MARKETS FOR SPOTIFY CHARTS 
country_codes = sp._get("markets")['markets']
country_codes

['AD',
 'AE',
 'AG',
 'AL',
 'AM',
 'AO',
 'AR',
 'AT',
 'AU',
 'AZ',
 'BA',
 'BB',
 'BD',
 'BE',
 'BF',
 'BG',
 'BH',
 'BI',
 'BJ',
 'BN',
 'BO',
 'BR',
 'BS',
 'BT',
 'BW',
 'BY',
 'BZ',
 'CA',
 'CD',
 'CG',
 'CH',
 'CI',
 'CL',
 'CM',
 'CO',
 'CR',
 'CV',
 'CW',
 'CY',
 'CZ',
 'DE',
 'DJ',
 'DK',
 'DM',
 'DO',
 'DZ',
 'EC',
 'EE',
 'EG',
 'ES',
 'ET',
 'FI',
 'FJ',
 'FM',
 'FR',
 'GA',
 'GB',
 'GD',
 'GE',
 'GH',
 'GM',
 'GN',
 'GQ',
 'GR',
 'GT',
 'GW',
 'GY',
 'HK',
 'HN',
 'HR',
 'HT',
 'HU',
 'ID',
 'IE',
 'IL',
 'IN',
 'IQ',
 'IS',
 'IT',
 'JM',
 'JO',
 'JP',
 'KE',
 'KG',
 'KH',
 'KI',
 'KM',
 'KN',
 'KR',
 'KW',
 'KZ',
 'LA',
 'LB',
 'LC',
 'LI',
 'LK',
 'LR',
 'LS',
 'LT',
 'LU',
 'LV',
 'LY',
 'MA',
 'MC',
 'MD',
 'ME',
 'MG',
 'MH',
 'MK',
 'ML',
 'MN',
 'MO',
 'MR',
 'MT',
 'MU',
 'MV',
 'MW',
 'MX',
 'MY',
 'MZ',
 'NA',
 'NE',
 'NG',
 'NI',
 'NL',
 'NO',
 'NP',
 'NR',
 'NZ',
 'OM',
 'PA',
 'PE',
 'PG',
 'PH',
 'PK',
 'PL',
 'PS',
 'PT',
 'PW',
 'PY',
 'QA',
 'RO',
 'RS',

In [16]:
def get_IDs(query,country_codes, sp): 
    '''get playlist IDs for all country codes'''
    playlist_ids = [] 
    for c in country_codes: 
        search_result = sp.search(q=query, type='playlist', market=c)
        playlist_id = search_result['playlists']['items'][0]['id']
        playlist_ids.append(playlist_id)
    return playlist_ids

In [18]:
def top_hits(playlist_ids, country_codes, limit, sp): 
    '''get top hits for all countries'''  
    track_names = []
    artist_names = [] 
    for i in range(len(playlist_ids)): 
        results = sp.playlist_tracks(playlist_ids[i], market=country_codes[i], limit=limit)
        track_name = []
        artist_name = [] 
        for track in results['items']:
            track_name.append(track['track']['name']) 
            artist_name.append(track['track']['artists'][0]['name']) 
        track_names.append(track_name) 
        artist_names.append(artist_name) 
    return track_names, artist_names

In [17]:
# PLAYLIST IDs FOR TOP 50 
playlist_ids = get_IDs('Top 50', country_codes, sp) 
playlist_ids

['3Eg5vT3aqp476DCjof3QVn',
 '2JfjfXDnKFzJEhtR6MtBlR',
 '3Eg5vT3aqp476DCjof3QVn',
 '3Eg5vT3aqp476DCjof3QVn',
 '3Eg5vT3aqp476DCjof3QVn',
 '3Eg5vT3aqp476DCjof3QVn',
 '1q3yDrg9VG9eHmfk0j4eNa',
 '2JfjfXDnKFzJEhtR6MtBlR',
 '3Eg5vT3aqp476DCjof3QVn',
 '3Eg5vT3aqp476DCjof3QVn',
 '3Eg5vT3aqp476DCjof3QVn',
 '3Eg5vT3aqp476DCjof3QVn',
 '3Eg5vT3aqp476DCjof3QVn',
 '3Eg5vT3aqp476DCjof3QVn',
 '3Eg5vT3aqp476DCjof3QVn',
 '3Eg5vT3aqp476DCjof3QVn',
 '2JfjfXDnKFzJEhtR6MtBlR',
 '3Eg5vT3aqp476DCjof3QVn',
 '3Eg5vT3aqp476DCjof3QVn',
 '3Eg5vT3aqp476DCjof3QVn',
 '3Eg5vT3aqp476DCjof3QVn',
 '2JfjfXDnKFzJEhtR6MtBlR',
 '3Eg5vT3aqp476DCjof3QVn',
 '3Eg5vT3aqp476DCjof3QVn',
 '3Eg5vT3aqp476DCjof3QVn',
 '3Eg5vT3aqp476DCjof3QVn',
 '3Eg5vT3aqp476DCjof3QVn',
 '2JfjfXDnKFzJEhtR6MtBlR',
 '3Eg5vT3aqp476DCjof3QVn',
 '3Eg5vT3aqp476DCjof3QVn',
 '3Eg5vT3aqp476DCjof3QVn',
 '3Eg5vT3aqp476DCjof3QVn',
 '3Eg5vT3aqp476DCjof3QVn',
 '3Eg5vT3aqp476DCjof3QVn',
 '2JfjfXDnKFzJEhtR6MtBlR',
 '3Eg5vT3aqp476DCjof3QVn',
 '3Eg5vT3aqp476DCjof3QVn',
 

In [24]:
%%time
# GET TOP HITS
track_names, artist_names = top_hits(playlist_ids, country_codes, 100, sp)
print(len(track_names))
print(len(artist_names))

184
184
CPU times: total: 1.75 s
Wall time: 37.7 s


In [25]:
# PRINT LENGTHS OF PLAYLIST IDs 
for i in track_names:
    print(len(i))

55
23
55
55
55
55
100
23
55
55
55
55
55
55
55
55
23
55
55
55
55
23
55
55
55
55
55
23
55
55
55
55
55
55
23
55
55
55
23
55
23
55
55
55
55
55
55
55
55
23
55
55
55
55
23
55
23
55
55
55
55
55
55
55
55
55
55
55
55
55
55
55
55
55
55
55
55
55
23
55
55
23
55
55
55
55
55
55
55
23
55
55
55
55
55
55
55
55
55
23
55
55
55
55
55
55
55
55
55
55
100
55
55
55
55
55
55
55
23
55
55
55
50
55
23
55
55
50
23
55
55
55
55
23
23
23
55
55
55
100
55
55
55
55
55
55
55
25
23
55
55
55
55
55
55
55
55
55
55
55
55
55
55
55
55
55
55
55
55
55
55
55
100
55
55
55
55
36
55
55
55
23
55
55


Note:  The playlist labeled as "Top 100" for certain countries may not contain precisely 100 songs.

In [26]:
%%time
# GET TOP 50
track_names, artist_names = top_hits(playlist_ids, country_codes, 50, sp)
print(len(track_names))
print(len(artist_names))

184
184
CPU times: total: 1.25 s
Wall time: 39.7 s


In [33]:
# DETERMINE HOW MANY COUNTRIES HAVE LESS THAN 50 SONGS  
count = 0 
index_list = [] 
for i in range(len(track_names)):
    if len(track_names[i])<50: 
        count+=1
        index_list.append(i) 
        print(len(track_names[i]))
count

23
23
23
23
23
23
23
23
23
23
23
23
23
23
23
23
23
23
23
23
23
25
23
36
23


25

In [41]:
# COUNTRIES WITH LESS THAN 50 HITS 
print(index_list) #print indices of countries with less than 50 hits 
c = [country_codes[i] for i in range(len(country_codes)) if i in index_list] 
print(c)

[1, 7, 16, 21, 27, 34, 38, 40, 49, 54, 56, 78, 81, 89, 99, 118, 124, 128, 133, 134, 135, 147, 148, 177, 181]
['AE', 'AT', 'BH', 'BR', 'CA', 'CO', 'CY', 'DE', 'ES', 'FR', 'GB', 'IT', 'JP', 'KW', 'LU', 'MY', 'NL', 'NZ', 'PH', 'PK', 'PL', 'SE', 'SG', 'VN', 'ZA']


#### Combine into a dataframe

In [21]:
# COMBINE LIST OF LISTS INTO A LIST 
track_names_list = [i for sublist in track_names for i in sublist]
artist_names_list = [i for sublist in artist_names for i in sublist]
print(len(track_names_list))
print(len(artist_names_list))

8540
8540


In [42]:
# ASSIGN COUNTRY CODE FOR EACH TRACK IN A NEW COLUMN 
countrycodes_list = [] 
for i in range(len(track_names)): 
    cc = [country_codes[i]]*len(track_names[i])
    countrycodes_list.append(cc) 
print(len(countrycodes_list))

#turn list of lists into list 
country_codes_list = [i for sublist in countrycodes_list for i in sublist]
print(len(country_codes_list))

184
8540


In [43]:
# COMBINE INTO A DATAFRAME 
tophits_df = pd.DataFrame(list(zip(country_codes_list, track_names_list, artist_names_list)),
               columns =['countrycode_iso2', 'track_names', 'artist_names'])
tophits_df

Unnamed: 0,countrycode_iso2,track_names,artist_names
0,AD,Bijlee Bijlee,Harrdy Sandhu
1,AD,Excuses,AP Dhillon
2,AD,Beliya,Gurnam Bhullar
3,AD,Punjabi Mutiyaran,Jasmine Sandlas
4,AD,Punjabiyan Di Dhee,Guru Randhawa
...,...,...,...
8535,ZW,Dabbi Kale Maal Di,Jassi Khalar
8536,ZW,Zero,Fateh Shergill
8537,ZW,Maa Da Laadla,Romey Maan
8538,ZW,Akhiyaan,Mitraz


In [56]:
# ADD COUNTRY NAME COLUMN TO DF BY CONVERTING ISO2 TO FULL NAME 
country_name = [] 
for c in tophits_df['countrycode_iso2']:
    if c == 'XK': 
        country_name.append('Kosovo') #not inlcuded in pycountry 
    else: 
        country = pycountry.countries.get(alpha_2=str(c))
        country_name.append(country.name) 
tophits_df['country_name'] = country_name
tophits_df

Unnamed: 0,countrycode_iso2,track_names,artist_names,country_name
0,AD,Bijlee Bijlee,Harrdy Sandhu,Andorra
1,AD,Excuses,AP Dhillon,Andorra
2,AD,Beliya,Gurnam Bhullar,Andorra
3,AD,Punjabi Mutiyaran,Jasmine Sandlas,Andorra
4,AD,Punjabiyan Di Dhee,Guru Randhawa,Andorra
...,...,...,...,...
8535,ZW,Dabbi Kale Maal Di,Jassi Khalar,Zimbabwe
8536,ZW,Zero,Fateh Shergill,Zimbabwe
8537,ZW,Maa Da Laadla,Romey Maan,Zimbabwe
8538,ZW,Akhiyaan,Mitraz,Zimbabwe


In [None]:
# SET CREDENTIALS (NON-STANDARD CLIENT ONLY)
# redirect_uri = ''
# scope = 'user-read-private user-read-email user-read-playback-state user-library-read user-library-modify'
# sp = spotipy.Spotify(auth_manager=SpotifyOAuth(client_id=client_id_spotify, client_secret=client_secret_spotify, redirect_uri=redirect_uri, scope=scope))

In [61]:
# GET NUMBER OF STREAMS FOR A TRACK 
# # Get number of streams for a track
# track_id = '4y3OI86AEP6PQoDE6olYhO'  # replace with the ID of the track you want to get streams for
# track_info = sp.track(track_id)
# num_streams = track_info['popularity']
# print(num_streams)

### II. Regional top 100 songs (Spotify Charts) <a class="anchor" id="csv"></a>

Spotify Charts provides free-to-download CSV files for its weekly and daily charts across various countries. These files include the top 200 songs for each chart, along with their respective artist names and streams. The files can be easily accessed and downloaded from the Spotify Charts website, making it a convenient option for gathering data on popular songs. However, one limitation of using the CSV files is that they do not include audio features or lyrics, which are necessary for sentiment analysis. As a result, additional sources such as the Spotify API and Genius API need to be used to obtain this information. For this project, the individual CSV files obtained for 73 countries that are imported to this notebook contain weekly top 200 songs in each region for the week of 2/16/2023. To optimize the collection of lyrics, only the top 100 songs are employed for this analysis. The following code shows the process of merging various CSV files containing weekly charts data into a dataframe.

In [132]:
# COUNTRY CODES 
countries = ['ae', 'ar', 'at', 'au', 
             'be', 'bg', 'bo', 'br', 'by', 
             'ca',  'ch', 'cl', 'co', 'cr', 'cy', 'cz', 
             'de', 'dk', 'do', 
             'ec', 'ee', 'eg', 'es', 
             'fi', 'fr', 
            'gb', 'gr', 'gt', 
            'hk', 'hn', 'hu', 
            'id', 'ie', 'il', 'in', 'is', 'it', 
            'jp', 
            'kr', 'kz', 
            'lt', 'lu', 'lv', 
            'ma', 'mx', 'my', 
            'ng', 'ni', 'nl', 'no', 'nz',
            'pa', 'pe', 'ph', 'pk','pl', 'pt', 'py', 
            'ro', 
            'sa', 'se', 'sg', 'sk', 'sv', 
            'th', 'tr', 'tw', 
            'ua', 'us', 'uy', 
            've', 'vn', 
            'za']
len(countries)

73

In [140]:
# COUNTRY NAMES 
countries_ls = ['United Arab Emirates', 'Argentina', 'Austria', 'Australia', 
             'Belgium', 'Bulgaria', 'Bolivia', 'Brazil', 'Belarus', 
             'Canada',  'Switzerland', 'Chile', 'Colombia', 'Costa Rica', 'Cyprus', 'Czech Republic', 
             'Germany', 'Denmark', 'Dominican Republic', 
             'Ecuador', 'Estonia', 'Egypt', 'Spain', 
             'Finland', 'France', 
            'United Kingdom', 'Greece', 'Guatemala', 
            'Hong Kong', 'Honduras', 'Hungary', 
            'Indonesia', 'Ireland', 'Israel', 'India', 'Iceland', 'Italy', 
            'Japan', 
            'South Korea', 'Kazakhstan', 
            'Lithuania', 'Luxembourg', 'Latvia', 
            'Morocco', 'Mexico', 'Malaysia', 
            'Nigeria', 'Nicaragua', 'Netherlands', 'Norway', 'New Zealand',
            'Panama', 'Peru', 'Philippines', 'Pakistan','Poland', 'Portugal', 'Paraguay', 
            'Romania', 
            'Saudi Arabia', 'Sweden', 'Singapore', 'Slovakia', 'El Salvador', 
            'Thailand', 'Turkey', 'Taiwan', 
            'Ukraine', 'USA', 'Uruguay', 
            'Venezuela', 'Vietnam', 
            'South Africa']

In [144]:
%%time
# READ DOWNLOADED CSVs INTO A DATAFRAME 
top_region = pd.DataFrame()
for c in countries:
    df = pd.read_csv("../Data/Regional_Weekly_Top200/regional-"+c+ "-weekly-2023-02-16.csv")
    df['country']= countries_ls[countries.index(c)]
    top_region = pd.concat([top_region, df.loc[0:99]], ignore_index=True)
top_region.insert(2, 'track_id', top_region['uri'].str.replace('spotify:track:',''))
top_region

CPU times: total: 172 ms
Wall time: 190 ms


Unnamed: 0,rank,uri,track_id,artist_names,track_name,source,peak_rank,previous_rank,weeks_on_chart,streams,country
0,1,spotify:track:0yLdNVWF3Srea0uzk55zFn,0yLdNVWF3Srea0uzk55zFn,Miley Cyrus,Flowers,Columbia,1,1,5,124198,United Arab Emirates
1,2,spotify:track:1Qrg8KqiBpW07V7PNxwwwL,1Qrg8KqiBpW07V7PNxwwwL,SZA,Kill Bill,Top Dawg Entertainment/RCA Records,1,2,10,106927,United Arab Emirates
2,3,spotify:track:6AQbmUe0Qwf5PZnt4HmTXv,6AQbmUe0Qwf5PZnt4HmTXv,"PinkPantheress, Ice Spice",Boy's a liar Pt. 2,Warner Records,3,59,2,83627,United Arab Emirates
3,4,spotify:track:0WtM2NBVQNNJLh6scP13H8,0WtM2NBVQNNJLh6scP13H8,"Rema, Selena Gomez",Calm Down (with Selena Gomez),Mavin Records / Jonzing World,2,4,25,79714,United Arab Emirates
4,5,spotify:track:2dHHgzDwk4BJdRwy9uXhTO,2dHHgzDwk4BJdRwy9uXhTO,"Metro Boomin, The Weeknd, 21 Savage",Creepin' (with The Weeknd & 21 Savage),Republic Records,1,3,11,79488,United Arab Emirates
...,...,...,...,...,...,...,...,...,...,...,...
7295,96,spotify:track:7ErtOGQ9DwyQa3lwP77j4u,7ErtOGQ9DwyQa3lwP77j4u,Ruger,Asiwaju,Columbia,96,130,4,54026,South Africa
7296,97,spotify:track:4EI8VuxUuIHKfafU72emqz,4EI8VuxUuIHKfafU72emqz,Mariah Carey,We Belong Together,Island Records,97,115,50,53828,South Africa
7297,98,spotify:track:3Puq6i4xIRH4lrPvJxIC83,3Puq6i4xIRH4lrPvJxIC83,"Deep London, Nkosazana Daughter, Murumba Pitch...",Piano Ngijabulise,Cycad Wave,37,85,14,53752,South Africa
7298,99,spotify:track:7DQMBUK4oX9gV1qIzpoRz6,7DQMBUK4oX9gV1qIzpoRz6,Aymos,Mama,DJs Production,54,86,14,53733,South Africa


In [145]:
# DISPLAY ROWS WITH MISSING VALUES 
m = top_region.isnull()
top_region[m.any(axis=1)]

Unnamed: 0,rank,uri,track_id,artist_names,track_name,source,peak_rank,previous_rank,weeks_on_chart,streams,country


## Spotify API <a class="anchor" id="spotifyapi"></a>

### I. Spotify track audio features <a class="anchor" id="audiofeats"></a>

In order to obtain the audio features of each track included in the dataframe displayed above, Spotify API is utilized. With the exception of one track, all audio features are successfully obtained using the Spotify API. Spotify API provides a useful way to obtain audio features of all songs in the top charts. Using a Spotify developer account and an access token, the API is used to search for songs and retrieve audio features such as danceability, energy, and loudness. The audio features can then be collected in a structured format for further analysis.

In [146]:
# GET ACCESS TOKEN 
auth_url = 'https://accounts.spotify.com/api/token'
auth_response = requests.post(auth_url, {
    'grant_type': 'client_credentials',
    'client_id': client_id_spotify,
    'client_secret': client_secret_spotify})
auth_response_data = auth_response.json()
access_token = auth_response_data['access_token']
headers = {'Authorization': 'Bearer {token}'.format(token=access_token)}

In [147]:
def get_features(ids, headers): 
    '''get tracks audio features''' 
    audio_features = []
    for i in range(0, len(ids), 100):
        ids_slice =ids[i:i+100] #limit=100 track IDs per request
        track_ids_str = ','.join(ids_slice) #comma-separate batch ids
        response = requests.get(f'https://api.spotify.com/v1/audio-features?ids={track_ids_str}', headers=headers)
        audio_features += response.json()['audio_features']
        print(f"Track {i} to {i+100} done.")
        time.sleep(0.1)
    return audio_features

In [148]:
%%time 
# GET AUDIO FEATURES 
audio = get_features(top_region['track_id'], headers)

Track 0 to 100 done.
Track 100 to 200 done.
Track 200 to 300 done.
Track 300 to 400 done.
Track 400 to 500 done.
Track 500 to 600 done.
Track 600 to 700 done.
Track 700 to 800 done.
Track 800 to 900 done.
Track 900 to 1000 done.
Track 1000 to 1100 done.
Track 1100 to 1200 done.
Track 1200 to 1300 done.
Track 1300 to 1400 done.
Track 1400 to 1500 done.
Track 1500 to 1600 done.
Track 1600 to 1700 done.
Track 1700 to 1800 done.
Track 1800 to 1900 done.
Track 1900 to 2000 done.
Track 2000 to 2100 done.
Track 2100 to 2200 done.
Track 2200 to 2300 done.
Track 2300 to 2400 done.
Track 2400 to 2500 done.
Track 2500 to 2600 done.
Track 2600 to 2700 done.
Track 2700 to 2800 done.
Track 2800 to 2900 done.
Track 2900 to 3000 done.
Track 3000 to 3100 done.
Track 3100 to 3200 done.
Track 3200 to 3300 done.
Track 3300 to 3400 done.
Track 3400 to 3500 done.
Track 3500 to 3600 done.
Track 3600 to 3700 done.
Track 3700 to 3800 done.
Track 3800 to 3900 done.
Track 3900 to 4000 done.
Track 4000 to 4100 do

In [151]:
# CHECK FOR EMPTY RESPONSE  
for i in range(len(audio)):
    if audio[i] is None: 
        print(audio.index(audio[i])) #print index 

4342


In [152]:
# TRACK WITHOUT AUDIO FEATURES
print(top_region.loc[4342])

rank                                                43
uri               spotify:track:1ThaPy4W188i5SFdjXl64J
track_id                        1ThaPy4W188i5SFdjXl64J
artist_names                Demi Portion, ElGrandeToto
track_name                                  Casablanca
source                            La bulle corporation
peak_rank                                            2
previous_rank                                       32
weeks_on_chart                                      41
streams                                          41343
country                                        Morocco
Name: 4342, dtype: object


In [153]:
# SEND INDIVIDUAL REQUEST  FOR TRACK 4342
response = requests.get(f'https://api.spotify.com/v1/audio-features?id=1ThaPy4W188i5SFdjXl64J', headers=headers)
response.json()['audio_features']

[]

In [154]:
# REUEST AUDIO FEATURES FOR TRACK 4342 
track_id = '1ThaPy4W188i5SFdjXl64J'
audio_features = sp.audio_features(track_id)
print(audio_features)

[None]


In [155]:
# PRINT KEYS 
audio[0].keys()

dict_keys(['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo', 'type', 'id', 'uri', 'track_href', 'analysis_url', 'duration_ms', 'time_signature'])

In [156]:
# ASSIGN NA VALUES TO TRACKS WITH NO AVAILABLE AUDIO FEATURES 
empty_audio = {'danceability': None, 'energy': None, 'key': None, 'loudness': None, 'mode': None, 
               'speechiness': None, 'acousticness': None, 'instrumentalness': None, 
               'liveness': None, 'valence': None, 'tempo': None, 'type': None, 'id': None, 'uri': None, 
               'track_href': None, 'analysis_url': None, 'duration_ms': None, 'time_signature': None}
audio[4342] = empty_audio

In [157]:
# DOUBLE CHECK EMPTY AUDIO FEATURES 
for i in range(len(audio)):
    if audio[i] is None: 
        print(audio.index(audio[i]))

In [158]:
# CONVERT TO DATAFRAME 
audiofeats_df = pd.DataFrame(audio)
audiofeats_df

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.707,0.681,0.0,-4.325,1.0,0.0668,0.0632,0.000005,0.0322,0.646,117.999,audio_features,0yLdNVWF3Srea0uzk55zFn,spotify:track:0yLdNVWF3Srea0uzk55zFn,https://api.spotify.com/v1/tracks/0yLdNVWF3Sre...,https://api.spotify.com/v1/audio-analysis/0yLd...,200455.0,4.0
1,0.644,0.735,8.0,-5.747,1.0,0.0391,0.0521,0.144000,0.1610,0.418,88.980,audio_features,1Qrg8KqiBpW07V7PNxwwwL,spotify:track:1Qrg8KqiBpW07V7PNxwwwL,https://api.spotify.com/v1/tracks/1Qrg8KqiBpW0...,https://api.spotify.com/v1/audio-analysis/1Qrg...,153947.0,4.0
2,0.696,0.809,5.0,-8.254,1.0,0.0500,0.2520,0.000128,0.2480,0.857,132.962,audio_features,6AQbmUe0Qwf5PZnt4HmTXv,spotify:track:6AQbmUe0Qwf5PZnt4HmTXv,https://api.spotify.com/v1/tracks/6AQbmUe0Qwf5...,https://api.spotify.com/v1/audio-analysis/6AQb...,131013.0,4.0
3,0.801,0.806,11.0,-5.206,1.0,0.0381,0.3820,0.000669,0.1140,0.802,106.999,audio_features,0WtM2NBVQNNJLh6scP13H8,spotify:track:0WtM2NBVQNNJLh6scP13H8,https://api.spotify.com/v1/tracks/0WtM2NBVQNNJ...,https://api.spotify.com/v1/audio-analysis/0WtM...,239318.0,4.0
4,0.715,0.620,1.0,-6.005,0.0,0.0484,0.4170,0.000000,0.0822,0.172,97.950,audio_features,2dHHgzDwk4BJdRwy9uXhTO,spotify:track:2dHHgzDwk4BJdRwy9uXhTO,https://api.spotify.com/v1/tracks/2dHHgzDwk4BJ...,https://api.spotify.com/v1/audio-analysis/2dHH...,221520.0,4.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7295,0.727,0.600,8.0,-4.799,1.0,0.2400,0.6360,0.000005,0.1060,0.754,199.796,audio_features,7ErtOGQ9DwyQa3lwP77j4u,spotify:track:7ErtOGQ9DwyQa3lwP77j4u,https://api.spotify.com/v1/tracks/7ErtOGQ9DwyQ...,https://api.spotify.com/v1/audio-analysis/7Ert...,216000.0,4.0
7296,0.840,0.476,0.0,-7.918,1.0,0.0629,0.0264,0.000000,0.0865,0.767,139.987,audio_features,4EI8VuxUuIHKfafU72emqz,spotify:track:4EI8VuxUuIHKfafU72emqz,https://api.spotify.com/v1/tracks/4EI8VuxUuIHK...,https://api.spotify.com/v1/audio-analysis/4EI8...,201400.0,4.0
7297,0.835,0.454,8.0,-10.670,0.0,0.0628,0.0141,0.000823,0.0241,0.433,112.010,audio_features,3Puq6i4xIRH4lrPvJxIC83,spotify:track:3Puq6i4xIRH4lrPvJxIC83,https://api.spotify.com/v1/tracks/3Puq6i4xIRH4...,https://api.spotify.com/v1/audio-analysis/3Puq...,416037.0,4.0
7298,0.802,0.469,1.0,-13.865,0.0,0.0503,0.0166,0.136000,0.0895,0.314,113.008,audio_features,7DQMBUK4oX9gV1qIzpoRz6,spotify:track:7DQMBUK4oX9gV1qIzpoRz6,https://api.spotify.com/v1/tracks/7DQMBUK4oX9g...,https://api.spotify.com/v1/audio-analysis/7DQM...,450304.0,4.0


In [159]:
# PRINT NA AUDIO FATURES 
print(audiofeats_df.loc[4342])

danceability         NaN
energy               NaN
key                  NaN
loudness             NaN
mode                 NaN
speechiness          NaN
acousticness         NaN
instrumentalness     NaN
liveness             NaN
valence              NaN
tempo                NaN
type                None
id                  None
uri                 None
track_href          None
analysis_url        None
duration_ms          NaN
time_signature       NaN
Name: 4342, dtype: object


### II. Album's release date <a class="anchor" id="albumrelease"></a>

To address a potential inquiry regarding the correlation between release date and popularity measured by the number of streams, album details can be obtained using the Spotify API.

In [160]:
def get_releasedates(track_ids): 
    '''get release date of albums'''
    dates = []
    album_ids = [] 
    for i in range(0, len(track_ids), 50):
        track_ids_str = ','.join(track_ids[i:i+50]) #comma-separate batch of limit=50 ids
        response = requests.get(f'https://api.spotify.com/v1/tracks?ids={track_ids_str}', headers=headers)
        for n in range(len(response.json()['tracks'])): 
            dates.append(response.json()['tracks'][n]['album']['release_date'])
            album_ids.append(response.json()['tracks'][n]['album']['id'])
        time.sleep(0.001)
    return pd.DataFrame({'album_id':album_ids,'release_date':dates})

In [162]:
# CHECK IF TRACK 4342 HAS ALBUM INFO 
response = requests.get(f'https://api.spotify.com/v1/tracks?id=1ThaPy4W188i5SFdjXl64J', headers=headers)
response

<Response [400]>

In [164]:
%%time
# GET REQUEST WITHOUT TRACK 4342
album_release = get_releasedates(top_region['track_id'].drop([4342]))
album_release

CPU times: total: 32.9 s
Wall time: 1min 23s


Unnamed: 0,album_id,release_date
0,7I0tjwFtxUwBC1vgyeMAax,2023-01-13
1,1nrVofqDRs7cpWXJ49qTnP,2022-12-08
2,6cVfHBcp3AdpYY0bBglkLN,2023-02-03
3,2b2GHWESCWEuHiCZ2Skedp,2022-08-25
4,7txGsnDSqVMoRl6RQ9XyZP,2022-12-02
...,...,...
7294,5xqEVPQeBA9GUnEFJhyCtt,2022-11-14
7295,6ek7Y68IlB6CoFkkc2gEQb,2005
7296,6wJ5Kb1e2gPqqXTgumyn8K,2022-09-30
7297,2lc6GfPXhRMVrJsBKq1WjU,2022-08-12


In [496]:
# REINSERT DROPPED ROW (TRACK 4342) WITH NULL VALUES 
row_4342 = pd.DataFrame({'album_id': [None], 'release_date': [None]})
album_release_df = pd.concat([album_release.loc[:4341]
                              , row_4342, album_release.loc[4342:]]).reset_index(drop=True)
print(album_release_df)
print(album_release_df.iloc[4341:4344,:])

                    album_id release_date
0     7I0tjwFtxUwBC1vgyeMAax   2023-01-13
1     1nrVofqDRs7cpWXJ49qTnP   2022-12-08
2     6cVfHBcp3AdpYY0bBglkLN   2023-02-03
3     2b2GHWESCWEuHiCZ2Skedp   2022-08-25
4     7txGsnDSqVMoRl6RQ9XyZP   2022-12-02
...                      ...          ...
7295  5xqEVPQeBA9GUnEFJhyCtt   2022-11-14
7296  6ek7Y68IlB6CoFkkc2gEQb         2005
7297  6wJ5Kb1e2gPqqXTgumyn8K   2022-09-30
7298  2lc6GfPXhRMVrJsBKq1WjU   2022-08-12
7299  0ceNIR1fRMz6vRGvccv3eS   2023-02-03

[7300 rows x 2 columns]
                    album_id release_date
4341  4DjuD48lhHAsL3tOklxQrC   2023-01-06
4342                    None         None
4343  1q3j12Y1sp2eqqffSnyA93   2020-03-13


### III. Song genre

Determining a song's particular genre is a daunting task since most sources only offer genres associated with an artist or album. Although the Spotify API's documentation states that genres are available in several API endpoints, only a small percentage of artists and albums have a designated genre. Therefore, employing a machine learning model that is trained on data featuring song lyrics and its corresponding genre label, to forecast a song's genre, may be a more viable option. However, gathering such data would entail an additional undertaking. The inclusion of the genre attribute in this analysis will be contingent upon the availability of time.

In [165]:
%%time
# GET ALBUM GENRE 
album_genres = []
for i  in range(0, len(album_release['album_id']), 20): 
    batch_ids = ','.join(album_release['album_id'][i:i+20]) #comma-separate batch of limit=20 ids
    response = requests.get(f'https://api.spotify.com/v1/albums?ids={batch_ids}', headers=headers)
    for n in range(len(response.json()['albums'])): 
        album_genres.append(response.json()['albums'][n]['genres'])
    time.sleep(0.001)
print(album_genres) #no results provided by spotify 

[[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [],

In [167]:
# GENRES AVAILABLE IN SPOTIFY
genres = sp.recommendation_genre_seeds()['genres']
genres

['acoustic',
 'afrobeat',
 'alt-rock',
 'alternative',
 'ambient',
 'anime',
 'black-metal',
 'bluegrass',
 'blues',
 'bossanova',
 'brazil',
 'breakbeat',
 'british',
 'cantopop',
 'chicago-house',
 'children',
 'chill',
 'classical',
 'club',
 'comedy',
 'country',
 'dance',
 'dancehall',
 'death-metal',
 'deep-house',
 'detroit-techno',
 'disco',
 'disney',
 'drum-and-bass',
 'dub',
 'dubstep',
 'edm',
 'electro',
 'electronic',
 'emo',
 'folk',
 'forro',
 'french',
 'funk',
 'garage',
 'german',
 'gospel',
 'goth',
 'grindcore',
 'groove',
 'grunge',
 'guitar',
 'happy',
 'hard-rock',
 'hardcore',
 'hardstyle',
 'heavy-metal',
 'hip-hop',
 'holidays',
 'honky-tonk',
 'house',
 'idm',
 'indian',
 'indie',
 'indie-pop',
 'industrial',
 'iranian',
 'j-dance',
 'j-idol',
 'j-pop',
 'j-rock',
 'jazz',
 'k-pop',
 'kids',
 'latin',
 'latino',
 'malay',
 'mandopop',
 'metal',
 'metal-misc',
 'metalcore',
 'minimal-techno',
 'movies',
 'mpb',
 'new-age',
 'new-release',
 'opera',
 'pagode',

In [168]:
# SEARCH FOR ITEMS (TRACKS, ALBUMS, ARTIST, PLAYLISTS) THAT MATCH THE QUERY 
q1 = 'genre:pop, track_name:Flowers, artist_name=Miley Cyrus' #query
sw = sp.search(q=q1)
len(sw['tracks'])

7

In [169]:
sw['tracks']

{'href': 'https://api.spotify.com/v1/search?query=genre%3Apop%2C+track_name%3AFlowers%2C+artist_name%3DMiley+Cyrus&type=track&offset=0&limit=10',
 'items': [],
 'limit': 10,
 'next': None,
 'offset': 0,
 'previous': None,
 'total': 0}

## Genius Lyrics <a class="anchor" id="geniuslyrics"></a>

For this project, a Python library called `lyricsgenius` is used to access the Genius API and retrieve lyrics for a total 7,300 songs. This library allows users to search for songs and retrieve song lyrics, artist information, and other metadata. The process of using lyricsgenius involves registering an account on the Genius website and generating an access token to authenticate API requests. Once authenticated, the library can be used to search for songs by artist name and song title, and retrieve the corresponding lyrics. This library provides a convenient way to gather lyrics for popular songs and can be used in conjunction with other data sources for sentiment analysis or other text-based analyses of music.

One of the most challenging tasks in data collection is dealing with the issue of unavailable lyrics for some popular songs in certain countries. Copyright restrictions often limit the public availability of lyrics for certain songs, making it impossible to collect lyrics for every song. Given that popular sources such as Spotify do not provide API endpoints for lyrics, incomplete data is unavoidable in this analysis. One approach to addressing this issue is to exclude the songs with unavailable lyrics, resulting in unequal numbers of popular songs for each country.

A potential challenge when using `lyricsgenius.search_song()` is the possibility of incorrect search inputs. If a song features multiple artists, searching for it using only one artist name may not yield accurate results. To address this issue, the functions described below were developed to search for a song using different search inputs. If the song is not found in the initial search results, each artist name is searched individually until the lyrics are found. Additionally, track names are simplified by removing extraneous information such as "(feat. artist_name)" or "(with artist_name)" to avoid empty search results for certain songs.

Additonally, lyricsgenius has an issue with broken song URLs. A song in the dataset has an assigned `genius.com` URL that is broken, causing an `HTTPError` when attempting to extract the lyrics. The `get_request` function has an `except HTTPError` clause to handle such errors during requests. As no results were found by manually searching for the song on the genius.com website, it can be inferred that the lyrics for this song are non-existent.

In [170]:
# IMPORT DEPENDENCIES  
import lyricsgenius
from lyricsgenius import OAuth2, Genius
import urllib.request 
import urllib.error
from requests.exceptions import HTTPError, Timeout

In [171]:
# SET CREDENTIALS 
client_id_genius = ''
client_secret_genius = ''
redirect_uri_genius = ''
access_token_genius = ''

In [97]:
#auth = OAuth2.client_only_app(client_id_genius,client_secret_genius,redirect_uri_genius, scope = 'all')
#token = auth.prompt_user()

In [172]:
def get_request(track, artist, token): 
    '''check request error'''
    genius = Genius(token)
    genius.verbose = False #Turn off status messages
    genius.sleep_time = 0.001
    genius.remove_section_headers = True #Remove section headers (e.g. [Chorus]) from lyrics when searching
    genius.retries = 3
    try: 
        song = genius.search_song(track, artist)
        if song is None: 
            lyric = 'None' #lyrics not available
        else: 
            lyric = song.lyrics
        return lyric
    except HTTPError as err: 
        lyric='None' #lyrics not available 
        return lyric
    #except Timeout:
        #pass

In [173]:
def get_modified(artist, track, token): 
    '''finds lyrics by modifying search inputs'''
    if ('(with' in track) or ('(feat' in track): #check if track name has 'with' or 'feat'
        track_simp = track.split(' (', 1)[0] #simplify track name
        #2. search using all artists + simplified track name
        lyric = get_request(track_simp, artist, token) #request
        if lyric=='None': 
            #3. search using single artist name + simplified track name 
            lyric = get_lyric_artist(artist, track_simp, token)
        elif lyric.startswith("Top Artists of")  or lyric.startswith("New Music"):
            #3. search using single artist name + simplified track name 
            lyric = get_lyric_artist(artist, track_simp, token)
        else: 
            return lyric
    else:
        #4. search using single artists and full track name 
        lyric = get_lyric_artist(artist, track, token)
    return lyric 
        

In [174]:
def get_lyric_artist(artist, track, token): 
    '''try to find lyric using one each artist'''
    artist_ls = artist.split(', ') #multiple artists(only use one to search)
    for a in artist_ls: #search using each artist
        lyric = get_request(track, a, token)
        if lyric == 'None' :
            continue
        elif lyric.startswith("Top Artists of") or lyric.startswith("New Music"): 
            continue 
        else: 
            break
    return lyric 

In [176]:
def get_song_lyrics(artist, track, token): 
    '''get lyrics of a song'''
    if ', ' not in artist: #single artist 
        return get_request(track, artist, token)
    else: 
        #1. search using 2 artists and simplified name
        l= artist.split(', ')[0:2]
        lyric = get_modified(l[0]+', '+ l[1], track, token) 
        if lyric!='None': #lyrics found
            if lyric.startswith("Top Artists of") or lyric.startswith("New Music"):
                return get_modified(artist, track, token) #try to find lyrics    
            else: 
                return lyric
        elif lyric == 'None': 
            return get_modified(artist, track, token)
        else: 
            return lyric


In [177]:
def get_lyrics(unique_songs, token): 
    '''get lyrics of tracks'''
    lyrics = [] 
    for i in range(0, len(unique_songs['artist_names'])):
        print(str(i)+ ': ' + unique_songs['artist_names'][i] + ' | ' + unique_songs['track_name'][i])
        lyric = get_song_lyrics(unique_songs['artist_names'][i], unique_songs['track_name'][i], token)
        lyrics.append([unique_songs['track_id'][i], unique_songs['artist_names'][i],
                       unique_songs['track_name'][i], lyric])
        time.sleep(0.00001)
    return lyrics 

In [178]:
# GET UNIQUE SONGS 
unique_songs = top_region[['track_id', 'artist_names', 'track_name']].drop_duplicates().reset_index(drop=True)
unique_songs

Unnamed: 0,track_id,artist_names,track_name
0,0yLdNVWF3Srea0uzk55zFn,Miley Cyrus,Flowers
1,1Qrg8KqiBpW07V7PNxwwwL,SZA,Kill Bill
2,6AQbmUe0Qwf5PZnt4HmTXv,"PinkPantheress, Ice Spice",Boy's a liar Pt. 2
3,0WtM2NBVQNNJLh6scP13H8,"Rema, Selena Gomez",Calm Down (with Selena Gomez)
4,2dHHgzDwk4BJdRwy9uXhTO,"Metro Boomin, The Weeknd, 21 Savage",Creepin' (with The Weeknd & 21 Savage)
...,...,...,...
3284,0oNkR5J4qmQxNVwLeA55y7,"Sjava, Nontokozo Mkhize",Thixo
3285,4EI8VuxUuIHKfafU72emqz,Mariah Carey,We Belong Together
3286,3Puq6i4xIRH4lrPvJxIC83,"Deep London, Nkosazana Daughter, Murumba Pitch...",Piano Ngijabulise
3287,7DQMBUK4oX9gV1qIzpoRz6,Aymos,Mama


Since the Genius API allows searching for songs in various languages, including Arabic, it's unnecessary to translate non-English song titles while searching for a song to fetch its lyrics.

In [181]:
%%time
# GET LYRICS 
song_lyrics_uni = get_lyrics(unique_songs, access_token_genius)

0: Miley Cyrus | Flowers
1: SZA | Kill Bill
2: PinkPantheress, Ice Spice | Boy's a liar Pt. 2
3: Rema, Selena Gomez | Calm Down (with Selena Gomez)
4: Metro Boomin, The Weeknd, 21 Savage | Creepin' (with The Weeknd & 21 Savage)
5: RAYE, 070 Shake | Escapism.
6: 3GAR BABY | HUSTLE NA MUST
7: Harry Styles | As It Was
8: Sam Smith, Kim Petras | Unholy (feat. Kim Petras)
9: The Weeknd | Die For You
10: Miguel | Sure Thing
11: David Guetta, Bebe Rexha | I'm Good (Blue)
12: d4vd | Here With Me
13: JVKE | golden hour
14: Taylor Swift | Anti-Hero
15: Libianca | People
16: Chris Brown | Under The Influence
17: OneRepublic | I Ain't Worried
18: The Weeknd, Daft Punk | Starboy
19: SZA | Snooze
20: Tory Lanez | The Color Violet
21: Coi Leray | Players
22: NewJeans | OMG
23: Kaifi Khalil | Kahani Suno 2.0
24: Vishal-Shekhar, Shilpa Rao, Caralisa Monteiro, Vishal Dadlani, Shekhar Ravjiani, Kumaar | Besharam Rang (From "Pathaan")
25: Meghan Trainor | Made You Look
26: Drake, 21 Savage | Rich Flex
27:

200: Luciano, Aitch, BIA | Bamba (feat. Aitch & BIA)
201: Linkin Park | Lost
202: MEDUZA, James Carter, Elley Duhé, FAST BOY | Bad Memories (feat. Elley Duhé & FAST BOY)
203: Nina Chuba | Mangos mit Chili
204: Samra | 1995
205: CRO | Sie
206: Dardan, Hava | mailbox (feat. Hava)
207: Nina Chuba | Wildberry Lillet
208: Felix Jaehn, Ray Dalton | Call It Love
209: CRO | Steht Mir
210: Miksu / Macloud, makko | Nachts wach (Lila Wolken Bootleg)
211: Olexesh | Matador
212: RAF Camora, Bonez MC | Blaues Licht
213: Peter Fox, Inéz | Zukunft Pink (feat. Inéz)
214: Ion Miles, SIRA, BHZ | Powerade
215: Luciano | Beautiful Girl
216: Dean Lewis | How Do I Say Goodbye
217: James Hype, Miggy Dela Rosa | Ferrari
218: George Ezra | Green Green Grass
219: Miksu / Macloud, makko | Nachts wach
220: Miksu / Macloud, t-low | Sehnsucht
221: Bonez MC, Gzuz | YumYum
222: Macklemore & Ryan Lewis, Macklemore, Ryan Lewis, Ray Dalton | Can't Hold Us (feat. Ray Dalton)
223: Lil Nas X | STAR WALKIN' (League of Legend

422: Myke Towers, Daddy Yankee | ULALA (OOH LA LA)
423: Rauw Alejandro, Subelo NEO | RON COLA
424: Christian Nodal | Ya No Somos Ni Seremos
425: Bad Bunny, Jhayco | Tarot
426: Maluma, Feid | Mojando Asientos (feat. Feid)
427: Feid | Si Te La Encuentras Por Ahí
428: Tiago PZK, LIT killah | Entre Nosotros
429: Jay Wheeler, DJ Nelson, Myke Towers | La Curiosidad
430: Bad Bunny | Si Estuviésemos Juntos
431: Morat, Feid | Salir Con Vida
432: Anuel AA, Bad Bunny | Hasta Que Dios Diga
433: Leo Santana | Zona De Perigo
434: Marília Mendonça | Leão
435: Zé Felipe | Facilita Aí
436: Israel & Rodolffo, Ana Castela | Bombonzinho - Ao Vivo
437: WIU | Coração de Gelo
438: Gustavo Mioto, Mari Fernandez | Eu Gosto Assim - Ao Vivo
439: Mc Tato, DJ Ak beats | Luz do Luar
440: Teto, WIU, Matuê | Flow Espacial
441: Maiara & Maraisa | A Culpa É Nossa - Ao Vivo
442: Henrique & Juliano | Traumatizei - Ao Vivo Em Brasília
443: AgroPlay, Ana Castela | Nosso Quadro
444: DJ Jeeh FDC, MC Menor MT, Yuri Redicopa, 

607: Папин Олимпос | Тёмно-оранжевый закат
608: uglystephan | Снова обогнал их
609: LIL KRYSTALLL, Лоя (5sta Family), OBLADAET, Markul | Я БУДУ - Remix
610: Quest Pistols | Ты так красива
611: Три дня дождя | Демоны
612: XXXTENTACION | Revenge
613: Kxllswxtch | WASTE
614: SOSKA 69 | Басы долбят
615: INSTASAMKA | BALANCE
616: MiyaGi & Endspiel, Rem Digga | I Got Love
617: ooes | зима
618: Bailey Zimmerman | Rock and A Hard Place
619: Rihanna | Bitch Better Have My Money
620: Morgan Wallen | Wasted On You
621: Rihanna | Diamonds
622: JAY-Z, Rihanna, Kanye West | Run This Town
623: Morgan Wallen | You Proof
624: TALK | Run Away to Mars
625: Rihanna | Love On The Brain
626: Rihanna | Only Girl (In The World)
627: Marshmello, Khalid | Numb
628: Future, Drake, Tems | WAIT FOR U (feat. Drake & Tems)
629: Luke Combs | Growin' Up and Gettin' Old
630: Lizzy McAlpine | ceilings
631: Rihanna | Needed Me
632: Rihanna | Don't Stop The Music
633: Raaka | 52 bars
634: Rihanna | S&M
635: Preston Pablo,

820: Calin, Viktor Sheen | Berlín
821: Calin, Viktor Sheen | Soundtrack
822: Ektor | Upgrade
823: Calin, Viktor Sheen | Double Time
824: Calin, Viktor Sheen | Plán B
825: Calin, Viktor Sheen | Luna
826: Calin, Viktor Sheen | Kudlu
827: Calin, Viktor Sheen | Limonáda
828: Calin, Viktor Sheen | Southside
829: Calin | Hannah Montana
830: Calin | Santé
831: Calin | Praha/Vídeň
832: Viktor Sheen | Stíny
833: Vesna | My Sister's Crown
834: Viktor Sheen, Calin, Hasan, Nik Tendo | Až na měsíc
835: Viktor Sheen | Blessed
836: P T K | NOCI JAK NA MOŘI
837: Calin | Dilema (Over U)
838: CA$HANOVA BULHAR | Praha den a noc
839: ŠKWOR | Síla Starejch Vín
840: Kabát | Malá dáma
841: Koky, Viktor Sheen, Robin Zoot | SOS
842: Viktor Sheen | Rozdělený světy
843: Ektor | Než bude po všem
844: Ektor | To neni hra
845: Ben Cristovao, Calin | PRŠÍ, PRŠÍ
846: Kabát | RUMCAJS MILOVAL MANKU
847: TWISTED, Oliver Tree | WORTH NOTHING - Fast & Furious: Drift Tape/Phonk Vol 1
848: Samey | Valeriya
849: P T K, Yzoma

1035: Lil Peep | Star Shopping
1036: Måneskin, Tom Morello | GOSSIP (feat. Tom Morello)
1037: Måneskin | BABY SAID
1038: Beach Weather | Sex, Drugs, Etc.
1039: Andreas | Why Do You Love Me
1040: 5MIINUST | ?mis sa tegid
1041: INTERWORLD | METAMORPHOSIS - Sped Up
1042: Yeat | Out thë way
1043: kaw, nublu | MINNA KOOS (feat. Nublu)
1044: Bru-C | No Excuses
1045: The Neighbourhood | Softcore
1046: Kendrick Lamar, Jay Rock | Money Trees
1047: ALIKA | C’est La Vie
1048: Essam Sasa | مسا مني ليكوا
1049: Mohammed Saeed | Alo Aleky
1050: Wingii, Lil Noby, Tommy, FL EX, Husayn | Freedom Music: Scene Cypher 3
1051: Lege-Cy | Msh Da Elle Ekhtarto
1052: Bahaa Sultan | Beraha Ya Sheekha
1053: Wegz | البخت
1054: Muslim - مُسلِم | Aleb Fel Dafater
1055: Ahmed Saad, NORDO, Ahmed Zaeem | Ya 3araf
1056: FL EX, Husayn | MESAMA3EEN
1057: Wegz, Ash | Amira
1058: Essam Sasa | عره فافي عامل مغامر - بت انتي حب حياتي
1059: Cairokee | Basrah w Atooh
1060: Afroto | 7ALA
1061: Farid | بأمارة مين
1062: Essam Sasa 

1268: Kerchak, Ziak | Peur (feat. Ziak)
1269: RIDSA | Nous Deux
1270: Josman | Intro
1271: Niro, Niska | A qui la faute (feat. Niska)
1272: Tayc | Carry Me
1273: Ninho | Jefe
1274: Jul, Omah Lay | Namek
1275: Mister V, Kerchak | Match
1276: Boris Way, Shibui | Kings & Queens (feat. Shibui)
1277: Lomepal | À peu près
1278: Damso | Θ. Macarena
1279: Imagine Dragons | Symphony
1280: SCH | Autobahn
1281: ZEG P, Hamza, SCH | FADE UP
1282: Kekra, Alpha Wann, La Fève | Ingé son
1283: Tiakola | La clé
1284: Kaza | HRTBRK #6
1285: Aya Nakamura, SDM | Daddy
1286: Jul | J'ai tout su
1287: GIMS, Soolking | APRÈS-VOUS MADAME
1288: Jul | La bandite
1289: Jul | Cœur blanc
1290: Calema, Dj Youcef | Te Amo - DJ Youcef Remix
1291: Lorenzo | Coco
1292: Sofia Carson | Come Back Home
1293: Zed | Joli
1294: Rim'K, Freeze corleone | Metaverse
1295: Damso | Coeur De Pirate
1296: Ninho | Lettre à une femme
1297: PNL | J’comprends pas
1298: Timal, Gazo | Filtré
1299: Mig, Tiakola | Quand j'y repense
1300: Nuit 

1510: DESH, Azahriah | Papa
1511: Azahriah | four moods
1512: Kolg8eight, Csoky, Beton.Hofi, Pogány Induló | Rizikó
1513: Manuel, T. Danny | Legnagyobb Rapper / Lamborghini Álmok
1514: Manuel | Rossz kéz
1515: KKevin, Bruno X Spacc | Topshit
1516: Ekhoe | Forog a világ
1517: Kiss Kevin | Csinibaba
1518: Dzsúdló, Azahriah | Várnék
1519: Manuel, Mihályfi Luca | Rendben, Pt. 2 - Bonus Track
1520: VALMAR | SZÍNVAK
1521: Pogány Induló | Pogi Hip-Hop
1522: Azahriah, DESH | Pullup
1523: Grasa | Tilidin
1524: DESH | Malibu
1525: Manuel, Figura, Ekhoe | Balenciaga, Pt. 2
1526: Beton.Hofi, ajsa luna, Hundred Sins, Beatrick | TISZALÖK
1527: Manuel | Balenciaga
1528: Azahriah | figyelj
1529: Manuel | 21
1530: ByeAlex és a Slepp, LUCA | Rózsaszín Mustang
1531: Manuel, Young Fly | Hosszú út
1532: Ekhoe | Costa Rica
1533: Bruno X Spacc | Aha-Aha
1534: Manuel | Zombi
1535: Ekhoe | Tenger
1536: Manuel, Mxrci | Terapeuta
1537: Azahriah, DESH | Miafasz
1538: Pogány Induló | Gettó csirke
1539: Manuel, VAL

1738: G. V. Prakash, Shweta Mohan | Vaa Vaathi
1739: Jassa Dhillon, thiarajxtt | SPAIN
1740: Silambarasan TR, Thaman S | Thee Thalapathy (From "Varisu")
1741: Asees Kaur, Stebin Ben | Tu Mile Dil Khile
1742: Shubh | We Rollin
1743: Vilen | Kyun Dhunde
1744: Tanishk Bagchi, Anu Malik, Abhijeet | Main Khiladi - From "Selfiee"
1745: Shubh | Elevated
1746: Anirudh Ravichander, Jonita Gandhi | Jimikki Ponnu
1747: Karan Aujla, Ikky | Take It Easy
1748: Akhil Sachdeva, Mansheel Gujral | Channa Ve
1749: Shubh | Her
1750: Laddi Chahal, Parmish Verma, Gurlez Akhtar | Rubicon Drill
1751: Prashant Katheriya | Tum Chhupa Na Sakogi - Unplugged
1752: Gajendra Verma | Mann Mera (From "Table No. 21")
1753: Karan Aujla | On Top
1754: Vishal Mishra | Kaise Hua (From "Kabir Singh")
1755: AP Dhillon | Summer High
1756: Karan Aujla, Avvy Sra, Jaani | White Brown Black
1757: Prince Narula, Munawar Faruqui, Rony Ajnali | Todh
1758: Taaruk Raina, Mismatched | Kho Gaye
1759: Sidhu Moose Wala | 295
1760: Armaan 

1938: Mrs. GREEN APPLE, Sonoko Inoue | 点描の唄
1939: Aimer | 残響散歌
1940: back number | ベルベットの詩
1941: Lilas Ikuta | スパークル
1942: ZUTOMAYO, Mori Calliope | 綺羅キラー
1943: YOASOBI | 群青
1944: yama | 色彩
1945: Vaundy | CHAINSAW BLOOD
1946: RADWIMPS | KANATA HALUKA
1947: Yorushika | アルジャーノン
1948: Kanaria | Yoidoreshirazu
1949: Vaundy | 不可幸力
1950: Vaundy | 花占い
1951: back number | HAPPY BIRTHDAY
1952: Mrs. GREEN APPLE | 私は最強
1953: LANA, Candee, ZOT on the WAVE | TURN IT UP - feat. Candee & ZOT on the WAVE
1954: back number | オールドファッション
1955: OFFICIAL HIGE DANDISM | Pretender
1956: Vaundy | 恋風邪にのせて
1957: Vaundy | 踊り子
1958: Chilli Beans., Vaundy | rose - feat. Vaundy
1959: BTS | Dynamite
1960: Saucy Dog | Itsuka
1961: SugLawd Familiar, CHICO CARLITO, Awich | LONGINESS REMIX
1962: back number | 花束
1963: Vaundy | napori
1964: TWICE | Talk that Talk
1965: Ado | Backlight
1966: Stray Kids | CASE 143
1967: Yuuri | シャッター
1968: OFFICIAL HIGE DANDISM | ノーダウト
1969: Yuuri | レオ
1970: YOASOBI | 夜に駆ける
1971: Da-iCE | 

2189: Anys, Stormy | Si tu savais
2190: Pause | Sociopath
2191: Maestro | Ha Mamma
2192: Tagne | Nadi Canadi
2193: Mocci | 9ortass
2194: ElGrandeToto | Salade Coco
2195: Assala Nasri, Asma Lmnawar | Sid Lghram
2196: Mc Artisan, Didine Canon 16 | ELGHIRA
2197: Rubio | JOANA
2198: Draganov | Tikitaka
2199: Marwan Moussa, Stormy | DOUBLEZUKSH
2200: Don Bigg | Arahmini
2201: Beny Jr, Morad, K y B | Sigue
2202: Dizzy DROS | Outro
2203: Baby Gang | Marocchino
2204: ASHE 22, ElGrandeToto | Low
2205: L'morphine | Papillon
2206: Junior H, Ovi | COLOGNE
2207: Junior H, Peso Pluma | El Azul
2208: Kenia OS | Malas Decisiones
2209: Junior H, Gabito Ballesteros | Vamos Para Arriba
2210: Alfredo Olivas | El Precio De La Soledad
2211: Lenin Ramírez, Luis R Conriquez | Solita (En Vivo)
2212: Virlan Garcia, Angel Cervantes | El Chamaquito
2213: Junior H | Extssy Model
2214: Codiciado | Vamos Aclarando Muchas Cosas - En Vivo
2215: Fuerza Regida, Peso Pluma | Igualito a Mi Apá
2216: Eden Muñoz | Como Quie

2405: Marstein | Frida Kahlo
2406: Amara | Leve
2407: Ulrikke | Honestly
2408: Rike Venner | Tenerife Tapes
2409: Kris Winther, Dolla$Bae, DJ Black | Barra Brava
2410: UNDERGRUNN | Italia
2411: Emma Steinbakken, Hver gang vi møtes | BlimE
2412: Emma Steinbakken | Jeg glemmer deg aldri (fra Rådebank)
2413: Selma Ibrahim | BlimE! - Den Ene
2414: Broiler, Kamelen, Emma Steinbakken | BAP
2415: Ballinciaga, Kris Winther | Beklager (Guttaklubben)
2416: Hagle | Norge rundt
2417: Broiler, Papito MIERDA | åtte shots
2418: Emma Steinbakken, Hver gang vi møtes | Night In Oslo
2419: El Papi | Hjerteløs
2420: Kaizers Orchestra | Hjerteknuser
2421: Kamelen | Creme De La Creme
2422: Beathoven, Capow x 2G | SØR-AFRIKA
2423: Phill, MAFAQ, DYRET, MDMArius | Charabanc 2023 (Hjemmesnekk)
2424: Halva Priset, Maria Mena | Den fineste Chevy'n
2425: Kamelen | Mama
2426: Benson Boone | In The Stars
2427: Kristian Kristensen, Hver gang vi møtes | 1 natt
2428: Alesso, Zara Larsson | Words (feat. Zara Larsson)
24

2605: Oki, @atutowy | Doja Cat
2606: 2115, Bedoes 2115, Blacha 2115, White 2115, Kuqe 2115 | KETCHUP
2607: sanah, Dawid Podsiadło | ostatnia nadzieja
2608: Kuban, Szpaku | jak nie wrócę po północy
2609: Mata | </3
2610: Miszel, Kabe, Premixm | dres
2611: WŁODAR, Pedro | SIĘ WJEŻDŻA
2612: Oki, Bodhi | Sonic Skit
2613: Chivas | anyżowe żelki
2614: Mata | JESTEM POJ384NY
2615: 2115, Blacha 2115, Kuqe 2115, Flexxy 2115, Bedoes 2115, WERSOW, @atutowy | GLOW UP
2616: Chivas | koleżanko mojej byłej
2617: White 2115 | California
2618: PRO8L3M | Ground Zero
2619: Mortal, Jonatan | SZKŁO
2620: ReTo, Avi, PSR | BMW
2621: Team X, Natsu World | KAPITAN
2622: Malik Montana, DaChoyce, SRNO, The Plug | Jetlag (feat. The Plug)
2623: Michael Patrick Kelly, Rakim | Wonders (feat. Rakim)
2624: Mata, Pedro, francis | Kiss cam (podryw roku)
2625: Sobel | Piękni Ludzie
2626: 2115, Bedoes 2115, White 2115, Kuqe 2115, @atutowy | NA KRAŃCU ŚWIATA
2627: White 2115 | 18
2628: Tribbs, Kubańczyk | Ostatni raz zatań

2821: Rasmus Gozzi, FRÖKEN SNUSK | STRIPPA I DITT VARDAGSRUM
2822: Sticky, 01an | KYSST
2823: Yasin | Young & Heartless
2824: Hooja | NÄR RADION DÅNAR
2825: Elov & Beny | CHEVA PÅ FREDAG
2826: Hooja | Donkey Kong
2827: LOAM, estraden | ENSAM (feat. estraden)
2828: Hov1, Einár | Gamora
2829: Dizzy, Manny Flaco | ZUTTLUKTEN
2830: Ringnes-Ronny, FRÖKEN SNUSK, Rasmus Gozzi | TURN ME ON
2831: Humlan Djojj, Josefine Götestam | Somna
2832: Ant Wan | Komplicerat
2833: Ricky Rich | RATATA
2834: Veronica Maggio | Välkommen in
2835: Young Earth Sauce, Markoolio | KASSANOVA
2836: estraden | Gråter tillsammans över varandra
2837: A36 | Blicky
2838: Yasin | Hiphop N RnB
2839: Olivia Lobato | Det mesta regnar bort
2840: 张远 | 嘉宾
2841: Ren Ran | 飞鸟和蝉
2842: 小阿七 | 从前说
2843: Marshmello, Jonas Brothers | Leave Before You Love Me (with Jonas Brothers)
2844: BIBI | BIBI Vengeance
2845: Eric Chou | 你,好不好? - TVBS連續劇【遺憾拼圖】片尾曲
2846: Jay Chou | 晴天
2847: Maroon 5 | Memories
2848: Pil C, Viktor Sheen | Inverzia
284

3059: Jay Chou | 擱淺
3060: Crowd Lu | 刻在我心底的名字 (Your Name Engraved Herein) - 電影<刻在你心底的名字>主題曲
3061: en | 间距
3062: 星野 | 晚风告白
3063: Eric Chou | 你不屬於我 - 《比悲傷更悲傷的故事》影集版片尾曲
3064: Fish Leong | 慢冷
3065: 告五人 | 唯一 (三立/台視戲劇《戀愛是科學》插曲)
3066: 高爾宣 OSN, 李浩瑋 Howard Lee | Drowning
3067: 王ADEN | 想了妳6次
3068: Sodagreen | 小情歌
3069: 告五人 | 給你一瓶魔法藥水
3070: 李浩瑋 Howard Lee | Crush On
3071: 告五人 | 在這座城市遺失了你 (戲劇《他們創業的那些鳥事》插曲)
3072: Jay Chou, Ashin Chen | 說好不哭
3073: Stefanie Sun | 遇見
3074: 1K | 就忘了吧
3075: NICKTHEREAL | 愛上你算我賤
3076: Vicky Chen | 沒有關係
3077: Sophie Chen | 500天
3078: Tanya Chua | 紅色高跟鞋
3079: A-Lin | 摯友
3080: G.E.M. | 多遠都要在一起
3081: Nicky Lee | 如常
3082: Eric Chou | 怎麼了
3083: 队长 | 哪里都是你
3084: Xiao Bing Chih | 毒藥
3085: 理想混蛋 雞丁 | 習慣不習慣
3086: G.E.M. | 天空沒有極限
3087: Jay Chou, Lara Liang | 珊瑚海
3088: Cyndi Wang | 當你
3089: Ronghao Li | 年少有為
3090: Mayday | 為你寫下這首情歌
3091: EggPlantEgg | 浪流連
3092: The Crane | 不介意
3093: 阿冗 | 你的答案
3094: 高爾宣 OSN | 最後一次
3095: 棉子 | 勇气
3096: 艾薇 | 失重前幸福
3097: SadSvit, СТРУКТУРА ЩАСТЯ | Силуети

3271: Deep London, Boohle | Hamba Wena
3272: King Monada, CK THE DJ | AYE KUWA (feat. CK THE DJ)
3273: Nasty C | Blackout
3274: Kelvin Momo, MaWhoo, Babalwa M, Chley | Izono
3275: Kabza De Small, Young Stunna, Nobuhle, Ze2 | Xola (feat. Nobuhle, Ze2 & Young Stunna)
3276: Ch'cco, Focalistic, Mellow & Sleazy | Pele Pele
3277: Luxury SA | Crazy Vibez
3278: Kabza De Small, Msaki | Khusela (feat. Msaki)
3279: Riaan Benadé | Spontaan
3280: Sam Deep, MaWhoo | Thokoza
3281: Bryson Tiller | Don't
3282: Pabi Cooper, Mellow & Sleazy | Waga Bietjie
3283: AKA, K.O | Run Jozi (Godly) (feat. K.O)
3284: Sjava, Nontokozo Mkhize | Thixo
3285: Mariah Carey | We Belong Together
3286: Deep London, Nkosazana Daughter, Murumba Pitch, Janda_K1 | Piano Ngijabulise
3287: Aymos | Mama
3288: TOSS, Young Stunna, Tyler ICU | Tetema
CPU times: total: 2min 34s
Wall time: 45min 14s


In [182]:
# PRINT LENGTH OF SONG LYRICS OBTAINED 
len(song_lyrics_uni)

3289

In [451]:
# CONVERT TO DATAFRAME 
song_lyrics_df = pd.DataFrame(song_lyrics_uni)
song_lyrics_df.columns = ['track_id','artist_names', 'track_name',  'lyrics']
song_lyrics_df

Unnamed: 0,track_id,artist_names,track_name,lyrics
0,0yLdNVWF3Srea0uzk55zFn,Miley Cyrus,Flowers,TranslationsEspañolPortuguêsKiswahiliDeutschIt...
1,1Qrg8KqiBpW07V7PNxwwwL,SZA,Kill Bill,TranslationsEspañolPortuguêsItalianoTürkçeDeut...
2,6AQbmUe0Qwf5PZnt4HmTXv,"PinkPantheress, Ice Spice",Boy's a liar Pt. 2,TranslationsPortuguêsTürkçeBoy’s a liar Pt. 2 ...
3,0WtM2NBVQNNJLh6scP13H8,"Rema, Selena Gomez",Calm Down (with Selena Gomez),TranslationsPortuguêsCalm Down (Remix) Lyrics\...
4,2dHHgzDwk4BJdRwy9uXhTO,"Metro Boomin, The Weeknd, 21 Savage",Creepin' (with The Weeknd & 21 Savage),TranslationsPortuguêsEspañolTürkçeفارسیCreepin...
...,...,...,...,...
3284,0oNkR5J4qmQxNVwLeA55y7,"Sjava, Nontokozo Mkhize",Thixo,Thixo Lyrics\nThixo Bawo\nNgabe uyong’bona na?...
3285,4EI8VuxUuIHKfafU72emqz,Mariah Carey,We Belong Together,"We Belong Together Lyrics\nSweet love, yeah\n\..."
3286,3Puq6i4xIRH4lrPvJxIC83,"Deep London, Nkosazana Daughter, Murumba Pitch...",Piano Ngijabulise,Piano Ngijabulise Lyrics\nOkokuqala ukuhlakani...
3287,7DQMBUK4oX9gV1qIzpoRz6,Aymos,Mama,Mama Lyrics\nMama mama mama mama\nMama mama ma...


In [452]:
# DISPLAY SONGS WITHOUT LYRICS  
song_lyrics_df.loc[song_lyrics_df['lyrics'] == 'None'] #no lyrics available yet

Unnamed: 0,track_id,artist_names,track_name,lyrics
6,3l6K9SW5VFJyA5jBtioFFt,3GAR BABY,HUSTLE NA MUST,
24,0CtZpaOhtzvLV3FfcsVpQo,"Vishal-Shekhar, Shilpa Rao, Caralisa Monteiro,...","Besharam Rang (From ""Pathaan"")",
31,6FAYpZ4jve8vpvTwUvjK6H,"Vishal-Shekhar, Arijit Singh, Sukriti Kakar, V...",Jhoome Jo Pathaan,
76,72zHuDxFQTjbL51qJQSA7j,"Jasleen Royal, B Praak, Romy, Anvita Dutt","Ranjha (From ""Shershaah"")",
139,3eUtQSdde3wNmXOW2OESKi,"El Polaco, La China",Ya No Quiero Verte,
...,...,...,...,...
3167,3a2Oftcs10wtzw6AmxuTMU,O.lew,Rồi Ta Sẽ Ngắm Pháo Hoa Cùng Nhau,
3169,27fqy8VruqYZlKiK1qfwEd,"tlinh, 2pillz, Wokeupat4am",ghệ iu dấu của em ơi,
3170,3ukrFH17Zl6iEZ2QJ1Zwiy,"RPT Orijinn, Ronboogz",Don't Côi,
3198,3wUp8eCTshIrJcYbjWaoyP,Phuong Ly,ThichThich,


In [453]:
# EXAMINE FIRST FEW WORDS IN EACH LYRICS 
for i in  range(len(song_lyrics_df['lyrics'])): 
    print( str(i)+ str("".join(song_lyrics_df['lyrics'][i][:50].split()))) 

0TranslationsEspañolPortuguêsKiswahiliDeutschItalia
1TranslationsEspañolPortuguêsItalianoTürkçeDeutschN
2TranslationsPortuguêsTürkçeBoy’saliarPt.2Lyri
3TranslationsPortuguêsCalmDown(Remix)LyricsVibe
4TranslationsPortuguêsEspañolTürkçeفارسیCreepin’Ly
5TranslationsTürkçeEspañolPortuguêsEscapism.Lyrics
6None
7TranslationsTürkçeEspañolPortuguês日本語ItalianoΕλλην
8TranslationsFrançaisEspañolNederlandsDeutschالعربي
9TranslationsPortuguêsفارسیTürkçeEspañolDeutschDie
10TranslationsEnglishSureThingLyricsLoveyoulike
11TranslationsPortuguêsEspañolTürkçeI’mGood(Blue)
12TranslationsTürkçeHereWithMeLyricsWatchthesu
13TranslationsFrançaisEspañolPortuguês​goldenhourL
14TranslationsPortuguêsEspañolTürkçeFrançais中文Nederl
15PeopleLyricsOh-oh-oh-ohMhmm,mhmmFromBamenda,
16UndertheInfluenceLyricsKido,KidoK-K-Kidoon
17IAin’tWorriedLyricsIdon'tknowwhatyou'vebe
18TranslationsTürkçeEspañolРусскийPortuguêsFrançaisI
19TranslationsEspañolPortuguêsSnoozeLyricsOohIth
20TheColorVioletLyricsItookmydrugsandtookm
21Transl

In [458]:
# REPLACE LYRICS WITH UNCOMMON STARTING LINE (INCORRECT)
song_lyrics_df['lyrics'][96] = genius.search_song("Lift Me Up", "Rihanna").lyrics

In [457]:
# print(song_lyrics_df.loc[127] )
# print(song_lyrics_df['lyrics'][127]) 

In [456]:
# genius.search_song("TU AMOR", "DJ Alex").lyrics

In [459]:
# EXPORT DATA 
song_lyrics_df.to_csv('song-lyrics-top100.csv') 

# Data Cleaning <a class="anchor" id="cleaning"></a>

### Incorrect lyrics 

Upon analyzing the data, it was observed that certain lyrics fetched through the Genius API are not entirely precise. Songs with excessively long lyrics are more likely to contain errors in their lyrics. Additionally, some songs lack available lyrics, and therefore, the Genius API suggests alternative songs instead of providing lyrics or error messages. In such instances, the lyrics are assigned a null value. Correcting this error manually can help detect any other inaccuracies that might have been overlooked.

In [460]:
# DEFINE GENIUS OBJECT 
genius = Genius(access_token_genius) 
genius.verbose = False #Turn off status messages
genius.sleep_time = 0.001
genius.remove_section_headers = True #Remove section headers (e.g. [Chorus]) from lyrics when searching
genius.retries = 3 

In [54]:
#print length of each lyrics: incorrect if extremely long 
#for i in  range(len(song_lyrics_df['lyrics'])): 
#    n = len(song_lyrics_df['lyrics'][i])
 #   print(song_lyrics_df['artist_names'][i] + ' | ' + song_lyrics_df['track_name'][i]) 
 #   print( str(i)+ ': '+ str(n) )

In [461]:
# PRINT LENGTHS OF SONG LYRICS
for i in  range(len(song_lyrics_df['lyrics'])): 
    n = len(song_lyrics_df['lyrics'][i])
    if n>10000:
        print(song_lyrics_df['artist_names'][i] + ' | ' + song_lyrics_df['track_name'][i]) 
        print( str(i)+ ': '+ str(n) )

Metro Boomin, Future, Chris Brown | Superhero (Heroes & Villains) [with Future & Chris Brown]
32: 56323
Post Malone, Doja Cat | I Like You (A Happier Song) (with Doja Cat)
75: 15916
Drake, 21 Savage | Jimmy Cooks (feat. 21 Savage)
81: 17051
Miguel | Sure Thing - Sped Up
86: 36628
Luck Ra, La K'onga, Ke Personajes | Ya No Vuelvas (Versión Cuarteto)
100: 57144
La T y La M, Ke Personajes | Messirve Mix 5
137: 16667
Creeds | Push Up - Original Mix
304: 86162
Emilia, Galin | Alcohol
331: 22855
Alisia | Телефона
340: 11383
Galena, DJ Damyan, Costi | Welcome to Bulgaria
347: 584705
Galena | Euphoria
351: 60792
Trap19 Connection, FYRE | BEZ CHUVSTVA
356: 68045
FYRE | DJIGIT DRILL
364: 67544
Molec, Mila Robert | Силна
366: 12889
DJ Damyan, Azis | Airport
370: 27057
Metro Boomin, Travis Scott, 21 Savage | Niagara Falls (Foot or 2) [with Travis Scott & 21 Savage]
381: 144322
Nico Hernández | El Malo
404: 16311
DJ Cayoo, DJ ESCOBAR, Nilo, MC Myres | Sorrisin De Puto
477: 22485
Guilherme & Benuto, 

In [462]:
# CHANGE SOME SONGS WITH INCORRECT LYRICS 
song_lyrics_df['lyrics'][75] = genius.search_song('I Like You (A Happier Song) ', 'Post Malone, Doja Cat').lyrics
song_lyrics_df['lyrics'][32] = genius.search_song('Superhero', 'Metro Boomin, Future, Chris Brown').lyrics 
song_lyrics_df['lyrics'][81] = genius.search_song('Jimmy Cooks', 'Drake').lyrics 
song_lyrics_df['lyrics'][86] = genius.search_song('Sure Thing', 'Miguel').lyrics  
song_lyrics_df['lyrics'][100] = genius.search_song('Ya No Vuelvas', "Luck Ra").lyrics 
song_lyrics_df['lyrics'][304] = genius.search_song('Push Up', "Creeds").lyrics 
song_lyrics_df['lyrics'][331] = genius.search_song('Alcohol', "Galin").lyrics 
song_lyrics_df['lyrics'][347] = genius.search_song('Welcome to Bulgaria', "Costi").lyrics 
song_lyrics_df['lyrics'][351] = 'None' #incorrect lyrics 
song_lyrics_df['lyrics'][356] = 'None'
song_lyrics_df['lyrics'][364] = 'None'
song_lyrics_df['lyrics'][366] = 'None'
song_lyrics_df['lyrics'][370] = 'None'
song_lyrics_df['lyrics'][381] = genius.search_song('Niagara Falls', "Metro Boomin").lyrics 
song_lyrics_df['lyrics'][404] = 'None'
song_lyrics_df['lyrics'][477] = 'None'
song_lyrics_df['lyrics'][484] = 'None'
song_lyrics_df['lyrics'][489] = genius.search_song('Áudio Que Te Entrega', "Leo Santana").lyrics
song_lyrics_df['lyrics'][495] = 'None'
song_lyrics_df['lyrics'][509] = 'None'
song_lyrics_df['lyrics'][633] = 'None'
song_lyrics_df['lyrics'][656] = genius.search_song('Bam Bam', "Camila Cabello").lyrics 
#song_lyrics_df['lyrics'][684] = 'None'

In [463]:
# CHANGE SOME SONGS WITH INCORRECT LYRICS 
song_lyrics_df['lyrics'][717] = 'None'
song_lyrics_df['lyrics'][784] = genius.search_song('EMMONH', "RICTA").lyrics 
song_lyrics_df['lyrics'][861] = 'None'
song_lyrics_df['lyrics'][954] = 'None'
song_lyrics_df['lyrics'][961] = genius.search_song('Alle Skuffer Over Tid', "The Minds Of 99").lyrics
song_lyrics_df['lyrics'][1004] = 'None'
song_lyrics_df['lyrics'][1018] = 'None'
song_lyrics_df['lyrics'][1033] = 'None'
song_lyrics_df['lyrics'][1050] = genius.search_song('Freedom Music', "Wingii").lyrics 
song_lyrics_df['lyrics'][1133] = 'None'
song_lyrics_df['lyrics'][1168] = 'None'
song_lyrics_df['lyrics'][1244] = 'None'
song_lyrics_df['lyrics'][1327] = genius.search_song("Don't Look Back In Anger", "Oasis").lyrics
song_lyrics_df['lyrics'][1345] = genius.search_song("Epiloges", "HGEMONA$").lyrics 
song_lyrics_df['lyrics'][1408] = genius.search_song("永順街39號", "盧瀚霆").lyrics
song_lyrics_df['lyrics'][1421] = 'None'
song_lyrics_df['lyrics'][1576] = 'None'
song_lyrics_df['lyrics'][1593] = 'None'

In [464]:
# CHANGE SOME SONGS WITH INCORRECT LYRICS  
song_lyrics_df['lyrics'][1626] = 'None'
song_lyrics_df['lyrics'][1757] = 'None'
song_lyrics_df['lyrics'][1764] = 'None'
song_lyrics_df['lyrics'][1777]= 'None'
song_lyrics_df['lyrics'][1785] = 'None'
song_lyrics_df['lyrics'][1864] = genius.search_song("Il Doc 3", "VillaBanks").lyrics 
song_lyrics_df['lyrics'][1899] = genius.search_song("VIOLA ", "Fedez").lyrics 
song_lyrics_df['lyrics'][2043] = genius.search_song("Boy With Luv ", "Halsey").lyrics 
song_lyrics_df['lyrics'][2105] = 'None'
song_lyrics_df['lyrics'][2108] = 'None'
song_lyrics_df['lyrics'][2113] = genius.search_song("Shut up My Moms Calling", "Hotel Ugly").lyrics 
song_lyrics_df['lyrics'][2175] = 'None'
song_lyrics_df['lyrics'][2220] = genius.search_song("Fuentes De Ortiz", "Norteño Banda").lyrics 
song_lyrics_df['lyrics'][2357] = 'None'
song_lyrics_df['lyrics'][2375] = 'None'
song_lyrics_df['lyrics'][2427] = 'None'
song_lyrics_df['lyrics'][2450] = genius.search_song("Go Your Own Way", "Fleetwood Mac").lyrics 
song_lyrics_df['lyrics'][2458] = 'None'
song_lyrics_df['lyrics'][2460] = 'None'
song_lyrics_df['lyrics'][2463] = 'None' 
song_lyrics_df['lyrics'][2539] = 'None' 
song_lyrics_df['lyrics'][2652] = genius.search_song("Como Antes", "Matias").lyrics 
song_lyrics_df['lyrics'][2741] = 'None'
song_lyrics_df['lyrics'][2756] = 'None' 
song_lyrics_df['lyrics'][2763] = genius.search_song("Those Eyes", "New West").lyrics 
song_lyrics_df['lyrics'][2809] = 'None'
song_lyrics_df['lyrics'][2837] = 'None'
song_lyrics_df['lyrics'][2876] = 'None'
song_lyrics_df['lyrics'][2944] = 'None'
song_lyrics_df['lyrics'][3039] = 'None'
song_lyrics_df['lyrics'][3077] = 'None'
song_lyrics_df['lyrics'][3093] = 'None'
song_lyrics_df['lyrics'][3109] = 'None'
song_lyrics_df['lyrics'][3132] = genius.search_song("Where Is My Mind", "Pixies").lyrics 
song_lyrics_df['lyrics'][3207]= 'None'

In [432]:
#print(song_lyrics_df.loc[3207] )
#print(song_lyrics_df['lyrics'][3207]) 

In [433]:
#genius.search_song("YOU ", "AK49").lyrics 

In [455]:
#get_song_lyrics(song_lyrics_df['artist_names'][287], song_lyrics_df['track_name'][287], access_token_genius)

In [276]:
#top20_lyrics = pd.read_csv('../Data/song_lyrics_orig.csv')
#top20_lyrics['lyrics'] = top20_lyrics.lyrics.astype(str)
#top20_lyrics['lyrics']=top20_lyrics['lyrics'].replace('None',' ')
#top20_lyrics = top20_lyrics.reset_index(drop=True)
#top20_lyrics

In [465]:
# SONGS WITH UNAVAILABLE LYRICS 
song_lyrics_df.loc[song_lyrics_df['lyrics'] == 'None'] 

Unnamed: 0,track_id,artist_names,track_name,lyrics
6,3l6K9SW5VFJyA5jBtioFFt,3GAR BABY,HUSTLE NA MUST,
24,0CtZpaOhtzvLV3FfcsVpQo,"Vishal-Shekhar, Shilpa Rao, Caralisa Monteiro,...","Besharam Rang (From ""Pathaan"")",
31,6FAYpZ4jve8vpvTwUvjK6H,"Vishal-Shekhar, Arijit Singh, Sukriti Kakar, V...",Jhoome Jo Pathaan,
76,72zHuDxFQTjbL51qJQSA7j,"Jasleen Royal, B Praak, Romy, Anvita Dutt","Ranjha (From ""Shershaah"")",
139,3eUtQSdde3wNmXOW2OESKi,"El Polaco, La China",Ya No Quiero Verte,
...,...,...,...,...
3169,27fqy8VruqYZlKiK1qfwEd,"tlinh, 2pillz, Wokeupat4am",ghệ iu dấu của em ơi,
3170,3ukrFH17Zl6iEZ2QJ1Zwiy,"RPT Orijinn, Ronboogz",Don't Côi,
3198,3wUp8eCTshIrJcYbjWaoyP,Phuong Ly,ThichThich,
3207,2fjqdDz6jJn6VPgrSDDMvp,"GPG msmy, AK49",YOU (feat. AK49),


### Refine lyrics

**Remove special characters and nonlyric lines/words**

In [466]:
# CLEAN LYRICS 
song_lyrics_df['lyrics']= song_lyrics_df['lyrics'].astype(str) #convert to string type 
#song_lyrics_df['lyrics'] = [re.sub(r'Translations.+Lyrics','',i) for i in song_lyrics_df['lyrics']] #translation header
#song_lyrics_df['lyrics'] = [re.sub(r'Translations.+Lyrics','',i) for i in song_lyrics_df['lyrics']]
song_lyrics_df['lyrics'] = [re.sub(r'^.*?Lyrics','',i) for i in song_lyrics_df['lyrics']] #remove headers ends w/ 'Lyrics'
song_lyrics_df['lyrics']= song_lyrics_df['lyrics'].str.replace('Embed', '') #end of some lyrics
song_lyrics_df['lyrics']= song_lyrics_df['lyrics'].str.replace('Lyrics', '')
song_lyrics_df['lyrics']= song_lyrics_df['lyrics'].str.replace('I32', '') #end of some lyrics 
song_lyrics_df['lyrics']= song_lyrics_df['lyrics'].str.replace('\n\n', '\n')
song_lyrics_df['lyrics']= song_lyrics_df['lyrics'].str.replace('  ', ' ') #replace double spaces 
song_lyrics_df['lyrics'] = [i.strip() for i in song_lyrics_df['lyrics']] #remove extra white space 
#song_lyrics_df['lyrics']=song_lyrics_df['lyrics'].replace('None','not available')
song_lyrics_df['lyrics']
song_lyrics_df

Unnamed: 0,track_id,artist_names,track_name,lyrics
0,0yLdNVWF3Srea0uzk55zFn,Miley Cyrus,Flowers,"We were good, we were gold\nKinda dream that c..."
1,1Qrg8KqiBpW07V7PNxwwwL,SZA,Kill Bill,I'm still a fan even though I was salty\nHate ...
2,6AQbmUe0Qwf5PZnt4HmTXv,"PinkPantheress, Ice Spice",Boy's a liar Pt. 2,Take a look inside your heart\nIs there any ro...
3,0WtM2NBVQNNJLh6scP13H8,"Rema, Selena Gomez",Calm Down (with Selena Gomez),"Vibez\nOh, no\nAnother banger\nBaby, calm down..."
4,2dHHgzDwk4BJdRwy9uXhTO,"Metro Boomin, The Weeknd, 21 Savage",Creepin' (with The Weeknd & 21 Savage),"Ooh, ooh-ooh\nOoh-ooh-ooh, ooh, ooh-ooh (Just ..."
...,...,...,...,...
3284,0oNkR5J4qmQxNVwLeA55y7,"Sjava, Nontokozo Mkhize",Thixo,Thixo Bawo\nNgabe uyong’bona na?\nEmasangweni ...
3285,4EI8VuxUuIHKfafU72emqz,Mariah Carey,We Belong Together,"Sweet love, yeah\nI didn't mean it when I said..."
3286,3Puq6i4xIRH4lrPvJxIC83,"Deep London, Nkosazana Daughter, Murumba Pitch...",Piano Ngijabulise,Okokuqala ukuhlakanipha\nUkumesaba uJehova\nAy...
3287,7DQMBUK4oX9gV1qIzpoRz6,Aymos,Mama,Mama mama mama mama\nMama mama mama mama\nMama...


**Remove numbers from lyrics**


In [467]:
# REMOVE NUMBER CHARACTERS FROM LYRICS 
song_lyrics_df['lyrics'] = [re.sub(r'\d+', '', i) for i in song_lyrics_df['lyrics']]

###  Lyrics translation  

 To facilitate working with the lyrics data, the non-english lyrics can all be translated to English. 

**Issues encountered:**
- Random errors using `googletrans==4.0.0-rc1`: 
    - Object is nonetype even if it's not (resolved by switching to `googletrans==3.1.0a0`)
    - Runtime error (resolved by adding `time.sleep`)
    - `attributeerror: 'translator' object has no attribute 'raise_exception'` (resolved by switching to `googletrans==3.1.0a0`)


In [417]:
!pip3 install googletrans==3.1.0a0
#!pip3 install googletrans==4.0.0-rc1



In [468]:
#IMPORT DEPENDENCIES 
from googletrans import Translator

In [None]:
# DEFINE TRANSLATOR OBJECT
translator = Translator()

In [469]:
def split_list(ls):
    '''split string into 3 lists with equal lengths'''
    t = len(ls)//3
    return ls[:t], ls[t:t*2], ls[t*2:]

In [470]:
def translate_lyrics(songs): 
    '''translate lyrics'''
    translated = [] 
    n=0
    for s in songs:
        if s != 'None':
            if len(s)>5000: #avoid error by splitting text 
                ls = s.splitlines( )
                ls1,ls2,ls3  = split_list(ls)
                ls1  = ' '.join(ls1)
                ls2  = ' '.join(ls2)
                ls3  = ' '.join(ls3)
                lyrics1 = translator.translate(text=ls1, dest='en')
                lyrics2 = translator.translate(text=ls2, dest='en')
                lyrics3 = translator.translate(text=ls3, dest='en')
                translated.append(lyrics1.text + lyrics2.text+ lyrics3.text) 
            else: 
                lyrics = translator.translate(text=s, dest='en')
                translated.append(lyrics.text) 
        else:
            translated.append(s)
        print('Song ' + str(n) +' translated.')
        n+=1
        time.sleep(0.5)
    return translated 

In [471]:
# TRANSLATE ALL LYRICS TO ENGLISH 
song_lyrics_df['lyrics_trans'] = translate_lyrics(song_lyrics_df['lyrics'])


Song 0 translated.
Song 1 translated.
Song 2 translated.
Song 3 translated.
Song 4 translated.
Song 5 translated.
Song 6 translated.
Song 7 translated.
Song 8 translated.
Song 9 translated.
Song 10 translated.
Song 11 translated.
Song 12 translated.
Song 13 translated.
Song 14 translated.
Song 15 translated.
Song 16 translated.
Song 17 translated.
Song 18 translated.
Song 19 translated.
Song 20 translated.
Song 21 translated.
Song 22 translated.
Song 23 translated.
Song 24 translated.
Song 25 translated.
Song 26 translated.
Song 27 translated.
Song 28 translated.
Song 29 translated.
Song 30 translated.
Song 31 translated.
Song 32 translated.
Song 33 translated.
Song 34 translated.
Song 35 translated.
Song 36 translated.
Song 37 translated.
Song 38 translated.
Song 39 translated.
Song 40 translated.
Song 41 translated.
Song 42 translated.
Song 43 translated.
Song 44 translated.
Song 45 translated.
Song 46 translated.
Song 47 translated.
Song 48 translated.
Song 49 translated.
Song 50 tr

Song 396 translated.
Song 397 translated.
Song 398 translated.
Song 399 translated.
Song 400 translated.
Song 401 translated.
Song 402 translated.
Song 403 translated.
Song 404 translated.
Song 405 translated.
Song 406 translated.
Song 407 translated.
Song 408 translated.
Song 409 translated.
Song 410 translated.
Song 411 translated.
Song 412 translated.
Song 413 translated.
Song 414 translated.
Song 415 translated.
Song 416 translated.
Song 417 translated.
Song 418 translated.
Song 419 translated.
Song 420 translated.
Song 421 translated.
Song 422 translated.
Song 423 translated.
Song 424 translated.
Song 425 translated.
Song 426 translated.
Song 427 translated.
Song 428 translated.
Song 429 translated.
Song 430 translated.
Song 431 translated.
Song 432 translated.
Song 433 translated.
Song 434 translated.
Song 435 translated.
Song 436 translated.
Song 437 translated.
Song 438 translated.
Song 439 translated.
Song 440 translated.
Song 441 translated.
Song 442 translated.
Song 443 tran

Song 787 translated.
Song 788 translated.
Song 789 translated.
Song 790 translated.
Song 791 translated.
Song 792 translated.
Song 793 translated.
Song 794 translated.
Song 795 translated.
Song 796 translated.
Song 797 translated.
Song 798 translated.
Song 799 translated.
Song 800 translated.
Song 801 translated.
Song 802 translated.
Song 803 translated.
Song 804 translated.
Song 805 translated.
Song 806 translated.
Song 807 translated.
Song 808 translated.
Song 809 translated.
Song 810 translated.
Song 811 translated.
Song 812 translated.
Song 813 translated.
Song 814 translated.
Song 815 translated.
Song 816 translated.
Song 817 translated.
Song 818 translated.
Song 819 translated.
Song 820 translated.
Song 821 translated.
Song 822 translated.
Song 823 translated.
Song 824 translated.
Song 825 translated.
Song 826 translated.
Song 827 translated.
Song 828 translated.
Song 829 translated.
Song 830 translated.
Song 831 translated.
Song 832 translated.
Song 833 translated.
Song 834 tran

Song 1170 translated.
Song 1171 translated.
Song 1172 translated.
Song 1173 translated.
Song 1174 translated.
Song 1175 translated.
Song 1176 translated.
Song 1177 translated.
Song 1178 translated.
Song 1179 translated.
Song 1180 translated.
Song 1181 translated.
Song 1182 translated.
Song 1183 translated.
Song 1184 translated.
Song 1185 translated.
Song 1186 translated.
Song 1187 translated.
Song 1188 translated.
Song 1189 translated.
Song 1190 translated.
Song 1191 translated.
Song 1192 translated.
Song 1193 translated.
Song 1194 translated.
Song 1195 translated.
Song 1196 translated.
Song 1197 translated.
Song 1198 translated.
Song 1199 translated.
Song 1200 translated.
Song 1201 translated.
Song 1202 translated.
Song 1203 translated.
Song 1204 translated.
Song 1205 translated.
Song 1206 translated.
Song 1207 translated.
Song 1208 translated.
Song 1209 translated.
Song 1210 translated.
Song 1211 translated.
Song 1212 translated.
Song 1213 translated.
Song 1214 translated.
Song 1215 

Song 1543 translated.
Song 1544 translated.
Song 1545 translated.
Song 1546 translated.
Song 1547 translated.
Song 1548 translated.
Song 1549 translated.
Song 1550 translated.
Song 1551 translated.
Song 1552 translated.
Song 1553 translated.
Song 1554 translated.
Song 1555 translated.
Song 1556 translated.
Song 1557 translated.
Song 1558 translated.
Song 1559 translated.
Song 1560 translated.
Song 1561 translated.
Song 1562 translated.
Song 1563 translated.
Song 1564 translated.
Song 1565 translated.
Song 1566 translated.
Song 1567 translated.
Song 1568 translated.
Song 1569 translated.
Song 1570 translated.
Song 1571 translated.
Song 1572 translated.
Song 1573 translated.
Song 1574 translated.
Song 1575 translated.
Song 1576 translated.
Song 1577 translated.
Song 1578 translated.
Song 1579 translated.
Song 1580 translated.
Song 1581 translated.
Song 1582 translated.
Song 1583 translated.
Song 1584 translated.
Song 1585 translated.
Song 1586 translated.
Song 1587 translated.
Song 1588 

Song 1916 translated.
Song 1917 translated.
Song 1918 translated.
Song 1919 translated.
Song 1920 translated.
Song 1921 translated.
Song 1922 translated.
Song 1923 translated.
Song 1924 translated.
Song 1925 translated.
Song 1926 translated.
Song 1927 translated.
Song 1928 translated.
Song 1929 translated.
Song 1930 translated.
Song 1931 translated.
Song 1932 translated.
Song 1933 translated.
Song 1934 translated.
Song 1935 translated.
Song 1936 translated.
Song 1937 translated.
Song 1938 translated.
Song 1939 translated.
Song 1940 translated.
Song 1941 translated.
Song 1942 translated.
Song 1943 translated.
Song 1944 translated.
Song 1945 translated.
Song 1946 translated.
Song 1947 translated.
Song 1948 translated.
Song 1949 translated.
Song 1950 translated.
Song 1951 translated.
Song 1952 translated.
Song 1953 translated.
Song 1954 translated.
Song 1955 translated.
Song 1956 translated.
Song 1957 translated.
Song 1958 translated.
Song 1959 translated.
Song 1960 translated.
Song 1961 

Song 2289 translated.
Song 2290 translated.
Song 2291 translated.
Song 2292 translated.
Song 2293 translated.
Song 2294 translated.
Song 2295 translated.
Song 2296 translated.
Song 2297 translated.
Song 2298 translated.
Song 2299 translated.
Song 2300 translated.
Song 2301 translated.
Song 2302 translated.
Song 2303 translated.
Song 2304 translated.
Song 2305 translated.
Song 2306 translated.
Song 2307 translated.
Song 2308 translated.
Song 2309 translated.
Song 2310 translated.
Song 2311 translated.
Song 2312 translated.
Song 2313 translated.
Song 2314 translated.
Song 2315 translated.
Song 2316 translated.
Song 2317 translated.
Song 2318 translated.
Song 2319 translated.
Song 2320 translated.
Song 2321 translated.
Song 2322 translated.
Song 2323 translated.
Song 2324 translated.
Song 2325 translated.
Song 2326 translated.
Song 2327 translated.
Song 2328 translated.
Song 2329 translated.
Song 2330 translated.
Song 2331 translated.
Song 2332 translated.
Song 2333 translated.
Song 2334 

Song 2662 translated.
Song 2663 translated.
Song 2664 translated.
Song 2665 translated.
Song 2666 translated.
Song 2667 translated.
Song 2668 translated.
Song 2669 translated.
Song 2670 translated.
Song 2671 translated.
Song 2672 translated.
Song 2673 translated.
Song 2674 translated.
Song 2675 translated.
Song 2676 translated.
Song 2677 translated.
Song 2678 translated.
Song 2679 translated.
Song 2680 translated.
Song 2681 translated.
Song 2682 translated.
Song 2683 translated.
Song 2684 translated.
Song 2685 translated.
Song 2686 translated.
Song 2687 translated.
Song 2688 translated.
Song 2689 translated.
Song 2690 translated.
Song 2691 translated.
Song 2692 translated.
Song 2693 translated.
Song 2694 translated.
Song 2695 translated.
Song 2696 translated.
Song 2697 translated.
Song 2698 translated.
Song 2699 translated.
Song 2700 translated.
Song 2701 translated.
Song 2702 translated.
Song 2703 translated.
Song 2704 translated.
Song 2705 translated.
Song 2706 translated.
Song 2707 

Song 3035 translated.
Song 3036 translated.
Song 3037 translated.
Song 3038 translated.
Song 3039 translated.
Song 3040 translated.
Song 3041 translated.
Song 3042 translated.
Song 3043 translated.
Song 3044 translated.
Song 3045 translated.
Song 3046 translated.
Song 3047 translated.
Song 3048 translated.
Song 3049 translated.
Song 3050 translated.
Song 3051 translated.
Song 3052 translated.
Song 3053 translated.
Song 3054 translated.
Song 3055 translated.
Song 3056 translated.
Song 3057 translated.
Song 3058 translated.
Song 3059 translated.
Song 3060 translated.
Song 3061 translated.
Song 3062 translated.
Song 3063 translated.
Song 3064 translated.
Song 3065 translated.
Song 3066 translated.
Song 3067 translated.
Song 3068 translated.
Song 3069 translated.
Song 3070 translated.
Song 3071 translated.
Song 3072 translated.
Song 3073 translated.
Song 3074 translated.
Song 3075 translated.
Song 3076 translated.
Song 3077 translated.
Song 3078 translated.
Song 3079 translated.
Song 3080 

**Remove punctuations**

Punctuation removal is performed after the translation process as they assist googletrans with performing accurate translations. For example, `\n` separates the text into various lines and the translation appears more accurate when the translator is translating each line individually instead of translating individual words. 


In [472]:
# REMOVE PUNCTUATIONS 
song_lyrics_df['lyrics_trans']= song_lyrics_df['lyrics_trans'].str.replace('\n', ' ') #remove \n
song_lyrics_df['lyrics_trans']=[ re.sub(r'[^\w\s]', '', i) for i in song_lyrics_df['lyrics_trans']] #remove all puncs
song_lyrics_df['lyrics_trans'] = song_lyrics_df['lyrics_trans'].str.lower() #lowercase
song_lyrics_df['lyrics_trans']

0       we were good we were gold kinda dream that can...
1       im still a fan even though i was salty hate to...
2       take a look inside your heart is there any roo...
3       vibez oh no another banger baby calm down calm...
4       ooh oohooh oohoohooh ooh oohooh just cant beli...
                              ...                        
3284    your god are you going to see me at the gates ...
3285    sweet love yeah i didnt mean it when i said i ...
3286    first is wisdom to fear jehovah they dont hear...
3287    mother mother mother mother mother mother moth...
3288    okay lets go dude i know the work im going to ...
Name: lyrics_trans, Length: 3289, dtype: object

In [473]:
song_lyrics_df

Unnamed: 0,track_id,artist_names,track_name,lyrics,lyrics_trans
0,0yLdNVWF3Srea0uzk55zFn,Miley Cyrus,Flowers,"We were good, we were gold\nKinda dream that c...",we were good we were gold kinda dream that can...
1,1Qrg8KqiBpW07V7PNxwwwL,SZA,Kill Bill,I'm still a fan even though I was salty\nHate ...,im still a fan even though i was salty hate to...
2,6AQbmUe0Qwf5PZnt4HmTXv,"PinkPantheress, Ice Spice",Boy's a liar Pt. 2,Take a look inside your heart\nIs there any ro...,take a look inside your heart is there any roo...
3,0WtM2NBVQNNJLh6scP13H8,"Rema, Selena Gomez",Calm Down (with Selena Gomez),"Vibez\nOh, no\nAnother banger\nBaby, calm down...",vibez oh no another banger baby calm down calm...
4,2dHHgzDwk4BJdRwy9uXhTO,"Metro Boomin, The Weeknd, 21 Savage",Creepin' (with The Weeknd & 21 Savage),"Ooh, ooh-ooh\nOoh-ooh-ooh, ooh, ooh-ooh (Just ...",ooh oohooh oohoohooh ooh oohooh just cant beli...
...,...,...,...,...,...
3284,0oNkR5J4qmQxNVwLeA55y7,"Sjava, Nontokozo Mkhize",Thixo,Thixo Bawo\nNgabe uyong’bona na?\nEmasangweni ...,your god are you going to see me at the gates ...
3285,4EI8VuxUuIHKfafU72emqz,Mariah Carey,We Belong Together,"Sweet love, yeah\nI didn't mean it when I said...",sweet love yeah i didnt mean it when i said i ...
3286,3Puq6i4xIRH4lrPvJxIC83,"Deep London, Nkosazana Daughter, Murumba Pitch...",Piano Ngijabulise,Okokuqala ukuhlakanipha\nUkumesaba uJehova\nAy...,first is wisdom to fear jehovah they dont hear...
3287,7DQMBUK4oX9gV1qIzpoRz6,Aymos,Mama,Mama mama mama mama\nMama mama mama mama\nMama...,mother mother mother mother mother mother moth...


## Final aggregated data

In [475]:
# SELECT SEPCIFIC FEATURES FROM EACH DATAFRAMES 
top = top_region[['track_id', 'artist_names','track_name','source', 'rank', 
                      'peak_rank','previous_rank','weeks_on_chart','streams','country']]
audio = audiofeats_df[['id', 'danceability','energy','key','loudness','mode',
                       'speechiness','acousticness','instrumentalness','liveness',
                       'valence','tempo','duration_ms','time_signature']]

In [477]:
# MERGE TOP HITS AND AUDIO FEATURES DATAFRAMES 
merged = pd.concat([top, audio], axis = 1)
merged

Unnamed: 0,track_id,artist_names,track_name,source,rank,peak_rank,previous_rank,weeks_on_chart,streams,country,...,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
0,0yLdNVWF3Srea0uzk55zFn,Miley Cyrus,Flowers,Columbia,1,1,1,5,124198,United Arab Emirates,...,-4.325,1.0,0.0668,0.0632,0.000005,0.0322,0.646,117.999,200455.0,4.0
1,1Qrg8KqiBpW07V7PNxwwwL,SZA,Kill Bill,Top Dawg Entertainment/RCA Records,2,1,2,10,106927,United Arab Emirates,...,-5.747,1.0,0.0391,0.0521,0.144000,0.1610,0.418,88.980,153947.0,4.0
2,6AQbmUe0Qwf5PZnt4HmTXv,"PinkPantheress, Ice Spice",Boy's a liar Pt. 2,Warner Records,3,3,59,2,83627,United Arab Emirates,...,-8.254,1.0,0.0500,0.2520,0.000128,0.2480,0.857,132.962,131013.0,4.0
3,0WtM2NBVQNNJLh6scP13H8,"Rema, Selena Gomez",Calm Down (with Selena Gomez),Mavin Records / Jonzing World,4,2,4,25,79714,United Arab Emirates,...,-5.206,1.0,0.0381,0.3820,0.000669,0.1140,0.802,106.999,239318.0,4.0
4,2dHHgzDwk4BJdRwy9uXhTO,"Metro Boomin, The Weeknd, 21 Savage",Creepin' (with The Weeknd & 21 Savage),Republic Records,5,1,3,11,79488,United Arab Emirates,...,-6.005,0.0,0.0484,0.4170,0.000000,0.0822,0.172,97.950,221520.0,4.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7295,7ErtOGQ9DwyQa3lwP77j4u,Ruger,Asiwaju,Columbia,96,96,130,4,54026,South Africa,...,-4.799,1.0,0.2400,0.6360,0.000005,0.1060,0.754,199.796,216000.0,4.0
7296,4EI8VuxUuIHKfafU72emqz,Mariah Carey,We Belong Together,Island Records,97,97,115,50,53828,South Africa,...,-7.918,1.0,0.0629,0.0264,0.000000,0.0865,0.767,139.987,201400.0,4.0
7297,3Puq6i4xIRH4lrPvJxIC83,"Deep London, Nkosazana Daughter, Murumba Pitch...",Piano Ngijabulise,Cycad Wave,98,37,85,14,53752,South Africa,...,-10.670,0.0,0.0628,0.0141,0.000823,0.0241,0.433,112.010,416037.0,4.0
7298,7DQMBUK4oX9gV1qIzpoRz6,Aymos,Mama,DJs Production,99,54,86,14,53733,South Africa,...,-13.865,0.0,0.0503,0.0166,0.136000,0.0895,0.314,113.008,450304.0,4.0


In [497]:
# ADD ALBUM RELEASE INFO TO DATAFRAME 
merged = pd.concat([merged, album_release_df['release_date']], axis = 1).rename(columns={"release_date": "album_release_date"})
merged

Unnamed: 0,track_id,artist_names,track_name,source,rank,peak_rank,previous_rank,weeks_on_chart,streams,country,...,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,album_release_date
0,0yLdNVWF3Srea0uzk55zFn,Miley Cyrus,Flowers,Columbia,1,1,1,5,124198,United Arab Emirates,...,1.0,0.0668,0.0632,0.000005,0.0322,0.646,117.999,200455.0,4.0,2023-01-13
1,1Qrg8KqiBpW07V7PNxwwwL,SZA,Kill Bill,Top Dawg Entertainment/RCA Records,2,1,2,10,106927,United Arab Emirates,...,1.0,0.0391,0.0521,0.144000,0.1610,0.418,88.980,153947.0,4.0,2022-12-08
2,6AQbmUe0Qwf5PZnt4HmTXv,"PinkPantheress, Ice Spice",Boy's a liar Pt. 2,Warner Records,3,3,59,2,83627,United Arab Emirates,...,1.0,0.0500,0.2520,0.000128,0.2480,0.857,132.962,131013.0,4.0,2023-02-03
3,0WtM2NBVQNNJLh6scP13H8,"Rema, Selena Gomez",Calm Down (with Selena Gomez),Mavin Records / Jonzing World,4,2,4,25,79714,United Arab Emirates,...,1.0,0.0381,0.3820,0.000669,0.1140,0.802,106.999,239318.0,4.0,2022-08-25
4,2dHHgzDwk4BJdRwy9uXhTO,"Metro Boomin, The Weeknd, 21 Savage",Creepin' (with The Weeknd & 21 Savage),Republic Records,5,1,3,11,79488,United Arab Emirates,...,0.0,0.0484,0.4170,0.000000,0.0822,0.172,97.950,221520.0,4.0,2022-12-02
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7295,7ErtOGQ9DwyQa3lwP77j4u,Ruger,Asiwaju,Columbia,96,96,130,4,54026,South Africa,...,1.0,0.2400,0.6360,0.000005,0.1060,0.754,199.796,216000.0,4.0,2022-11-14
7296,4EI8VuxUuIHKfafU72emqz,Mariah Carey,We Belong Together,Island Records,97,97,115,50,53828,South Africa,...,1.0,0.0629,0.0264,0.000000,0.0865,0.767,139.987,201400.0,4.0,2005
7297,3Puq6i4xIRH4lrPvJxIC83,"Deep London, Nkosazana Daughter, Murumba Pitch...",Piano Ngijabulise,Cycad Wave,98,37,85,14,53752,South Africa,...,0.0,0.0628,0.0141,0.000823,0.0241,0.433,112.010,416037.0,4.0,2022-09-30
7298,7DQMBUK4oX9gV1qIzpoRz6,Aymos,Mama,DJs Production,99,54,86,14,53733,South Africa,...,0.0,0.0503,0.0166,0.136000,0.0895,0.314,113.008,450304.0,4.0,2022-08-12


In [499]:
# ADD LYRICS TO DATAFRAME 
df=pd.merge(merged,song_lyrics_df, on=['track_id', 'artist_names', 'track_name'], how='left')
df

Unnamed: 0,track_id,artist_names,track_name,source,rank,peak_rank,previous_rank,weeks_on_chart,streams,country,...,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,album_release_date,lyrics,lyrics_trans
0,0yLdNVWF3Srea0uzk55zFn,Miley Cyrus,Flowers,Columbia,1,1,1,5,124198,United Arab Emirates,...,0.0632,0.000005,0.0322,0.646,117.999,200455.0,4.0,2023-01-13,"We were good, we were gold\nKinda dream that c...",we were good we were gold kinda dream that can...
1,1Qrg8KqiBpW07V7PNxwwwL,SZA,Kill Bill,Top Dawg Entertainment/RCA Records,2,1,2,10,106927,United Arab Emirates,...,0.0521,0.144000,0.1610,0.418,88.980,153947.0,4.0,2022-12-08,I'm still a fan even though I was salty\nHate ...,im still a fan even though i was salty hate to...
2,6AQbmUe0Qwf5PZnt4HmTXv,"PinkPantheress, Ice Spice",Boy's a liar Pt. 2,Warner Records,3,3,59,2,83627,United Arab Emirates,...,0.2520,0.000128,0.2480,0.857,132.962,131013.0,4.0,2023-02-03,Take a look inside your heart\nIs there any ro...,take a look inside your heart is there any roo...
3,0WtM2NBVQNNJLh6scP13H8,"Rema, Selena Gomez",Calm Down (with Selena Gomez),Mavin Records / Jonzing World,4,2,4,25,79714,United Arab Emirates,...,0.3820,0.000669,0.1140,0.802,106.999,239318.0,4.0,2022-08-25,"Vibez\nOh, no\nAnother banger\nBaby, calm down...",vibez oh no another banger baby calm down calm...
4,2dHHgzDwk4BJdRwy9uXhTO,"Metro Boomin, The Weeknd, 21 Savage",Creepin' (with The Weeknd & 21 Savage),Republic Records,5,1,3,11,79488,United Arab Emirates,...,0.4170,0.000000,0.0822,0.172,97.950,221520.0,4.0,2022-12-02,"Ooh, ooh-ooh\nOoh-ooh-ooh, ooh, ooh-ooh (Just ...",ooh oohooh oohoohooh ooh oohooh just cant beli...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7295,7ErtOGQ9DwyQa3lwP77j4u,Ruger,Asiwaju,Columbia,96,96,130,4,54026,South Africa,...,0.6360,0.000005,0.1060,0.754,199.796,216000.0,4.0,2022-11-14,Cook that thing\nMan getting high till I fade ...,cook that thing man getting high till i fade o...
7296,4EI8VuxUuIHKfafU72emqz,Mariah Carey,We Belong Together,Island Records,97,97,115,50,53828,South Africa,...,0.0264,0.000000,0.0865,0.767,139.987,201400.0,4.0,2005,"Sweet love, yeah\nI didn't mean it when I said...",sweet love yeah i didnt mean it when i said i ...
7297,3Puq6i4xIRH4lrPvJxIC83,"Deep London, Nkosazana Daughter, Murumba Pitch...",Piano Ngijabulise,Cycad Wave,98,37,85,14,53752,South Africa,...,0.0141,0.000823,0.0241,0.433,112.010,416037.0,4.0,2022-09-30,Okokuqala ukuhlakanipha\nUkumesaba uJehova\nAy...,first is wisdom to fear jehovah they dont hear...
7298,7DQMBUK4oX9gV1qIzpoRz6,Aymos,Mama,DJs Production,99,54,86,14,53733,South Africa,...,0.0166,0.136000,0.0895,0.314,113.008,450304.0,4.0,2022-08-12,Mama mama mama mama\nMama mama mama mama\nMama...,mother mother mother mother mother mother moth...


In [500]:
# EXPORT DATA 
df.to_csv('merged_final_top100.csv') 

In [501]:
# DISPLAY TRACKS WITH NO AVAILABLE LYRICS 
df.loc[df['lyrics']=='None']

Unnamed: 0,track_id,artist_names,track_name,source,rank,peak_rank,previous_rank,weeks_on_chart,streams,country,...,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,album_release_date,lyrics,lyrics_trans
6,3l6K9SW5VFJyA5jBtioFFt,3GAR BABY,HUSTLE NA MUST,TGFG ENTERTAINMENT,7,7,-1,1,60525,United Arab Emirates,...,0.1000,0.01350,0.1610,0.865,104.996,152040.0,4.0,2023-02-10,,none
24,0CtZpaOhtzvLV3FfcsVpQo,"Vishal-Shekhar, Shilpa Rao, Caralisa Monteiro,...","Besharam Rang (From ""Pathaan"")",YRF Music,25,14,15,9,36508,United Arab Emirates,...,0.0587,0.00237,0.1540,0.649,115.997,258474.0,4.0,2022-12-12,,none
31,6FAYpZ4jve8vpvTwUvjK6H,"Vishal-Shekhar, Arijit Singh, Sukriti Kakar, V...",Jhoome Jo Pathaan,YRF Music,32,18,23,7,32846,United Arab Emirates,...,0.0964,0.00000,0.3310,0.616,104.964,208164.0,4.0,2022-12-22,,none
76,72zHuDxFQTjbL51qJQSA7j,"Jasleen Royal, B Praak, Romy, Anvita Dutt","Ranjha (From ""Shershaah"")",Sony Music Entertainment India Pvt. Ltd.,77,58,-1,29,20935,United Arab Emirates,...,0.4780,0.00000,0.0971,0.236,82.941,228855.0,4.0,2021-08-05,,none
141,3eUtQSdde3wNmXOW2OESKi,"El Polaco, La China",Ya No Quiero Verte,Columbia,42,23,43,16,1088456,Argentina,...,0.1360,0.00000,0.0444,0.604,139.875,165635.0,3.0,2022-10-28,,none
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7109,27fqy8VruqYZlKiK1qfwEd,"tlinh, 2pillz, Wokeupat4am",ghệ iu dấu của em ơi,Universal Music Indochina Distributed Labels,10,10,-1,1,311416,Vietnam,...,0.6020,0.02080,0.1720,0.276,105.076,205996.0,4.0,2023-02-10,,none
7110,3ukrFH17Zl6iEZ2QJ1Zwiy,"RPT Orijinn, Ronboogz",Don't Côi,Rapital,11,3,3,8,303515,Vietnam,...,0.8160,0.00000,0.1020,0.361,110.079,148880.0,4.0,2022-11-20,,none
7144,3wUp8eCTshIrJcYbjWaoyP,Phuong Ly,ThichThich,Phuong Ly,45,9,43,30,171778,Vietnam,...,0.5630,0.00000,0.0994,0.619,124.072,241935.0,4.0,2022-07-24,,none
7153,2fjqdDz6jJn6VPgrSDDMvp,"GPG msmy, AK49",YOU (feat. AK49),GePolyG,54,54,81,2,156338,Vietnam,...,0.8220,0.00000,0.1800,0.611,130.074,147692.0,4.0,2023-02-02,,none
