This notebook requires a **MusixMatch** and **Genius** API keys. These values must be stored in a python file named *api_keys.py* and must be stored under the variables `musixmatch_api_key` and `genius_api_key`, respectively.

**However**, the Genius API requests executed in this notebook do not need a working API key, but is instead only used to initialize the API object. Therefore, the `genius_api_key` can either be set as a valid API key or as any random string.

In [1]:
from api_tools import *

# Collect data from MusixMatch

## Get the parent genres from MusixMatch

This returns a list of JSON objects, containing the genre_id and genre_name of each genre.

In [2]:
genres = get_genres()
len(genres)

48

Remove genres that are not useful for the data science project (e.g. genres of other countries' music (K-Pop, French Pop, Chinese) and 'Instrumental')

In [9]:
to_remove_genres = [3, 10, 12, 25, 27, 28, 29, 30, 51, 52, 53, 1122, 1197, 1290, 1291, 100024, 50000063, 50000064, 
                    50000061, 50000066, 50000068, 1232, 1243, 1262]

genres = [x for x in genres if x['genre_id'] not in to_remove_genres]
len(genres)

24

## Collect songs from each genre

The function `get_top_tracks()` returns a list of at most 100 JSON objects (tracks), unless otherwise specified (`page_size` parameter). 

This data project aims to collect at least 200 lyrics for each genre, thus the number of pages to be scraped is three (3), for extra allowance. The API requests begin at page three to check if it is not empty (therefore, has more than 200 songs), and decrements until page one to minimize the number of API requests.

The default values of the function parameters are based on the project's scope, which are:

> `f_track_release_group_first_release_date_min` = '20110101'  
> `f_track_release_group_first_release_date_max` = '20201231'  
> `f_lyrics_language` = 'tl'  
> `page_size` = 100  
> `s_track_rating` = 'desc'

Additionally, if the number of songs collected are less than 200, then it is not included in the DataFrame.

The genres that met the requirement are the following:

In [2]:
with open('CSV Files/genres_collection.json') as infile:
    genres_collection = json.load(infile)

# Create the DataFrames

## Import existing CSV files to be updated, if available

In [28]:
try:
    pop_df = pd.read_csv('CSV Files/pop_music.csv').drop('Unnamed: 0', axis=1)
    pop_json = json.loads(pop_df.to_json(orient='records'))

    # Convert strings to list    
    for song in pop_json:
        song['genre_id'] = song['genre_id'][1:-1].split(', ')
        song['genre_names'] = [x[1:-1] for x in song['genre_names'][1:-1].split(', ')]

except: 
    pop_json = []

In [14]:
try:
    rbSoul_df = pd.read_csv('CSV Files/rbSoul_music.csv').drop('Unnamed: 0', axis=1)
    rbSoul_json = json.loads(rbSoul_df.to_json(orient='records'))
    
    # Convert strings to list
    for song in rbSoul_json:
        song['genre_id'] = song['genre_id'][1:-1].split(', ')
        song['genre_names'] = [x[1:-1] for x in song['genre_names'][1:-1].split(', ')]
        
except:
    rbSoul_json = []

In [15]:
try:
    hiphop_df = pd.read_csv('CSV Files/hiphop_music.csv').drop('Unnamed: 0', axis=1)
    hiphop_json = json.loads(hiphop_df.to_json(orient='records'))

    # Convert strings to list
    for song in hiphop_json:
        song['genre_id'] = song['genre_id'][1:-1].split(', ')
        song['genre_names'] = [x[1:-1] for x in song['genre_names'][1:-1].split(', ')]

except:
    hiphop_json = []

In [16]:
try:
    alt_df = pd.read_csv('CSV Files/alternative_music.csv').drop('Unnamed: 0', axis=1)
    alt_json = json.loads(alt_df.to_json(orient='records'))

    # Convert strings to list
    for song in alt_json:
        song['genre_id'] = song['genre_id'][1:-1].split(', ')
        song['genre_names'] = [x[1:-1] for x in song['genre_names'][1:-1].split(', ')]
        
except:
    alt_json = []

In [17]:
try:
    rock_df = pd.read_csv('CSV Files/rock_music.csv').drop('Unnamed: 0', axis=1)
    rock_json = json.loads(rock_df.to_json(orient='records'))

    # Convert strings to list
    for song in rock_json:
        song['genre_id'] = song['genre_id'][1:-1].split(', ')
        song['genre_names'] = [x[1:-1] for x in song['genre_names'][1:-1].split(', ')]

except:
    rock_json = []

In [26]:
try:
    christian_df = pd.read_csv('CSV Files/christian_music_noAZLyrics.csv').drop('Unnamed: 0', axis=1)
    christian_json = json.loads(christian_df.to_json(orient='records'))

    for song in christian_json:
        song['genre_id'] = song['genre_id'][1:-1].split(', ')
        song['genre_names'] = [x[1:-1] for x in song['genre_names'][1:-1].split(', ')]
        
except:
    christian_json = []

## Format the collected data from MusixMatch and obtain the lyrics

In [10]:
get_genre_songs(pop_json, genres_collection[0], 202)
pop_df = pd.DataFrame(pop_json)
pop_df.to_csv('CSV Files/pop_music.csv')

print(pop_df['lyrics'].isna().sum(), "lyrics missing out of", len(pop_df), "in total.")
pop_df

92 lyrics missing out of 300 in total.


Unnamed: 0,track_id,track_name,artist_name,genre_id,genre_names,lyrics
0,163709256,Ikaw At Ako,Moira Dela Torre feat. Jason Marvin,[14],[Pop],Sabi nila\nBalang araw darating\nAng iyong tan...
1,167116583,Hanggang Dito Na Lang,Jaya,[14],[Pop],Di ko alam kung tama ba ito.\nLilisan ako upan...
2,114473971,Ikaw,Yeng Constantino,[14],[Pop],Sa pagpatak ng bawat oras ay ikaw\nAng iniisip...
3,114797567,Chinito,Yeng Constantino,[14],[Pop],Mapapansin mo ba\nKaya ang tulad ko\nKahit nas...
4,160018049,Dati (Cover Version),Ben&Ben,[14],[Pop],Datirati sabay pa nating pinangarap ang lahat\...
...,...,...,...,...,...,...
295,116159495,Bitter Song,Callalily feat. Maysh Baay,[14],[Pop],Di ako bitter\nKung akala mo'y nasasaktan\nSa ...
296,146162702,Trio,IMAGO,[14],[Pop],
297,86029897,San Ka Galing Kagabi (Bonus Track),Mayonnaise,[14],[Pop],
298,157224822,Alam Na,IMAGO,[14],[Pop],


In [7]:
get_genre_songs(rbSoul_json, genres_collection[1])
rbSoul_df = pd.DataFrame(rbSoul_json)
rbSoul_df.to_csv('CSV Files/rbSoul_music.csv')

print(rbSoul_df['lyrics'].isna().sum(), "lyrics missing out of", len(rbSoul_df), "in total.")
rbSoul_df

176 lyrics missing out of 239 in total.


Unnamed: 0,track_id,track_name,artist_name,genre_id,genre_names,lyrics
0,182569161,Lykkelig (Acoustic),D.Sound,"[1066, 1057, 15]","[New Acoustic, Downtempo, R&B/Soul]",
1,159046130,Sa Susunod Na Lang,PDL feat. Skusta Clee & Yuri,[15],[R&B/Soul],Ha! Gusto ko lang naman makausap ka eh.\nYou k...
2,43600153,Bespren,Coach Jungee feat. Yeng Constantino,[15],[R&B/Soul],
3,189995224,Exchange Gift,ALLMO$T,[15],[R&B/Soul],"Naaalala mo pa ba ko\nOh, pwede bang bati na t..."
4,157573520,Rubberband,ALLMO$T feat. FTD,"[15, 18]","[R&B/Soul, Hip Hop/Rap]",'Di ba sabi mo noon\nWalang bibitaw saating da...
...,...,...,...,...,...,...
234,198107668,Bakit Wala Na (feat. Chano),Peter Miranda,[15],[R&B/Soul],Bakit wala na\nAno na nangyari?\nBakit wala na...
235,205521123,Maganda ang giseng,Eric Ruckus,[15],[R&B/Soul],
236,196719526,"Stardust, Vol. 2",Yalien Dahlen,[15],[R&B/Soul],
237,204549151,Pinaglayo (feat. Kxle & Kyra),Poisonhxrzy,[15],[R&B/Soul],


In [10]:
get_genre_songs(hiphop_json, genres_collection[2])
hiphop_df = pd.DataFrame(hiphop_json)
hiphop_df.to_csv('CSV Files/hiphop_music.csv')

print(hiphop_df['lyrics'].isna().sum(), "lyrics missing out of", len(hiphop_df), "in total.")
hiphop_df

184 lyrics missing out of 300 in total.


Unnamed: 0,track_id,track_name,artist_name,genre_id,genre_names,lyrics
0,126834098,Onoff,Eevee,"[34, 18]","[Music, Hip Hop/Rap]",
1,125921810,lily,Eevee,[18],[Hip Hop/Rap],
2,169917164,Kailan Kaya Ako?,Kiara feat. Matthaios,"[34, 18]","[Music, Hip Hop/Rap]",
3,160904741,Ewan Ko Ba,ALLMO$T feat. Crakky,[18],[Hip Hop/Rap],
4,177235378,Kahit Na,ALLMO$T feat. Roberto Bello,[18],[Hip Hop/Rap],"Ayan na naman\nNakasimangot, tampo’y ‘di maiwa..."
...,...,...,...,...,...,...
295,69144184,A.T. We On Parlay,AT feat. Antonio Maxie,[18],[Hip Hop/Rap],
296,155870686,Husgado,Apoc the Death Architect,[18],[Hip Hop/Rap],
297,133336466,Love,Leon Marin,[18],[Hip Hop/Rap],
298,115635434,Warum ich Musik mach,O.R.C.A,[18],[Hip Hop/Rap],


In [13]:
get_genre_songs(alt_json, genres_collection[3])
alt_df = pd.DataFrame(alt_json)
alt_df.to_csv('CSV Files/alternative_music.csv')

print(alt_df['lyrics'].isna().sum(), "lyrics missing out of", len(alt_df), "in total.")
alt_df

142 lyrics missing out of 300 in total.


Unnamed: 0,track_id,track_name,artist_name,genre_id,genre_names,lyrics
0,206436336,Paubaya,Moira Dela Torre,[20],[Alternative],Saan nagsimulang magbago ang lahat\nKailan nun...
1,152025019,Buwan,Juan Karlos Labajo,[20],[Alternative],"Ako'y sayo, ikaw ay akin\nGanda mo sa paningin..."
2,88489270,Tadhana,Up Dharma Down,[20],[Alternative],Sa hindi inaaasahang\nPagtatagpo ng mga mundo\...
3,78880323,Orange,Parokya Ni Edgar,"[1133, 20]","[Pop/Rock, Alternative]","Gusto kong kumain ng lemon,\nKahit ano, kahit ..."
4,170029087,Pagtingin,Ben&Ben,"[20, 14]","[Alternative, Pop]",Dami pang gustong sabihin\nNgunit wag nalang m...
...,...,...,...,...,...,...
295,129569951,Lutang,Ely Buendia feat. the itchyworms,[20],[Alternative],
296,188322731,Ipanumpa ko,Oh! Caraga,[20],[Alternative],
297,112026444,Silakbo,MilesExperience,[20],[Alternative],
298,178700922,Sigurado,Udd,"[7, 20]","[Electronic, Alternative]",Natatakot ikaw ay mawala sa aking tabi\nNamumu...


In [16]:
get_genre_songs(rock_json, genres_collection[4])
rock_df = pd.DataFrame(rock_json)
rock_df.to_csv('CSV Files/rock_music.csv')

print(rock_df['lyrics'].isna().sum(), "lyrics missing out of", len(rock_df), "in total.")
rock_df

212 lyrics missing out of 300 in total.


Unnamed: 0,track_id,track_name,artist_name,genre_id,genre_names,lyrics
0,136802912,Kahit Di Mo Alam,December Avenue,[21],[Rock],Ipikit mo man ang iyong mata\n'Di pa rin naman...
1,84257639,Parang Mali,Siakol,[21],[Rock],
2,104448893,Eroplanong Papel,December Avenue,[21],[Rock],"Sandali, 'wag kang mapupuno sa paghihirap\nDar..."
3,84257635,Basag,Siakol,[21],[Rock],
4,161144075,Bangin,Mayonnaise,[21],[Rock],
...,...,...,...,...,...,...
295,149147068,Huli,Viente,[21],[Rock],
296,110219982,Sayo'ng Sa'yo,Zoom Zoom Lunacy,[21],[Rock],
297,162636949,Antok,Agaw Agimat,[21],[Rock],
298,124514919,Lo-Fi,Pinoy Rock Station,[21],[Rock],


In [6]:
get_genre_songs(christian_json, genres_collection[5], 120)
christian_df = pd.DataFrame(christian_json)
christian_df.to_csv('CSV Files/christian_music_noAZLyrics.csv')

print(christian_df['lyrics'].isna().sum(), "lyrics missing out of", len(christian_df), "in total.")
christian_df

278 lyrics missing out of 293 in total.


Unnamed: 0,track_id,track_name,artist_name,genre_id,genre_names,lyrics
0,201311852,Walang Katulad,Victory Worship,[22],[Christian & Gospel],Ika'y aking liwanag\nSa dilim ng landas\nSa ba...
1,73676809,Shine Upon the Philippines,Victory Worship,[22],[Christian & Gospel],"You are the light, You are the hope\nYou are t..."
2,189377564,Pagbabalik,Victory Worship,[22],[Christian & Gospel],"Ngayon ay aahon, at kakalimutan ang nakaraan\n..."
3,189377568,Tagumpay,Victory Worship,[22],[Christian & Gospel],"Sa sigaw ng alon, 'di mangangamba\nSa gitna ng..."
4,189377567,Maghari,Victory Worship,[22],[Christian & Gospel],"Sa gitna ng kaguluhan, ang tinig Mo ay hanap\n..."
...,...,...,...,...,...,...
288,83929781,Gagawa Ang Diyos (God Will Make a Way),Lito Magnaye,[22],[Christian & Gospel],
289,79505181,Ang Panginoon Ay Awitan,Tony Rodeo,[22],[Christian & Gospel],
290,136863016,Ikaw Lamang (Live),Rommel Guevara,[22],[Christian & Gospel],
291,78391977,Mga Pangako Mo (Live),Jesus One Generation,[22],[Christian & Gospel],


### Check for Lyrics with Translations

In [29]:
withTrans = []
music_dfs = [pop_df, rbSoul_df, hiphop_df, alt_df, rock_df, christian_df]

for df in music_dfs:
    withTrans.append(df[df['lyrics'].notnull()]
                     [df[df['lyrics'].notnull()]['lyrics'].str.lower().str.contains("translation")]
                     .index.tolist())
    
withTrans

[[43, 70], [], [], [], [], []]

### Manually remove translated sections

In [20]:
# Pop

lyrics = pop_df.iloc[70]['lyrics']
pop_df.loc[70, 'lyrics'] = lyrics[lyrics.index('Tagalog')+7:lyrics.index('English Translation', 20)]

lyrics = pop_df.iloc[43]['lyrics']
pop_df.loc[43, 'lyrics'] = lyrics[:lyrics.index('English Translation')]

## Compile all genres into one DataFrame

In [30]:
all_music_json = pop_json.copy()
all_music_json.extend(rbSoul_json)
all_music_json.extend(hiphop_json)
all_music_json.extend(alt_json)
all_music_json.extend(rock_json)
all_music_json.extend(christian_json)

In [31]:
all_music_df = pd.DataFrame(all_music_json)
all_music_df.to_csv('CSV Files/all_music.csv')
all_music_df

Unnamed: 0,track_id,track_name,artist_name,genre_id,genre_names,lyrics
0,163709256,Ikaw At Ako,Moira Dela Torre feat. Jason Marvin,[14],[Pop],Sabi nila\nBalang araw darating\nAng iyong tan...
1,167116583,Hanggang Dito Na Lang,Jaya,[14],[Pop],Di ko alam kung tama ba ito.\nLilisan ako upan...
2,114473971,Ikaw,Yeng Constantino,[14],[Pop],Sa pagpatak ng bawat oras ay ikaw\nAng iniisip...
3,114797567,Chinito,Yeng Constantino,[14],[Pop],Mapapansin mo ba\nKaya ang tulad ko\nKahit nas...
4,160018049,Dati (Cover Version),Ben&Ben,[14],[Pop],Datirati sabay pa nating pinangarap ang lahat\...
...,...,...,...,...,...,...
1727,83929781,Gagawa Ang Diyos (God Will Make a Way),Lito Magnaye,[22],[Christian & Gospel],
1728,79505181,Ang Panginoon Ay Awitan,Tony Rodeo,[22],[Christian & Gospel],
1729,136863016,Ikaw Lamang (Live),Rommel Guevara,[22],[Christian & Gospel],
1730,78391977,Mga Pangako Mo (Live),Jesus One Generation,[22],[Christian & Gospel],


In [72]:
print("Number of Songs with Lyrics")
print()

print("Pop:                ", pop_df['lyrics'].notnull().sum())
print("R&B / Soul:         ", rbSoul_df['lyrics'].notnull().sum())
print("Hiphop:             ", hiphop_df['lyrics'].notnull().sum())
print("Alternative:        ", alt_df['lyrics'].notnull().sum())
print("Rock:               ", rock_df['lyrics'].notnull().sum())
print("Christian & Gospel: ", christian_df['lyrics'].notnull().sum())
print("Total:              ", all_music_df['lyrics'].notnull().sum())

Number of Songs with Lyrics

Pop:                 179
R&B / Soul:          56
Hiphop:              110
Alternative:         139
Rock:                78
Christian & Gospel:  12
Total:               574
