This notebook requires a **MusixMatch** and **Genius** API keys. These values must be stored in a python file named *api_keys.py* and must be stored under the variables `musixmatch_api_key` and `genius_api_key`, respectively.

**However**, the Genius API requests executed in this notebook do not need a working API key, but is instead only used to initialize the API object. Therefore, the `genius_api_key` can either be set as a valid API key or as any random string.

In [1]:
from tools import *

# Collect data from MusixMatch

## Get the parent genres from MusixMatch

This returns a list of JSON objects, containing the genre_id and genre_name of each genre.

In [2]:
genres = get_genres()
len(genres)

48

Remove genres that are not useful for the data science project (e.g. genres of other countries' music (K-Pop, French Pop, Chinese) and 'Instrumental')

In [3]:
to_remove_genres = [10, 27, 28, 29, 30, 51, 53, 1122, 1197, 100024, 50000064, 
                    50000061, 50000066, 50000068, 1232, 1243, 1262]

genres = [x for x in genres if x['genre_id'] not in to_remove_genres]
len(genres)

31

## Collect songs from each genre

The function `get_top_tracks()` returns a list of at most 100 JSON objects (tracks), unless otherwise specified (`page_size` parameter). This data project aims to collect 200 songs for each genre, thus the `max_pages` variable is set to 2. The default values of the function parameters are based on the project's scope, which are:

> `f_track_release_group_first_release_date_min` = '20110101'  
> `f_track_release_group_first_release_date_max` = '20201231'  
> `f_lyrics_language` = 'tl'  
> `page_size` = 100  
> `s_track_rating` = 'desc'

Additionally, if the number of songs collected are less than 200, then it is not included in the DataFrame.

In [4]:
genres_collection = []
max_pages = 2
page = 1

for i in genres:
    print("Collecting songs from " + i['genre_name'], ' ', end='')
    
    page = 1    
    genre_tracks = []
    
    results = get_top_tracks(f_music_genre_id=i['genre_id'], page=page, apikey=musixmatch_api_key)

    while (page <= max_pages and len(results['message']['body']['track_list']) > 0):
        page += 1

        genre_tracks.extend(results['message']['body']['track_list'])        
        results = get_top_tracks(f_music_genre_id=i['genre_id'], page=page, apikey=musixmatch_api_key)
        
    print()
    if len(genre_tracks) == 200:
        genres_collection.append({
            'genre_name': i['genre_name'],
            'genre_tracks': genre_tracks
        })

Collecting songs from Blues  
Collecting songs from Comedy  
Collecting songs from Children's Music  
Collecting songs from Classical  
Collecting songs from Country  
Collecting songs from Electronic  
Collecting songs from Holiday  
Collecting songs from Opera  
Collecting songs from Folk  
Collecting songs from Orchestral  
Collecting songs from Jazz  
Collecting songs from Marching Bands  
Collecting songs from Latin  
Collecting songs from New Age  
Collecting songs from Pop  
Collecting songs from R&B/Soul  
Collecting songs from Soundtrack  
Collecting songs from Dance  
Collecting songs from Hip Hop/Rap  
Collecting songs from World  
Collecting songs from Alternative  
Collecting songs from Rock  
Collecting songs from Christian & Gospel  
Collecting songs from Vocal  
Collecting songs from Reggae  
Collecting songs from Easy Listening  
Collecting songs from Fitness & Workout  
Collecting songs from Karaoke  
Collecting songs from Hip-Hop/Rap  
Collecting songs from Inspirati

The genres that met the requirement are the following:

In [5]:
for i in genres_collection:
    print(i['genre_name'], '  (', len(i['genre_tracks']), ' songs)', sep='')

Pop  (200 songs)
R&B/Soul  (200 songs)
Hip Hop/Rap  (200 songs)
Alternative  (200 songs)
Rock  (200 songs)
Christian & Gospel  (200 songs)


# Create the DataFrames

## Import existing CSV files to be updated, if available

In [12]:
try:
    pop_df = pd.read_csv('pop_music.csv').drop('Unnamed: 0', axis=1)
    pop_json = json.loads(pop_df.to_json(orient='records'))

    # Convert strings to list    
    for song in pop_json:
        song['genre_id'] = song['genre_id'][1:-1].split(', ')
        song['genre_names'] = [x[1:-1] for x in song['genre_names'][1:-1].split(', ')]

except: 
    pop_json = []

In [13]:
try:
    rbSoul_df = pd.read_csv('rbSoul_music.csv').drop('Unnamed: 0', axis=1)
    rbSoul_json = json.loads(rbSoul_df.to_json(orient='records'))
    
    # Convert strings to list
    for song in rbSoul_json:
        song['genre_id'] = song['genre_id'][1:-1].split(', ')
        song['genre_names'] = [x[1:-1] for x in song['genre_names'][1:-1].split(', ')]
        
except:
    rbSoul_json = []

In [14]:
try:
    hiphop_df = pd.read_csv('hiphop_music.csv').drop('Unnamed: 0', axis=1)
    hiphop_json = json.loads(hiphop_df.to_json(orient='records'))

    # Convert strings to list
    for song in hiphop_json:
        song['genre_id'] = song['genre_id'][1:-1].split(', ')
        song['genre_names'] = [x[1:-1] for x in song['genre_names'][1:-1].split(', ')]

except:
    hiphop_json = []

In [15]:
try:
    alt_df = pd.read_csv('alternative_music.csv').drop('Unnamed: 0', axis=1)
    alt_json = json.loads(alt_df.to_json(orient='records'))

    # Convert strings to list
    for song in alt_json:
        song['genre_id'] = song['genre_id'][1:-1].split(', ')
        song['genre_names'] = [x[1:-1] for x in song['genre_names'][1:-1].split(', ')]
        
except:
    alt_json = []

In [16]:
try:
    rock_df = pd.read_csv('rock_music.csv').drop('Unnamed: 0', axis=1)
    rock_json = json.loads(rock_df.to_json(orient='records'))

    # Convert strings to list
    for song in rock_json:
        song['genre_id'] = song['genre_id'][1:-1].split(', ')
        song['genre_names'] = [x[1:-1] for x in song['genre_names'][1:-1].split(', ')]

except:
    rock_json = []

In [17]:
try:
    christian_df = pd.read_csv('christian_music.csv').drop('Unnamed: 0', axis=1)
    christian_json = json.loads(christian_df.to_json(orient='records'))

    for song in christian_json:
        song['genre_id'] = song['genre_id'][1:-1].split(', ')
        song['genre_names'] = [x[1:-1] for x in song['genre_names'][1:-1].split(', ')]
        
except:
    christian_json = []

## Format the collected data from MusixMatch and obtain the lyrics

In [27]:
get_genre_songs(pop_json, genres_collection[0])
pop_df = pd.DataFrame(pop_json)
pop_df.to_csv('pop_music.csv')

print(pop_df['lyrics'].isna().sum(), "lyrics missing out of", len(pop_df), "in total.")
pop_df

20 lyrics missing out of 200 in total.


Unnamed: 0,track_id,track_name,artist_name,genre_id,genre_names,lyrics
0,163709256,Ikaw At Ako,Moira Dela Torre feat. Jason Marvin,[14],[Pop],Sabi nila\nBalang araw darating\nAng i'yong ta...
1,167116583,Hanggang Dito Na Lang,Jaya,[14],[Pop],'Di ko alam kung tama ba ito\nLilisan ako upan...
2,114473971,Ikaw,Yeng Constantino,[14],[Pop],Sa pagpatak ng bawat oras ay ikaw\nAng iniisip...
3,114797567,Chinito,Yeng Constantino,[14],[Pop],Napapansin mo ba\nKaya ang tulad ko\nKahit nas...
4,170308366,"Ba'T Gano'N? (Theme From the Movie ""Family His...",Miguel Tanfelix feat. Mikoy Morales & Jemwell ...,"[34, 14]","[Music, Pop]",
...,...,...,...,...,...,...
195,114724390,Wag Mo Akong Iwan Mag Isa,Angeline Quinto,[14],[Pop],Aalis ka na naman\nAt ako'y iiwan mo\nLagi kan...
196,116159495,Bitter Song,Callalily feat. Maysh Baay,[14],[Pop],"We are forced into world-wide brutality, leade..."
197,84193775,Muli,Kyla,[14],[Pop],"Sana Maulit Muli, Ang Mga Oras Nating Nakaraan..."
198,42006822,Muling Buksan Ang Puso,Erik Santos,[14],[Pop],Walang hindi man lang dumanas kailanman\nMagma...


In [20]:
get_genre_songs(rbSoul_json, genres_collection[1], 199)
rbSoul_df = pd.DataFrame(rbSoul_json)
rbSoul_df.to_csv('rbSoul_music.csv')

print(rbSoul_df['lyrics'].isna().sum(), "lyrics missing out of", len(rbSoul_df), "in total.")
rbSoul_df

73 lyrics missing out of 200 in total.


Unnamed: 0,track_id,track_name,artist_name,genre_id,genre_names,lyrics
0,159046130,Sa Susunod Na Lang,PDL feat. Skusta Clee & Yuri,[15],[R&B/Soul],Ha! Gusto ko lang naman makausap ka eh\nYou kn...
1,43600153,Bespren,Coach Jungee feat. Yeng Constantino,[15],[R&B/Soul],
2,189995224,Exchange Gift,ALLMO$T,[15],[R&B/Soul],"Naaalala mo pa ba ko\nOh, pwede bang bati na t..."
3,157573520,Rubberband,ALLMO$T feat. FTD,"[15, 18]","[R&B/Soul, Hip Hop/Rap]",'Di ba sabi mo noon\nWalang bibitaw saating da...
4,148559015,Paano Na,JP Bacallan feat. Because,[15],[R&B/Soul],Binigay kong lahat sayo\nInubos ko lahat ng pa...
...,...,...,...,...,...,...
195,193138261,Ako Na Lang,Alfie A,[15],[R&B/Soul],
196,191791217,Malaya (feat. Steven Peregrina),Nrprods,[15],[R&B/Soul],
197,207074132,Hagap,Jobe Derick feat. Emcy Grey,[],[],
198,202218443,Lakas Tama,Pieces,[15],[R&B/Soul],"F-F-F-Fractious Frank\n\nDolce Gabbana, jalois..."


In [21]:
get_genre_songs(hiphop_json, genres_collection[2])
hiphop_df = pd.DataFrame(hiphop_json)
hiphop_df.to_csv('hiphop_music.csv')

print(hiphop_df['lyrics'].isna().sum(), "lyrics missing out of", len(hiphop_df), "in total.")
hiphop_df

26 lyrics missing out of 200 in total.


Unnamed: 0,track_id,track_name,artist_name,genre_id,genre_names,lyrics
0,126834098,Onoff,Eevee,"[34, 18]","[Music, Hip Hop/Rap]",
1,125921810,lily,Eevee,[18],[Hip Hop/Rap],
2,169917164,Kailan Kaya Ako?,Kiara feat. Matthaios,"[34, 18]","[Music, Hip Hop/Rap]",Matthaios be wonderin'\n\nTinatanong ko sa sar...
3,160904741,Ewan Ko Ba,ALLMO$T feat. Crakky,[18],[Hip Hop/Rap],
4,177235378,Kahit Na,ALLMO$T feat. Roberto Bello,[18],[Hip Hop/Rap],"Ayan na naman\nNakasimangot, tampo’y ‘di maiwa..."
...,...,...,...,...,...,...
195,180460400,Paliparan,Ron Henley feat. Jameson,[18],[Hip Hop/Rap],'Di sa pagdadrama\nMas malungkot pa sa bagsak...
196,34534273,Inspirasyon,Shehyee,"[18, 14]","[Hip Hop/Rap, Pop]",Verse 1:\nAkala ko sa teleserye lang nangyayar...
197,73276010,Tayo Pa Kaya,Crazy As Pinoy,[18],[Hip Hop/Rap],
198,166252693,Guwantes,Nero,[18],[Hip Hop/Rap],Real G Remix\nKiubbah\nMany Malon\n\nYo me est...


In [22]:
get_genre_songs(alt_json, genres_collection[3])
alt_df = pd.DataFrame(alt_json)
alt_df.to_csv('alternative_music.csv')

print(alt_df['lyrics'].isna().sum(), "lyrics missing out of", len(alt_df), "in total.")
alt_df

21 lyrics missing out of 200 in total.


Unnamed: 0,track_id,track_name,artist_name,genre_id,genre_names,lyrics
0,206436336,Paubaya,Moira Dela Torre,[20],[Alternative],Saan nagsimulang magbago ang lahat\nKailan nun...
1,152025019,Buwan,Juan Karlos Labajo,[20],[Alternative],"[""\n\r\nAko'y sayo ikaw ay akin\nGanda mo sa p..."
2,78880323,Orange,Parokya Ni Edgar,"[1133, 20]","[Pop/Rock, Alternative]","Gusto kong kumain ng lemon\nKahit ano, kahit m..."
3,170029087,Pagtingin,Ben&Ben,"[20, 14]","[Alternative, Pop]",Dami pang gustong sabihin\nNgunit 'wag nalang ...
4,170029098,Araw-Araw,Ben&Ben,"[20, 14]","[Alternative, Pop]",Umaga na sa ating duyan\n'Wag nang mawawala\nU...
...,...,...,...,...,...,...
195,170321189,Stranded,Tanya Markova,[20],[Alternative],Lagi kitang naaalala\nNag-iisip kung bakit nga...
196,208861208,First Love,April Boy Regino,[20],[Alternative],"Gentlemen,\nA copy of the Address delivered by..."
197,130453548,Tahanan,Nica del Rosario,[20],[Alternative],(Verse 1)\nHuwag mangamba\nAlam kong ika’y pag...
198,205690739,Tinadhana Sa 'Yo,Zephanie,[20],[Alternative],"[""\n\r\nAnong meron s'yang\nWala sa akin\nBaki..."


In [23]:
get_genre_songs(rock_json, genres_collection[4])
rock_df = pd.DataFrame(rock_json)
rock_df.to_csv('rock_music.csv')

print(rock_df['lyrics'].isna().sum(), "lyrics missing out of", len(rock_df), "in total.")
rock_df

56 lyrics missing out of 200 in total.


Unnamed: 0,track_id,track_name,artist_name,genre_id,genre_names,lyrics
0,136802912,Kahit Di Mo Alam,December Avenue,[21],[Rock],Ipikit mo man ang iyong mata\n'Di pa rin naman...
1,84257639,Parang Mali,Siakol,[21],[Rock],['\n\r\nParang Mali\nKung kinausap mo lang siy...
2,104448893,Eroplanong Papel,December Avenue,[21],[Rock],"Sandali, 'wag kang mapupuno sa paghihirap\nDar..."
3,84257635,Basag,Siakol,[21],[Rock],['\n\r\nKung hindi na masaya\nKung hindi na ma...
4,161144075,Bangin,Mayonnaise,[21],[Rock],"I wiggle and wobble, bouncing that jelly, oh m..."
...,...,...,...,...,...,...
195,128488806,Agwat,Popoy Dela Cruz,"[10, 21]","[Singer/Songwriter, Rock]",
196,64424947,Tsismosa,Kley,[21],[Rock],
197,205097427,Pagitan,Papapeta,[21],[Rock],Sa pagitan ng araw at buwan\nSa pagitan ng sig...
198,175392195,Suntok Sa Buwan,Jaywalkers,[21],[Rock],


In [24]:
get_genre_songs(christian_json, genres_collection[5])
christian_df = pd.DataFrame(christian_json)
christian_df.to_csv('christian_music.csv')

print(christian_df['lyrics'].isna().sum(), "lyrics missing out of", len(christian_df), "in total.")
christian_df

178 lyrics missing out of 200 in total.


Unnamed: 0,track_id,track_name,artist_name,genre_id,genre_names,lyrics
0,201311852,Walang Katulad,Victory Worship,[22],[Christian & Gospel],Ika'y aking liwanag\nSa dilim ng landas\nSa ba...
1,73676809,Shine Upon the Philippines,Victory Worship,[22],[Christian & Gospel],"You are the light, You are the hope\nYou are t..."
2,189377564,Pagbabalik,Victory Worship,[22],[Christian & Gospel],"Ngayon ay aahon, at kakalimutan ang nakaraan\n..."
3,189377568,Tagumpay,Victory Worship,[22],[Christian & Gospel],"Sa sigaw ng alon, 'di mangangamba\nSa gitna ng..."
4,189377567,Maghari,Victory Worship,[22],[Christian & Gospel],"Sa gitna ng kaguluhan, ang tinig Mo ay hanap\n..."
...,...,...,...,...,...,...
195,74789949,Bakit Kailangan Mag-Pray?,Musikatha Kids,[22],[Christian & Gospel],
196,200856862,Tagpuan,Jay Tolentino,"[14, 22]","[Pop, Christian & Gospel]",
197,79505192,Tayo's Sumamba,Tony Rodeo,[22],[Christian & Gospel],
198,131978085,Biyayang Ganap,Faithmusic Manila,[22],[Christian & Gospel],


## Compile all genres into one DataFrame

In [25]:
all_music_json = pop_json.copy()
all_music_json.extend(rbSoul_json)
all_music_json.extend(hiphop_json)
all_music_json.extend(alt_json)
all_music_json.extend(rock_json)
all_music_json.extend(christian_json)

In [26]:
all_music_df = pd.DataFrame(all_music_json)
all_music_df.to_csv('all_music.csv')
all_music_df

Unnamed: 0,track_id,track_name,artist_name,genre_id,genre_names,lyrics
0,163709256,Ikaw At Ako,Moira Dela Torre feat. Jason Marvin,[14],[Pop],Sabi nila\nBalang araw darating\nAng i'yong ta...
1,167116583,Hanggang Dito Na Lang,Jaya,[14],[Pop],'Di ko alam kung tama ba ito\nLilisan ako upan...
2,114473971,Ikaw,Yeng Constantino,[14],[Pop],Sa pagpatak ng bawat oras ay ikaw\nAng iniisip...
3,114797567,Chinito,Yeng Constantino,[14],[Pop],Napapansin mo ba\nKaya ang tulad ko\nKahit nas...
4,170308366,"Ba'T Gano'N? (Theme From the Movie ""Family His...",Miguel Tanfelix feat. Mikoy Morales & Jemwell ...,"[34, 14]","[Music, Pop]",
...,...,...,...,...,...,...
1195,74789949,Bakit Kailangan Mag-Pray?,Musikatha Kids,[22],[Christian & Gospel],
1196,200856862,Tagpuan,Jay Tolentino,"[14, 22]","[Pop, Christian & Gospel]",
1197,79505192,Tayo's Sumamba,Tony Rodeo,[22],[Christian & Gospel],
1198,131978085,Biyayang Ganap,Faithmusic Manila,[22],[Christian & Gospel],
