# **MATH2020 - Discrete Mathematics - Mini Project**
##### **Project title**: A lyric-based genre classification – A Naïve Bayes approach from naïve students.
##### **Team**: Grindset
##### **Members**
- Nguyễn Đại Nghĩa - 20nghia.nd@vinuni.edu.vn
- Bùi Đức Khánh An – 20an.bdk@vinuni.edu.vn
- Khau Liên Kiệt – 20kiet.kl@vinuni.edu.vn
- Vương Đỗ Tuấn Thành – 20thanh.vdt@vinuni.edu.vn



---
The key concept of the project is to implement Naïve Bayes’ Theorem for classifying song into the genres through the lyrics. The work procedure divides into three steps: 
- Handling raw data
- Featurizing
- Implementing Naïve Bayes’ Theorem

In terms of raw data handling, the dataset is obtained from [Tmthyjames](https://github.com/tmthyjames/cypher), which contains the song name, the lyrics, the year published, the artist, the genre, etc. The dataset will be pre-processed and featurized into planned datatypes for later uses. Regarding the featurizing, its main function is to calculate the frequency of words in each genre to calculate the likelihood. The third procedure is to apply the Naïve Bayes’ Theorem that gives prediction to the song which likelihood of the genre to be fallen into.

### **Step 0: Importing**

In [1]:
# Import library
import pandas as pd
import numpy as np

import nltk
from nltk import ngrams
nltk.download('wordnet')
from nltk.corpus import stopwords
nltk.download('stopwords')

from nltk.stem import WordNetLemmatizer

from collections import Counter

# Import Google Drive for dataset
from google.colab import drive
drive.mount("/content/drive")

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
# Import dataset
data_csv = pd.read_csv("/content/drive/MyDrive/lyrics_classification.csv") # Upload the dataset to the home Drive folder to load
data_csv

Unnamed: 0,album,artist,id,lyric,song,year,genre,ranker_genre,album_genre
0,The Johnny Cash Show (1970),Johnny Cash,0,These hands aren't the hands of a gentleman th...,These Hands,1970.0,Gospel,Country,
1,Rock the World (2000),Bubbles,0,(Spoken) Every night I look up into the sky an...,I Have A Dream,2000.0,,pop,Bubblegum Pop
2,The Johnny Cash Show (1970),Johnny Cash,1,These hands raised a family these hands built ...,These Hands,1970.0,Gospel,Country,
3,Recipe for Hate (1993),Bad Religion,1,I know a man who doesn't have many friends,My Poor Friend Me,1993.0,Melodic Hardcore|Punk Rock,punk rock,Punk Rock
4,Punk in Drublic (1994),NOFX,1,Friday night we'll be drinkin' Maneshevitz,The Brews,1994.0,Hardcore Punk|Melodic Hardcore|Punk Rock|Skate...,punk rock,Punk Rock
...,...,...,...,...,...,...,...,...,...
2778354,Edit,Usher,633518,Bring it all together like contact,Don't Hurt Em,,R&B,rhythm and blues,
2778355,The Hangover (2015),Obie_Trice,633519,Ain't about college now it's about a dollar,Chuuuurch,2015.0,Hip Hop,Hip Hop,
2778356,Edit,Usher,633519,Fold it up two times then creep back,Don't Hurt Em,,R&B,rhythm and blues,
2778357,The Hangover (2015),Obie_Trice,633520,"She turnt up, hottest in the city, most popular",Chuuuurch,2015.0,Hip Hop,Hip Hop,


### **Step 1: Handling raw data**

In [3]:
# Merge screamo, punk rock, heavy metal to alt rock to simplify classification
# Too many genres can cause some trouble in classifying
data_csv['ranker_genre'] = np.where(
    (data_csv['ranker_genre'] == 'screamo')|
    (data_csv['ranker_genre'] == 'punk rock')|
    (data_csv['ranker_genre'] == 'heavy metal'), 
    'alt rock', 
    data_csv['ranker_genre']
)

# All label to lower case
data_csv['ranker_genre'] = np.where((data_csv['ranker_genre'] == 'Country'), 'country',  data_csv['ranker_genre'])
data_csv['ranker_genre'] = np.where((data_csv['ranker_genre'] == 'Hip Hop'), 'hip hop',  data_csv['ranker_genre'])

# For debugging
data_csv['ranker_genre'].value_counts()

country             838537
alt rock            742989
hip hop             632979
rhythm and blues    331234
pop                 232620
Name: ranker_genre, dtype: int64

In [4]:
# Data is available as 1 lyric per row. Merge to 1 song per row
group = ['song', 'year', 'album', 'genre', 'artist', 'ranker_genre']
lyrics_by_song = data_csv.sort_values(group).groupby(group).lyric.apply(' '.join).apply(lambda x: x.lower()).reset_index(name='lyric')

# Remove non-alphanumeric characters
lyrics_by_song["lyric"] = lyrics_by_song['lyric'].str.replace(r'[^\w\s+]','')

# For debugging
lyrics_by_song

Unnamed: 0,song,year,album,genre,artist,ranker_genre,lyric
0,Bad Girl,2004.0,Rollercoaster (2004),Punk Rock|Oi-Punk,The Adicts,alt rock,theres a rumor youre the talk of the town ther...
1,Cheese Tomato Man,2004.0,Rollercoaster (2004),Punk Rock|Oi-Punk,The Adicts,alt rock,cheese tomato flan crazy little man quiche lor...
2,Daydreamers Night,2004.0,Rollercoaster (2004),Punk Rock|Oi-Punk,The Adicts,alt rock,shroud cloak veil smoke in camera masquerade i...
3,Do It To Me,2004.0,Rollercoaster (2004),Punk Rock|Oi-Punk,The Adicts,alt rock,theres a kid who just found that love is not a...
4,Hello Farewell Goodbye,2004.0,Rollercoaster (2004),Punk Rock|Oi-Punk,The Adicts,alt rock,oh we kiss and we fight then we make up all ni...
...,...,...,...,...,...,...,...
62150,www.memory,2003.0,Greatest Hits Volume II (2003),Bluegrass,Alan Jackson,country,i know your leavin i see the signs your gonna ...
62151,xOn Our Kneesx,2012.0,Short Songs (2012),Heavy_Metal,Silverstein,alt rock,i dont want it i dont want it i dont want it i...
62152,¡Olé!,1999.0,Hopeless Romantic (1999),Punk Rock,The Bouncing Souls,alt rock,olexs 12 oleall xs 12 the bouncing souls no on...
62153,¡Viva La Gloria!,2009.0,21st Century Breakdown (2009),Alternative Rock|Pop Punk|Punk Rock,Green Day,alt rock,hey gloria are you standing close to the edge ...


In [5]:
# Clean unecessary column
group = ['song', 'ranker_genre', 'lyric']
lyrics_by_song = lyrics_by_song[group]

# For debugging
print(lyrics_by_song.shape)
lyrics_by_song

(62155, 3)


Unnamed: 0,song,ranker_genre,lyric
0,Bad Girl,alt rock,theres a rumor youre the talk of the town ther...
1,Cheese Tomato Man,alt rock,cheese tomato flan crazy little man quiche lor...
2,Daydreamers Night,alt rock,shroud cloak veil smoke in camera masquerade i...
3,Do It To Me,alt rock,theres a kid who just found that love is not a...
4,Hello Farewell Goodbye,alt rock,oh we kiss and we fight then we make up all ni...
...,...,...,...
62150,www.memory,country,i know your leavin i see the signs your gonna ...
62151,xOn Our Kneesx,alt rock,i dont want it i dont want it i dont want it i...
62152,¡Olé!,alt rock,olexs 12 oleall xs 12 the bouncing souls no on...
62153,¡Viva La Gloria!,alt rock,hey gloria are you standing close to the edge ...


### **Step 2: Featurizing**

In [6]:
# Constant
MIN_LYRIC_LEN = 400
TRAIN_FRACTION = 0.8
RANDOM_SEED = 150

# Only accept song with more than 400 characters
# Since shorter songs usually contain meaningless phrases
lyrics_by_song = lyrics_by_song[lyrics_by_song.lyric.str.len() > MIN_LYRIC_LEN]
lyrics_by_song = lyrics_by_song.reset_index(drop=True)

# Split all the lyric words
lyrics_by_song['words'] = lyrics_by_song['lyric'].str.split()

# Clean lyric words by lemantizing and stop words
lemmatizer = WordNetLemmatizer()
stop_words = stopwords.words('english')
stopwords_dict = Counter(stop_words)

def lemmatize_stop(lyric_words):
  # Lemantizing
  for i in range(len(lyric_words)):
    lyric_words[i] = lemmatizer.lemmatize(lyric_words[i])

  # Stop word
  lyric_words = [lemmatizer.lemmatize(word) for word in lyric_words if word not in stopwords_dict]  
  return lyric_words

#lyrics_by_song['words'].apply(lemmatize_stop) # Uncomment this line to use lemmatizing and stop words

# For debugging
lyrics_by_song

Unnamed: 0,song,ranker_genre,lyric,words
0,Bad Girl,alt rock,theres a rumor youre the talk of the town ther...,"[theres, a, rumor, youre, the, talk, of, the, ..."
1,Cheese Tomato Man,alt rock,cheese tomato flan crazy little man quiche lor...,"[cheese, tomato, flan, crazy, little, man, qui..."
2,Daydreamers Night,alt rock,shroud cloak veil smoke in camera masquerade i...,"[shroud, cloak, veil, smoke, in, camera, masqu..."
3,Do It To Me,alt rock,theres a kid who just found that love is not a...,"[theres, a, kid, who, just, found, that, love,..."
4,Hello Farewell Goodbye,alt rock,oh we kiss and we fight then we make up all ni...,"[oh, we, kiss, and, we, fight, then, we, make,..."
...,...,...,...,...
59904,www.memory,country,i know your leavin i see the signs your gonna ...,"[i, know, your, leavin, i, see, the, signs, yo..."
59905,www.memory,country,i know your leavin i see the signs your gonna ...,"[i, know, your, leavin, i, see, the, signs, yo..."
59906,¡Olé!,alt rock,olexs 12 oleall xs 12 the bouncing souls no on...,"[olexs, 12, oleall, xs, 12, the, bouncing, sou..."
59907,¡Viva La Gloria!,alt rock,hey gloria are you standing close to the edge ...,"[hey, gloria, are, you, standing, close, to, t..."


In [7]:
# Bigram featurizer
def bigram(lyric_words):
  return list(ngrams(lyric_words, 2))

lyrics_by_song['lyric_words'] = lyrics_by_song['words'].apply(bigram)
#lyrics_by_song['lyric_words'] = lyrics_by_song['words'] # Uncomment this line to use unigram instead

# For debugging
lyrics_by_song

Unnamed: 0,song,ranker_genre,lyric,words,lyric_words
0,Bad Girl,alt rock,theres a rumor youre the talk of the town ther...,"[theres, a, rumor, youre, the, talk, of, the, ...","[(theres, a), (a, rumor), (rumor, youre), (you..."
1,Cheese Tomato Man,alt rock,cheese tomato flan crazy little man quiche lor...,"[cheese, tomato, flan, crazy, little, man, qui...","[(cheese, tomato), (tomato, flan), (flan, craz..."
2,Daydreamers Night,alt rock,shroud cloak veil smoke in camera masquerade i...,"[shroud, cloak, veil, smoke, in, camera, masqu...","[(shroud, cloak), (cloak, veil), (veil, smoke)..."
3,Do It To Me,alt rock,theres a kid who just found that love is not a...,"[theres, a, kid, who, just, found, that, love,...","[(theres, a), (a, kid), (kid, who), (who, just..."
4,Hello Farewell Goodbye,alt rock,oh we kiss and we fight then we make up all ni...,"[oh, we, kiss, and, we, fight, then, we, make,...","[(oh, we), (we, kiss), (kiss, and), (and, we),..."
...,...,...,...,...,...
59904,www.memory,country,i know your leavin i see the signs your gonna ...,"[i, know, your, leavin, i, see, the, signs, yo...","[(i, know), (know, your), (your, leavin), (lea..."
59905,www.memory,country,i know your leavin i see the signs your gonna ...,"[i, know, your, leavin, i, see, the, signs, yo...","[(i, know), (know, your), (your, leavin), (lea..."
59906,¡Olé!,alt rock,olexs 12 oleall xs 12 the bouncing souls no on...,"[olexs, 12, oleall, xs, 12, the, bouncing, sou...","[(olexs, 12), (12, oleall), (oleall, xs), (xs,..."
59907,¡Viva La Gloria!,alt rock,hey gloria are you standing close to the edge ...,"[hey, gloria, are, you, standing, close, to, t...","[(hey, gloria), (gloria, are), (are, you), (yo..."


In [8]:
from sklearn.utils import shuffle

# List of genres for classifying
genres = list(set(lyrics_by_song["ranker_genre"]))
print(genres)

# Train and test set
train_dataframe = pd.DataFrame()
test_dataframe = pd.DataFrame()

for genre in genres:
  # Split train - test by ratio of 8 - 2
  lyrics_by_song_qualified = lyrics_by_song[(lyrics_by_song.ranker_genre==genre)]
  train_set = lyrics_by_song_qualified.sample(frac = TRAIN_FRACTION, random_state = RANDOM_SEED)
  test_set = lyrics_by_song_qualified.drop(train_set.index)
  train_dataframe = train_dataframe.append(train_set)
  test_dataframe = test_dataframe.append(test_set)

# Shuffle and reset train - test index
train_dataframe = shuffle(train_dataframe).reset_index(drop=True)
test_dataframe = shuffle(test_dataframe).reset_index(drop=True)

# For debugging
print(train_dataframe.head())
print(train_dataframe.shape)
print(test_dataframe.head())
print(test_dataframe.shape)

['hip hop', 'country', 'alt rock', 'rhythm and blues', 'pop']
                       song  ...                                        lyric_words
0  Everything That You Want  ...  [(in, the), (the, beginning), (beginning, the)...
1          Power Of My Love  ...  [(oh, break), (break, it), (it, burn), (burn, ...
2             So Much Betta  ...  [(tired, of), (of, being), (being, number), (n...
3         I Want You So Bad  ...  [(when, the), (the, wind), (wind, blows), (blo...
4      It's Christmas Again  ...  [(while, everyones), (everyones, dreamin), (dr...

[5 rows x 5 columns]
(47926, 5)
                    song  ...                                        lyric_words
0                  Maria  ...  [(she, moves), (moves, like), (like, she), (sh...
1  Angel from Montgomery  ...  [(i, am), (am, an), (an, old), (old, woman), (...
2             Good Times  ...  [(when, i), (i, ran), (ran, to), (to, the), (t...
3  Hawaiian Wedding Song  ...  [(this, is), (is, the), (the, moment), (moment

In [9]:
train_dataframe

Unnamed: 0,song,ranker_genre,lyric,words,lyric_words
0,Everything That You Want,country,in the beginning the nights were long under th...,"[in, the, beginning, the, nights, were, long, ...","[(in, the), (the, beginning), (beginning, the)..."
1,Power Of My Love,country,oh break it burn it drag it all around twist i...,"[oh, break, it, burn, it, drag, it, all, aroun...","[(oh, break), (break, it), (it, burn), (burn, ..."
2,So Much Betta,rhythm and blues,tired of being number 2 i can do what she cant...,"[tired, of, being, number, 2, i, can, do, what...","[(tired, of), (of, being), (being, number), (n..."
3,I Want You So Bad,alt rock,when the wind blows through your hair i want y...,"[when, the, wind, blows, through, your, hair, ...","[(when, the), (the, wind), (wind, blows), (blo..."
4,It's Christmas Again,pop,while everyones dreamin sweet christmas dreams...,"[while, everyones, dreamin, sweet, christmas, ...","[(while, everyones), (everyones, dreamin), (dr..."
...,...,...,...,...,...
47921,I Love a Man in a Uniform (dub),alt rock,time with my girl i spent it well i had to be ...,"[time, with, my, girl, i, spent, it, well, i, ...","[(time, with), (with, my), (my, girl), (girl, ..."
47922,About the Money,hip hop,bustin out the bando a nigga jewelry will melt...,"[bustin, out, the, bando, a, nigga, jewelry, w...","[(bustin, out), (out, the), (the, bando), (ban..."
47923,God Is Love,rhythm and blues,oh dont go and talk about my father god is my ...,"[oh, dont, go, and, talk, about, my, father, g...","[(oh, dont), (dont, go), (go, and), (and, talk..."
47924,Modern Man,alt rock,ive got nothing to say ive got nothing to do a...,"[ive, got, nothing, to, say, ive, got, nothing...","[(ive, got), (got, nothing), (nothing, to), (t..."


In [10]:
# For debugging
train_dataframe[(train_dataframe["ranker_genre"] == "alt rock")]

Unnamed: 0,song,ranker_genre,lyric,words,lyric_words
3,I Want You So Bad,alt rock,when the wind blows through your hair i want y...,"[when, the, wind, blows, through, your, hair, ...","[(when, the), (the, wind), (wind, blows), (blo..."
5,Get off the Phone,alt rock,10 years old anything goes all you ever knew w...,"[10, years, old, anything, goes, all, you, eve...","[(10, years), (years, old), (old, anything), (..."
14,Join Us For Pong,alt rock,its the gay nineties and you cant watch nothin...,"[its, the, gay, nineties, and, you, cant, watc...","[(its, the), (the, gay), (gay, nineties), (nin..."
15,I've Had It,alt rock,i cant go to work the boss is a jerk i aint go...,"[i, cant, go, to, work, the, boss, is, a, jerk...","[(i, cant), (cant, go), (go, to), (to, work), ..."
22,Beat Surrender,alt rock,beat surrender come on boy come on girl succum...,"[beat, surrender, come, on, boy, come, on, gir...","[(beat, surrender), (surrender, come), (come, ..."
...,...,...,...,...,...
47911,Til Wrong Feels Right,alt rock,i took a pounding from the radio today i heard...,"[i, took, a, pounding, from, the, radio, today...","[(i, took), (took, a), (a, pounding), (poundin..."
47916,91 Freestyle,alt rock,alone i feel abused its house music reminds me...,"[alone, i, feel, abused, its, house, music, re...","[(alone, i), (i, feel), (feel, abused), (abuse..."
47918,Tomorrow's Industry,alt rock,young kids in catholic schools elderly parents...,"[young, kids, in, catholic, schools, elderly, ...","[(young, kids), (kids, in), (in, catholic), (c..."
47921,I Love a Man in a Uniform (dub),alt rock,time with my girl i spent it well i had to be ...,"[time, with, my, girl, i, spent, it, well, i, ...","[(time, with), (with, my), (my, girl), (girl, ..."


In [11]:
# Create a word vocabulary - words must be unique
word_vocab = []
for words in train_dataframe['lyric_words']:
  word_vocab.extend(words)
word_vocab = set(word_vocab)

# For debugging
word_vocab

{('chains', 'you'),
 ('rockwilder', 'on'),
 ('the', 'talks'),
 ('out', 'skeezin'),
 ('good', 'die'),
 ('byebye', 'sweet'),
 ('miss', 'er'),
 ('formed', 'a'),
 ('touch', 'você'),
 ('makin', 'neither'),
 ('bad', 'most'),
 ('stifle', 'ya'),
 ('stairs', 'any'),
 ('it', 'tickle'),
 ('knockin', 'other'),
 ('machine', 'ah'),
 ('reminiscing', 'of'),
 ('deep', 'the'),
 ('bring', 'machine'),
 ('hoodoo', 'you'),
 ('michelangelo', 'picasso'),
 ('champ', 'cause'),
 ('rendered', 'i'),
 ('and', 'dudes'),
 ('heart', 'beguiled'),
 ('to', 'mommy'),
 ('curious', 'shorty'),
 ('colas', 'in'),
 ('trip', 'on'),
 ('mother', 'longing'),
 ('coulda', 'stood'),
 ('dick', 'calling'),
 ('grow', 'yes'),
 ('little', 'shops'),
 ('whoaoh', 'its'),
 ('chair', 'on'),
 ('bone', 'shallow'),
 ('food', 'hands'),
 ('lead', 'grinnin'),
 ('arranged', 'you'),
 ('class', 'put'),
 ('gabbanas', 'hey'),
 ('i', 'ingest'),
 ('remark', 'rings'),
 ('lotta', 'ass'),
 ('on', 'star'),
 ('bake', 'she'),
 ('flicks', 'give'),
 ('ventures', 't

In [12]:
genre_data = dict()
for genre in genres:
  words = []
  train_dataframe.loc[train_dataframe['ranker_genre'] == genre, 'lyric_words'].apply(words.extend)
  genre_data[genre] = {
      'word_occurence': Counter(words), # count all the word occurence in the genre
      'count': len(words),
      'prob': train_dataframe['ranker_genre'].value_counts()[genre] / train_dataframe.shape[0]
  }

# For debugging
genre_data

{'alt rock': {'count': 3023831,
  'prob': 0.28539832241372115,
  'word_occurence': Counter({('when', 'the'): 1274,
           ('the', 'wind'): 405,
           ('wind', 'blows'): 66,
           ('blows', 'through'): 6,
           ('through', 'your'): 305,
           ('your', 'hair'): 86,
           ('hair', 'i'): 37,
           ('i', 'want'): 2554,
           ('want', 'you'): 694,
           ('you', 'so'): 578,
           ('so', 'bad'): 240,
           ('bad', 'want'): 8,
           ('bad', 'i'): 65,
           ('i', 'see'): 1704,
           ('see', 'your'): 352,
           ('your', 'smile'): 67,
           ('smile', 'boy'): 2,
           ('boy', 'everywhere'): 2,
           ('everywhere', 'i'): 69,
           ('i', 'never'): 1415,
           ('never', 'thought'): 270,
           ('thought', 'this'): 34,
           ('this', 'could'): 139,
           ('could', 'happen'): 21,
           ('happen', 'to'): 80,
           ('to', 'me'): 2797,
           ('me', 'if'): 260,
           ('if', 'i

### **Step 3: Implementing Naive Bayes**

In [13]:
# Since by multiplying too many small probability, the result will become 0
# Therefore, we will use log2 function to avoid this situtation

# Parameters for Laplace smoothing
alpha = 1
K = len(word_vocab)

# Log prob of the lyric belong to a given genre = sum of prob of each word belong to that genre
def prob_log2_lyric_in_genre(lyric_words, genre):
  prob_lyric = 0
  for word in lyric_words:
    # for word not in the vocab, it would be much easier just to drop it (log prop = 0)
    if word in word_vocab:
      # using Laplace smoothing with number of features = number of word in vocab
      prob_lyric += np.log2(genre_data[genre]['word_occurence'][word] + alpha) - np.log2(genre_data[genre]['count'] + alpha * K)
  return prob_lyric

In [14]:
def genre_prediction(lyric_words):
  max = -1000000000000000000000
  for genre in genres:
    prob_log2 = prob_log2_lyric_in_genre(lyric_words, genre) + np.log2(genre_data[genre]['prob'])
    if prob_log2 > max:
      max = prob_log2
      genre_return = genre
  return genre_return

### **Step 4: Testing**

In [15]:
test_dataframe

Unnamed: 0,song,ranker_genre,lyric,words,lyric_words
0,Maria,alt rock,she moves like she dont care smooth as silk co...,"[she, moves, like, she, dont, care, smooth, as...","[(she, moves), (moves, like), (like, she), (sh..."
1,Angel from Montgomery,country,i am an old woman named after my mother my old...,"[i, am, an, old, woman, named, after, my, moth...","[(i, am), (am, an), (an, old), (old, woman), (..."
2,Good Times,country,when i ran to the store with a penny and when ...,"[when, i, ran, to, the, store, with, a, penny,...","[(when, i), (i, ran), (ran, to), (to, the), (t..."
3,Hawaiian Wedding Song,country,this is the moment ive waited for i can hear m...,"[this, is, the, moment, ive, waited, for, i, c...","[(this, is), (is, the), (the, moment), (moment..."
4,Full Disclosure,alt rock,oh i dont want one i dont want one i dont want...,"[oh, i, dont, want, one, i, dont, want, one, i...","[(oh, i), (i, dont), (dont, want), (want, one)..."
...,...,...,...,...,...
11978,I. Pink Toes,hip hop,rainbows sunshine everywhere i go rainbows sun...,"[rainbows, sunshine, everywhere, i, go, rainbo...","[(rainbows, sunshine), (sunshine, everywhere),..."
11979,Millyrokk,hip hop,no ceilings no ceilings no ceilings my ceiling...,"[no, ceilings, no, ceilings, no, ceilings, my,...","[(no, ceilings), (ceilings, no), (no, ceilings..."
11980,Still Crazy After All These Years,rhythm and blues,i met my old lover on the street last night sh...,"[i, met, my, old, lover, on, the, street, last...","[(i, met), (met, my), (my, old), (old, lover),..."
11981,Let's Do It For Love,pop,i love you with everything thats in me did yo...,"[i, love, you, with, everything, thats, in, me...","[(i, love), (love, you), (you, with), (with, e..."


In [16]:
test_dataframe['predicted'] = test_dataframe['lyric_words'].apply(genre_prediction)
test_dataframe

Unnamed: 0,song,ranker_genre,lyric,words,lyric_words,predicted
0,Maria,alt rock,she moves like she dont care smooth as silk co...,"[she, moves, like, she, dont, care, smooth, as...","[(she, moves), (moves, like), (like, she), (sh...",alt rock
1,Angel from Montgomery,country,i am an old woman named after my mother my old...,"[i, am, an, old, woman, named, after, my, moth...","[(i, am), (am, an), (an, old), (old, woman), (...",country
2,Good Times,country,when i ran to the store with a penny and when ...,"[when, i, ran, to, the, store, with, a, penny,...","[(when, i), (i, ran), (ran, to), (to, the), (t...",country
3,Hawaiian Wedding Song,country,this is the moment ive waited for i can hear m...,"[this, is, the, moment, ive, waited, for, i, c...","[(this, is), (is, the), (the, moment), (moment...",country
4,Full Disclosure,alt rock,oh i dont want one i dont want one i dont want...,"[oh, i, dont, want, one, i, dont, want, one, i...","[(oh, i), (i, dont), (dont, want), (want, one)...",alt rock
...,...,...,...,...,...,...
11978,I. Pink Toes,hip hop,rainbows sunshine everywhere i go rainbows sun...,"[rainbows, sunshine, everywhere, i, go, rainbo...","[(rainbows, sunshine), (sunshine, everywhere),...",country
11979,Millyrokk,hip hop,no ceilings no ceilings no ceilings my ceiling...,"[no, ceilings, no, ceilings, no, ceilings, my,...","[(no, ceilings), (ceilings, no), (no, ceilings...",hip hop
11980,Still Crazy After All These Years,rhythm and blues,i met my old lover on the street last night sh...,"[i, met, my, old, lover, on, the, street, last...","[(i, met), (met, my), (my, old), (old, lover),...",country
11981,Let's Do It For Love,pop,i love you with everything thats in me did yo...,"[i, love, you, with, everything, thats, in, me...","[(i, love), (love, you), (you, with), (with, e...",country


In [17]:
correct = (test_dataframe['predicted'] == test_dataframe['ranker_genre']).sum() / test_dataframe.shape[0] * 100
print(correct)
test_dataframe['predicted'] == test_dataframe['ranker_genre']

83.12609530167737


0         True
1         True
2         True
3         True
4         True
         ...  
11978    False
11979     True
11980    False
11981    False
11982     True
Length: 11983, dtype: bool