### Sentiment Analysis Model - Musical Instruments Review
### COMP262 - Group 3 - Phase 2
### Team Members

- Devanshi Shah (301175169)
- Hitesh Dharmadhikari (301150694)
- Jefil Tasna John Mohan (301149710)
- Nestor Romero (301133331)
- Shrikant Kale (301150258)

### 1. Dataset data exploration

In [1]:
import json
import pandas as pd
df = pd.read_json(r'Musical_Instruments_5.json',lines = True)

1. reviewerID - ID of the reviewer, e.g. A2SUAM1J3GNN3B
2. asin - ID of the product, e.g. 0000013714
3. reviewerName - name of the reviewer
4. helpful - helpfulness rating of the review, e.g. 2/3
5. reviewText - text of the review
6. overall - rating of the product
7. summary - summary of the review
8. unixReviewTime - time of the review (unix time)
9. reviewTime - time of the review (raw)

### 2. Text pre-processing

## Here one can select the data to be used in the following sections of code, either the complete data set or a sample!!!

In [2]:
## THE FOLLOWING CODE WILL REFERENCE THE study_data VARIABLE AS THE DATA SOURCE FOR 
## TEXT REPRESENTATION AND LEXICON ANALYSIS, LEAVE THE DESIRED CONTENTS UNCOMMENTED

# 1. USE FULL SET OF DATA

#study_data = df

# 2. USE A SAMPLE OF DATA WITH STRATIFIED SAMPLING
# Take 200 samples from each overall qualification
study_data = df.groupby('overall', group_keys=False).apply(lambda x: x.sample(200))

study_data['overall'].value_counts()

1    200
2    200
3    200
4    200
5    200
Name: overall, dtype: int64

In [3]:
# Create labels for "rating of the product"
def product_ratings(x):
    if x['overall'] == 5 or x['overall'] == 4:
        x['ratings'] = 'Positive'
    elif x['overall'] == 3:
        x['ratings'] = 'Neutral'
    elif x['overall'] == 2 or x['overall'] == 1:
        x['ratings'] = 'Negative'
    return x

study_data = study_data.apply(product_ratings, axis = 1)
study_data['ratings'].value_counts()

Negative    400
Positive    400
Neutral     200
Name: ratings, dtype: int64

In [4]:
study_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000 entries, 5555 to 9662
Data columns (total 10 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   reviewerID      1000 non-null   object
 1   asin            1000 non-null   object
 2   reviewerName    997 non-null    object
 3   helpful         1000 non-null   object
 4   reviewText      1000 non-null   object
 5   overall         1000 non-null   int64 
 6   summary         1000 non-null   object
 7   unixReviewTime  1000 non-null   int64 
 8   reviewTime      1000 non-null   object
 9   ratings         1000 non-null   object
dtypes: int64(2), object(8)
memory usage: 85.9+ KB


In [5]:
# 2c. Column selection
try:
    study_data.drop(['reviewerName','helpful','helpful_rating','reviewTextLength','unixReviewTime','reviewTime'], axis=1, inplace= True)
except KeyError as ke:
    print(f'Column removal not possible: {ke}')
    
study_data.head()

Column removal not possible: "['helpful_rating', 'reviewTextLength'] not found in axis"


Unnamed: 0,reviewerID,asin,reviewerName,helpful,reviewText,overall,summary,unixReviewTime,reviewTime,ratings
5555,A1QRF5KISDOKPA,B000SAC5PA,santos,"[6, 8]",ok when i saw the bag i was impressed nice loo...,1,hate it,1324598400,"12 23, 2011",Negative
7085,AA8SWH4Y5SN8H,B0025V1REU,S. Grider,"[0, 1]",I had a friend with one of these and he loves ...,1,Oh god,1356652800,"12 28, 2012",Negative
7670,A1M957IA3QNX7X,B0037MC786,Kyle D.,"[0, 1]",I got a few of these as a Christmas gift and t...,1,Doesn't cut it for me.,1391731200,"02 7, 2014",Negative
6695,A3DAURGJAL0Y0S,B001LJUVO4,"Marc LaBelle ""NevermoreFU""","[2, 2]",Doesn't wotk with V-Amp 3. Doubt it works at ...,1,Doesn't work,1295913600,"01 25, 2011",Negative
10015,A3K9OQPCI8UJE,B00AZUAORE,MQ,"[0, 1]",Amazon and TC Electronics will not tell you th...,1,Where's my power supply?,1402444800,"06 11, 2014",Negative


### Data Cleanup

In [6]:
# lowercasing
study_data['reviewText'] = study_data['reviewText'].str.lower()

# Remove punctuation
study_data['reviewText'] = study_data['reviewText'].str.replace('[^\w\s]','', regex=True)

study_data.head()

Unnamed: 0,reviewerID,asin,reviewerName,helpful,reviewText,overall,summary,unixReviewTime,reviewTime,ratings
5555,A1QRF5KISDOKPA,B000SAC5PA,santos,"[6, 8]",ok when i saw the bag i was impressed nice loo...,1,hate it,1324598400,"12 23, 2011",Negative
7085,AA8SWH4Y5SN8H,B0025V1REU,S. Grider,"[0, 1]",i had a friend with one of these and he loves ...,1,Oh god,1356652800,"12 28, 2012",Negative
7670,A1M957IA3QNX7X,B0037MC786,Kyle D.,"[0, 1]",i got a few of these as a christmas gift and t...,1,Doesn't cut it for me.,1391731200,"02 7, 2014",Negative
6695,A3DAURGJAL0Y0S,B001LJUVO4,"Marc LaBelle ""NevermoreFU""","[2, 2]",doesnt wotk with vamp 3 doubt it works at all...,1,Doesn't work,1295913600,"01 25, 2011",Negative
10015,A3K9OQPCI8UJE,B00AZUAORE,MQ,"[0, 1]",amazon and tc electronics will not tell you th...,1,Where's my power supply?,1402444800,"06 11, 2014",Negative


In [7]:
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', 50)
study_data[['reviewText']].head(10)

Unnamed: 0,reviewText
5555,ok when i saw the bag i was impressed nice looking and perfect style so i decided to buy it but the bag just arrive home 122311 and is way different from the one on the picture is a black bag with one ugly red line on the middle and it looks like is not going to protect my guitar i am really upset i would not returned because i dont feel like fighting with no body but they shouldnt do that to the costumer we pay how much they asking so a least we should get what we pay for
7085,i had a friend with one of these and he loves it i got one on his recommendation and when i put the batteries in it it started smoking it did work for awhile i pulled the batteries out and tried to ac adapter and it smoked again this time it did not work i am one of those people who gives an item one try and if it does not work holds a grudge i would not advise anybody to get this
7670,i got a few of these as a christmas gift and theyre just bad i put them on my pedalboard where i previously had longer cables and these just didnt stay in the jack i have no other problems with these jacks with any other cable but these if i play loudly and my room vibrates a bit the cable even works its way out of the jack a little bit and cuts my signal awful
6695,doesnt wotk with vamp 3 doubt it works at all red light always stays on and ive never seen the green light
10015,amazon and tc electronics will not tell you that you need a 9v power supply i didnt see any warning or advise in the point of purchase nor in the package this is an unfair practice from amazon and tc electronics
7656,the material used is fine the looks are fine the length is horrible this is too short for most adults that play guitar it felt like i was holding my guitar around my neck instead of off my shoulders i attached it by the longest way possible and it was well short of what i could use to me this was a worthless purchase i just tucked it into a drawer and it will sit there and not be used again
8493,my reviews for this have changed a few times but no more when i first got it i liked the sound though it did sound a little cold and digital and this is from someone wanting to record shoegazedreampop which by its nature is full of cold tones reverbchorusdelay shimmering a la cocteau twins chapterhouse lush so at first i thought it was great then the fuse software was even cooler all this customizing the sound though it did sound noisy and needed the compressor to take some of the noise awayafter all of that i finally realized i was trying to ignore the fizzcracklehair its called a few things go to the fender forums and look it up or search it on yahoogooglethe thing happening with this amp i think is a few things fender guitars sound best through it which is silly if you dont own a fender and i dont and refuse to get a squire as that body is so played out i wanted anything but a strat body unless i have a grand to pay for an older one im not going to get a rinky dink one and i would only get one after i got like 5 other dream guitars firstso everyone has different problems with pickup quality and guitar quality also playing into a modeling amp is like playing into a computer that may be fine for some younger folks but to a lot of people ive read the complaints i was trying not to realize as i didnt feel like having to pay for shipping to return it but i didand i am so glad i did i almost got a peavey vypyr and i am glad i didnt this amp to be fair is really close to being really cool but that sound problem which i read on fenders own forumwebsite fender acknowledges it exists but does nothing to fix it in future models sorry but thats some big company bsive been reading of people saying it sounds cold and lifeless and all that and tube this and tube that and i was like what music snobs and what have you but i realized that im a music snob i love the music i listen to more than anything and feel like the music i create should have a life to it as well so i returned the mustang 2 and almost got a marshall mg 30 watt cause i wanted an amp with a clean sound that didnt have that crackle sound but thankfully i caught myself before that return as after that it would have been finalthe decision i did make i think will be great for me as id rather start with a good real tone then get the effects i need i kept reading reviews of these teenagers that hadnt played for more than a year or two or so saying throw away your pedals you wont need them and i was going to get pedals but dont need to now thats kind of the danger of these amps they end up making everyone sound the same no matter what tweaking i was doing on the mustang it sounded the same but in a different costumei think this amp and modeling amps are best for beginning guitarists and ive seen some old timers getting an amp for home use after years maybe thats for them but i needed something to record with and practice and get the best most real tone possible so that means tube ampi think if i played this at a gc i wouldnt of gotten it and i just tried to make myself like it
4678,this strap is uncomfortable and too flimsy to feel good also doesnt fit an instrument where theres no place to hook it i returned mine as it just didnt fit well for my uke or mandolin
6222,i had some issues with my 18watt all tube amp so i experimented and replaced all the tubes issues not only continued but worsened hum noise volume fade unwanted distortion at low volumes i eventually put back the original el84s and that didnt help once i removed the tungsol 12ax7 preamp tubes and replaced the hum stopped sound quality of the amp was consistent quiet and back to par my experience with the tungsol tubes was very very negative they caused me to go out and buy another amp i had no idea that brand new tubes could sound so badly that fast ill never buy this brand again
3792,i bought this becausehey a wireless mic for 13 why not i might be able to use this in a public address systemthe connections all work and it makes sounds but the static is unbearable no matter how much i fiddle with different the adjustment screw in the receiver nothing works the audio is always clipped dont even think of using this for music


In [8]:
'''Lemmatization'''
import nltk
from nltk.stem import WordNetLemmatizer
nltk.download('omw-1.4')
lemmatizer = WordNetLemmatizer()
study_data['reviewText'] = study_data['reviewText'].apply(lambda x : ' '.join([lemmatizer.lemmatize(word) for word in x.split()]))
study_data.head()

[nltk_data] Downloading package omw-1.4 to
[nltk_data]     C:\Users\romer\AppData\Roaming\nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


Unnamed: 0,reviewerID,asin,reviewerName,helpful,reviewText,overall,summary,unixReviewTime,reviewTime,ratings
5555,A1QRF5KISDOKPA,B000SAC5PA,santos,"[6, 8]",ok when i saw the bag i wa impressed nice looking and perfect style so i decided to buy it but the bag just arrive home 122311 and is way different from the one on the picture is a black bag with one ugly red line on the middle and it look like is not going to protect my guitar i am really upset i would not returned because i dont feel like fighting with no body but they shouldnt do that to the costumer we pay how much they asking so a least we should get what we pay for,1,hate it,1324598400,"12 23, 2011",Negative
7085,AA8SWH4Y5SN8H,B0025V1REU,S. Grider,"[0, 1]",i had a friend with one of these and he love it i got one on his recommendation and when i put the battery in it it started smoking it did work for awhile i pulled the battery out and tried to ac adapter and it smoked again this time it did not work i am one of those people who give an item one try and if it doe not work hold a grudge i would not advise anybody to get this,1,Oh god,1356652800,"12 28, 2012",Negative
7670,A1M957IA3QNX7X,B0037MC786,Kyle D.,"[0, 1]",i got a few of these a a christmas gift and theyre just bad i put them on my pedalboard where i previously had longer cable and these just didnt stay in the jack i have no other problem with these jack with any other cable but these if i play loudly and my room vibrates a bit the cable even work it way out of the jack a little bit and cut my signal awful,1,Doesn't cut it for me.,1391731200,"02 7, 2014",Negative
6695,A3DAURGJAL0Y0S,B001LJUVO4,"Marc LaBelle ""NevermoreFU""","[2, 2]",doesnt wotk with vamp 3 doubt it work at all red light always stay on and ive never seen the green light,1,Doesn't work,1295913600,"01 25, 2011",Negative
10015,A3K9OQPCI8UJE,B00AZUAORE,MQ,"[0, 1]",amazon and tc electronics will not tell you that you need a 9v power supply i didnt see any warning or advise in the point of purchase nor in the package this is an unfair practice from amazon and tc electronics,1,Where's my power supply?,1402444800,"06 11, 2014",Negative


In [9]:
from sklearn.utils import shuffle
study_data = shuffle(study_data)

In [10]:
## Create training and test datasets

X = study_data.iloc[:,:-1]
y = study_data.iloc[:,-1]

# Split data 70-30
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
        X, y,stratify=y, test_size=0.3)

In [11]:
y_train.value_counts()

Positive    280
Negative    280
Neutral     140
Name: ratings, dtype: int64

In [12]:
y_test.value_counts()

Negative    120
Positive    120
Neutral      60
Name: ratings, dtype: int64

In [13]:
study_data['ratings'].value_counts()

Negative    400
Positive    400
Neutral     200
Name: ratings, dtype: int64

In [14]:
X_train

Unnamed: 0,reviewerID,asin,reviewerName,helpful,reviewText,overall,summary,unixReviewTime,reviewTime
9151,A3AOPVQ7EZHTWA,B005FKF1PY,"frankp93 ""frankp93""","[20, 22]",the n mini chromatic headstock tuner quickly recognized twelve pitch across a multioctave range of electric bass guitar and mandolin the accidental appear a sharp or flat toggled using the alternate function of the frequency buttonthe tuner view window is slightly larger than a chiclet but brightness and resolution are fine for dimlylit space one thing thats le than ideal is the charcoal color of the tuner itself if youve ever dropped stuff on dark stage you know what i mean id wager the back of most headstock skew to lighter tone than darker so if discretion is what the manufacturer is emphasizing a lighter color may work betterthe note appear red when out of tune their color change to green a a rough marker you then continue fine tuning using the vertical bar that appear on both side of the note theyve gone to length to make the tuner stagefriendly but id much prefer the green color mean fine tuned and done that way i can look from a distance i should only have to bother with the bar if im struggling to home in and in my experience thats the exception rather than the rule and usually caused by new string slack under the bridge pin a slipping tuner etc midlife string tend to stretch and loosen predictably so long a you dont use a lot of altered tuningsthe n is designed to be clipped discretely to your instrument headstock and viewed from the back im a lefty and unlike some other tuner i appreciate the swiveling base that allows me to view the screen rightside upthe viewer doesnt extend far above the headstock surface and felt a bit cramped on my l4 with large new yorkerstyle tuner on headstock with more bare wood like a strat or even my astyle mandolin it wa easier to find a comfortable spot for viewing that didnt obstruct my hand while turning keysat first i didnt like the ratchet clamp that attache to the headstock i could attach the tuner with one hand by squeezing the clamp tight but it felt like i needed two to remove it one to press the release and another to pull the clamp out after a little practice however i found this wasnt the case and the touch required isnt difficult at allthe tuner constructed in a highimpact composite similar to those parachutestyle clamp you see on backpack it feel durable and very unlikely to break although i didnt try stomping on iti wa impressed the battery come uninstalled in a separate small bag the battery door ha a concave slot for easy opening with a pick or a coin the tuner ha an autoshutoff function to preserve battery lifethe power and frequency button are just nib really id prefer they be larger with more surface and a bit of click to them the tuner can be calibrated in integer step from 430 to 450 if 440 doesnt work for youone advantage of a chromatic tuner is you can tune to a chord and arent limited to usually diatonic open string for myself i only use tuner to set the pitch of one string usually the low a for guitar or bass and low g for mandolin in my opinion tuning a fretted instrument completely by open string is only slightly better than tuning it completely with harmonic and neither doe the job well intonation is subject to a bunch of variable string action and thickness neck relief fret condition nut and bridge conditiononce i have a single string to pitch i match fretted and open string unison and octave after a while you learn which combination work best for each instrumenti used to love staring at those pignosesized strobatuners sitting on amp and never faulted a musician who took their time to tune accurately i wish many more did im not quite sure how we went from we tune because we care to im so embarrassed to be standing here tuning i hope no one notice but the n mini headstock tuner questionable design choice and all doe what it say with a minimum of fuss,3,"Detects Pitch Well, But Could Be Friendlier to Use",1319587200,"10 26, 2011"
7955,A1EMD22HC94SI1,B003HGFRO8,"D. Dulin ""Dave D""","[1, 2]",good but it not fit properly defenley better then bone thogh i did have to sand it to try to make it fit good and i did put it on my epiphone lp it doe give a nice blue tone beause it not bone but i had to take it off case it just did not fit right,4,good,1354492800,"12 3, 2012"
1562,A1EWYZI4ZHU9XZ,B0002E2G24,Mark J,"[3, 3]",this switch doe work but you should look before you leap the switch travel is greater than it is for a standard 3way switch and if you use the common fender replacement control plate or one that wa only intended for a 3way switch it may not fit you could fairly easily alter the plate but if you want the cleanest look possible you may want to investigate a new control plate,3,Make sure your control plate fits,1358467200,"01 18, 2013"
5392,A3FA334BIFOMRD,B000RKVH0K,"K. Meagher ""MFT""","[2, 2]",yes this ie every bit a good a the shure sm 58 but you dont get the cool pouch the 57 version is also like it shure counterpart great sound and great deal yes i am a musician too,5,It's an SM 58 for 1/5th the price!!!,1356220800,"12 23, 2012"
5256,AKHWZ3S1UVZAO,B000P5LVSK,Hagen LeBray,"[0, 0]",nice pick a bit small for my preference great for picking kind of harsh for strumming quality seems topnotch some guitarist will love these,4,Nice Picks,1396656000,"04 5, 2014"
...,...,...,...,...,...,...,...,...,...
9777,A2053ZJUGCKUA5,B0087UPSLQ,LARRY,"[0, 0]",update received 6 of these 5 are ok but one ha a bad weld joint at the top retainer it wa held by only one small tac weldi got what i paid for cheap price and cheap product no excuse to make this kind of low quality guitar stand putting a 500 guitar on this is asking for guitar damage when the stand fall apart not worth the time to send it back to amazon one star only for this product good chance that you will get a crappy one if you order this le than one star for this piece of,1,"LIGHTWEIGHT,ATTRACTIVE, INEXPENSIVE",1366243200,"04 18, 2013"
5325,A3CSWB0L9ZLD94,B000PO30QM,"Filipe N. Marques ""fnmphoto""","[0, 1]",5 foot is a weird length for a cable it really only ha one use when youre sitting right next to your practice amp just dont wander too far off this cable is perfect for a kid who is going to hisher guitar lesson and ha to plug ini bought two of these for the home studio to jack in a guitar with a pedal or two in between i feel chained to my workstation just my experience get a 10 foot and be freehope it help,3,Too long and Too short,1402272000,"06 9, 2014"
1035,AA169UZEJYAV1,B0002D0E8S,Luckystar,"[0, 0]",i like purple and it is purple the strap is sturdy it work what else need to be said it arrived timely is decent quality and a nice color,4,Strap for the Guitar,1385251200,"11 24, 2013"
8077,A1FCX548TD6DLP,B003QTM9O2,Cooper the Beagle,"[0, 0]",at the time i bought wa 16 mine arrived broken so it had gone back reality is that it is not that good regardless of mine being broken not upset about that stuff happens it is just not a good stand light construcion unstable topples easily and a likely candidate to be broken if dropped the protective materails on the stand to keep product from scratching your instrument are the soft shiny runner like materail and it stick to my instrument neck so i pick up uke and i get a stand until the stickiness is seprated by gravity happened every time yes i tried out the stand even though broken and it proved to be a bad choice back it ha gone 5 star to amazon return policy thouh,1,"Poorly Made, Flimsy. Buy Another Product",1389225600,"01 9, 2014"


### 3. Text Representation

### 3.1 TF-IDF

In [15]:
#TF-IDF

#Import TfIdfVectorizer from the scikit-learn library
from sklearn.feature_extraction.text import TfidfVectorizer

#Define a TF-IDF Vectorizer Object. Remove all english stopwords
tfidf = TfidfVectorizer()

#Construct the required TF-IDF matrix by applying the fit_transform method on the overview feature
tfidf_matrix = tfidf.fit_transform(X_train['reviewText'])

#Output the shape of tfidf_matrix
tfidf_matrix.shape

(700, 6512)

In [16]:
tfidf_test = tfidf.transform(X_test['reviewText'])

In [17]:
tfidf_test.shape

(300, 6512)

In [18]:
#Bag of words for Text Representation

from sklearn.feature_extraction.text import CountVectorizer

CountVec = CountVectorizer()

# Transform
Count_data = CountVec.fit_transform(X_train['reviewText'])
Count_data.shape



(700, 6512)

In [19]:
from sklearn.preprocessing import LabelEncoder

enc = LabelEncoder()
actual_labels=enc.fit_transform(y_train)
actual_labels_test = enc.transform(y_test)

# SUPPORT VECTOR MACHINE

In [20]:
from sklearn.svm import SVC

In [21]:
svc_linear = SVC(kernel='linear', C=1, decision_function_shape='ovo').fit(tfidf_matrix, actual_labels)
svc_rbf = SVC(kernel='rbf', gamma=1, C=1, decision_function_shape='ovo').fit(tfidf_matrix, actual_labels)
svc_poly = SVC(kernel='poly', degree=3, C=1, decision_function_shape='ovo').fit(tfidf_matrix, actual_labels)
svc_sig = SVC(kernel='sigmoid', C=1, decision_function_shape='ovo').fit(tfidf_matrix, actual_labels)

In [22]:
pred_linear = svc_linear.predict(tfidf_test)
pred_rbf = svc_rbf.predict(tfidf_test)
pred_poly = svc_poly.predict(tfidf_test)
pred_sig = svc_sig.predict(tfidf_test)

In [23]:
pred_sig.shape

(300,)

In [24]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

# SVM LINEAR KERNAL

In [25]:
acc_linear = accuracy_score(actual_labels_test, pred_linear)
precision_linear = precision_score(actual_labels_test, pred_linear, average='weighted')
recall_linear = recall_score(actual_labels_test, pred_linear, average= 'weighted')
f1_linear = f1_score(actual_labels_test, pred_linear, average= 'weighted')
cm_linear = confusion_matrix(actual_labels_test, pred_linear)

print(f'Accuracy using Linear SVM: {acc_linear}')
print(f'Precision using Linear SVM: {precision_linear}')
print(f'Recall using Linear SVM: {recall_linear}')
print(f'F1 using Linear SVM: {f1_linear}')
print(f'Confusion Matrix using Linear SVM: {cm_linear}')

Accuracy using Linear SVM: 0.5533333333333333
Precision using Linear SVM: 0.5258288253911249
Recall using Linear SVM: 0.5533333333333333
F1 using Linear SVM: 0.5160978740891364
Confusion Matrix using Linear SVM: [[83  1 36]
 [31  5 24]
 [35  7 78]]


# SVM RBF KERNEL

In [26]:
acc_rbf = accuracy_score(actual_labels_test, pred_rbf)
precision_rbf = precision_score(actual_labels_test, pred_rbf, average='weighted')
recall_rbf = recall_score(actual_labels_test, pred_rbf, average= 'weighted')
f1_rbf = f1_score(actual_labels_test, pred_rbf, average= 'weighted')
cm_rbf = confusion_matrix(actual_labels_test, pred_rbf)

print(f'Accuracy using rbf SVM: {acc_rbf}')
print(f'Precision using rbf SVM: {precision_rbf}')
print(f'Recall using rbf SVM: {recall_rbf}')
print(f'F1 using rbf SVM: {f1_rbf}')
print(f'Confusion Matrix using rbf SVM: {cm_rbf}')

Accuracy using rbf SVM: 0.58
Precision using rbf SVM: 0.46451383011892566
Recall using rbf SVM: 0.58
F1 using rbf SVM: 0.5155948442711836
Confusion Matrix using rbf SVM: [[89  0 31]
 [33  0 27]
 [35  0 85]]


  _warn_prf(average, modifier, msg_start, len(result))


# SVM POLY KERNEL

In [27]:
acc_poly = accuracy_score(actual_labels_test, pred_poly)
precision_poly = precision_score(actual_labels_test, pred_poly, average='weighted')
recall_poly = recall_score(actual_labels_test, pred_poly, average= 'weighted')
f1_poly = f1_score(actual_labels_test, pred_poly, average= 'weighted')
cm_poly = confusion_matrix(actual_labels_test, pred_poly)

print(f'Accuracy using poly SVM: {acc_poly}')
print(f'Precision using poly SVM: {precision_poly}')
print(f'Recall using poly SVM: {recall_poly}')
print(f'F1 using poly SVM: {f1_poly}')
print(f'Confusion Matrix using poly SVM: {cm_poly}')

Accuracy using poly SVM: 0.5333333333333333
Precision using poly SVM: 0.4272447477585976
Recall using poly SVM: 0.5333333333333333
F1 using poly SVM: 0.47400815721171674
Confusion Matrix using poly SVM: [[77  0 43]
 [27  0 33]
 [37  0 83]]


  _warn_prf(average, modifier, msg_start, len(result))


# SVM SIGMOID KERNEL

In [28]:
acc_sig = accuracy_score(actual_labels_test, pred_sig)
precision_sig = precision_score(actual_labels_test, pred_sig, average='weighted')
recall_sig = recall_score(actual_labels_test, pred_sig, average= 'weighted')
f1_sig = f1_score(actual_labels_test, pred_sig, average= 'weighted')
cm_sig = confusion_matrix(actual_labels_test, pred_sig)

print(f'Accuracy using sigmoid SVM: {acc_sig}')
print(f'Precision using sigmoid SVM: {precision_sig}')
print(f'Recall using sigmoid SVM: {recall_sig}')
print(f'F1 using sigmoid SVM: {f1_sig}')
print(f'Confusion Matrix using sigmoid SVM: {cm_sig}')

Accuracy using sigmoid SVM: 0.5566666666666666
Precision using sigmoid SVM: 0.5020572366877786
Recall using sigmoid SVM: 0.5566666666666666
F1 using sigmoid SVM: 0.5080010142704526
Confusion Matrix using sigmoid SVM: [[83  1 36]
 [31  2 27]
 [33  5 82]]


# LOGISTIC REGRESSION

In [29]:
from sklearn.linear_model import LogisticRegression

lg_sag = LogisticRegression(solver='sag', class_weight= {0:2 , 1:3, 2:2 }, max_iter=1400).fit(tfidf_matrix, actual_labels)
pred_sag = lg_sag.predict(tfidf_test)

In [30]:
acc_sag = accuracy_score(actual_labels_test, pred_sag)
precision_sag = precision_score(actual_labels_test, pred_sag, average='weighted')
recall_sag = recall_score(actual_labels_test, pred_sag, average= 'weighted')
f1_sag = f1_score(actual_labels_test, pred_sag, average= 'weighted')
cm_sag = confusion_matrix(actual_labels_test, pred_sag)

print(f'Accuracy using Logistic Regression: {acc_sag}')
print(f'Precision using Logistic Regression: {precision_sag}')
print(f'Recall using Logistic Regression: {recall_sag}')
print(f'F1 using Logistic Regression: {f1_sag}')
print(f'Confusion Matrix using Logistic Regression: {cm_sag}')

Accuracy using Logistic Regression: 0.56
Precision using Logistic Regression: 0.5396382428940568
Recall using Logistic Regression: 0.56
F1 using Logistic Regression: 0.5453082919914953
Confusion Matrix using Logistic Regression: [[78  9 33]
 [24 12 24]
 [27 15 78]]


# NAIVE BAYES 

In [31]:
from sklearn.naive_bayes import MultinomialNB

In [32]:
nb = MultinomialNB(alpha=0.1).fit(tfidf_matrix, actual_labels)
pred_nb = nb.predict(tfidf_test)

In [33]:
acc_nb = accuracy_score(actual_labels_test, pred_nb)
precision_nb = precision_score(actual_labels_test, pred_nb, average='weighted')
recall_nb = recall_score(actual_labels_test, pred_nb, average= 'weighted')
f1_nb = f1_score(actual_labels_test, pred_nb, average= 'weighted')
cm_nb = confusion_matrix(actual_labels_test, pred_nb)

print(f'Accuracy using Naïve Bayes: {acc_nb}')
print(f'Precision using Naïve Bayes: {precision_nb}')
print(f'Recall using Naïve Bayes: {recall_nb}')
print(f'F1 using Naïve Bayes: {f1_nb}')
print(f'Confusion Matrix using Naïve Bayes: {cm_nb}')

Accuracy using Naïve Bayes: 0.5933333333333334
Precision using Naïve Bayes: 0.4771882574737409
Recall using Naïve Bayes: 0.5933333333333334
F1 using Naïve Bayes: 0.5285371702637889
Confusion Matrix using Naïve Bayes: [[87  1 32]
 [25  0 35]
 [29  0 91]]


# GRADIENT BOOSTING 

In [34]:
from sklearn.ensemble import GradientBoostingClassifier

In [35]:
gb_clf2 = GradientBoostingClassifier(n_estimators=170, learning_rate=0.1, max_features=4, max_depth=2, random_state=0)

In [36]:
gb_clf2.fit(tfidf_matrix, actual_labels)

GradientBoostingClassifier(max_depth=2, max_features=4, n_estimators=170,
                           random_state=0)

In [37]:
pred_gb = gb_clf2.predict(tfidf_test)

In [38]:
acc_gb = accuracy_score(actual_labels_test, pred_gb)
precision_gb = precision_score(actual_labels_test, pred_gb, average='weighted')
recall_gb = recall_score(actual_labels_test, pred_gb, average='weighted')
f1_gb = f1_score(actual_labels_test, pred_gb, average='weighted')
cm_gb = confusion_matrix(actual_labels_test, pred_gb)

print(f'Accuracy using Gradient Boosting: {acc_gb}')
print(f'Precision using Gradient Boosting: {precision_gb}')
print(f'Recall using Gradient Boosting: {recall_gb}')
print(f'F1 using Gradient Boosting: {f1_gb}')
print(f'Confusion Matrix using Gradient Boosting: {cm_gb}')

Accuracy using Gradient Boosting: 0.4866666666666667
Precision using Gradient Boosting: 0.5890378476972185
Recall using Gradient Boosting: 0.4866666666666667
F1 using Gradient Boosting: 0.43562814287108875
Confusion Matrix using Gradient Boosting: [[64  0 56]
 [26  1 33]
 [39  0 81]]


# GRID SEARCH - MODELS

In [54]:
import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import KFold

In [55]:
# SVM Combinations
svc_linear_search = SVC(random_state=0)
svc_params = {
    # default 1
    'C' : np.arange(0.1,1.1,0.1),
    'kernel' : ['linear', 'rbf', 'poly', 'sigmoid'],
    'degree' : np.arange(1,4,1),
    'decision_function_shape' : ['ovo','ovr']
}

# Logistic Regression
# class_weight????
lrg_search = LogisticRegression(class_weight={0:2,1:3,2:2}, random_state=0)
lrg_params = {
    'solver' : ['lbfgs', 'sag', 'saga'],
    'C' : np.arange(0.1,1.1,0.1),
    'max_iter' : np.arange(1000,2000,200)
}

# Naive Bayes
# No parameters

# Gradient Boost
# max_features????
gb_search = GradientBoostingClassifier(max_features=4, random_state=0)
gb_params = {
    "learning_rate" : np.arange(0.01,0.1,0.01),
    "n_estimators" : np.arange(100,200,10),
    "max_depth" : np.arange(2,5,1),
}

In [56]:
# Build collection of models to test
models = { 
    'svc_linear' : {'model': svc_linear_search, 'params' : svc_params},
    "lrg" : {'model': lrg_search, 'params' : lrg_params},
    "gb" : {'model': gb_search, 'params' : gb_params}
}

In [58]:
# GridSearch >> find best hyperparameters
# Reduced splits for testing
# cv = KFold(n_splits=5, shuffle=True, random_state=31)
cv = None

# training >> tfidf_matrix, actual_labels
# testing >> tfidf_test, actual_labels_test

print('\n\n#GRID SEARCH')
for model_id in models:
    
    classifier = models[model_id]
    print('\nGridSearch Analysis', classifier['model'])
    gs = GridSearchCV(classifier['model'], classifier['params'], 
                      cv=cv, n_jobs=-1, verbose=2)
    gs.fit(tfidf_matrix, actual_labels)
    gs_pred = gs.predict(tfidf_test)
    
    # Scores
    accuracy = accuracy_score(actual_labels_test, gs_pred)
    precision = precision_score(actual_labels_test, gs_pred, average='weighted')
    recall = recall_score(actual_labels_test, gs_pred, average='weighted')
    f1 = f1_score(actual_labels_test, gs_pred, average='weighted')
    cm = confusion_matrix(actual_labels_test, gs_pred)
    
    print(f'Accuracy: {accuracy}')
    print(f'Precision: {precision}')
    print(f'Recall: {recall}')
    print(f'F1: {f1}')
    print(f'Confusion Matrix: \n{cm}\n')
    
    print('Best Parameters')
    print(gs.best_params_)





#GRID SEARCH

GridSearch Analysis SVC(random_state=0)
Fitting 5 folds for each of 240 candidates, totalling 1200 fits


  _warn_prf(average, modifier, msg_start, len(result))


Accuracy: 0.58
Precision: 0.46451383011892566
Recall: 0.58
F1: 0.5155948442711836
Confusion Matrix: 
[[89  0 31]
 [33  0 27]
 [35  0 85]]

Best Parameters
{'C': 1.0, 'decision_function_shape': 'ovo', 'degree': 1, 'kernel': 'rbf'}

GridSearch Analysis LogisticRegression(class_weight={0: 2, 1: 3, 2: 2}, random_state=0)
Fitting 5 folds for each of 150 candidates, totalling 750 fits
Accuracy: 0.56
Precision: 0.5348405033655254
Recall: 0.56
F1: 0.5401383420822398
Confusion Matrix: 
[[79  8 33]
 [26 10 24]
 [29 12 79]]

Best Parameters
{'C': 0.5, 'max_iter': 1000, 'solver': 'lbfgs'}

GridSearch Analysis GradientBoostingClassifier(max_features=4, random_state=0)
Fitting 5 folds for each of 270 candidates, totalling 1350 fits
Accuracy: 0.48333333333333334
Precision: 0.45483831530044627
Recall: 0.48333333333333334
F1: 0.43441801445475664
Confusion Matrix: 
[[65  2 53]
 [27  1 32]
 [41  0 79]]

Best Parameters
{'learning_rate': 0.09, 'max_depth': 3, 'n_estimators': 190}
