## About Dataset
Context
This is a small subset of dataset of Book reviews from Amazon Kindle Store category.

Content
5-core dataset of product reviews from Amazon Kindle Store category from May 1996 - July 2014. Contains total of 982619 entries. Each reviewer has at least 5 reviews and each product has at least 5 reviews in this dataset.
Columns

- asin - ID of the product, like B000FA64PK
- helpful - helpfulness rating of the review - example: 2/3.
- overall - rating of the product.
- reviewText - text of the review (heading).
- reviewTime - time of the review (raw).
- reviewerID - ID of the reviewer, like A3SPTOKDG7WBLN
- reviewerName - name of the reviewer.
- summary - summary of the review (description).
- unixReviewTime - unix timestamp.

Acknowledgements
This dataset is taken from Amazon product data, Julian McAuley, UCSD website. http://jmcauley.ucsd.edu/data/amazon/

License to the data files belong to them.

Inspiration
- Sentiment analysis on reviews.
- Understanding how people rate usefulness of a review/ What factors influence helpfulness of a review.
- Fake reviews/ outliers.
- Best rated product IDs, or similarity between products based on reviews alone (not the best idea ikr).
- Any other interesting analysis

#### Best Practises
1. Preprocessing And Cleaning
2. Train Test Split
3. BOW,TFIDF,Word2vec
4. Train ML algorithms

In [59]:
import pandas as pd
import numpy as np

df = pd.read_csv('./all_kindle_review.csv')
df.head()

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,asin,helpful,rating,reviewText,reviewTime,reviewerID,reviewerName,summary,unixReviewTime
0,0,11539,B0033UV8HI,"[8, 10]",3,"Jace Rankin may be short, but he's nothing to ...","09 2, 2010",A3HHXRELK8BHQG,Ridley,Entertaining But Average,1283385600
1,1,5957,B002HJV4DE,"[1, 1]",5,Great short read. I didn't want to put it dow...,"10 8, 2013",A2RGNZ0TRF578I,Holly Butler,Terrific menage scenes!,1381190400
2,2,9146,B002ZG96I4,"[0, 0]",3,I'll start by saying this is the first of four...,"04 11, 2014",A3S0H2HV6U1I7F,Merissa,Snapdragon Alley,1397174400
3,3,7038,B002QHWOEU,"[1, 3]",3,Aggie is Angela Lansbury who carries pocketboo...,"07 5, 2014",AC4OQW3GZ919J,Cleargrace,very light murder cozy,1404518400
4,4,1776,B001A06VJ8,"[0, 1]",4,I did not expect this type of book to be in li...,"12 31, 2012",A3C9V987IQHOQD,Rjostler,Book,1356912000


In [60]:
df=df[['reviewText','rating']]#only these are required
df.head()

Unnamed: 0,reviewText,rating
0,"Jace Rankin may be short, but he's nothing to ...",3
1,Great short read. I didn't want to put it dow...,5
2,I'll start by saying this is the first of four...,3
3,Aggie is Angela Lansbury who carries pocketboo...,3
4,I did not expect this type of book to be in li...,4


In [4]:
df.shape

(12000, 2)

In [5]:
#missing values
df.isnull().sum()

reviewText    0
rating        0
dtype: int64

In [6]:
df['rating'].unique()#1 to 5 multiclass classification

array([3, 5, 4, 2, 1], dtype=int64)

In [7]:
df['rating'].value_counts()

rating
5    3000
4    3000
3    2000
2    2000
1    2000
Name: count, dtype: int64

In [61]:
## Preprocessing And Cleaning
## postive review is 1 and negative review is 0
df['rating']=df['rating'].apply(lambda x:0 if x<3 else 1)
#making to only two as positive and negative

In [62]:
df['rating'].value_counts()

rating
1    8000
0    4000
Name: count, dtype: int64

In [63]:
## 1. Lower All the cases
df['reviewText']=df['reviewText'].str.lower()

In [11]:
df.head()

Unnamed: 0,reviewText,rating
0,"jace rankin may be short, but he's nothing to ...",1
1,great short read. i didn't want to put it dow...,1
2,i'll start by saying this is the first of four...,1
3,aggie is angela lansbury who carries pocketboo...,1
4,i did not expect this type of book to be in li...,1


In [12]:
import re
import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\User\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [14]:
from bs4 import BeautifulSoup

In [64]:
## Removing special characters
df['reviewText']=df['reviewText'].apply(lambda x:re.sub('[^a-z A-z 0-9-]+', '',x))
## Remove the stopswords
df['reviewText']=df['reviewText'].apply(lambda x:" ".join([y for y in x.split() if y not in stopwords.words('english')]))
## Remove url 
df['reviewText']=df['reviewText'].apply(lambda x: re.sub(r'(http|https|ftp|ssh)://([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?', '' , str(x)))
# ## Remove html tags
# df['reviewText']=df['reviewText'].apply(lambda x: BeautifulSoup(x, 'lxml').get_text())
## Remove any additional spaces
df['reviewText']=df['reviewText'].apply(lambda x: " ".join(x.split()))


In [65]:
df.head()

Unnamed: 0,reviewText,rating
0,jace rankin may short hes nothing mess man hau...,1
1,great short read didnt want put read one sitti...,1
2,ill start saying first four books wasnt expect...,1
3,aggie angela lansbury carries pocketbooks inst...,1
4,expect type book library pleased find price right,1


In [66]:
## Lemmatizer
from nltk.stem import WordNetLemmatizer
lemmatizer=WordNetLemmatizer()

In [67]:
def lemmatize_words(text):
    return " ".join([lemmatizer.lemmatize(word) for word in text.split()])

In [68]:
df['reviewText']=df['reviewText'].apply(lambda x:lemmatize_words(x))

In [69]:
df.head()

Unnamed: 0,reviewText,rating
0,jace rankin may short he nothing mess man haul...,1
1,great short read didnt want put read one sitti...,1
2,ill start saying first four book wasnt expecti...,1
3,aggie angela lansbury carry pocketbook instead...,1
4,expect type book library pleased find price right,1


In [24]:
## Train Test Split
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(df['reviewText'],df['rating'],
                                              test_size=0.20)

In [25]:
from sklearn.feature_extraction.text import CountVectorizer
bow=CountVectorizer()
X_train_bow=bow.fit_transform(X_train).toarray()
X_test_bow=bow.transform(X_test).toarray()

In [26]:
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf=TfidfVectorizer()
X_train_tfidf=tfidf.fit_transform(X_train).toarray()
X_test_tfidf=tfidf.transform(X_test).toarray()

In [27]:
X_train_bow

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int64)

In [28]:
from sklearn.naive_bayes import GaussianNB
nb_model_bow=GaussianNB().fit(X_train_bow,y_train)
nb_model_tfidf=GaussianNB().fit(X_train_tfidf,y_train)

In [29]:
from sklearn.metrics import confusion_matrix,accuracy_score,classification_report
y_pred_bow=nb_model_bow.predict(X_test_bow)
y_pred_tfidf=nb_model_bow.predict(X_test_tfidf)
confusion_matrix(y_test,y_pred_bow)

array([[539, 274],
       [702, 885]], dtype=int64)

In [30]:
print("BOW accuracy: ",accuracy_score(y_test,y_pred_bow))

BOW accuracy:  0.5933333333333334


In [31]:
confusion_matrix(y_test,y_pred_tfidf)

array([[530, 283],
       [694, 893]], dtype=int64)

In [32]:
print("TFIDF accuracy: ",accuracy_score(y_test,y_pred_tfidf))

TFIDF accuracy:  0.5929166666666666


In [33]:
X_train

4831    jessie suffered lot husband dy mugged apartmen...
5983    fun quick read delivers bang buck- would defin...
9058    like author always feel like story get far lef...
1459    note potential reader would probably assign hi...
3586    wonderful page turning ebook started reading 1...
                              ...                        
9654    picked title breezed pretty quickly surprising...
8380    sweet love story worth 266 spent something thi...
2198    charlis perfectly comfortable simple fact shes...
2409    start 1st book series youll find wrapped chara...
1387    art amazing everyone ratedthe idea story decie...
Name: reviewText, Length: 9600, dtype: object

In [34]:
y_train

4831    1
5983    1
9058    1
1459    0
3586    1
       ..
9654    1
8380    1
2198    1
2409    1
1387    0
Name: rating, Length: 9600, dtype: int64

In [35]:
import gensim
from gensim.models import Word2Vec, KeyedVectors

In [70]:
Independent=df['reviewText']
Independent.head()

0    jace rankin may short he nothing mess man haul...
1    great short read didnt want put read one sitti...
2    ill start saying first four book wasnt expecti...
3    aggie angela lansbury carry pocketbook instead...
4    expect type book library pleased find price right
Name: reviewText, dtype: object

In [71]:
depenedent = df['rating']
depenedent

0        1
1        1
2        1
3        1
4        1
        ..
11995    1
11996    1
11997    1
11998    0
11999    1
Name: rating, Length: 12000, dtype: int64

In [72]:
Independent.shape,depenedent.shape

((12000,), (12000,))

In [73]:
from nltk import sent_tokenize
from gensim.utils import simple_preprocess
words=[]
for sent in Independent:
    sent_token=sent_tokenize(sent)
    for sent in sent_token:
        words.append(simple_preprocess(sent))

In [74]:
words

[['jace',
  'rankin',
  'may',
  'short',
  'he',
  'nothing',
  'mess',
  'man',
  'hauled',
  'saloon',
  'undertaker',
  'know',
  'he',
  'famous',
  'bounty',
  'hunter',
  'oregon',
  'shot',
  'man',
  'saloon',
  'finished',
  'year',
  'long',
  'quest',
  'avenge',
  'sister',
  'murder',
  'trying',
  'figure',
  'next',
  'snotty',
  'nosed',
  'farm',
  'boy',
  'rescued',
  'gang',
  'bully',
  'offer',
  'money',
  'kill',
  'man',
  'forced',
  'ranch',
  'reluctantly',
  'agrees',
  'bring',
  'man',
  'justice',
  'kill',
  'outright',
  'first',
  'need',
  'tell',
  'sister',
  'widower',
  'newskyla',
  'kyle',
  'springer',
  'bailey',
  'riding',
  'trail',
  'sleeping',
  'ground',
  'past',
  'month',
  'trying',
  'find',
  'jace',
  'want',
  'revenge',
  'man',
  'killed',
  'husband',
  'took',
  'ranch',
  'amongst',
  'crime',
  'shes',
  'keen',
  'detour',
  'jace',
  'want',
  'take',
  'realizes',
  'shes',
  'option',
  'hide',
  'behind',
  'boy',
 

In [75]:
model=gensim.models.Word2Vec(words)

In [76]:
## To Get All the Vocabulary
model.wv.index_to_key

['book',
 'story',
 'read',
 'one',
 'character',
 'like',
 'good',
 'would',
 'really',
 'love',
 'time',
 'get',
 'author',
 'reading',
 'series',
 'well',
 'much',
 'first',
 'even',
 'didnt',
 'short',
 'know',
 'way',
 'great',
 'could',
 'make',
 'sex',
 'little',
 'dont',
 'two',
 'thing',
 'want',
 'think',
 'find',
 'plot',
 'romance',
 'also',
 'end',
 'life',
 'im',
 'see',
 'enjoyed',
 'go',
 'scene',
 'never',
 'written',
 'take',
 'woman',
 'many',
 'lot',
 'kindle',
 'year',
 'say',
 'thought',
 'work',
 'bit',
 'found',
 'going',
 'give',
 'interesting',
 'liked',
 'writing',
 'novel',
 'loved',
 'another',
 'feel',
 'better',
 'got',
 'come',
 'man',
 'hot',
 'still',
 'back',
 'enough',
 'though',
 'people',
 'star',
 'reader',
 'made',
 'something',
 'review',
 'part',
 'friend',
 'page',
 'cant',
 'bad',
 'world',
 'need',
 'free',
 'keep',
 'new',
 'wasnt',
 'doesnt',
 'relationship',
 'enjoy',
 'recommend',
 'together',
 'next',
 'start',
 'felt',
 'best',
 'put',

In [77]:
model.corpus_count

12000

In [78]:
model.epochs

5

In [79]:
def avg_word2vec(doc):
    return np.mean([model.wv[word] for word in doc if word in model.wv.index_to_key],axis=0)

In [80]:
#apply for the entire sentences
from tqdm import tqdm
import numpy as np
X=[]
for i in tqdm(range(len(words))):
    X.append(avg_word2vec(words[i]))

100%|██████████| 12000/12000 [00:08<00:00, 1337.40it/s]


In [81]:
len(X)

12000

In [82]:
##independent Features
X_new=np.array(X)

In [84]:
X_new.shape

(12000, 100)

In [85]:
depenedent.shape

(12000,)

In [86]:
## this is the final independent features
df = pd.DataFrame(np.vstack(X_new))

In [87]:
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,90,91,92,93,94,95,96,97,98,99
0,-0.136856,0.294415,0.16795,0.081493,0.072774,-0.41332,0.067672,0.643041,-0.178715,-0.210501,...,0.52596,0.19903,-0.173893,0.056046,0.614693,0.197858,-0.01886,-0.230912,0.034119,0.081583
1,-0.222328,0.375868,0.092912,0.322842,0.241023,-0.587671,0.231921,0.743923,-0.242465,-0.301679,...,0.704224,0.439919,-0.279269,0.009386,0.241405,0.314028,0.073951,-0.047599,-0.027253,0.10892
2,-0.202839,0.364107,0.116809,0.147928,0.146087,-0.610554,0.236524,0.742397,-0.232603,-0.246233,...,0.536425,0.339665,-0.109171,-0.050865,0.35663,0.252303,0.15589,-0.064944,-0.095497,0.17152
3,-0.34604,0.345582,0.118988,0.146529,0.186025,-0.651036,0.230724,0.696143,-0.023003,-0.141846,...,0.454279,0.257656,-0.093801,-0.317059,0.02662,0.209338,0.500109,0.13422,-0.131994,0.151503
4,-0.008377,0.425999,0.0394,0.036013,-0.041662,-0.674357,0.091354,0.94588,-0.075488,-0.341923,...,0.486416,0.356018,0.133871,0.032888,0.341183,0.194147,-0.003606,-0.096355,-0.136248,0.055947


In [88]:
df['Output']=depenedent

In [89]:
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,91,92,93,94,95,96,97,98,99,Output
0,-0.136856,0.294415,0.16795,0.081493,0.072774,-0.41332,0.067672,0.643041,-0.178715,-0.210501,...,0.19903,-0.173893,0.056046,0.614693,0.197858,-0.01886,-0.230912,0.034119,0.081583,1
1,-0.222328,0.375868,0.092912,0.322842,0.241023,-0.587671,0.231921,0.743923,-0.242465,-0.301679,...,0.439919,-0.279269,0.009386,0.241405,0.314028,0.073951,-0.047599,-0.027253,0.10892,1
2,-0.202839,0.364107,0.116809,0.147928,0.146087,-0.610554,0.236524,0.742397,-0.232603,-0.246233,...,0.339665,-0.109171,-0.050865,0.35663,0.252303,0.15589,-0.064944,-0.095497,0.17152,1
3,-0.34604,0.345582,0.118988,0.146529,0.186025,-0.651036,0.230724,0.696143,-0.023003,-0.141846,...,0.257656,-0.093801,-0.317059,0.02662,0.209338,0.500109,0.13422,-0.131994,0.151503,1
4,-0.008377,0.425999,0.0394,0.036013,-0.041662,-0.674357,0.091354,0.94588,-0.075488,-0.341923,...,0.356018,0.133871,0.032888,0.341183,0.194147,-0.003606,-0.096355,-0.136248,0.055947,1


In [90]:
df.dropna(inplace=True)

In [91]:
df.isnull().sum()

0         0
1         0
2         0
3         0
4         0
         ..
96        0
97        0
98        0
99        0
Output    0
Length: 101, dtype: int64

In [93]:
## Train Test Split
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(df.iloc[:,:-1],df.iloc[:,-1],test_size=0.30)

In [96]:
from sklearn.ensemble import RandomForestClassifier
classifier=RandomForestClassifier()
classifier.fit(X_train,y_train)

In [97]:
y_pred=classifier.predict(X_test)

In [98]:
from sklearn.metrics import accuracy_score,classification_report
print(accuracy_score(y_test,y_pred))

0.7688888888888888


In [99]:
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           0       0.67      0.57      0.61      1162
           1       0.81      0.87      0.84      2438

    accuracy                           0.77      3600
   macro avg       0.74      0.72      0.72      3600
weighted avg       0.76      0.77      0.76      3600



In [100]:
def evaluate_model(true,predicted):
    print("Accuracy: ",accuracy_score(true,predicted))
    print(classification_report(true,predicted))

In [101]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import AdaBoostClassifier
from xgboost import XGBClassifier

In [103]:
models = {
    'Random Forest': RandomForestClassifier(),
    'Naive Bayes': GaussianNB(),
    'Logistic Regression': LogisticRegression(),
    'SVM': SVC(),
    'KNN': KNeighborsClassifier(),
    'Decision Tree': DecisionTreeClassifier(),
    'XGBoost': XGBClassifier(),
    'AdaBoost': AdaBoostClassifier(),
    'Gradient Boosting': GradientBoostingClassifier(),
}

for i in range(len(list(models))):
    model = list(models.values())[i]
    model.fit(X_train, y_train) # Train model

    # Make predictions
    y_train_pred = model.predict(X_train)
    y_test_pred = model.predict(X_test)
    
    print(list(models.keys())[i])
    # Evaluate Train and Test dataset
    print('Model performance for Training set')
    evaluate_model(y_train, y_train_pred)
    print('----------------------------------')
    
    print('Model performance for Test set')
    evaluate_model(y_test, y_test_pred)
    
    print('='*35)
    print('\n')

Random Forest
Model performance for Training set
Accuracy:  1.0
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      2838
           1       1.00      1.00      1.00      5562

    accuracy                           1.00      8400
   macro avg       1.00      1.00      1.00      8400
weighted avg       1.00      1.00      1.00      8400

----------------------------------
Model performance for Test set
Accuracy:  0.7722222222222223
              precision    recall  f1-score   support

           0       0.67      0.58      0.62      1162
           1       0.81      0.86      0.84      2438

    accuracy                           0.77      3600
   macro avg       0.74      0.72      0.73      3600
weighted avg       0.77      0.77      0.77      3600



Naive Bayes
Model performance for Training set
Accuracy:  0.7151190476190477
              precision    recall  f1-score   support

           0       0.56      0.74      0.64      283



AdaBoost
Model performance for Training set
Accuracy:  0.7713095238095238
              precision    recall  f1-score   support

           0       0.68      0.60      0.64      2838
           1       0.81      0.86      0.83      5562

    accuracy                           0.77      8400
   macro avg       0.75      0.73      0.74      8400
weighted avg       0.77      0.77      0.77      8400

----------------------------------
Model performance for Test set
Accuracy:  0.7666666666666667
              precision    recall  f1-score   support

           0       0.65      0.59      0.62      1162
           1       0.81      0.85      0.83      2438

    accuracy                           0.77      3600
   macro avg       0.73      0.72      0.73      3600
weighted avg       0.76      0.77      0.76      3600



Gradient Boosting
Model performance for Training set
Accuracy:  0.8120238095238095
              precision    recall  f1-score   support

           0       0.75      0.66   

In [105]:
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from xgboost import XGBClassifier

# Define models
models = {
    'Random Forest': RandomForestClassifier(),
    'Naive Bayes': GaussianNB(),
    'Logistic Regression': LogisticRegression(),
    'SVM': SVC(),
    'KNN': KNeighborsClassifier(),
    'XGBoost': XGBClassifier(),
    'AdaBoost': AdaBoostClassifier(),
    'Gradient Boosting': GradientBoostingClassifier(),
}

# Define hyperparameters for GridSearch
params = {
    "Random Forest": {
        'n_estimators': [8, 16, 32, 64, 128, 256]
    },
    "Gradient Boosting": {
        'learning_rate': [0.1, 0.01, 0.05, 0.001],
        'subsample': [0.6, 0.7, 0.75, 0.8, 0.85, 0.9],
        'n_estimators': [8, 16, 32, 64, 128, 256]
    },
    "XGBoost": {
        'learning_rate': [0.1, 0.01, 0.05, 0.001],
        'n_estimators': [8, 16, 32, 64, 128, 256]
    },
    "AdaBoost": {
        'learning_rate': [0.1, 0.01, 0.5, 0.001],
        'n_estimators': [8, 16, 32, 64, 128, 256]
    },
    "KNN": {
        'n_neighbors': [3, 5, 7, 9],
        'weights': ['uniform', 'distance'],
        'algorithm': ['auto', 'ball_tree', 'kd_tree', 'brute']
    }
}

# Function to evaluate model
def evaluate_model(y_true, y_pred):
    print(f'Accuracy: {accuracy_score(y_true, y_pred)}')
    print(f'Precision: {precision_score(y_true, y_pred, average="weighted")}')
    print(f'Recall: {recall_score(y_true, y_pred, average="weighted")}')
    print(f'F1 Score: {f1_score(y_true, y_pred, average="weighted")}')

model_report = {}

# Train and evaluate each model
for name, model in models.items():
    print(name)
    if name in params:
        grid = GridSearchCV(model, params[name], cv=5, scoring='accuracy')
        grid.fit(X_train, y_train)
        best_model = grid.best_estimator_
    else:
        best_model = model
        best_model.fit(X_train, y_train)
    
    # Make predictions
    y_train_pred = best_model.predict(X_train)
    y_test_pred = best_model.predict(X_test)
    
    # Evaluate Train and Test dataset
    print('Model performance for Training set')
    evaluate_model(y_train, y_train_pred)
    print('----------------------------------')
    
    print('Model performance for Test set')
    evaluate_model(y_test, y_test_pred)
    
    print('='*35)
    print('\n')
    
    # Store the test set performance
    model_report[name] = accuracy_score(y_test, y_test_pred)

# Identify the best model
best_model_score = max(model_report.values())
best_model_name = max(model_report, key=model_report.get)
best_model = models[best_model_name]

if best_model_score < 0.6:
    print("No best model found")
else:
    print(f"Best found model on both training and testing dataset is: {best_model_name} with score: {best_model_score}")


Random Forest
Model performance for Training set
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
----------------------------------
Model performance for Test set
Accuracy: 0.7736111111111111
Precision: 0.7665593296293539
Recall: 0.7736111111111111
F1 Score: 0.7674493447953895


Naive Bayes
Model performance for Training set
Accuracy: 0.7151190476190477
Precision: 0.7456299845842316
Recall: 0.7151190476190477
F1 Score: 0.722113256499473
----------------------------------
Model performance for Test set
Accuracy: 0.7136111111111111
Precision: 0.7524295242894523
Recall: 0.7136111111111111
F1 Score: 0.7222583212317802


Logistic Regression
Model performance for Training set
Accuracy: 0.7755952380952381
Precision: 0.7694922586680291
Recall: 0.7755952380952381
F1 Score: 0.7680277735652609
----------------------------------
Model performance for Test set
Accuracy: 0.7836111111111111
Precision: 0.7771340559956552
Recall: 0.7836111111111111
F1 Score: 0.7775092302352767


SVM
Model perfor



Model performance for Training set
Accuracy: 0.7978571428571428
Precision: 0.7941220894743105
Recall: 0.7978571428571428
F1 Score: 0.7948936344050488
----------------------------------
Model performance for Test set
Accuracy: 0.7733333333333333
Precision: 0.7687705176767676
Recall: 0.7733333333333333
F1 Score: 0.7703283898982535


Gradient Boosting


KeyboardInterrupt: 