# Testing The Nayes Classifier
In this notebook, I test the accuracy of the "Nayes" multinomial classifier against the sklearn library's algorithm of the same type.

In [1]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
from nayes import MultiNayes

In [2]:
# import the movie review data
imdb_cols = ["review", "sentiment"]
imdb = pd.read_csv("imdb_labelled.txt", sep="\t", names=imdb_cols)

# prepare and preprocess the text for vectorization
imdb["review"] = imdb["review"].str.strip()
imdb["review"] = imdb["review"].str.replace(r"[^\w\s-]", "")
imdb["review"] = imdb["review"].str.replace(r"\-", " ")
imdb["review"] = imdb["review"].str.replace(r"\s{2,}", " ")
imdb["review"] = imdb["review"].str.lower()

# vectorize the text
vectorizer = CountVectorizer()

X = vectorizer.fit_transform(imdb["review"])
y = imdb["sentiment"].values

# split into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.20,
                                                    random_state=42)

### Fit and Predict

In [3]:
sklearn_classifier = MultinomialNB()
nayes = MultiNayes()

sklearn_classifier.fit(X_train, y_train)
nayes.fit(X_train, y_train)

In [4]:
sklearn_prediction = sklearn_classifier.predict(X_test)
nayes_prediction = nayes.predict(X_test)

sklearn_score = accuracy_score(sklearn_prediction, y_test)
nayes_score = accuracy_score(nayes_prediction, y_test)

print(f"Scikit-Learn Score: {round(sklearn_score, 2)}")
print(f"Nayes Classifier Score: {round(nayes_score, 2)}")

Scikit-Learn Score: 0.77
Nayes Classifier Score: 0.77


As we can see, the classifier holds it's own when put to the test against the sklearn library!