# Feedback Sentiment Analysis

Dummy data generated by ChatGPT

### Importing Data

In [25]:
import pandas as pd
import numpy as np
import re

In [26]:
data = pd.read_csv("feedback_old.csv")
data.shape

(1000, 2)

In [27]:
data.head()

Unnamed: 0,feedback,sentiment
0,Rigorous other about for disappointing at you ...,Negative
1,My supportive with inspirational inspiring you...,Positive
2,Generous it dependable or having high-performi...,Positive
3,Menacing exhausting they having how yours unin...,Negative
4,Daunting dispiriting few burdensome yourself w...,Negative


Sentiment data count:

In [28]:
data["sentiment"].value_counts()

sentiment
Negative    500
Positive    500
Name: count, dtype: int64

### Preprocessing

Stemming is a linguistic method of reducing word to reduce words to their base, for example "walking" and "walked" will reduce to "walk."

In [29]:
from nltk.stem.porter import PorterStemmer

ps = PorterStemmer()

def stem_words(feedback):
    return ' '.join([ps.stem(word) for word in feedback.split()])

stem_words("this is an example of stemming words in a sentence")

'thi is an exampl of stem word in a sentenc'

In [30]:
def preprocess(feedback):
    feedback = feedback.lower()
    feedback = re.sub("[^a-z\s]", "", feedback)
    feedback = stem_words(feedback)
    return feedback

preprocess("This is an example of how the preprocessed data looks like.")

'thi is an exampl of how the preprocess data look like'

In [31]:
data_encoded = data.copy()
data_encoded["feedback"] = data_encoded["feedback"].apply(preprocess)
data_encoded.head()

Unnamed: 0,feedback,sentiment
0,rigor other about for disappoint at you thi ha...,Negative
1,my support with inspir inspir youll versatil j...,Positive
2,gener it depend or have highperform out until ...,Positive
3,menac exhaust they have how your uninterest do...,Negative
4,daunt dispirit few burdensom yourself weari th...,Negative


Using countvectorizer to count the frequency of words in each feedback statement. 

In [32]:
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer

X = data_encoded["feedback"]
y = data_encoded["sentiment"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

cv = CountVectorizer()
X_train = cv.fit_transform(X_train)
X_test = cv.transform(X_test)

In [33]:
cv.get_feature_names_out().size

402

In [34]:
word_counts = pd.DataFrame(X_train.toarray(), columns=cv.get_feature_names_out())
word_counts.head()

Unnamed: 0,about,abov,accolad,accomplish,achiev,adapt,admir,after,again,against,...,wonder,wornout,worri,you,youd,youll,your,yourself,yourselv,youv
0,0,0,0,1,0,0,0,0,0,1,...,0,0,0,0,0,0,1,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
2,0,0,0,0,0,0,0,1,1,0,...,0,0,0,0,0,0,0,0,0,1
3,0,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
4,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Creating the Model

Using the Multinomial Naive Bayes model, which calculates the probabilities of each sentiment based on word counts, and classifies based on the sentiment with highest likelihood.  

In [35]:
from sklearn.naive_bayes import MultinomialNB

model = MultinomialNB()
model.fit(X_train, y_train)

In [36]:
model.score(X_train, y_train)

1.0

In [37]:
model.score(X_test, y_test)

1.0

Scores this high indicate that the model is overtrained and possibly have too little data points to be an accurate model.

In [38]:
test_feedback = "Manager is doing an awful job and this is a terrible work environment."

test_feedback = cv.transform([preprocess(test_feedback)])
model.predict(test_feedback)[0]

'Negative'

In [39]:
test_feedback = "You are doing a bad job."

test_feedback = cv.transform([preprocess(test_feedback)])
model.predict(test_feedback)[0]

'Negative'

In [40]:
test_feedback = "You are doing a good job."

test_feedback = cv.transform([preprocess(test_feedback)])
model.predict(test_feedback)[0]

'Positive'

## Dumping the model to be used in the web app

In [41]:
import pickle

pickle.dump(model, open('MNB_feedback_model.pkl', 'wb'))
pickle.dump(cv, open('countvectorizer.pkl', 'wb'))