# Naive Bayes Classifier 
In this project we want ot apply the Naive Bayes Classifier on a dataset that is from comments of a online store and after training our model with the train dataset we are going to predict the recommendation class of each comment in the test dataset. Here we are going to use the bag of words method that is both simple and produces results with good accuracy.

## Importing needed libraries
As for most of the AI projects numpy and pandas should be importet. Here we are imporing hazm library for preprocessing and normalazing words in comments

In [1]:
from __future__ import unicode_literals
import pandas as pd
import numpy as np
from hazm import *
from math import log2

## Tokenizing comments and titles for train dataset
Here we tokenize the words of sentences in both comments and titles and then we append these two to make a one single comment without any title for the sake of simplicity

In [2]:
%%time
train_table = pd.read_csv("comment_train.csv", encoding="utf-8")
train_table['title'] = train_table.apply(lambda row: word_tokenize(row['title']), axis=1)
train_table['comment'] = train_table.apply(lambda row: word_tokenize(row['comment']) + row['title'], axis=1)
del train_table['title']
train_table

Wall time: 968 ms


Unnamed: 0,comment,recommend
0,"[با, وجود, سابقه, خوبی, که, از, برند, ایرانی, ...",not_recommended
1,"[بسیار, عالی, بسیار, عالی]",recommended
2,"[من, الان, ۳, هفته, هست, استفاده, میکنم\r, برا...",not_recommended
3,"[عمرش, کمه, تا, یه, هفته, بیشتر, نمیشه, استفاد...",not_recommended
4,"[فکر, کنین, کلمن, بخرین, با, ذوق, ., کلی, پولش...",not_recommended
...,...,...
5995,"[خیلی, جنس, پارچش, نرم, ولطیفه, خیلیم, جنسش, خ...",recommended
5996,"[سلام, ., واقعا, فکر, نمی, کردم, به, این, راحت...",recommended
5997,"[من, از, دیجی, کالا, خریدم, خیلی, زود, دستم, ر...",recommended
5998,"[یا, شرکت, نمیدونسته, چای, ماچا, امپریال, چیه,...",not_recommended


## Tokenizing comments and titles for train dataset
And we do the exact same process for the test dataset

In [3]:
%%time
test_table = pd.read_csv("comment_test.csv", encoding="utf-8")
test_table['title'] = test_table.apply(lambda row: word_tokenize(row['title']), axis=1)
test_table['comment'] = test_table.apply(lambda row: word_tokenize(row['comment']) + row['title'], axis=1)
del test_table['title']
test_table

Wall time: 119 ms


Unnamed: 0,comment,recommend
0,"[تازه, خریدم, یه, مدت, کار, بکنه, مشخص, میشه, ...",recommended
1,"[با, این, قیمت, گزینه, های, بهتری, هم, میشه, گ...",not_recommended
2,"[خیلی, عالیه, ،, فقط, کاش, از, اون, سمتش, میشد...",recommended
3,"[من, این, فیس, براس, چند, روز, یپش, به, دستم, ...",not_recommended
4,"[بنده, یه, هارد, اکسترنال, دارم, که, کابل, فاب...",recommended
...,...,...
795,"[طراحیش, قشنگه, ولی, داخل, عکس, خیلی, بزرگتر, ...",not_recommended
796,"[این, لامپ, چینی, هستتش, کیفیت, پایین, ., نور,...",not_recommended
797,"[در, کل, از, این, خریدم, راضی, هستم, و, به, تن...",recommended
798,"[تازع, نصبش, کردم-سرعت, انتقال, و, نصب, بازی, ...",recommended


## Preprocessing the datasets 
We should do some preprocessing for both test and train dataset. This preprocessing consist of eliminating stop words which is done in a few blocks below and stemming and lemmatization the word.
- Stemming: This process convert each word to its simplest form with no plural signs and any other added charecter.
- Lemmatization: This process convert each word mostly the verbs into its root.
 
These two processes help the similar words that mean the same fall into same category rather than falling into different categories that leads to have more categories and less accurate categorized dictionary.
In order to be able to see the effect of preprocssing in results we keep an instance of our data without preprecessing. 

In [4]:
%%time
train_table_nopreprocessing = train_table
lemmatizer = Lemmatizer()
stemmer = Stemmer()
train_table['comment'] = train_table.apply(lambda row: [lemmatizer.lemmatize(stemmer.stem(word)) for word in row['comment'] ], axis=1)
train_table

Wall time: 1.3 s


Unnamed: 0,comment,recommend
0,"[با, وجود, سابقه, خوب, که, از, برد#بر, ایران, ...",not_recommended
1,"[بسیار, عال, بسیار, عال]",recommended
2,"[من, ال, ۳, هفته, هس, استفاده, میکنم\r, برا, ک...",not_recommended
3,"[عمر, کمه, تا, یه, هفته, ب, نمیشه, استفاده, کر...",not_recommended
4,"[فکر, کنین, کلمن, بخرین, با, ذوق, ., کل, پولشو...",not_recommended
...,...,...
5995,"[خیل, جنس, پارچ, نر, ولطیفه, خیل, جنس, خوبه, ا...",recommended
5996,"[سلا, ., واقعا, فکر, نم, کرد#کن, به, این, راحت...",recommended
5997,"[من, از, دیج, کالا, خرید, خیل, زود, دس, رسید, ...",recommended
5998,"[یا, شرک, نمیدونسته, چا, ماچا, امپریال, چیه, ی...",not_recommended


In [5]:
%%time
test_table_nopreprocessing = test_table
lemmatizer = Lemmatizer()
stemmer = Stemmer()
test_table['comment'] = test_table.apply(lambda row: [lemmatizer.lemmatize(stemmer.stem(word)) for word in row['comment'] ], axis=1)
test_table

Wall time: 438 ms


Unnamed: 0,comment,recommend
0,"[تازه, خرید, یه, مد, کار, بکنه, مشخص, میشه, کی...",recommended
1,"[با, این, قیم, گزینه, , بهتر, ه, میشه, گرف, .,...",not_recommended
2,"[خیل, عالیه, ،, فقط, کا, از, اون, سمت, میشد, ب...",recommended
3,"[من, این, فیس, براس, چند, روز, یپ, به, دس, رسی...",not_recommended
4,"[بنده, یه, هارد, اکسترنال, دار, که, کابل, فابر...",recommended
...,...,...
795,"[طراح, قشنگه, ول, داخل, عکس, خیل, بزرگ, ب, چ, ...",not_recommended
796,"[این, لامپ, چین, هستت, کیف, پایین, ., نور, ک, ...",not_recommended
797,"[در, کل, از, این, خرید, راض, هس, و, به, تناسب,...",recommended
798,"[تازع, نصب, کردم-سرع, انتقال, و, نصب, باز, رو,...",recommended


## Making a dictionary out of all the words in the comments 
Here we do two things first getting rid of stop words that don't have anything to do with the comment class and also we calculate how many times each word showed up in a recommended class comments and not recommended class comments.
During this process the words that repeated more than once in a single comment won't count more that once and wiil be ignored.
In order to see the results for data with no preprocessing we keep a dictionary of our raw data.

In [6]:
%%time
comment_words = {}
comment_words_nopreprocessing = {}
def get_words(row, name, dic, eliminate_stopwords = True):
    seen_words = set([])
    if eliminate_stopwords:
        # stop_words = set(["و", "در", "از", "با", "که", ".", "به", "این", "اون", "را", "تا", "\n", "\r", ""])
        stop_words = set(stopwords_list())
    else:
        stop_words = set([])
    for word in row[name]:
        if len(word) <= 1 or word in stop_words or word in seen_words:
            continue
        seen_words.add(word)
        if not word in dic:
            dic[word] = {'recommend': 0, 'not recommend': 0, 'total': 0}
        dic[word]['total'] += 1
        if row['recommend'] == 'recommended':
            dic[word]['recommend'] += 1
        else:
            dic[word]['not recommend'] += 1
    return

train_table_nopreprocessing.apply(lambda row: get_words(row, 'comment', comment_words_nopreprocessing, False), axis=1)
comment_words_nopreprocessing = sorted(comment_words_nopreprocessing.items(), key=lambda x: x[1]['total'], reverse=True)
comment_words_nopreprocessing_table = pd.DataFrame(comment_words_nopreprocessing, columns=('word', 'details'), index=[words[0] for words in comment_words_nopreprocessing])

train_table.apply(lambda row: get_words(row, 'comment', comment_words), axis=1)
comment_words = sorted(comment_words.items(), key=lambda x: x[1]['total'], reverse=True)
comment_words_table = pd.DataFrame(comment_words, columns=('word', 'details'), index=[words[0] for words in comment_words])

comment_words_table

Wall time: 9.7 s


Unnamed: 0,word,details
خیل,خیل,"{'recommend': 1105, 'not recommend': 805, 'tot..."
خرید,خرید,"{'recommend': 906, 'not recommend': 870, 'tota..."
کیف,کیف,"{'recommend': 658, 'not recommend': 917, 'tota..."
کرد#کن,کرد#کن,"{'recommend': 767, 'not recommend': 800, 'tota..."
برا,برا,"{'recommend': 757, 'not recommend': 553, 'tota..."
...,...,...
امپریال,امپریال,"{'recommend': 0, 'not recommend': 1, 'total': 1}"
ضدسرطان,ضدسرطان,"{'recommend': 0, 'not recommend': 1, 'total': 1}"
اکسید,اکسید,"{'recommend': 0, 'not recommend': 1, 'total': 1}"
مونوه,مونوه,"{'recommend': 1, 'not recommend': 0, 'total': 1}"


## Calculating the conditional probiblity for each word
Here we have to calculate conditional probiblity of each word when we know the class of comment they belog to.
Here prior probiblity of each class is 50% since we have equall number of comments in each class in the train dataset. Each word in the comment consider as an evidence that makes the prediction more accurate and the probiblity considering we see X1...Xn words will form the posterior probiblity. After calculating probiblity of each class considering the words we've seen in comment hte class with more probiblity determines which class is more likely to be the class of this comment. 

### Additive smoothing
This method is used to used for the situations where the a cetain word is occuared in a class but not in the other class and to avoid our model to jump into the coclusion that having this word means the comment belongs to that class we add a certain value ( like 1 for example ) to number of occurance of every word in dictiony for both classes so this way the p(x | class) won't be zero for any of the words and so p(class | X) won't be zero.
Jsut like previous sections in order to see the results of smoothing on the data we keep a dataset with no smoothing.

In [7]:
%%time
rec_comments = train_table.loc[train_table['recommend'] == 'recommended'].shape[0]
notrec_comments = train_table.loc[train_table['recommend'] == 'not_recommended'].shape[0]
P_REC = rec_comments / train_table.shape[0]
P_NOTREC = 1 - P_REC

comment_words_nopreprocessing_table['p(x | rec)'] = comment_words_nopreprocessing_table.apply(lambda row: log2((row['details']['recommend']+1) / rec_comments), axis=1)
comment_words_nopreprocessing_table['p(x | not rec)'] = comment_words_nopreprocessing_table.apply(lambda row: log2((row['details']['not recommend']+1) / notrec_comments), axis=1)
comment_words_nopreprocessing_table['p(x | rec) no smoothing'] = comment_words_nopreprocessing_table.apply(lambda row: log2((row['details']['recommend']) / rec_comments) if row['details']['recommend'] != 0 else -float('inf'), axis=1)
comment_words_nopreprocessing_table['p(x | not rec) no smoothing'] = comment_words_nopreprocessing_table.apply(lambda row: log2((row['details']['not recommend']) / notrec_comments) if row['details']['not recommend'] != 0 else -float('inf'), axis=1)

comment_words_table['p(x | rec)'] = comment_words_table.apply(lambda row: log2((row['details']['recommend']+1) / rec_comments), axis=1)
comment_words_table['p(x | not rec)'] = comment_words_table.apply(lambda row: log2((row['details']['not recommend']+1) / notrec_comments), axis=1)
comment_words_table['p(x | rec) no smoothing'] = comment_words_table.apply(lambda row: log2((row['details']['recommend']) / rec_comments) if row['details']['recommend'] != 0 else -float('inf'), axis=1)
comment_words_table['p(x | not rec) no smoothing'] = comment_words_table.apply(lambda row: log2((row['details']['not recommend']) / notrec_comments) if row['details']['not recommend'] != 0 else -float('inf'), axis=1)

comment_words_table

Wall time: 2.42 s


Unnamed: 0,word,details,p(x | rec),p(x | not rec),p(x | rec) no smoothing,p(x | not rec) no smoothing
خیل,خیل,"{'recommend': 1105, 'not recommend': 805, 'tot...",-1.439611,-1.896111,-1.440916,-1.897902
خرید,خرید,"{'recommend': 906, 'not recommend': 870, 'tota...",-1.725788,-1.784218,-1.727380,-1.785875
کیف,کیف,"{'recommend': 658, 'not recommend': 917, 'tota...",-2.186612,-1.708396,-2.188803,-1.709969
کرد#کن,کرد#کن,"{'recommend': 767, 'not recommend': 800, 'tota...",-1.965784,-1.905088,-1.967664,-1.906891
برا,برا,"{'recommend': 757, 'not recommend': 553, 'tota...",-1.984693,-2.437005,-1.986597,-2.439611
...,...,...,...,...,...,...
امپریال,امپریال,"{'recommend': 0, 'not recommend': 1, 'total': 1}",-11.550747,-10.550747,-inf,-11.550747
ضدسرطان,ضدسرطان,"{'recommend': 0, 'not recommend': 1, 'total': 1}",-11.550747,-10.550747,-inf,-11.550747
اکسید,اکسید,"{'recommend': 0, 'not recommend': 1, 'total': 1}",-11.550747,-10.550747,-inf,-11.550747
مونوه,مونوه,"{'recommend': 1, 'not recommend': 0, 'total': 1}",-10.550747,-11.550747,-11.550747,-inf


## Getting results for the test dataset
Here we do the final step we use our dictionary that contains the conditional probiblity for each word and use it to determine which class is more likely to be the class of the current comment we are seeing and get the results of each comment in the dataset with four dictionaries that we generated in the previous sections.
The Prodiction column is the main results that is caculated using preprocessing and smoothing the other columns uses only one the mentioned method or niether of them.
After getting resluts we caculate some efficiency percentages to see how good our model perfomed.
- More Precision means we have less comments that belong to not recommended class but we labled them as recommended class.
- More Recall means the number of comments that we correctly labled as recommended is more closer to the actuall number of comments with recommended class.

Consider that our model tend to lable comments as recommended less that is should. this way it is less likely to lable a not recommended comment as recommended so the Precision percentage goes really high but our model is not good enough and Recall stays low becuase the number of correctly detected recommended is so much less than actuall number of recommended comments.

Now consider the opposite case that our model tend to lable comments as recommended more than it should. In this case the number of correctly detected inevitably increases so Recall would get bigger although our model is not perfoming very well and in this case we can detect the deficiency of our model with Precision since the number of wrongly detected recommended comments increases and this lessen Precision.

F1 parameter is kind of average between Recall and Precision. The final value of F1 is (2 * Coreectly detected recommended) / (All detected recommended + Total recommended). High value of F1 means correctly detected recommended number is not too close to niether all detected recommended nor total recommended and this means niether of the above cases happend so F1 is a good way to evaluate the performance of the model.

In [8]:
%%time
def evaluate(row, comment_words_table, rec_probs, notrec_probs):
    p_rec = log2(P_REC)
    p_notrec = log2(P_NOTREC)
    for word in row['comment']:
        if word in comment_words_table.index:
            detail = comment_words_table.loc[word]
            p_rec += detail[rec_probs]
            p_notrec += detail[notrec_probs]
    if p_rec > p_notrec:
        return 'recommended'
    else:
        return 'not_recommended'

test_table['prediction'] = test_table.apply(lambda row: evaluate(row, comment_words_table, 'p(x | rec)', 'p(x | not rec)'), axis=1)
test_table['no preprocessing'] = test_table.apply(lambda row: evaluate(row, comment_words_nopreprocessing_table, 'p(x | rec)', 'p(x | not rec)'), axis=1)
test_table['no smoothing'] = test_table.apply(lambda row: evaluate(row, comment_words_table, 'p(x | rec) no smoothing', 'p(x | not rec) no smoothing'), axis=1)
test_table['no preprocessing or smoothing'] = test_table.apply(lambda row: evaluate(row, comment_words_nopreprocessing_table, 'p(x | rec) no smoothing', 'p(x | not rec) no smoothing'), axis=1)

test_table

Wall time: 17 s


Unnamed: 0,comment,recommend,prediction,no preprocessing,no smoothing,no preprocessing or smoothing
0,"[تازه, خرید, یه, مد, کار, بکنه, مشخص, میشه, کی...",recommended,recommended,recommended,recommended,recommended
1,"[با, این, قیم, گزینه, , بهتر, ه, میشه, گرف, .,...",not_recommended,recommended,recommended,recommended,recommended
2,"[خیل, عالیه, ،, فقط, کا, از, اون, سمت, میشد, ب...",recommended,recommended,recommended,recommended,recommended
3,"[من, این, فیس, براس, چند, روز, یپ, به, دس, رسی...",not_recommended,not_recommended,not_recommended,not_recommended,not_recommended
4,"[بنده, یه, هارد, اکسترنال, دار, که, کابل, فابر...",recommended,recommended,recommended,recommended,recommended
...,...,...,...,...,...,...
795,"[طراح, قشنگه, ول, داخل, عکس, خیل, بزرگ, ب, چ, ...",not_recommended,not_recommended,not_recommended,not_recommended,not_recommended
796,"[این, لامپ, چین, هستت, کیف, پایین, ., نور, ک, ...",not_recommended,not_recommended,not_recommended,not_recommended,not_recommended
797,"[در, کل, از, این, خرید, راض, هس, و, به, تناسب,...",recommended,recommended,recommended,recommended,recommended
798,"[تازع, نصب, کردم-سرع, انتقال, و, نصب, باز, رو,...",recommended,recommended,recommended,recommended,recommended


In [9]:
def get_results(test_table, result_colname):
    accuracy = 0
    correct_detected_recommended = 0
    all_detected_recommended = 0
    total_recommended = test_table.loc[test_table['recommend'] == 'recommended'].shape[0]
    for indx, row in test_table.iterrows():
        if row['recommend'] == row[result_colname]:
            accuracy += 1
        if row[result_colname] == 'recommended':
            all_detected_recommended += 1
            if row['recommend'] == "recommended":
                    correct_detected_recommended += 1

    accuracy = accuracy / test_table.shape[0] * 100
    precision = correct_detected_recommended / all_detected_recommended * 100
    recall = correct_detected_recommended / total_recommended * 100
    F1 = 2 * (precision * recall) / (precision + recall)
    print("\tAccuracy: ", accuracy, " %")
    print("\tPrecision: ", precision, " %")
    print("\tRecall: ", recall, " %")
    print("\tF1: ", F1)

print("Results with preprocessing and additave smoothing:")
get_results(test_table, 'prediction')
print()
print("Results with preprocessing only:")
get_results(test_table, 'no smoothing')
print()
print("Results with additave smoothing only:")
get_results(test_table, 'no preprocessing')
print()
print("Results with no preprocessing or additave smoothing:")
get_results(test_table, 'no preprocessing or smoothing')

Results with preprocessing and additave smoothing:
	Accuracy:  93.625  %
	Precision:  90.86651053864169  %
	Recall:  97.0  %
	F1:  93.83313180169287

Results with preprocessing only:
	Accuracy:  89.5  %
	Precision:  89.30348258706468  %
	Recall:  89.75  %
	F1:  89.52618453865337

Results with additave smoothing only:
	Accuracy:  91.625  %
	Precision:  87.24832214765101  %
	Recall:  97.5  %
	F1:  92.08972845336481

Results with no preprocessing or additave smoothing:
	Accuracy:  89.0  %
	Precision:  88.04878048780488  %
	Recall:  90.25  %
	F1:  89.1358024691358


## Conclusion
Numbers above shows that smoothing has more effect on the results than preprocessing and that's becauze preprocessing do some simplification by eliminating words that don't directy related to the class of comments or normalizing words for exmaple by gather words with same meaning and origin into same category. This process although has effect on the results it's not that significant on the other hand smoothing as mentioned above can eliminate the prossiblity that our model decides based on just one or two words in the comment that showed up only in one of the classes.

In [14]:
test_table.to_csv("final results.csv", encoding='utf-8-sig')
wrongs_talbe = test_table.loc[test_table['prediction'] != test_table['recommend']]
wrongs_talbe

Unnamed: 0,comment,recommend,prediction,no preprocessing,no smoothing,no preprocessing or smoothing
1,"[با, این, قیم, گزینه, , بهتر, ه, میشه, گرف, .,...",not_recommended,recommended,recommended,recommended,recommended
8,"[سلا, ،, راح, شد#شو, از, کابل, شارژ, ،, توصیه,...",recommended,not_recommended,not_recommended,not_recommended,not_recommended
69,"[من, خود, جزو, افراد, بود#باش, که, نزدیک, سیزد...",not_recommended,recommended,recommended,recommended,recommended
83,"[سلا, دوس, بعد, از, استفاده, چراغ, چک, تویوتا,...",recommended,not_recommended,not_recommended,not_recommended,not_recommended
102,"[ایراد, دستگاه, ایراد, دستگاه]",not_recommended,recommended,recommended,recommended,recommended
119,"[برد, بالا, نداره, کیف, صدا, معمولیه, برا, تا,...",not_recommended,recommended,recommended,recommended,recommended
138,"[روتخت, نسب, به, بقیه, روتختیا, کوچیکتره, جور,...",recommended,not_recommended,not_recommended,not_recommended,not_recommended
142,"[والله, من, اینو, فقط, بخاطر, برد#بر, خرید, .,...",not_recommended,recommended,recommended,not_recommended,not_recommended
157,"[با, قابل, تعویض, و, فقط, به, درد, اصلاه, با, ...",not_recommended,recommended,recommended,not_recommended,not_recommended
166,"[من, برا, هدیه, خرید, راست, اینقدر, که, هزینه,...",not_recommended,recommended,recommended,recommended,recommended


## Five comments that weren't labled correctly

In [17]:
print(wrongs_talbe.iloc[2]['comment'])
wrongs_talbe.iloc[2]

['من', 'خود', 'جزو', 'افراد', 'بود#باش', 'که', 'نزدیک', 'سیزده', 'ساله', 'از', 'انواع', 'فیل', 'سرک', 'اع', 'از', 'روغن', '،', 'هوا', 'و', 'اتاق', 'استفاده', 'میکرد', 'ول', 'به', 'تازگ', 'متوجه', 'و', 'اطلاع', 'یاف', 'که', 'فیل', 'گاج', 'باکیف', '', 'از', 'فیل', 'سرک', 'بود#باش', 'و', 'ه', 'چنین', 'قیم', 'بمراتب', 'مناسب', 'ترب', 'نسب', 'به', 'سرک', 'داشت#دار', 'و', 'طرف', 'فروشنده', 'که', 'دا', 'روغن', 'فیل', 'را', 'به', 'من', 'می\u200cفروخ', '.', 'واقعا', 'به', 'اثب', 'کرد#کن', 'که', 'گاج', 'باکیف', '', 'از', 'سرک', 'بود#باش', '.', 'بررس', 'فیل', 'سرک']


comment                          [من, خود, جزو, افراد, بود#باش, که, نزدیک, سیزد...
recommend                                                          not_recommended
prediction                                                             recommended
no preprocessing                                                       recommended
no smoothing                                                           recommended
no preprocessing or smoothing                                          recommended
Name: 69, dtype: object

In [18]:
print(wrongs_talbe.iloc[3]['comment'])
wrongs_talbe.iloc[3]

['سلا', 'دوس', 'بعد', 'از', 'استفاده', 'چراغ', 'چک', 'تویوتا', 'کمر', '۲۰۰۷', 'خامو', 'شد#شو', 'چراغ', 'چک', 'موتور', 'خامو', 'شد#شو']


comment                          [سلا, دوس, بعد, از, استفاده, چراغ, چک, تویوتا,...
recommend                                                              recommended
prediction                                                         not_recommended
no preprocessing                                                   not_recommended
no smoothing                                                       not_recommended
no preprocessing or smoothing                                      not_recommended
Name: 83, dtype: object

In [20]:
print(wrongs_talbe.iloc[5]['comment'])
wrongs_talbe.iloc[5]

['برد', 'بالا', 'نداره', 'کیف', 'صدا', 'معمولیه', 'برا', 'تا', 'تمرین', 'و', 'ورز', 'مناسبه', 'از', 'رو', 'گو', 'نمیفته', 'ظریف', 'نیس', 'م', 'نمونه', '', 'ه', 'رده', 'خود', 'خیل', 'خوب', 'نیس', 'پیشنهاد', 'نمیکن']


comment                          [برد, بالا, نداره, کیف, صدا, معمولیه, برا, تا,...
recommend                                                          not_recommended
prediction                                                             recommended
no preprocessing                                                       recommended
no smoothing                                                           recommended
no preprocessing or smoothing                                          recommended
Name: 119, dtype: object

In [21]:
print(wrongs_talbe.iloc[6]['comment'])
wrongs_talbe.iloc[6]

['روتخت', 'نسب', 'به', 'بقیه', 'روتختیا', 'کوچیکتره', 'جور', 'که', 'از', 'تخ', 'آویز', 'نمیشه', 'لب', 'به', 'لب', 'تخ', 'اندازس', 'ایکا', 'اون', 'عکس', 'که', 'میزارن', 'با', 'اون', 'که', 'میفرستن', 'یک', 'بود#باش', 'روک', 'بالشتا', 'اصلا', 'شبیه', 'عکس', 'بود#باش', 'صورت', 'ساده', 'بود#باش']


comment                          [روتخت, نسب, به, بقیه, روتختیا, کوچیکتره, جور,...
recommend                                                              recommended
prediction                                                         not_recommended
no preprocessing                                                   not_recommended
no smoothing                                                       not_recommended
no preprocessing or smoothing                                      not_recommended
Name: 138, dtype: object

In [26]:
print(wrongs_talbe.iloc[11]['comment'])
wrongs_talbe.iloc[11]

['قیم', 'بالا', 'نسب', 'به', 'موارد', 'دیگه', 'با', 'این', 'خصوص', '.', 'ب', 'کیف']


comment                          [قیم, بالا, نسب, به, موارد, دیگه, با, این, خصو...
recommend                                                          not_recommended
prediction                                                             recommended
no preprocessing                                                       recommended
no smoothing                                                           recommended
no preprocessing or smoothing                                          recommended
Name: 185, dtype: object

This method is not perfect considering each word without paying attention to the context can lead to wrong interpretation of comment for example the words خوب or بد by themselves has posetive or negetive meanings but comment like خوب نیست is a negeive comment or بد نیست is a posetive comment. 
Also some kind of preprocessing makes words loose their true meaning for example verbs lose thier negetive meaning when converted to base form. So preprocessing doesn't always better the performance. 

The solution might be in considering the role of each word in a sentence and apply the preprocessing based the role of that word or consider each adjective with its verb to see the true meaning of the sentence.