### **Naive Bayes Classifier Task**
### 문장에서 느껴지는 감정 예측
##### 다중 분류(Multiclass Classification)
- 비대면 심리 상담사로서 메세지를 전달한 환자에 대한 감정 데이터를 수집했다.
- 각 메세지 별로 감정이 표시되어 있다.
- 미래에 동일한 메세지를 보내는 환자에게 어떤 심리 치료가 적합할 수 있는지 알아보기 위한 모델을 구축한다.

In [1]:
import pandas as pd

feeling_df = pd.read_csv('./datasets/feeling.csv', sep=";")
feeling_df

Unnamed: 0,message,feeling
0,im feeling quite sad and sorry for myself but ...,sadness
1,i feel like i am still looking at a blank canv...,sadness
2,i feel like a faithful servant,love
3,i am just feeling cranky and blue,anger
4,i can have for a treat or if i am feeling festive,joy
...,...,...
17995,i just had a very brief time in the beanbag an...,sadness
17996,i am now turning and i feel pathetic that i am...,sadness
17997,i feel strong and good overall,joy
17998,i feel like this was such a rude comment and i...,anger


In [2]:
feeling_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18000 entries, 0 to 17999
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   message  18000 non-null  object
 1   feeling  18000 non-null  object
dtypes: object(2)
memory usage: 281.4+ KB


In [3]:
feeling_df.isna().sum()

message    0
feeling    0
dtype: int64

In [4]:
# felling값 확인
feeling_df['feeling'].value_counts()

feeling
joy         6066
sadness     5216
anger       2434
fear        2149
love        1482
surprise     653
Name: count, dtype: int64

In [5]:
# 데이터 원본 복사
feel_df = feeling_df.copy()

In [6]:
# target 으로 변환
from sklearn.preprocessing import LabelEncoder

feel_encoder = LabelEncoder()
targets = feel_encoder.fit_transform(feel_df.loc[:, 'feeling'])
feel_df['Target'] = targets

In [7]:
# target 변환 전 felling 원래 값 찾아보기
# feel_mapping = dict(zip(feel_encoder.classes_, feel_encoder.transform(feel_encoder.classes_)))
# inverse_feel_mapping = {v: k for k, v in feel_mapping.items()}
# original_value_of_0 = inverse_feel_mapping[4]
# print(original_value_of_0)

- 0, anger
- 1, fear
- 2, joy
- 3, love
- 4, sadness

In [8]:
# 기존 felling 컬럼 삭제
feel_df = feel_df.drop(labels=['feeling'], axis=1)
feel_df

Unnamed: 0,message,Target
0,im feeling quite sad and sorry for myself but ...,4
1,i feel like i am still looking at a blank canv...,4
2,i feel like a faithful servant,3
3,i am just feeling cranky and blue,0
4,i can have for a treat or if i am feeling festive,2
...,...,...
17995,i just had a very brief time in the beanbag an...,4
17996,i am now turning and i feel pathetic that i am...,4
17997,i feel strong and good overall,2
17998,i feel like this was such a rude comment and i...,0


In [9]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = \
train_test_split(feel_df.message,
                feel_df.Target,
                stratify=feel_df.Target,
                test_size=0.2,
                random_state=124)

In [10]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline

m_nb_pipe = Pipeline([('count_vectorizer', CountVectorizer()), ('multinomial_NB', MultinomialNB())])
m_nb_pipe.fit(X_train, y_train)

In [11]:
m_nb_pipe.score(X_test, y_test)

0.7536111111111111

In [12]:
feel_df.iloc[1].message

'i feel like i am still looking at a blank canvas blank pieces of paper'

In [14]:
prediction = m_nb_pipe.predict(X_test)
result = m_nb_pipe.predict([feel_df.iloc[1].message])
print(result)

[4]


In [15]:
def map_number_to_feeling(number):
    mapping = {
        0: 'anger',
        1: 'fear',
        2: 'joy',
        3: 'love',
        4: 'sadness'
    }
    return mapping[number]

# 예측된 숫자 값을 해당하는 원래 문자열 값으로 변환
predicted_feeling = map_number_to_feeling(result)
print(predicted_feeling)


TypeError: unhashable type: 'numpy.ndarray'