- Developed a machine learning model that can classify the sentimental category (
positive, negative and neutral
) of a news comment written in Bangla Text. - For the implementation a publicly available dataset of
12k
news comments have been used. - To create the system TF-idf feature extraction technique with n-gram features have been used.
- Analysed the performance of different machine learning algorithms for n-gram feature by using various evaluation metrics such as
accuracy, precision, recall and f1-score
.
The dataset consists of 12K
news comments of five sentiment categories. For the ease of implementation converted this five categories into 3 categories.
Dataset Summary- includes total number of words and unique words in each class.
Differnet Machine learning classifers are taken to train and evaluate the system efficacy. The experiment is done for N-gram features and measuers the performance using various evaluation metrics.
Performance on Unigram feature:
Performance on Bigram feature:
Performance on Tri-gram feature:
From the above analysis, it is observed that for trigram feature Multinomial Naive Bayes
shows good performance in all evaluation metrices.
Accuracy and F1-score Plot:
- Python Version: 3.7
- Packages: Scikit Learn, Numpy, Pandas, Matplotlib, Seaborn