Skip to content

sentiment model with transformer state of the art in Vietnamese, other languages also supported

Notifications You must be signed in to change notification settings

minhquan6203/Vietnamese-sentiment-analysis

Repository files navigation

Vietnamese sentiment analysis is an important and popular task in the field of natural language processing (NLP) and artificial intelligence (AI). In this project, we will select three machine learning and deep learning models, which are SVM, CNN and Transformer, to solve the problem of Vietnamese sentiment classification, and we will use the theory of statistical testing to compare their performances. We have trained these models on three different datasets: UIT-VSFC, UIT-VSMEC, and UIT-ViHSD. For each dataset, we have applied the models on both the original data and a new dataset generated from that original data using the word-segmentation tool py-vncorenlp. Furthermore, for each dataset, we have also used four different feature extraction methods: Count Vectorizer, Tf-Idf, and two PhoBERT architectures, to ensure replication principle in experimental design. After applying statistical analysis theories, we have concluded that the Transformer model shows the best accuracy among the three models.

Dataset:

UIT-VSFC (version 1.0) - Vietnamese Students’ Feedback Corpus (https://nlp.uit.edu.vn/datasets#h.p_4Brw8L-cbfTe)

UIT-VSMEC (version 1.0) - Vietnamese Social Media Emotion Corpus (https://nlp.uit.edu.vn/datasets#h.p_FxJKMfavctsJ)

UIT-ViHSD – Vietnamese Hate Speech Detection Dataset (https://nlp.uit.edu.vn/datasets#h.fs21gpd5w6p1)

About

sentiment model with transformer state of the art in Vietnamese, other languages also supported

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published