A Comparative Analysis of Noise Reduction Methods in Sentiment Analysis on Noisy Bangla Texts

This repository contains the code and datasets used in the paper titled "A Comparative Analysis of Noise Reduction Methods in Sentiment Analysis on Noisy Bangla Texts " accepted in The 9th Workshop on Noisy and User-generated Text (W-NUT) collocated with EACL 2024.

Paper Link: https://www.arxiv.org/abs/2401.14360.

Models

The code used to train and test the models are available in this repository under the folder NC-SentNoB Codes.

Datasets

The NC-SentNoB dataset is available at this repository on the folder NC-SentNoB Dataset.
Also available on: [HuggingFace] | [Paperswithcode] | [Kaggle]

SentNoB dataset with Back translation applied is also available on the folder Back-Translated Data. The 1000 ground truths used to evaluate denoising methods are available on the file 1000 Ground Truth.xlsx

Training & Evaluation

We used seven pretrained transformer models for Sentiment Analysis and SVM, BiLSTM, BanglaBERT for Noise Identification.

Benchmarks

Noise Identification

Model	Precision	Recall	F1-Score
SVM (C)	0.76	0.45	0.57
SVM (W)	0.64	0.38	0.48
SVM (C+W)	0.75	0.45	0.56
Bi-LSTM	0.36	0.18	0.24
Bangla-BERT-base	0.73	0.54	0.62

Sentiment Analysis on Noisy Text

Model	Precision	Recall	F1-Score
Bangla-BERT-Base	0.72	0.72	0.72
BanglaBERT	0.75	0.75	0.75
BanglaBERT Large	0.74	0.74	0.74
BanglaBERT Generator	0.72	0.72	0.72
sahajBERT	0.72	0.72	0.72
Bangla-Electra	0.68	0.68	0.68
MuRIL	0.73	0.73	0.73

Sentiment Analysis after Noise Reduction

Model	Precision	Recall	F1-Score
Bangla-BERT-Base	0.69	0.69	0.69
BanglaBERT	0.72	0.72	0.72
BanglaBERT Large	0.73	0.73	0.73
BanglaBERT Generator	0.70	0.70	0.70
sahajBERT	0.70	0.70	0.70
Bangla-Electra	0.66	0.66	0.66
MuRIL	0.71	0.71	0.71

Future Research Directions

Develop robust noise reduction models for Bangla texts.
Investigate and develop noise specific reduction techniques.
Leverage LLMs for noise reduction.
Compare performance of Sentiment Analysis using LLMs in both settings: with and without noise.
Implement character-level NMT models for back-translation to reduce noise.

License

Contents of this repository are restricted to non-commercial research purposes only under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).

Citation

If you use any of the datasets, models or code modules, please cite the following paper:

@misc{elahi2024comparative,
      title={A Comparative Analysis of Noise Reduction Methods in Sentiment Analysis on Noisy Bangla Texts}, 
      author={Kazi Toufique Elahi and Tasnuva Binte Rahman and Shakil Shahriar and Samir Sarker and Md. Tanvir Rouf Shawon and G. M. Shahariar},
      year={2024},
      eprint={2401.14360},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Back-Translated Data		Back-Translated Data
NC-SentNoB Codes		NC-SentNoB Codes
NC-SentNoB Dataset		NC-SentNoB Dataset
1000 Ground Truths.xlsx		1000 Ground Truths.xlsx
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Back-Translated Data

Back-Translated Data

NC-SentNoB Codes

NC-SentNoB Codes

NC-SentNoB Dataset

NC-SentNoB Dataset

1000 Ground Truths.xlsx

1000 Ground Truths.xlsx

README.md

README.md

Repository files navigation

A Comparative Analysis of Noise Reduction Methods in Sentiment Analysis on Noisy Bangla Texts

Table of Contents

Models

Datasets

Training & Evaluation

Benchmarks

Future Research Directions

License

Citation

About

Releases

Packages

Contributors 2

Languages

ktoufiquee/A-Comparative-Analysis-of-Noise-Reduction-Methods-in-Sentiment-Analysis-on-Noisy-Bangla-Texts

Folders and files

Latest commit

History

Repository files navigation

A Comparative Analysis of Noise Reduction Methods in Sentiment Analysis on Noisy Bangla Texts

Table of Contents

Models

Datasets

Training & Evaluation

Benchmarks

Future Research Directions

License

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages