Ben-Sarc

This repository contains the dataset of the paper titled "Ben-Sarc: A Self-Annotated Corpus for Sarcasm Detection from Bengali Social Media Comments and Its Baseline Evaluation" published in Natural Language Processing journal (formerly known as Natural Language Engineering) by Cambridge University Press. The Ben-Sarc is also available at HuggingFace.

Dataset Details

We are releasing a large-scale self-annotated Bengali corpus for sarcasm detection research problem in the Bengali language named Ben-Sarc containing 25,636 comments, manually collected from different public Facebook pages and evaluated by external evaluators.

Data Instances

The Ben-Sarc dataset is in .xlsx format. One example from the Ben-Sarc dataset is given below:

|----|------------------------------------------------------------------------------------------------------------|----------|
| id |                                           Text                                                             | Polarity |
|----|------------------------------------------------------------------------------------------------------------|----------|
| 589|তোমারে ভাবিয়া সারারাত জাগিয়া ঘুম মোর হয়েছে নষ্ট বুকের বামপাশে চিনচিন ব্যাথা করে একি গ্যাস্ট্রিক না প্রেম হচ্ছে না স্পষ্ট ।  |    1     |
|----|------------------------------------------------------------------------------------------------------------|----------|

Data Fields

id: A string representing the text ID.
Text: A string containing the text.
Polarity : A number containing the polarity of the text

Polarity of the Ben-Sarc is defined as follows:

            `0` indicates *Non-Sarcastic Text* 
            
            `1` indicates *Sarcastic Text*

Authors

Sanzana Karim Lora
G. M. Shahariar
Tamanna Nazmin
Noor Nafeur Rahman
Rafsan Rahman
Miyad Bhuiyan
Faisal Muhammad Shah

License

Contents of this repository are restricted to only non-commercial research purposes under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).

Citation

If you use our dataset, please cite the following paper:

@article{Lora_Shahariar_Nazmin_Rahman_Rahman_Bhuiyan_Shah_2024,
  title={Ben-Sarc: A self-annotated corpus for sarcasm detection from Bengali social media comments and its baseline evaluation},
  DOI={10.1017/nlp.2024.11},
  journal={Natural Language Processing},
  author={Lora, Sanzana Karim and Shahariar, G. M. and Nazmin, Tamanna and Rahman, Noor Nafeur and Rahman, Rafsan and Bhuiyan, Miyad and Shah, Faisal Muhammad},
  year={2024},
  pages={1–26}}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Ben-Sarc_ Bengali Sarcasm Detection Corpus.xlsx		Ben-Sarc_ Bengali Sarcasm Detection Corpus.xlsx
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ben-Sarc_ Bengali Sarcasm Detection Corpus.xlsx

Ben-Sarc_ Bengali Sarcasm Detection Corpus.xlsx

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Ben-Sarc

Table of Contents

Dataset Details

Data Instances

Data Fields

Authors

License

Citation

About

Releases

Packages

License

sanzanalora/Ben-Sarc

Folders and files

Latest commit

History

Repository files navigation

Ben-Sarc

Table of Contents

Dataset Details

Data Instances

Data Fields

Authors

License

Citation

About

Resources

License

Stars

Watchers

Forks