Skip to content

para-zhou/CDial-Bias

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 

Repository files navigation

Towards Identifying Social Bias in Dialogue Systems

This repository contains detailed description of the CDial-Bias Dataset.

This task aims to measure the social bias in dialogue scenario. Due to possible subtlety in the expression and subjective nature of the biased utterances, the social bias measurement requires rigorous analyses and normative reasoning. Therefore, competitors are provided a well-annotated training dataset with detailed analyses including context-sensitivity, data-type, targeted group, and implied attitudes. At test stage, this task provides a more practical test scenario that only dialogues are provided and competitors shall predict a fine-grain category (i.e. irrelevant, anti-bias, neutral, and biased) w.r.t. dialogue social bias.

Authors: Jingyan Zhou, Jiawen Deng, Fei Mi, Yitong Li, Yasheng Wang, Minlie Huang, Xin Jiang, Qun Liu, Helen M. Meng

Detailed Dataset Descriptions and Baselines

http://arxiv.org/abs/2202.08011 (We refine the annotations and construct CDial-Bias Dataset 2.0. The statistics and baseline performances may differ to some extend.)

Dataset

Format

The Cdial-Bias Dataset 2.0 has follwoing entries.

Explaination
Q Dialogturn turn 1.
A Dialogturn turn 2.
Topic The topic of the dialogue, including Race, Gender, Region, Occupatioin.
Context Sensitivity 0 - Context-independent; 1 - Context-sensitive.
Data Type 0 - Irrelevant; 1 - Bias-expressing; 2 - Bias-discussing.
Bias Attitudes 0 - NA (Irrelevant data); 1 - Anti-Bias; 2 - Neutral; 3 - Biased.
Referrenced Groups Presented in freetext. Multiple groups are splited by '/'.

Statistics

Topic Context-Idependent/Sensitive Irrelevant Bias-expressingBias-discussingAntiNeutralBiasedGroup #
Race 6,451 / 4,420 4,725 2,772 3,374 155 3,115 2,876 70
Gender 5,093 / 3,291 3,895 1,441 3,048 78 2,631 1,780 40
Region 2,985 / 2,046 1,723 2,217 1,091 197 1,525 1,586 41
Occupation 2,842 / 1,215 2,006 1,231 820 24 1,036 991 20
Overall 17,371 / 10,972 12,349 7,659 8,333 454 8,307 7,233 -

The dataset is randomly shuffled and splited into training, validation, and testing data in the ratio of 8:1:1.

Notes

Before you download the dataset, please be aware that: The CDial-Bias Dataset is released for research purpose only and other usages require further permission. If you want to publish experimental results with this dataset, please cite the following article:

@misc{cdial2022zhou,
  url = {https://arxiv.org/abs/2202.08011},
  author = {Zhou, Jingyan and Deng, Jiawen and Mi, Fei and Li, Yitong and Wang, Yasheng and Huang, Minlie and Jiang, Xin and Liu, Qun and Meng, Helen},
  title = {Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks},
  publisher = {arXiv},
  year = {2022}
}

Also, we held NLPCC 2022 Shared Task 7 based on the proposed resources. We got many talented participants contributing to the investigation of this problem, for more information, please check the webpage and the task overview here:

@inproceedings{zhou2022overview,
  title={Overview of NLPCC 2022 Shared Task 7: Fine-Grained Dialogue Social Bias Measurement},
  author={Zhou, Jingyan and Mi, Fei and Meng, Helen and Deng, Jiawen},
  booktitle={CCF International Conference on Natural Language Processing and Chinese Computing},
  pages={342--350},
  year={2022},
  organization={Springer}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published