Towards Identifying Social Bias in Dialogue Systems

This repository contains detailed description of the CDial-Bias Dataset.

This task aims to measure the social bias in dialogue scenario. Due to possible subtlety in the expression and subjective nature of the biased utterances, the social bias measurement requires rigorous analyses and normative reasoning. Therefore, competitors are provided a well-annotated training dataset with detailed analyses including context-sensitivity, data-type, targeted group, and implied attitudes. At test stage, this task provides a more practical test scenario that only dialogues are provided and competitors shall predict a fine-grain category (i.e. irrelevant, anti-bias, neutral, and biased) w.r.t. dialogue social bias.

Authors: Jingyan Zhou, Jiawen Deng, Fei Mi, Yitong Li, Yasheng Wang, Minlie Huang, Xin Jiang, Qun Liu, Helen M. Meng

Detailed Dataset Descriptions and Baselines

http://arxiv.org/abs/2202.08011 (We refine the annotations and construct CDial-Bias Dataset 2.0. The statistics and baseline performances may differ to some extend.)

Dataset

Format

The Cdial-Bias Dataset 2.0 has follwoing entries.

	Explaination
Q	Dialogturn turn 1.
A	Dialogturn turn 2.
Topic	The topic of the dialogue, including Race, Gender, Region, Occupatioin.
Context Sensitivity	0 - Context-independent; 1 - Context-sensitive.
Data Type	0 - Irrelevant; 1 - Bias-expressing; 2 - Bias-discussing.
Bias Attitudes	0 - NA (Irrelevant data); 1 - Anti-Bias; 2 - Neutral; 3 - Biased.
Referrenced Groups	Presented in freetext. Multiple groups are splited by '/'.

Statistics

Topic	Context-Idependent/Sensitive	Irrelevant	Bias-expressing	Bias-discussing	Anti	Neutral	Biased	Group #
Race	6,451 / 4,420	4,725	2,772	3,374	155	3,115	2,876	70
Gender	5,093 / 3,291	3,895	1,441	3,048	78	2,631	1,780	40
Region	2,985 / 2,046	1,723	2,217	1,091	197	1,525	1,586	41
Occupation	2,842 / 1,215	2,006	1,231	820	24	1,036	991	20
Overall	17,371 / 10,972	12,349	7,659	8,333	454	8,307	7,233	-

The dataset is randomly shuffled and splited into training, validation, and testing data in the ratio of 8:1:1.

Notes

Before you download the dataset, please be aware that: The CDial-Bias Dataset is released for research purpose only and other usages require further permission. If you want to publish experimental results with this dataset, please cite the following article:

@misc{cdial2022zhou,
  url = {https://arxiv.org/abs/2202.08011},
  author = {Zhou, Jingyan and Deng, Jiawen and Mi, Fei and Li, Yitong and Wang, Yasheng and Huang, Minlie and Jiang, Xin and Liu, Qun and Meng, Helen},
  title = {Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks},
  publisher = {arXiv},
  year = {2022}
}

Also, we held NLPCC 2022 Shared Task 7 based on the proposed resources. We got many talented participants contributing to the investigation of this problem, for more information, please check the webpage and the task overview here:

@inproceedings{zhou2022overview,
  title={Overview of NLPCC 2022 Shared Task 7: Fine-Grained Dialogue Social Bias Measurement},
  author={Zhou, Jingyan and Mi, Fei and Meng, Helen and Deng, Jiawen},
  booktitle={CCF International Conference on Natural Language Processing and Chinese Computing},
  pages={342--350},
  year={2022},
  organization={Springer}
}

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Repository files navigation

Towards Identifying Social Bias in Dialogue Systems

Detailed Dataset Descriptions and Baselines

Dataset

Format

Statistics

Notes

Download data

About

Releases

Packages

para-zhou/CDial-Bias

Folders and files

Latest commit

History

README.md

README.md

Repository files navigation

Towards Identifying Social Bias in Dialogue Systems

Detailed Dataset Descriptions and Baselines

Dataset

Format

Statistics

Notes

Download data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages