Skip to content

Latest commit

 

History

History

data

KoBBQ Datasets

KoBBQ_templates.tsv

  • Data organized by template (268 templates across 12 categories of social bias).
  • Online Viewer: Google Spreadsheets

KoBBQ_all_samples.tsv

  • Sample-level data with placeholders ([N1], [N2], [W1], etc.) filled with attributes.
  • A file containing information necessary for model evaluation.

KoBBQ_test_samples.tsv

  • Test set of KoBBQ_all_samples.tsv.
  • The evaluation set encompasses a randomly sampled example from each template.

KoBBQ_survey_result.jsonl

  • Result of the social bias verification survey.

Data Description

  • Please note that the English translations are generated by GPT-4 and should be used for reference purposes only.

  • sample_id: {category}-{template ID}{context: a, c (counter-biased) / b, d (biased)}-{sample ID}-{context: amb (ambiguous) / dis (disambiguated)}-{question: bsd (biased) / cnt (counter-biased)}

  • label_annotation: Categorization of BBQ templates

    • SR: Sample-Removed
    • TM: Target-Modified
    • ST: Simply-Transferred
    • NC: Newly-Created
  • context: A scenario where two individuals from different social groups engage in behavior related to the given stereotype

  • question:

    • A biased question asks which group conforms to a given stereotype
    • A counter-biased question asks which group goes against it
  • choices: Related social group options for the given context

  • biased_answer: The answer conforming to social biases

  • answer: The correct answer for given context and question

  • bbq_id: The ID of the original sample in the BBQ dataset

  • bbq_category: The category of the original sample in the BBQ dataset

  • prediction: Model output to be evaluated (not filled)


Details of Social Bias Verification Survey

Questionnaire

  • Type 1 (Target or non-target groups must be modified or newly designated.)

    한국 사회의 통념 상 '{stereotype}'(이)라는 고정관념에 해당하는 사회적 집단으로 적절한 것을 모두 고르시오.

    (※ 응답자 개인의 견해를 묻는 질문이 아닙니다. 한국 사회에서 나타나는 고정관념 혹은 통념에 기반하여 답변해주세요.)

    보기
    □ {group_1}
    □ {group_2}
    □ ...
    □ {group_n}
    □ 한국사회에 관련된 고정관념 없음

  • Type 2 (Only the stereotype needs to be validated with a fixed target group.)

    한국 사회의 통념 상 {group_1}{group_2}를 비교할 때, '{stereotype}'(이)라는 고정관념에 해당하는 사회적 집단으로 적절한 것을 고르시오.

    (※ 응답자 개인의 견해를 묻는 질문이 아닙니다. 한국 사회에서 나타나는 고정관념 혹은 통념에 기반하여 답변해주세요.)

    보기
    〇 {group_1}
    〇 {group_2}
    〇 한국 사회에 관련된 고정관념 없음

Demographic Statistics

Gender
Male 800 50.0%
Female 800 50.0%
Age
18-24 320 20.0%
25-34 320 20.0%
35-44 320 20.0%
45-54 320 20.0%
55+ 320 20.0%
Domestic Area of Origin
Seoul 468 29.3%
Gyeonggi, Incheon 350 21.9%
Gyeongsang, Daegu, Busan, Ulsan 411 25.7%
Jeolla, Gwangju 151 9.4%
Chungcheong, Daejeon, Sejong 156 9.8%
Gangwon 48 3.0%
Jeju 16 1.0%
Level of Education
Below high school level 29 1.8%
High school graduate or equivalent 378 23.6%
College dropout 45 2.8%
Associate degree 209 13.1%
Bachelor's degree 808 50.5%
Graduate degree 131 8.2%
Sexual Orientation
Straight 1474 92.1%
LGBTQ+ 31 1.9%
Prefer not to mention 95 6.0%
Disability
No 1508 94.3%
Yes 64 4.0%
Prefer not to mention 28 1.8%
Religion
Christian 275 17.2%
Catholic 122 7.6%
Buddhist 182 11.4%
Islamic 1 0.1%
No religion 979 61.2%
Prefer not to mention 41 2.6%
Political Orientation
Conservative 223 13.9%
Progressive 314 19.6%
Moderate 903 56.4%
Prefer not to mention 160 10.0%
Marital Status
No 795 49.7%
Yes 805 50.3%
Employment Status
Employed - less than 40h/week 361 22.6%
Employed - more than 40h/week 748 46.8%
Unemployed - Seeking employment 182 11.4%
Unemployed - Not seeking employment 249 15.6%
Retired 54 3.4%
Disabled - Unable to work 6 0.4%
Annual Income
Below 13 million KRW 139 8.7%
13 million-30 million KRW 249 15.6%
30 million-50 million KRW 447 27.9%
50 million-76 million KRW 374 23.4%
76 million-150 million KRW 355 22.2%
150+ million KRW 36 2.3%
Total 1600 100%