Skip to content

Latest commit

 

History

History
69 lines (44 loc) · 2.67 KB

DATASTATEMENT.md

File metadata and controls

69 lines (44 loc) · 2.67 KB

Data Statement for PolStance - Political Stance in Danish

Data set name: PolStance

Citation (if available): Lehmann, R., & Derczynski, L. (2019). Political Stance in Danish. In Proceedings of the 22nd Nordic Conference on Computational Linguistics (pp. 197-207).

Data set developer(s): Rasmus Lehmann

Data statement author(s): Leon Derczynski

Others who contributed to this document:

A. CURATION RATIONALE

This dataset contains quotes by Danish politicians curated to capture their opinions on various political issues. The goal is to build data-driven systems for automatic analysis of political sentiment.

B. LANGUAGE VARIETY/VARIETIES

  • BCP-47 language tag: da-DK
  • Language variety description: Standard Danish

C. SPEAKER DEMOGRAPHIC

  • Description: Danish politicians sitting in parliament (Folketinget)
  • Age: 25-70
  • Gender: Mixed. 52 males and 38 females (58%/42%)
  • Race/ethnicity (according to locally appropriate categories): Mixed, mostly white with Scandinavian background.
  • First language(s): Danish
  • Socioeconomic status: Privileged; minimum 56494.17DKK per month ($8470 USD).
  • Number of different speakers represented: 63
  • Presence of disordered speech: Quotes are mostly curated, so not prevalent.

D. ANNOTATOR DEMOGRAPHIC

  • Description: One L1 speaker annotator, supervised by one L2 speaker
  • Age: 25-35
  • Gender: Male
  • Race/ethnicity (according to locally appropriate categories): White northern European
  • First language(s): primary annotator L1 da-DK
  • Training in linguistics/other relevant discipline: masters' student in NLP; bachelor in Communications

E. SPEECH SITUATION

  • Description: Public statements made by politicians in the Danish parliament during debate or discussion, during verbal interviews or in writing, transcribed and then published in edited newswire
  • Time and place: 2018
  • Place: Denmark
  • Modality (spoken/signed, written): Spoken
  • Scripted/edited vs. spontaneous: Mixture
  • Synchronous vs. asynchronous interaction: Mixture
  • Intended audience: Danish voters

F. TEXT CHARACTERISTICS

Politicians talking about politics and policy while doing their job, addressing the public.

G. RECORDING QUALITY

Should be verbatim quotes, but may have been through a few pairs of hands.

H. OTHER

I. PROVENANCE APPENDIX

Originally taken from Ritzau; quotes are short enough to constitute "reasonable use" (see Ophavsret).

About this template

Based on the worksheets distributed at the 2020 LREC workshop on Data Statements, by Emily M. Bender, Batya Friedman, and Angelina McMillan-Major. Adapted to Markdown by Leon Dercyznski.