Skip to content

eric-zeng/chi-bad-ads-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Ad Perceptions Dataset

This is the dataset used in Survey 2 of our paper:

Eric Zeng, Tadayoshi Kohno, Franziska Roesner. "What Makes a 'Bad' Ad? User Perceptions of Problematic Online Advertising." In CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3411764.3445459.

To explore this dataset, visit https://badads.cs.washington.edu/ad-perceptions-dataset/table.html

Overview

This repository contains a dataset of 500 ads, labeled by 1025 annotators with their opinions of the ads: things that they both like and dislike about the ad.

What's in the Dataset

  • Screenshots of 500 ads randomly sampled from the web
  • Users' labels for those ads (10+ users labeled ad)
  • Clusters of ads based on distributions of users' labels, generated using Population Label Distribution Learning

Required Software

  • WebP-compatible image viewer (e.g. MacOS preview, any modern browser)
  • All other data is plaintext CSV or JSON, we suggest using pandas to work with the data

Methodology

For a full description of our data collection methodology, please refer to the paper.

Description of Files

data/ - This directory contains our full dataset from Survey 2.

Main files

  • ads_all_labels.json: Start here, this file contains 500 ads, annotated with the distribution of subjective opinion labels from participants, content labels, user ratings, and opinion label cluster ids.
  • participant_context.csv: This file includes supplemental information about participants' demographics, general attitudes towards ads, and ad blocker usage.
  • ad_parent_urls.csv: This file contains the URLs of the pages that the ads in our dataset appeared on.

Additional files (the raw data joined into ads_all_labels.json)

  • cluster_id_to_paper_id.json: Mapping of the cluster IDs in these files to the letter names used in the paper
  • content_labels.csv: Table containing the researcher-generated content labels assigned to each ad
  • fmm_results.json: Results of the FMM clustering algorithm. Contains a mapping of Ad IDs to cluster IDs (and other metadata)
  • label_dists_by_cluster.csv: The opinion label distributions for each ad, grouped by cluster, and aggregated (mean).
  • opinion_label_dist.csv: Table containing the opinion label distribution for each ad, as counts for each label. Also includes mappings of ad IDs to screenshot file name.
  • opinion_label_dist_norm.csv: Table containing the opinion label distribution for each ad, normalized by the number of labelers per ad.
  • opinion_labels_per_participant.csv: Raw opinion label data for each participant for each ad.
  • overall_ad_ratings.csv: Table containing participants' overall ratings for each ad, and their optional free response opinion of the ad.
  • participant_context.csv: Data from participants on their general feelings towards ads, and whether they use ad blockers.

screenshots/ - This directory contains screenshots of ads in our study. The filename of the screenshot is the ad_id referenced in all data files.

Releases

No releases published

Packages