Skip to content
WCDO studies and gives solutions to knowledge inequalities in Wikipedia.
Branch: wcdo
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
datasets_sample
docs
research_publications
src_data
src_viz
.gitignore
LICENSE
README.md

README.md

WCDO

==========

The Wikipedia Cultural Diversity Obsevatory (WCDO) is a research project whose purpose is to raise awareness on Wikipedia’s current state of cultural diversity, (1) providing datasets, (2) sites with visualizations and statistics, and (3) pointing out solutions to improve intercultural coverage and knowledge inequalities among languages and geographical places.

Data: Cultural Context Content datasets and WCDO stats

Cultural Context Content is the group of articles in a Wikipedia language edition that relates to the editors' geographical and cultural context (places, traditions, language, politics, agriculture, biographies, events, etcetera.). Therefore, they are articles related to the territories where the language is spoken because it is indigenous or it is official.

The method to obtain this group of articles is divided into two steps.

  • language_territories_mapping.py creates the first version of the database language_territories_mapping.csv with the territories that speak a language because it is either official or native.
  • ccc_selection.py uses this database as a reference, retrieves and processes data from Wikidata JSON dump and the Wikipedia language editions databases (MySQL replicas) in order to create the final CCC dataset. This is run with a cron executing a bash script on a monthly basis (ccc_selection.sh).

The method is build with:

The datasets are generated on a monthly basis at wcdo.wmflabs.org in CSV (more info). One sample of the generated CCC datasets is stored in the datasets_sample folder, and the historical archive is in wcdo.wmflabs.org/datasets.

In order to be able to answer questions on Wikipedia cultural diversity, it is necessary to compute several statistics based on CCC and other groups of articles.

  • stats_generation.py computes these statistics and ranks the articles in order to create valuable lists of articles for each Wikipedia language edition. It stores the results in wcdo_stats.db on a monthly basis so it can be used to create tables and graphs.

Site(s): Meta page (WCDO home) and external website (WCDO visualizations)

These are the scripts that create the tables and visualizations for the WCDO, both the meta page and the external website visualizations.

Research: Main papers and presentations

So far one paper have been published and several talks have been given on the usefulness of a Cultural Context Content dataset and the importance of exchanging content across languaeg editions in order to reduce the knowledge inequalities.

  • Miquel-Ribé, M., & Laniado, D. (2018). Wikipedia Culture Gap: Quantifying Content Imbalances Across 40 Language Editions. Frontiers in Physics (pdf).
  • Miquel Ribé, M. (2017) Identity-based motivation in digital engagement: the influence of community and cultural identity on participation in Wikipedia (Doctoral dissertation, Universitat Pompeu Fabra).
  • Presentation Wikipedia Cultural Diversity Observatory (WCDO) (Wikimania 2018, Cape Town) (pdf).

Community

Get involved in WCDO development and find tasks to do in Get involved page or you can get in touch at tools.wcdo@tools.wmflabs.org.

Copyright

All data, charts, and other content is available under the Creative Commons CC0 dedication.

You can’t perform that action at this time.