The Wikipedia Cultural Diversity Obsevatory (WCDO) is a research project whose purpose is to raise awareness on Wikipedia’s current state of cultural diversity, (1) providing datasets, (2) sites with visualizations and statistics, and (3) pointing out solutions to improve intercultural coverage and knowledge inequalities among languages and geographical places.
Data: Cultural Context Content datasets and WCDO stats
Cultural Context Content is the group of articles in a Wikipedia language edition that relates to the editors' geographical and cultural context (places, traditions, language, politics, agriculture, biographies, events, etcetera.). Therefore, they are articles related to the territories where the language is spoken because it is indigenous or it is official.
The method to obtain this group of articles is divided into two steps.
language_territories_mapping.pycreates the first version of the database language_territories_mapping.csv with the territories that speak a language because it is either official or native.
ccc_selection.pyuses this database as a reference, retrieves and processes data from Wikidata JSON dump and the Wikipedia language editions databases (MySQL replicas) in order to create the final CCC dataset. This is run with a cron executing a bash script on a monthly basis (ccc_selection.sh).
The method is build with:
The datasets are generated on a monthly basis at wcdo.wmflabs.org in CSV (more info). One sample of the generated CCC datasets is stored in the datasets_sample folder, and the historical archive is in wcdo.wmflabs.org/datasets.
In order to be able to answer questions on Wikipedia cultural diversity, it is necessary to compute several statistics based on CCC and other groups of articles.
stats_generation.pycomputes these statistics and ranks the articles in order to create valuable lists of articles for each Wikipedia language edition. It stores the results in
wcdo_stats.dbon a monthly basis so it can be used to create tables and graphs.
Site(s): Meta page (WCDO home) and external website (WCDO visualizations)
These are the scripts that create the tables and visualizations for the WCDO, both the meta page and the external website visualizations.
meta_updates.pypresents most of the results through tables in the (WCDO meta pages)[https://meta.wikimedia.org/wiki/Wikipedia_Cultural_Diversity_Observatory], with results for all languages and for each individually. This is done using Pywikibot - To post and update mediawiki pages.
Research: Main papers and presentations
So far one paper have been published and several talks have been given on the usefulness of a Cultural Context Content dataset and the importance of exchanging content across languaeg editions in order to reduce the knowledge inequalities.
- Miquel-Ribé, M., & Laniado, D. (2018). Wikipedia Culture Gap: Quantifying Content Imbalances Across 40 Language Editions. Frontiers in Physics (pdf).
- Miquel Ribé, M. (2017) Identity-based motivation in digital engagement: the influence of community and cultural identity on participation in Wikipedia (Doctoral dissertation, Universitat Pompeu Fabra).
- Presentation Wikipedia Cultural Diversity Observatory (WCDO) (Wikimania 2018, Cape Town) (pdf).
All data, charts, and other content is available under the Creative Commons CC0 dedication.