Jupyter Notebook
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
1.0 Create male-female pairs of profession names (German).ipynb
1.1 NewDataset. Create male-femalepairs and match Wiki articles [not used in thesis].ipynb
2.1 Match profession names with Wiki articles (male and female).ipynb
2.2 Match profession names with Wiki articles (neutral prof names).ipynb
2.3 Match profession names with Wiki articles using Levenstein dist and ratio .ipynb
2.4.Plot tables with number of matched articles and redirections. Save redirection bias groups.ipynb
3. Collect Google hits.ipynb
4. Match labor market statistics to professions .ipynb
5. Collect text and mentioned persons using links from Wiki articles.ipynb
6. Collect images from Wiki articles.ipynb
6.1 Create file with image links for Crowdflower task .ipynb
7. Mine persons mentioned in wiki articles.ipynb
7.1 Get date of birth of mentioned people (plot ratios of men).ipynb
7.2 Merge dataset of persons from links and dataset mined with polyglot.ipynb
9.1.1 Logistic regresion (Google hits).ipynb
9.1.2 Logistic regression (Labour market statistics) .ipynb
9.1.22 Logistic regression (Labour market statistics)with correction for not balanced groups [not used in thesis].ipynb
9.1.3 Logistic regresion (Google results and Labor market) [not used in thesis].ipynb
9.2 Analysis of mentioned people in articles.ipynb
9.2.1 Analysis of mentioned people (restricted by BirthDate).ipynb
9.3 Analysis of images.ipynb
9.39 Old Analysis of images.ipynb
French Wikipedia.ipynb
Hypothetical violin plots.ipynb
Russian Wikipedia.ipynb
profession_images (2).txt


Measuring Gender Inequalities of German Professions on Wikipedia

License: MIT


Master thesis project "Measuring Gender Inequalities of German Professions on Wikipedia"


Wikipedia is a community-created online encyclopedia; arguably, it is the most popular and largest knowledge resource on the Internet. Thus, reliability and neutrality are of high importance for Wikipedia. Previous research [3] reveals gender bias in Google search results for many professions and occupations. Also, Wikipedia was criticized for existing gender bias in biographies [4] and gender gap in the editor community [5, 6]. Thus, one could expect that gender bias related to professions and occupations may be present in Wikipedia. The term gender bias is used here in the sense of conscious or unconscious favoritism towards one gender over another [47] with respect to professions and occupations. The objective of this work is to identify and assess gender bias. To this end, the German Wikipedia articles about professions and occupations were analyzed on three dimensions: redirections, images, and people mentioned in the articles. This work provides evidence for systematic overrepresentation of men in all three dimensions; female bias is only present for a few professions.

Supervised by: Claudia Wagner, Fabian Flöck

Further Reading


My slides

Slides (by Claudia Wagner)

Thesis on ArXiv

How to cite

Olga Zagovora, Fabian Flöck, and Claudia Wagner. 2017. "(Weitergeleitet von Journalistin)": The Gendered Presentation of Professions on Wikipedia. In Proceedings of the 2017 ACM on Web Science Conference (WebSci '17). ACM, New York, NY, USA, 83-92. DOI: https://doi.org/10.1145/3091478.3091488 Download preprint


Olga Zagovora olga.zagovora (at) gesis (dot) org


This work is licensed under the MIT license. See LICENSE file in this repository.

Developed at Computational Social Science department of GESIS - Leibniz Institute for the Social Sciences, Cologne (Germany) and WeST Institute for Web Science and Technologies of the University of Koblenz-Landau, Koblenz (Germany).