Skip to content

emilebourban/ADA-Project

Repository files navigation

Evolution of women's recognition over time using Wikipedia data for influencing people

Abstract

In the recent century, in a lot of countries, women have finally gained more rights and we have been continuously progressing towards a more equal society. Our goal in this project is to try to highlight the evolution of social gender inequalities in different domains through the ages. Our interest is on the achievements and recognition of the work done by women in different fields as far back as the data goes. We will use the data from Wikipedia and gather data on number of women referenced, their contribution to their domain, and other parameters and compare it with the same data for men. We will also try to find how the country of origin and the time of acquisition of the rights has an significant impact.

Research questions

  • Can we accurately use wikipedia database to show the gender inequalities through time?
  • Are there any evidence that the men/women equality is reached?
  • What are the domains in which there is more/less equality? Does this change according to regions/country/language?
  • Since some countries have delayed women's rights, is the evolution similar, in term of timeframe, extent?

Dataset

Wikipedia Data: wikimedia dumps https://dumps.wikimedia.org/ The data dump is in XML which can be parsed using existing tools. They can also be imported in SQL for easier data querying. There are already some existing projects on github that use he same dataset that we could use to guide us for the data analysis. We will filter the pages to extract a "list" of the influential people divided into categories corresponding to what they are famous for. Then we can divide the data by gender, nationality and the period they lived in for further conclusions.

A list of internal milestones up until project milestone 2

11.11

  • Download the required data
  • Undestard how to use the cluster to manipulate our data
  • Understand the structure of the data from wikipedia

13.11

  • Sort the data so as to keep only what is usefull to us
  • Clean the data and convert it to an easily usable format
  • Define what parameters we will use to quantify gender equality

20.11

  • Analyze the data collected
  • Think about the best visualisation for the data

23.11

  • Cleanup the code and proof reading the report.

A list of internal milestones up until project milestone 3

01.12

  • Extract all human entities from wikidata with json script
  • Clean this data to repeat analysis
  • Start to redo the analysis more in depth with new data

08.12

  • Finish analysis and clean the code
  • start the report

16.12

  • Finish report

Contribution of each member and road map until presentation

Florian: First part to show differences across cultures. Emile: Parsing the JSON and analysis across fields of work. We will both work for the final poster before ML4

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published