Evolution of women's recognition over time using Wikipedia data for influencing people

Abstract

In the recent century, in a lot of countries, women have finally gained more rights and we have been continuously progressing towards a more equal society. Our goal in this project is to try to highlight the evolution of social gender inequalities in different domains through the ages. Our interest is on the achievements and recognition of the work done by women in different fields as far back as the data goes. We will use the data from Wikipedia and gather data on number of women referenced, their contribution to their domain, and other parameters and compare it with the same data for men. We will also try to find how the country of origin and the time of acquisition of the rights has an significant impact.

Research questions

Can we accurately use wikipedia database to show the gender inequalities through time?
Are there any evidence that the men/women equality is reached?
What are the domains in which there is more/less equality? Does this change according to regions/country/language?
Since some countries have delayed women's rights, is the evolution similar, in term of timeframe, extent?

Dataset

Wikipedia Data: wikimedia dumps https://dumps.wikimedia.org/ The data dump is in XML which can be parsed using existing tools. They can also be imported in SQL for easier data querying. There are already some existing projects on github that use he same dataset that we could use to guide us for the data analysis. We will filter the pages to extract a "list" of the influential people divided into categories corresponding to what they are famous for. Then we can divide the data by gender, nationality and the period they lived in for further conclusions.

A list of internal milestones up until project milestone 2

11.11

Download the required data
Undestard how to use the cluster to manipulate our data
Understand the structure of the data from wikipedia

13.11

Sort the data so as to keep only what is usefull to us
Clean the data and convert it to an easily usable format
Define what parameters we will use to quantify gender equality

20.11

Analyze the data collected
Think about the best visualisation for the data

23.11

Cleanup the code and proof reading the report.

A list of internal milestones up until project milestone 3

01.12

Extract all human entities from wikidata with json script
Clean this data to repeat analysis
Start to redo the analysis more in depth with new data

08.12

Finish analysis and clean the code
start the report

16.12

Finish report

Contribution of each member and road map until presentation

Florian: First part to show differences across cultures. Emile: Parsing the JSON and analysis across fields of work. We will both work for the final poster before ML4

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.ipynb_checkpoints		.ipynb_checkpoints
dict		dict
report		report
.DS_Store		.DS_Store
Poster.pptx		Poster.pptx
Project-Final.ipynb		Project-Final.ipynb
README.md		README.md
report.pdf		report.pdf
wididata_parser.py		wididata_parser.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evolution of women's recognition over time using Wikipedia data for influencing people

Abstract

Research questions

Dataset

A list of internal milestones up until project milestone 2

A list of internal milestones up until project milestone 3

Contribution of each member and road map until presentation

About

Releases

Packages

Contributors 3

Languages

emilebourban/ADA-Project

Folders and files

Latest commit

History

Repository files navigation

Evolution of women's recognition over time using Wikipedia data for influencing people

Abstract

Research questions

Dataset

A list of internal milestones up until project milestone 2

A list of internal milestones up until project milestone 3

Contribution of each member and road map until presentation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages