ADSgenderclustering

It is commonly thought that representation of minorities among mentors helps addressing underepresentation of minorities in the sciences. While this is a very reasonable assumption, it is not quantified in the literature, to my knowledge. One way to test this is to measure whether minorities thent to publish together. This code is designed to quantify the clustering of femal authorship: if a woman is one of the main authors of a paper, are other women more likely to be among the first authors?

This code was created within AAS2015 hackday.

As of Sept 2016, it is very much a work in progress.

I analyze 5000 articles extracted from ADS in January 2015.

Of those I only consider papers with >= 3 authors. Where all three first names can be read and identifies to at least 75% confidence level (where the ratio of gender usage for that first name is >0.75 for either males or females.)

From the original set of 5000 papers the final sample includes 1288 papers.

I check this with a MC simulation.

The Nnull hypothesis is that female authors are distributed randomely among papers is strongly rejected by both KS and AD tests to p<0.003 (3-sigma)!

Notice that this result is obtained after counting female and male authors in the final set of papers. So this analysis does not address at all the underrepresentation of women in the field, and we find women are about 40% of the top 3 authors in our final sample.

Notice also that our sample is biassed in many ways: for example, asian names cannot be associated with gender as easily as non-asian names, long author lists are likely to use initials and thus large collaborations are removed from our sample, and several other biases.

There is a significant excess of both papers with no and with all three (100%) female lead authors, and a deficit of papers with a single female author compared to a random gender distribution!

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
keywords		keywords
name_list		name_list
namedb		namedb
tools		tools
.gitignore		.gitignore
ADS_query.ipynb		ADS_query.ipynb
ADS_query_with_arXiv.ipynb		ADS_query_with_arXiv.ipynb
ADSgenderclustering.png		ADSgenderclustering.png
LICENSE		LICENSE
README.md		README.md
check_alphabetical.py		check_alphabetical.py
mega_pickle_maker.py		mega_pickle_maker.py
paperlist.p		paperlist.p
papers_popular.pkl		papers_popular.pkl
papers_recent.pkl		papers_recent.pkl
parse_analyze_names.ipynb		parse_analyze_names.ipynb
parseauthorlist.py		parseauthorlist.py
produce_pickled_papers.py		produce_pickled_papers.py
smalltest.txt		smalltest.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ADSgenderclustering

The Nnull hypothesis is that female authors are distributed randomely among papers is strongly rejected by both KS and AD tests to p<0.003 (3-sigma)!

There is a significant excess of both papers with no and with all three (100%) female lead authors, and a deficit of papers with a single female author compared to a random gender distribution!

About

Releases

Packages

Contributors 5

Languages

License

fedhere/ADSgenderclustering

Folders and files

Latest commit

History

Repository files navigation

ADSgenderclustering

The Nnull hypothesis is that female authors are distributed randomely among papers is strongly rejected by both KS and AD tests to p<0.003 (3-sigma)!

There is a significant excess of both papers with no and with all three (100%) female lead authors, and a deficit of papers with a single female author compared to a random gender distribution!

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages