# News stories in which gender plays key role

### Some stories are based on knowing gender distribution. Here are some examples:

* **Washington Post**: Here’s how Hillary Clinton knows that 61 percent of her donors were women [<a href="https://www.washingtonpost.com/news/the-fix/wp/2015/07/16/heres-how-hillary-clinton-knows-that-61-percent-of-her-donors-were-women">link</a>]

* **The Atlantic**: When Will the Gender Gap in Science Disappear? [<a href="https://www.theatlantic.com/science/archive/2018/04/when-will-the-gender-gap-in-science-disappear/558413/">Link</a>]

* **Bloomberg News**: Record Numbers of Women Running for Office [<a href="https://www.bloomberg.com/graphics/2018-women-candidates/">Link</a>]

* **The Guardian**: How we analysed 70m comments on the Guardian website [<a href="https://www.theguardian.com/technology/2016/apr/12/how-we-analysed-70m-comments-guardian-website">Link</a>]

For these pieces, gender was **estimated** based on person's first name.

### An acknowledgment to our Non-Binary community

The Python library, ```Genderize```, is based on the theory that analyzing a first name can help estimate someone’s gender. But that really applies only in a binary world in which a name is either male or female. We **don’t** live in a binary world and this approach risks erasing the identities of our non-binary community members. 

Many non-binary people understand the systems of classification that dominate our world and choose names that are outside those systems. For example, American Artist, who changed their legal name in a way makes them memorable yet indecipherable to an algorithm.

The reality is that one’s name does not determine one’s gender.

As journalists, we may find ourselves working on critical projects where we need to know gender identities. For example:

- How many refugees in a camp are male and how many are female? 
- What is the gender diversity of top executive leadership in an industry? 
- What form of gender equality is there in tenured science professorships or in major prizes?

We could certainly ask each individual, but ```Genderize``` is currently the most effective way of approaching massive datasets of names to estimate gender.

## ```pip install Genderize``` (a library available for many programming languages)

In [None]:
!pip install Genderize

In [None]:
## import necessary libraries
import pandas as pd
from genderize import Genderize as gd
# from google.colab import files  ## to export our files to our computer drive

# ```gd().get(list_name)```

In [None]:
### requires a list


### From Genderize site:
The **probability** indicates the certainty of the assigned gender. Basically the ratio of male to females. 

The **count** represents the number of <a href="https://genderize.io/our-data">data rows examined</a> in order to calculate the response.



In [None]:
## does not work on an individual name that is not in a list


In [None]:
## But you can make a single name into a list




In [None]:
## you can now call the genderize method on list


In [None]:
## Genderize works by analyzing first names and estimating their gender probability 
## run this cell
f_names = ['Rarin','Sandeep', 'Sahar', 'Yoshiko','Susan', 'Nabila','Pat', "Lupita"]
f_names

In [None]:
## run it on f_names


In [None]:
## We can pull out specific data by specifying the keys using a for loop


In [None]:
## FUNCTION to get gender data from genderize


In [None]:
## test it on "Sandeep"


In [None]:
## now use the function in our for loop


## Apply to a Pandas dataframe

In [None]:
## COLAB ONLY
## upload Excel file names.xlsx
# files.upload()

In [None]:
## read csv file into pandas dataframe
## see the head


In [None]:
## see the tail


## What the problem here?

In [None]:
## Split the first and last name into separate columns


In [None]:
## reorder the columns


In [None]:
## function to take a string name, convert to list and return gender
## NOTICE it taps our earlier gender_data() function


In [None]:
## Test on "Sandeep"


In [None]:
## apply as a lambda expression on our dataframe


## But we need a sense of the probability

In [None]:
## function to return probability
## NOTICE it taps our earlier gender_data() function


In [None]:
## test probability on "Sandeep"


In [None]:
## create new column called "Probability" in our df


In [None]:
### FUNCTION to return certainty
## NOTICE it taps our earlier gender_data() AND gender_probability() functions



In [None]:
## get gender on name "Pat"


In [None]:
## get probability on name "Pat"


In [None]:
## get gender certainty on name "Pat"


In [None]:
## create a column called "Certainty" in our df


## Slice flagged items for a manual check

In [None]:
## write pandas to create a slice
