# 05. Genderize API

This notebook focuses on assigning gender to the first and last author's proper names within the DataFrame, leveraging the `genderize.io` API, which offers free access for up to 1000 requests per day. By integrating the gender prediction capabilities of the API, the script enhances the existing dataset by incorporating gender-specific attributes for both, the first and the last authors.

In [1]:
# Libraries Importation

import time
import requests
import pandas as pd

from tqdm.notebook import tqdm

import warnings
warnings.filterwarnings('ignore')

In [2]:
# Import the clean data

articles = pd.read_csv('../data/pubmed_articles_clean.csv')

In [3]:
articles.head(2)

Unnamed: 0,DOI,title,authors,affiliations,journal,year,month,volume,first_page,last_page,PMID,PMCID,abstract,href,json_href,first_author,last_author
0,10.1177/17585732221102399,Surgical management of the spastic elbow,"['Weisang Luo 1 ', ' Matthew Nixon 1 2']","['Countess of Chester Hospital, Chester, UK.',...",Shoulder Elbow,2023.0,Oct,15(5),534,543,37811394.0,PMC10557929,['Background: We performed a retrospective rev...,https://doi.org/10.1177/17585732221102399,https://api.crossref.org/works/10.1177/1758573...,Weisang Luo 1,Matthew Nixon 1 2
1,10.1007/s11571-022-09871-6,Three-dimensional memristive Morris-Lecar mode...,"['Han Bao 1 ', ' Xihong Yu 1 ', ' Quan Xu 1...","[""School of Microelectronics and Control Engin...",Cogn Neurodyn,2023.0,Aug,17(4),1079,1092,37522038.0,PMC10374513,['To characterize the magnetic induction flow ...,https://doi.org/10.1007/s11571-022-09871-6,https://api.crossref.org/works/10.1007/s11571-...,Han Bao 1,Bocheng Bao 1


In [4]:
articles.shape

(783, 17)

___

## genderize.io

This API will be used to determine the gender of a given name.

Free API, but limited at 1000 requests by day.


### It is possible to ask for ten names at the time or one by one.

    names = [f'name[]={i}&name[]={j}&name[]={k}&name[]={l}&name[]={m}&name[]={n}&name[]={o}&name[]={p}&name[]={q}&name[]={r}'

    url = f'https://api.genderize.io/?name[]={name}'
    

### The answer will be like this

    {
      "name": "peter",
      "gender": "male",
      "probability": 0.99,
      "count": 165452
    }

### Day 1: First Authors

In [5]:
fst_auth_name = [name.split(' ')[0] for name in articles['first_author']]

In [13]:
fst_auth_gd = []
fst_auth_gd_prb = []

for name in tqdm(fst_auth_name):
    url = f'https://api.genderize.io/?name[]={name}'
    res = requests.get(url)
    res_api = res.json()
    fst_auth_gd.append(res_api[0]['gender'])
    fst_auth_gd_prb.append(res_api[0]['probability'])

  0%|          | 0/783 [00:00<?, ?it/s]

In [14]:
genderize = {'_id': [],
             'fst_auth_name': [],
             'fst_auth_gd': [],
             'fst_auth_gd_prb': [],
             'lst_auth_name': [],
             'lst_auth_gd': [],
             'lst_auth_gd_prb': []}

In [17]:
genderize['_id'] = [i for i in articles['DOI']]
genderize['fst_auth_name'] = fst_auth_name
genderize['fst_auth_gd'] = fst_auth_gd
genderize['fst_auth_gd_prb'] = fst_auth_gd_prb

In [25]:
columns = ['_id', 'fst_auth_name', 'fst_auth_gd', 'fst_auth_gd_prb']
genderize_df = pd.DataFrame(genderize, columns = columns)
genderize_df.head()

Unnamed: 0,index,fst_auth_name,fst_auth_gd,fst_auth_gd_prb
0,0,Weisang,,0.0
1,1,Han,male,0.73
2,2,Euan,male,1.0
3,3,Quan,male,0.9
4,4,Akiyo,female,0.91


In [14]:
genderize_df.to_csv('../data/genderize_df_fst.csv', index = False)

### Day 2: Last Authors

Open genderize_df_fst and complete with last authors

In [3]:
genderize_df = pd.read_csv('../data/genderize_df_fst.csv')

In [4]:
genderize_df.head(2)

Unnamed: 0,_id,fst_auth_name,fst_auth_gd,fst_auth_gd_prb
0,10.1177/17585732221102399,Weisang,,0.0
1,10.1007/s11571-022-09871-6,Han,male,0.73


In [5]:
lst_auth_name = [name.split()[0] for name in articles['last_author']]

In [8]:
lst_auth_gd = []
lst_auth_gd_prb = []

for name in tqdm(lst_auth_name):
    url = f'https://api.genderize.io/?name[]={name}'
    res = requests.get(url)
    res_api = res.json()
    lst_auth_gd.append(res_api[0]['gender'])
    lst_auth_gd_prb.append(res_api[0]['probability'])

  0%|          | 0/783 [00:00<?, ?it/s]

In [9]:
genderize_df['lst_auth_name'] = lst_auth_name
genderize_df['lst_auth_gd'] = lst_auth_gd
genderize_df['lst_auth_gd_prb'] = lst_auth_gd_prb

In [10]:
genderize_df.head(2)

Unnamed: 0,_id,fst_auth_name,fst_auth_gd,fst_auth_gd_prb,lst_auth_name,lst_auth_gd,lst_auth_gd_prb
0,10.1177/17585732221102399,Weisang,,0.0,Matthew,male,1.0
1,10.1007/s11571-022-09871-6,Han,male,0.73,Bocheng,male,0.83


In [12]:
genderize_df.to_csv('../data/genderize_df.csv', index = False)