# Name-gender coding

> A thin wrapper around the (World Gender Name Dictionary 2.0).

In [None]:
#| default_exp wgnd

In [None]:
#| hide
from nbdev.showdoc import *

In [None]:
#| export
import pandas as pd
import requests
from io import StringIO

In [None]:
#| export
#| hide

# this class downloads data from the WGND2.0 cite
# It includes a function to parse the data downloaded
class wgnd:
    def __init__(self):
        print('Downloading data from WGND2.0')
        print('WGND 2.0 name-gender (_i.e._ No code) contains 3,491,141 unique name observations. \nThis file is based on WGND 2.0 name-gender-code but it omits all known conflicting names across sources, geography and gender.')
        print('\nRead about the project here: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/MSEGSJ')
        print('\nDataset citation: Raffo, Julio, 2021, "WGND 2.0", https://doi.org/10.7910/DVN/MSEGSJ, Harvard Dataverse, V1, UNF:6:5rI3h1mXzd6zkVhHurelLw== [fileUNF]')
        self.url = 'https://dataverse.harvard.edu/api/access/datafile/4750351'
        s = requests.get(self.url).content
        self.names = pd.read_csv(StringIO(s.decode('utf-8')), sep='\t')
        print('Data downloaded')
    
    def get_gender(self,name):
        #lookup a name in the datase (name column) and get the gender from the gender column
        #if the name is not found, return 'unknown'

        matches = self.names[self.names['name'] == name.lower()]
        if len(matches) == 0:
            return 'unknown'
        return matches.iloc[0]["gender"]

This is a thin wrapper around the [World Gender Name Dictionary 2.0 (WGND)](https://www.wipo.int/publications/en/details.jsp?id=4554). The WGND is a dataset of name-gender pairs. It was originally produced to help historians of science and intellectual property to measure "women’s contribution to all fields of innovation and creativity." The WGND 2.0 contains "26 million records linking given names and 195 different countries and territories."

This implementation is limited. It only includes name-gender pairs when there is conflict betwee names across sources, geography, and gender. Put differently, this wrapper only reports the gendered valance of a name when there is not controversy within the larger WGND 2.0 database.

## Demonstration

### Initialize the program
```sh
from obiter.wgnd import *
```

In [None]:
database = wgnd()

Downloading data from WGND2.0
WGND 2.0 name-gender (_i.e._ No code) contains 3,491,141 unique name observations. 
This file is based on WGND 2.0 name-gender-code but it omits all known conflicting names across sources, geography and gender.

Read about the project here: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/MSEGSJ

Dataset citation: Raffo, Julio, 2021, "WGND 2.0", https://doi.org/10.7910/DVN/MSEGSJ, Harvard Dataverse, V1, UNF:6:5rI3h1mXzd6zkVhHurelLw== [fileUNF]
Data downloaded


In [None]:
print(database.get_gender('Simon'))

M


In [None]:
print(database.get_gender('Dana'))

F


In [None]:
print(database.get_gender('Vic'))

unknown


In [None]:
#| hide
import nbdev; nbdev.nbdev_export()