# Parsing Pubmed XML file

Data sets can be found [here](https://github.com/kescobo/gender-comp-bio/tree/master/data).

## Goals

Raw data from pubmed is contained in xml files, and we'd like to extract author and date information into a spreadsheet for easier analysis. 

The first thing to do is to make sure to set the working directory to where the data is.

In [1]:
import os

print(os.getcwd())

/Users/KBLaptop/computation/gender-comp-bio/src


In [2]:
os.chdir("../data/")
os.listdir()

['.DS_Store',
 'author_list.csv',
 'biology-1997-2014.xml',
 'comp-bio-1997-2014.xml',
 'github_pubs.xml',
 'README.md']

Next, we'll need to parse the xml files. We can do this using the built-in [python xml module](https://docs.python.org/3.5/library/xml.etree.elementtree.html). 

In [168]:
import xml.etree.ElementTree as ET
import datetime

xml_handle = ET.parse('github_pubs.xml')
root = xml_handle.getroot()

for citation in root.iter("MedlineCitation"):
    pmid = citation[0].text
    pubdate = datetime.date(
        int(citation[1][0].text),  # year
        int(citation[1][1].text),  # month
        int(citation[1][2].text)  # day
        )
    
    Journal = next(citation.iter("Journal"))

    journal_title = Journal.find("ISOAbbreviation").text
    
    abstract = next(citation.iter("AbstractText")).text
    
    # some articles don't have author fields - ignoring those
    try:
        authors = [{
                "Last": author.find("LastName").text,
                "First": author.find("ForeName").text
                   } for author in Citation.iter("Author")]
    except:
        continue
    
    print("PMID: {}\nJournal: {}\nAuthors: {}\n".format(
        pmid, journal_title, [(x['Last'], x['First']) for x in authors]))

PMID: 26357045
Journal: IEEE/ACM Trans Comput Biol Bioinform
Authors: [('Piipari', 'Matias'), ('Down', 'Thomas A'), ('Saini', 'Harpreet'), ('Enright', 'Anton'), ('Hubbard', 'Tim J P')]

PMID: 25601296
Journal: JMIR Med Inform
Authors: [('Piipari', 'Matias'), ('Down', 'Thomas A'), ('Saini', 'Harpreet'), ('Enright', 'Anton'), ('Hubbard', 'Tim J P')]

PMID: 25558360
Journal: Ecol Evol
Authors: [('Piipari', 'Matias'), ('Down', 'Thomas A'), ('Saini', 'Harpreet'), ('Enright', 'Anton'), ('Hubbard', 'Tim J P')]

PMID: 25553811
Journal: J Bioinform Comput Biol
Authors: [('Piipari', 'Matias'), ('Down', 'Thomas A'), ('Saini', 'Harpreet'), ('Enright', 'Anton'), ('Hubbard', 'Tim J P')]

PMID: 25549775
Journal: Ann Biomed Eng
Authors: [('Piipari', 'Matias'), ('Down', 'Thomas A'), ('Saini', 'Harpreet'), ('Enright', 'Anton'), ('Hubbard', 'Tim J P')]

PMID: 25543048
Journal: Bioinformatics
Authors: [('Piipari', 'Matias'), ('Down', 'Thomas A'), ('Saini', 'Harpreet'), ('Enright', 'Anton'), ('Hubbard', 'T

### Class Definition

Just because I need the practice, I'm going to set up an `Article` class to hold the data and make working with it easier, and an `Author` class that we can use to deal with author names

In [42]:
class Article(object):
    """Container for publication info"""
    def __init__(self, pmid, pubdate, journal, title, abstract, authors):
        self.pmid = pmid
        self.pubdate = pubdate
        self.journal = journal
        self.title = title
        self.abstract = abstract
        self.authors = authors
    def __repr__(self):
        return "<Article PMID: {}>".format(self.pmid)

    def get_authors(self):
        for author in self.authors:
            yield author

class Author(object):
    def __init__(self, last_name, first_name):
        assert type(last_name) == str
        assert type(first_name) == str
        
        self.last_name = last_name
        self.first_name = first_name.split()[0]
        try:
            self.initials = " ".join(first_name.split()[1:])
        except IndexError:
            self.initials = None

### Generator Function

And... we can turn the code above into a generator function that yields an `Article` for each document

In [43]:
def parse_pubmed_xml(xml_file):
    xml_handle = ET.parse(xml_file)
    root = xml_handle.getroot()

    for citation in root.iter("MedlineCitation"):
        pmid = citation[0].text
        pubdate = datetime.date(
            int(citation[1][0].text),  # year
            int(citation[1][1].text),  # month
            int(citation[1][2].text)  # day
            )
        
        journal = next(citation.iter("Journal"))

        journal_title = journal.find("ISOAbbreviation").text
        article_title = next(citation.iter("ArticleTitle")).text
        
        abstract = next(citation.iter("AbstractText")).text
        try:
            authors = [Author(author.find("LastName").text, author.find("ForeName").text)
                for author in citation.iter("Author")]
        except:
           continue
        
        yield Article(pmid, pubdate, journal_title, article_title, abstract, authors)

Usage:

In [44]:
for article in parse_pubmed_xml('github_pubs.xml'):
    print(article)
    print(article.pubdate)
    for author in article.get_authors():
        print("{}, {} {}".format(author.last_name, author.first_name, author.initials))
    print()

<Article PMID: 26357045>
2015-09-11
Shiraishi, Fumihide 
Yoshida, Erika 
Voit, Eberhard O

<Article PMID: 25601296>
2015-01-20
Dixit, Abhishek 
Dobson, Richard J B

<Article PMID: 25558360>
2015-01-05
Tuck, Sean L
Phillips, Helen Rp
Hintzen, Rogier E
Scharlemann, Jörn Pw
Purvis, Andy 
Hudson, Lawrence N

<Article PMID: 25553811>
2015-01-02
Chen, Junfang 
Lutsik, Pavlo 
Akulenko, Ruslan 
Walter, Jörn 
Helms, Volkhard 

<Article PMID: 25549775>
2015-06-09
Manini, Simone 
Antiga, Luca 
Botti, Lorenzo 
Remuzzi, Andrea 

<Article PMID: 25543048>
2015-04-28
Bouvier, Guillaume 
Desdouits, Nathan 
Ferber, Mathias 
Blondel, Arnaud 
Nilges, Michael 

<Article PMID: 25540185>
2015-04-28
Meinicke, Peter 

<Article PMID: 25527832>
2015-04-12
Lindberg, Michael R
Hall, Ira M
Quinlan, Aaron R

<Article PMID: 25526884>
2015-04-27
Barton, Carl 
Heliou, Alice 
Mouchard, Laurent 
Pissis, Solon P

<Article PMID: 25524895>
2015-04-28
Mu, John C
Mohiyuddin, Marghoob 
Li, Jian 
Bani Asadi, Narges 
Gerstein, M

### Getting Author Order

Author position matters, but it matters in sort of a weird way - first author and last author are most important, then decreasing as you work your way in to the middle of the list. But practically, there's not much distinction between 3rd and 4th author (or 3rd from last and 4th from last), so we'll generate scores for first, second, last, penultimate and everyone else. The trick is to avoid index errors if the author list is smaller than 5, so we need to write up some special cases. 

In [57]:
def score_authors(author_list):
    first = author_list[0]
    others, penultimate, second, last = None, None, None, None
    
    list_length = len(author_list)
    if list_length > 4:
        others = [author for author in author_list[2:-2]]
    if list_length > 3:
        penultimate = author_list[-2]
    if list_length > 2:
        second = author_list[1]
    if list_length > 1:
        last = author_list[-1]
        

    return first, last, second, penultimate, others

### DataFrame generation

In order to get the data into a usable spreadsheet-like form, and for later analysis, I'm going to use the `DataFrame`s from the [pandas](http://pandas.pydata.org/) package. This might be overkill, but I know how to use it (sort of). 

In [62]:
import pandas as pd

col_names = ["Date", "Journal", "First Author", "Last Author", "Second Author", "Penultimate Author", "Other Authors"]

df = pd.DataFrame()

for article in parse_pubmed_xml('github_pubs.xml'):
    first, last, second, penultimate, others = score_authors(article.authors)
    first = first.first_name
    try:
        last = last.first_name
    except:
        pass
    try:
        second = second.first_name
    except:
        pass
    try:
        penultimate = penultimate.first_name
    except:
        pass
    try:
        others = [x.first_name for x in others]
    except:
        pass
    
    row = pd.Series([article.pubdate, article.journal, first, last, second, penultimate, others],
                    name=article.pmid, index=col_names)
    df = df.append(row)

print(df)

                Date First Author                               Journal  \
26357045  2015-09-11     Fumihide  IEEE/ACM Trans Comput Biol Bioinform   
25601296  2015-01-20     Abhishek                       JMIR Med Inform   
25558360  2015-01-05         Sean                             Ecol Evol   
25553811  2015-01-02      Junfang               J Bioinform Comput Biol   
25549775  2015-06-09       Simone                        Ann Biomed Eng   
25543048  2015-04-28    Guillaume                        Bioinformatics   
25540185  2015-04-28        Peter                        Bioinformatics   
25527832  2015-04-12      Michael                        Bioinformatics   
25526884  2015-04-27         Carl                    BMC Bioinformatics   
25524895  2015-04-28         John                        Bioinformatics   
25521965  2014-12-19     Chengkun                      BMC Med Genomics   
25520192  2015-01-24         Paul                    Nucleic Acids Res.   
25514851  2015-02-06     

## Getting Genders - Prep
Now the tough part - getting genders. 

I played around trying to get `sexmachine` and `GenderComputer` to work, but ran into some issues, and those projects don't seem like they're being maintained, so I thought i'd try [genderize.io](http://genderize.io). The trouble is this is a web api, which takes more time than something run locally, and they have a limit to the number of requests you can make. Since there are probably a lot of duplicate names, I thought it might be worth collapsing the names into a set.

In [75]:
unique_names = set([name for name in df["First Author"]])
unique_names = unique_names.union(set(name for name in df["Last Author"]))
unique_names = unique_names.union(set(name for name in df["Second Author"]))
unique_names = unique_names.union(set(name for name in df["Penultimate Author"]))

for names in df["Other Authors"]:
    if names:
        unique_names.union([name for name in names])
        
print(len(unique_names))

790


So now... let's check the other, larger datasets:

In [81]:
print(len(unique_names))
print(unique_names)

823
{'Dingcheng', 'Abraham', 'Zhe', 'Brian', 'Elior', 'Matthias', 'Yusuke', 'Melissa', 'Whitney', 'Kazuki', 'Aaron', 'Bruno', 'Ralf', 'Preeti', 'Chen', 'Daniela', 'Tracy', 'Bogdan', 'Jason', 'Haibao', 'Jonathan', 'Santiago', 'Xiao', 'Charles', 'Kay', 'Kerstin', 'Mathieu', 'Antoine', 'Graham', 'Zhandong', 'Harvey', 'Mihai', 'Gareth', 'Stuart', 'Elena', 'O', 'Eamonn', 'Punita', 'Nicolas', 'Marie', 'Rob', 'Yinan', 'Christina', 'Jiantao', 'Tālis', 'Raoul', 'Alexey', 'Sergey', 'Ilinca', 'Jon', 'Yungang', 'v', 'Clare', 'Ole', 'Chengkun', 'Guy', 'Deb', 'Shintaro', 'Steve', 'Johann', 'Eduard', 'Ryan', "Jun'ichi", 'Zhi-Min', 'Manfred', 'Theo', 'Marnix', 'd', 'Nur', 'Luminita', 'Brad', 'Alvis', 'Thilo', 'Pekka', 'Kristoffer', 'Susan', 'Bianca', 'Julio', 'Chase', 'Yotsawat', 'Jeffrey', 'Bret', 'Koh-Ichiro', 'Hong', 'Jingde', 'Søren', 'Sarven', 'Avigail', 'Lilian', 'Yuanfeng', 'Pierre-Yves', 'Gergely', 'Giacomo', 'Dongming', 'Wei', 'Yadong', 'Zalman', 'Bobbie-Jo', 'Luis', 'Bradley', 'Devdatt', 'Ad

It took a while to parse all of the biology publications, so I'm just going to save the output from the last cell... shockingly there are only 823 names total. Also, a few names came out as single characters, which I'm going to remove.

In [88]:
save_unique_names = {'Dingcheng', 'Abraham', 'Zhe', 'Brian', 'Elior', 'Matthias', 'Yusuke', 'Melissa', 'Whitney', 'Kazuki', 'Aaron', 'Bruno', 'Ralf', 'Preeti', 'Chen', 'Daniela', 'Tracy', 'Bogdan', 'Jason', 'Haibao', 'Jonathan', 'Santiago', 'Xiao', 'Charles', 'Kay', 'Kerstin', 'Mathieu', 'Antoine', 'Graham', 'Zhandong', 'Harvey', 'Mihai', 'Gareth', 'Stuart', 'Elena', 'O', 'Eamonn', 'Punita', 'Nicolas', 'Marie', 'Rob', 'Yinan', 'Christina', 'Jiantao', 'Tālis', 'Raoul', 'Alexey', 'Sergey', 'Ilinca', 'Jon', 'Yungang', 'v', 'Clare', 'Ole', 'Chengkun', 'Guy', 'Deb', 'Shintaro', 'Steve', 'Johann', 'Eduard', 'Ryan', "Jun'ichi", 'Zhi-Min', 'Manfred', 'Theo', 'Marnix', 'd', 'Nur', 'Luminita', 'Brad', 'Alvis', 'Thilo', 'Pekka', 'Kristoffer', 'Susan', 'Bianca', 'Julio', 'Chase', 'Yotsawat', 'Jeffrey', 'Bret', 'Koh-Ichiro', 'Hong', 'Jingde', 'Søren', 'Sarven', 'Avigail', 'Lilian', 'Yuanfeng', 'Pierre-Yves', 'Gergely', 'Giacomo', 'Dongming', 'Wei', 'Yadong', 'Zalman', 'Bobbie-Jo', 'Luis', 'Bradley', 'Devdatt', 'Adrian', 'Hongjie', 'Eldon', 'Michal', 'Milad', 'Hamid', 'Hector', 'Rongxia', 'Gos', 'Carlo', 'Stefan', 'Julien', 'Kiran', 'Hákon', 'Vinhthuy', 'Kathryn', 'Steffen', 'Min', 'Chao', 'Sophia', 'Fatemeh', 'Damiano', 'Tanja', 'Ladislav', 'A', 'Joseph', 'Fiona', 'Anaïs', 'n', 'Ravindra', 'Kishore', 'Hans', 'Björn', 'Elisabet', 'Gang', 'Gregory', 'Hongfang', 'Wing', 'Eduardo', 'Arndt', 'Bing', 'h', 'Fritz', 'Pontus', 'Carlos', 'Nancy', 'Jaap', 'Aron', 'Mehdi', 'Tobias', 'R', 'Matthew', 'Adedapo', 'Helge', 'Debora', 'Marten', 'Neda', 'Bernd', 'Ola', 'Ludovic', 'Minxian', 'Filip', 'Richard', 'Israel', 'Hua-Lin', 'Alison', 'Patrick', 'k', 'Alexander', 'Goran', 'Ira', 'Fabian', 'Donovan', 'Jie', 'Yu', 'Ulrich', 'Sonia', 'Tatyana', 'Kristina', 'Mila', 'Ruping', 'Waraluk', 'Aristotelis', 'Zhong', 'Naoki', 'Mingwen', 'Lei', 'Maryam', 'Soile', 'Russ', 'Nansheng', 'Ahmed', 'Francesca', 'Javier', 'Hilmar', 'Reda', 'Brendan', 'Jay', 'Nathaniel', 'Nicola', 'Arlin', 'Douglas', 'Mikhail', 'Pavlo', 'P', 'Vicente', 'Guillaume', 'Jikai', 'Connor', 'Hiroshi', 'M', 'Sébastien', 'Denise', 'w', 'Harald', 'Gerard', 'V', 'Yanbo', 'Konrad', 'Serafim', 'Hongseok', 'r', 'Sebastian', 'Weifan', 'Anthony', 'Raquel', 'Tallulah', 'Dongjun', 'Basuthkar', 'Kevin', 'Qing', 'Raphael', 'Ugis', 'Mirco', 'Karissa', 'Armin', 'Görel', 'Kazuharu', 'Christian', 'Alexis', 'Niko', 'Erik', 'Maxim', 'Rajiv', 'Zhixing', 'Mikko', 'Neil', 'Jotun', 'Derek', 'Paula', 'Arran', 'Kranti', 'Harriet', 'Alexei', 'Jiang', 'Jean-Marc', 'Kenichi', 'Ramil', 'Gabriel', 'Maria-Jesus', 'Harish', 'Tom', 'Vincent', 'Laura', 'Jan-Ming', 'Julia', 'Aryan', 'Bortolo', 'Fumihide', 'Nelson', 'David', 'Seyed', 'Ying-Wooi', 'Andre', 'Mario', 'Sampo', 'Joel', 'Eberhard', 'Hai', 'José', 'Travers', 'Mohammad', 'Lea', 'Volkhard', 'Sirko', 'Murat', 'Zheng', 'Sanghyuk', 'Ian', 'Michel', 'Sean', 'Andrea', 'Man', 'Hongyu', 'Iztok', 'Andrey', 'u', 'William', 'Sam', 'Philip', 'Steven', 'Eriko', 'Katerina', 'Burkhard', 'Jared', 'Alyssa', 'Marcus', 'Jacob', 'Bernardo', 'Hugo', 'Dennis', 'Giles', 'J', 'p', 'Lorenzo', 'Michelle', 'Ignas', 'H', 'Sahand', 'Ilya', 'Natalja', 'Marcelo', 'Michael', 'Anders', 'Rakesh', 'Marcin', 'Janez', 'Thomas', 'W', 'Niclas', 'Theodore', 'Lisa', 'Abhinav', 'Jose', 'Jeet', 'Antti', 'Xavier', 'Scott', 'Kyowon', 'g', 'Jianxin', 'Shilin', 'Ulf', 'Rutger', 'Elana', 'Tony', 'Xiaowu', 'Daniel', 'Chiranjib', 'Canh', 'Bojan', 'Hailiang', 'Gustavo', 'Mengjun', 'Abhaya', 'Tim', 'Leonardo', 'Eitan', 'Yi', 'Benjamin', 'Rebecca', 'Francisco', 'Juho', 'Yongchao', 'Wubin', 'Florence', 'Jean-Philippe', 'Frederick', 'Trevor', 'Chuen-Liang', 'Berthold', 'Hari', 'Christoph', 'Ngan', 'Wojciech', 'Simone', 'Wen', 'Sandeep', 'Vassily', 'Owen', 'Oksana', 'Sangwoo', "Ya'ara", 'Quanhu', 'Jiali', 'Tomasz', 'Tomi', 'Chenggang', 'Carl', 'Marjo', 'm', 'Kai', 'Jarl', 'Qingpeng', 'Ajasja', 'Nicole', 'Quaid', 'Ernest', 'Stephen', 'Xuebin', 'Adam', 'Till', 'Ence', 'Markus', 'Nils', 'Tiejun', 'Tamara', 'Bosco', 'Valentin', 'Jillian', 'Adrià', 'Giovanna', 'Todd', 'Phillip', 'Luca', 'Koji', 'Hamidreza', 'Sten', 'Zhengdeng', 'Justin', 'Yi-Bao', 'Stefano', 'Nicoló', 'Allan', 'Xiaohui', 'Victoria', 'Marinka', 'Shibu', 'Andy', 'Bjarni', 'Ueli', 'Bryce', 'Eske', 'Pelin', 'Evan', 'Marco', 'Lior', 'Dan', 'Heiko', 'Miroslav', 'Adnane', 'Alessio', 'Arne', 'Colby', 'Perry', 'Ana', 'Weidong', 'Erika', 's', 'Eran', 'Rong', 'Fabien', 'René', 'Adina', 'Nathan', 'Yardena', 'Bei', 'Ben', 'Zsuzsanna', 'Julian', 'Edmund', 'Alice', 'Can', 'Joakim', 'Mick', 'Glenn', 'Didier', 'Konstantin', 'Yan', 'John', 'Terry', 'Lawrence', 'Poul', 'Bernhard', 'Damon', 'Spencer', 'Barbara', 'Cristin', 'B', 'Mannis', 'Jai', 'Roderic', 'Yanli', 'Harold', 'Pablo', 'Morris', 'Irmgard', 'Jeroen', 'Bo', 'Alistair', 'Ludwig', 'Gianluigi', 'Ingemar', 'Mengyao', 'Peter', 'Hiroyuki', 'Roger', 'Colin', 'Quang', 'Jun', 'Li', 'Radek', 'Marc', 'Zhiping', 'Andreas', 'Jessica', 'Sohan', 'Maximilian', 'Qiyun', 'Raymond', 'Katrina', 'Vinay', 'Balaji', 'Oliver', 'Maria', 'Jordan', 'Hilary', 'Ioannis', 'Kasper', 'Gilles', 'e', 'Julie', 'Nikhil', 'i', 'Nick', 'Ross', 'Daryl', 'Yanni', 'Lasse', 'Jens', 'Emily', 'Robert', 'Kim', 'Mehmet', 'Jan', 'Marghoob', 'Virpi', 'N', 'Werner', 'Sashank', 'Benedict', 'Kaname', 'Gary', 'Lifang', 'Joaquin', 'Bastien', 'Winfried', 'Esa', 'Lars', 'Sarah', 'Mayumi', 'Noah', 'Arnaud', 'o', 'Trisha', 'Chi', 'Joshua', 'Yuichi', 'Chien-Chih', 'Son', 'Janna', 'G', 'Clemens', 'Zasha', 'Helen', 'Kensaku', 'Nam', 'Caleb', 'Kyungmin', 'Bastian', 'Seiya', 'Vassilios', 'Chaitanya', 'Fred', 'Martin', 'Yannick', 'Guo-Cheng', 'Dimitrios-Georgios', 'Roman', 'Volodymyr', 'Denis', 'Mostafa', 'Lana', 'Edda', 'Yong-Huan', 'Nobal', 'Akash', 'Morgan', 'Luke', 'Franck', 'Colbert', 'Naama', 'Jeremy', 'Hyatt', 'Davey', 'Rita', 'Jeff', 'Sujai', 'Larisa', 'Lynda', 'Alexandre', 'Abhishek', 'Rasmus', 'Juliane', 'Dmitri', 'Pavel', 'Andrei', 'L', 'Juha', 'Xin', 'Yang', 'Dorota', 'Solon', 'Estela', 'Ruchi', 'Brent', 'T', 'Giosuè', 'Joaquín', 'Rodrigo', 'Satoru', 'Allen', 'Aki', 'Henna', 'Ashwin', 'Mattias', 'Masao', 'Tamil', 'Philippe', 'Lincoln', 'I', 'Pedro', 'Omar', 'See-Kiong', 'Henry', 'Nyasha', 'D', 'Katharina', 'Enis', 'Piero', 'Sven', 'James', 'Jagir', 'Roderick', 'Henning', 'Ignacio', 'Masaru', 'Junfang', 'Murray', 'Bertrand', 'Mitchell', 'Sophie', 'Matteo', 'Angel', 'Sadique', 'K', 'Sitanshu', 'Alex', 'Elsa', 'Hugh', 'Robyn', 'Grzegorz', 'Kunyaluk', 'Augustin', 'Fernando', 'Anne', 'Nuria', 'Anna', 'Vijay', 'Sikander', 'Naoto', 'Ayton', 'Laurent', 'Andriy', 'Simon', 'Seishi', 'Hui', 'Lin', 'Rafael', 'Bastiaan', 'Layla', 'Mark', 'Manuel', 'Hisham', 'Aurélien', 'Noam', 'Chris', 'Brett', 'Beifang', 'Sol', 'Macha', 'Lusine', 'Yves', 'Nolan', 'Dilan', 'Philipp', 'Osamu', 'Anna-Sapfo', 'Ramamurthy', 'Johannes', 'Yu-Jung', 'Carole', 'Miguel', 'Igor', 'Fabio', 'Tyler', 'Heng', 'Piotr', 't', 'C', 'Vineet', 'Filipe', 'S', 'Anton', 'Greg', 'Samuel', 'Liyang', 'Maarten', 'Matias', 'Louis', 'Zhi', 'Moustapha', 'Shankar', 'Hongjun', 'Lukas', 'Leopold', 'Koh-ichiro', 'Sofie', 'Alessandro', 'Georg', 'Andrew', 'Lisle', 'Aristotle', 'Marta', 'Gabor', 'Sergei', 'Xiao-Li', 'Hitesh', 'Yongkyu', 'Carson', None, 'Dongqing', 'Rainer', 'Susanna-Assunta', 'Christopher', 'Wolfgang', 'Xiahan', 'Gernot', 'Joost', 'Kurt', 'Come', 'Hidetoshi', 'Aidan', 'Ergude', 'Tao', 'Josh', 'Guido', 'Ali', 'Janusz', 'Daan', 'Lutz', 'Silvio', 'Fei-Yang', 'Stephan', 'Lakshmi', 'Alexandra', 'Hayssam', 'Hyeshik', 'Kyle', 'Jennifer', 'Natalie', 'Virginie', 'Emmanuel', 'Gunnar', 'Leyla', 'Jonas', 'Nuwan', 'Jordi', 'c', 'Dinesh', 'Stefanie', 'Paul', 'Wan-Ping', 'Ivana', 'Eric', 'Konstantinos', 'Gerd', 'Nicholas', 'F', 'Francis', 'Arturo', 'Pratap', 'Pierre', 'Susana', 'Ingolfur', 'Jörn', 'Han', 'Blaž', 'Insu', 'Felix', 'Frank', 'Alan', 'Cali', 'Florian', 'Claire', 'Ezequiel', 'Goutham', 'Tzung-Fu', 'Youri', 'Yelena', 'Adva', 'Jamison', 'Dietrich', 'Yurong', 'Olivier', '-', 'Alejandra', 'a', 'Aureliano', 'Emre', 'Timothy', 'Alexandros'}

In [92]:
fixed_names = []
for name in save_unique_names:
    if name:
        if len(name) > 1:
            fixed_names.append(name)
fixed_names.sort()
print(len(fixed_names))
print(fixed_names)

783
['Aaron', 'Abhaya', 'Abhinav', 'Abhishek', 'Abraham', 'Adam', 'Adedapo', 'Adina', 'Adnane', 'Adrian', 'Adrià', 'Adva', 'Ahmed', 'Aidan', 'Ajasja', 'Akash', 'Aki', 'Alan', 'Alejandra', 'Alessandro', 'Alessio', 'Alex', 'Alexander', 'Alexandra', 'Alexandre', 'Alexandros', 'Alexei', 'Alexey', 'Alexis', 'Ali', 'Alice', 'Alison', 'Alistair', 'Allan', 'Allen', 'Alvis', 'Alyssa', 'Ana', 'Anaïs', 'Anders', 'Andre', 'Andrea', 'Andreas', 'Andrei', 'Andrew', 'Andrey', 'Andriy', 'Andy', 'Angel', 'Anna', 'Anna-Sapfo', 'Anne', 'Anthony', 'Antoine', 'Anton', 'Antti', 'Aristotelis', 'Aristotle', 'Arlin', 'Armin', 'Arnaud', 'Arndt', 'Arne', 'Aron', 'Arran', 'Arturo', 'Aryan', 'Ashwin', 'Augustin', 'Aureliano', 'Aurélien', 'Avigail', 'Ayton', 'Balaji', 'Barbara', 'Bastiaan', 'Bastian', 'Bastien', 'Basuthkar', 'Bei', 'Beifang', 'Ben', 'Benedict', 'Benjamin', 'Bernardo', 'Bernd', 'Bernhard', 'Berthold', 'Bertrand', 'Bianca', 'Bing', 'Bjarni', 'Björn', 'Blaž', 'Bo', 'Bobbie-Jo', 'Bogdan', 'Bojan', 'Bort

So now that we have a list of names, I'm going to use the [`genderize`](https://github.com/SteelPangolin/genderize)
package to call the genderize.io API and get names. The API only allows us to call 10 names at a time. Again, I'll save the output here.

In [98]:
genders_save = [{'name': 'Aaron', 'count': 2299, 'gender': 'male', 'probability': 1.0}, {'name': 'Abhaya', 'gender': None}, {'name': 'Abhinav', 'count': 53, 'gender': 'male', 'probability': 1.0}, {'name': 'Abhishek', 'count': 308, 'gender': 'male', 'probability': 1.0}, {'name': 'Abraham', 'count': 292, 'gender': 'male', 'probability': 1.0}, {'name': 'Adam', 'count': 3963, 'gender': 'male', 'probability': 1.0}, {'name': 'Adedapo', 'gender': None}, {'name': 'Adina', 'count': 100, 'gender': 'female', 'probability': 1.0}, {'name': 'Adnane', 'count': 16, 'gender': 'male', 'probability': 1.0}, {'name': 'Adrian', 'count': 1515, 'gender': 'male', 'probability': 0.97}, {'name': 'Adrià', 'count': 59, 'gender': 'male', 'probability': 0.76}, {'name': 'Adva', 'count': 4, 'gender': 'female', 'probability': 1.0}, {'name': 'Ahmed', 'count': 2639, 'gender': 'male', 'probability': 0.99}, {'name': 'Aidan', 'count': 167, 'gender': 'male', 'probability': 0.98}, {'name': 'Ajasja', 'gender': None}, {'name': 'Akash', 'count': 84, 'gender': 'male', 'probability': 1.0}, {'name': 'Aki', 'count': 81, 'gender': 'male', 'probability': 0.63}, {'name': 'Alan', 'count': 2079, 'gender': 'male', 'probability': 1.0}, {'name': 'Alejandra', 'count': 1418, 'gender': 'female', 'probability': 0.99}, {'name': 'Alessandro', 'count': 1025, 'gender': 'male', 'probability': 1.0}, {'name': 'Alessio', 'count': 326, 'gender': 'male', 'probability': 1.0}, {'name': 'Alex', 'count': 5856, 'gender': 'male', 'probability': 0.87}, {'name': 'Alexander', 'count': 1645, 'gender': 'male', 'probability': 1.0}, {'name': 'Alexandra', 'count': 1723, 'gender': 'female', 'probability': 1.0}, {'name': 'Alexandre', 'count': 903, 'gender': 'male', 'probability': 1.0}, {'name': 'Alexandros', 'count': 79, 'gender': 'male', 'probability': 1.0}, {'name': 'Alexei', 'count': 30, 'gender': 'male', 'probability': 1.0}, {'name': 'Alexey', 'count': 45, 'gender': 'male', 'probability': 1.0}, {'name': 'Alexis', 'count': 1224, 'gender': 'male', 'probability': 0.52}, {'name': 'Ali', 'count': 3351, 'gender': 'male', 'probability': 0.85}, {'name': 'Alice', 'count': 1414, 'gender': 'female', 'probability': 1.0}, {'name': 'Alison', 'count': 1288, 'gender': 'female', 'probability': 0.99}, {'name': 'Alistair', 'count': 98, 'gender': 'male', 'probability': 1.0}, {'name': 'Allan', 'count': 590, 'gender': 'male', 'probability': 1.0}, {'name': 'Allen', 'count': 561, 'gender': 'male', 'probability': 0.99}, {'name': 'Alvis', 'count': 6, 'gender': 'male', 'probability': 0.67}, {'name': 'Alyssa', 'count': 728, 'gender': 'female', 'probability': 1.0}, {'name': 'Ana', 'count': 3621, 'gender': 'female', 'probability': 0.99}, {'name': 'Anaïs', 'count': 288, 'gender': 'female', 'probability': 0.99}, {'name': 'Anders', 'count': 603, 'gender': 'male', 'probability': 1.0}, {'name': 'Andre', 'count': 1210, 'gender': 'male', 'probability': 0.95}, {'name': 'Andrea', 'count': 5812, 'gender': 'female', 'probability': 0.79}, {'name': 'Andreas', 'count': 1021, 'gender': 'male', 'probability': 1.0}, {'name': 'Andrei', 'count': 168, 'gender': 'male', 'probability': 0.99}, {'name': 'Andrew', 'count': 5168, 'gender': 'male', 'probability': 1.0}, {'name': 'Andrey', 'count': 147, 'gender': 'male', 'probability': 0.99}, {'name': 'Andriy', 'count': 14, 'gender': 'male', 'probability': 1.0}, {'name': 'Andy', 'count': 3139, 'gender': 'male', 'probability': 0.95}, {'name': 'Angel', 'count': 1668, 'gender': 'male', 'probability': 0.61}, {'name': 'Anna', 'count': 4755, 'gender': 'female', 'probability': 1.0}, {'name': 'Anna-Sapfo', 'gender': None}, {'name': 'Anne', 'count': 2461, 'gender': 'female', 'probability': 1.0}, {'name': 'Anthony', 'count': 3029, 'gender': 'male', 'probability': 1.0}, {'name': 'Antoine', 'count': 411, 'gender': 'male', 'probability': 1.0}, {'name': 'Anton', 'count': 537, 'gender': 'male', 'probability': 1.0}, {'name': 'Antti', 'count': 111, 'gender': 'male', 'probability': 1.0}, {'name': 'Aristotelis', 'count': 9, 'gender': 'male', 'probability': 1.0}, {'name': 'Aristotle', 'count': 1, 'gender': 'male', 'probability': 1.0}, {'name': 'Arlin', 'count': 2, 'gender': 'female', 'probability': 1.0}, {'name': 'Armin', 'count': 131, 'gender': 'male', 'probability': 0.99}, {'name': 'Arnaud', 'count': 375, 'gender': 'male', 'probability': 0.99}, {'name': 'Arndt', 'count': 4, 'gender': 'male', 'probability': 1.0}, {'name': 'Arne', 'count': 112, 'gender': 'male', 'probability': 0.99}, {'name': 'Aron', 'count': 81, 'gender': 'male', 'probability': 0.84}, {'name': 'Arran', 'count': 19, 'gender': 'male', 'probability': 1.0}, {'name': 'Arturo', 'count': 515, 'gender': 'male', 'probability': 1.0}, {'name': 'Aryan', 'count': 12, 'gender': 'male', 'probability': 1.0}, {'name': 'Ashwin', 'count': 142, 'gender': 'male', 'probability': 1.0}, {'name': 'Augustin', 'count': 24, 'gender': 'male', 'probability': 0.96}, {'name': 'Aureliano', 'count': 11, 'gender': 'male', 'probability': 1.0}, {'name': 'Aurélien', 'count': 179, 'gender': 'male', 'probability': 1.0}, {'name': 'Avigail', 'count': 3, 'gender': 'female', 'probability': 1.0}, {'name': 'Ayton', 'gender': None}, {'name': 'Balaji', 'count': 71, 'gender': 'male', 'probability': 1.0}, {'name': 'Barbara', 'count': 2843, 'gender': 'female', 'probability': 1.0}, {'name': 'Bastiaan', 'count': 6, 'gender': 'male', 'probability': 1.0}, {'name': 'Bastian', 'count': 66, 'gender': 'male', 'probability': 0.98}, {'name': 'Bastien', 'count': 97, 'gender': 'male', 'probability': 1.0}, {'name': 'Basuthkar', 'gender': None}, {'name': 'Bei', 'count': 6, 'gender': 'female', 'probability': 0.83}, {'name': 'Beifang', 'gender': None}, {'name': 'Ben', 'count': 3363, 'gender': 'male', 'probability': 0.99}, {'name': 'Benedict', 'count': 59, 'gender': 'male', 'probability': 1.0}, {'name': 'Benjamin', 'count': 1475, 'gender': 'male', 'probability': 1.0}, {'name': 'Bernardo', 'count': 265, 'gender': 'male', 'probability': 1.0}, {'name': 'Bernd', 'count': 34, 'gender': 'male', 'probability': 1.0}, {'name': 'Bernhard', 'count': 47, 'gender': 'male', 'probability': 1.0}, {'name': 'Berthold', 'count': 4, 'gender': 'male', 'probability': 1.0}, {'name': 'Bertrand', 'count': 117, 'gender': 'male', 'probability': 0.97}, {'name': 'Bianca', 'count': 684, 'gender': 'female', 'probability': 1.0}, {'name': 'Bing', 'count': 37, 'gender': 'female', 'probability': 0.62}, {'name': 'Bjarni', 'count': 10, 'gender': 'male', 'probability': 1.0}, {'name': 'Björn', 'count': 248, 'gender': 'male', 'probability': 1.0}, {'name': 'Blaž', 'count': 17, 'gender': 'male', 'probability': 1.0}, {'name': 'Bo', 'count': 230, 'gender': 'male', 'probability': 0.84}, {'name': 'Bobbie-Jo', 'count': 1, 'gender': 'female', 'probability': 1.0}, {'name': 'Bogdan', 'count': 88, 'gender': 'male', 'probability': 1.0}, {'name': 'Bojan', 'count': 93, 'gender': 'male', 'probability': 1.0}, {'name': 'Bortolo', 'gender': None}, {'name': 'Bosco', 'count': 17, 'gender': 'male', 'probability': 0.88}, {'name': 'Brad', 'count': 1286, 'gender': 'male', 'probability': 0.99}, {'name': 'Bradley', 'count': 394, 'gender': 'male', 'probability': 1.0}, {'name': 'Brendan', 'count': 517, 'gender': 'male', 'probability': 1.0}, {'name': 'Brent', 'count': 595, 'gender': 'male', 'probability': 1.0}, {'name': 'Bret', 'count': 88, 'gender': 'male', 'probability': 0.99}, {'name': 'Brett', 'count': 976, 'gender': 'male', 'probability': 0.99}, {'name': 'Brian', 'count': 4600, 'gender': 'male', 'probability': 1.0}, {'name': 'Bruno', 'count': 926, 'gender': 'male', 'probability': 1.0}, {'name': 'Bryce', 'count': 214, 'gender': 'male', 'probability': 0.98}, {'name': 'Burkhard', 'count': 5, 'gender': 'male', 'probability': 1.0}, {'name': 'Caleb', 'count': 354, 'gender': 'male', 'probability': 0.99}, {'name': 'Cali', 'count': 41, 'gender': 'female', 'probability': 0.76}, {'name': 'Can', 'count': 248, 'gender': 'male', 'probability': 0.93}, {'name': 'Canh', 'count': 4, 'gender': 'male', 'probability': 1.0}, {'name': 'Carl', 'count': 929, 'gender': 'male', 'probability': 1.0}, {'name': 'Carlo', 'count': 423, 'gender': 'male', 'probability': 0.99}, {'name': 'Carlos', 'count': 5149, 'gender': 'male', 'probability': 1.0}, {'name': 'Carole', 'count': 630, 'gender': 'female', 'probability': 1.0}, {'name': 'Carson', 'count': 83, 'gender': 'male', 'probability': 0.81}, {'name': 'Chaitanya', 'count': 30, 'gender': 'male', 'probability': 0.9}, {'name': 'Chao', 'count': 16, 'gender': 'male', 'probability': 0.81}, {'name': 'Charles', 'count': 1798, 'gender': 'male', 'probability': 1.0}, {'name': 'Chase', 'count': 306, 'gender': 'male', 'probability': 0.96}, {'name': 'Chen', 'count': 239, 'gender': 'male', 'probability': 0.57}, {'name': 'Chenggang', 'gender': None}, {'name': 'Chengkun', 'gender': None}, {'name': 'Chi', 'count': 86, 'gender': 'female', 'probability': 0.55}, {'name': 'Chien-Chih', 'gender': None}, {'name': 'Chiranjib', 'gender': None}, {'name': 'Chris', 'count': 8631, 'gender': 'male', 'probability': 0.93}, {'name': 'Christian', 'count': 2548, 'gender': 'male', 'probability': 0.99}, {'name': 'Christina', 'count': 2400, 'gender': 'female', 'probability': 1.0}, {'name': 'Christoph', 'count': 192, 'gender': 'male', 'probability': 1.0}, {'name': 'Christopher', 'count': 2339, 'gender': 'male', 'probability': 1.0}, {'name': 'Chuen-Liang', 'gender': None}, {'name': 'Claire', 'count': 2587, 'gender': 'female', 'probability': 1.0}, {'name': 'Clare', 'count': 555, 'gender': 'female', 'probability': 1.0}, {'name': 'Clemens', 'count': 20, 'gender': 'male', 'probability': 1.0}, {'name': 'Colbert', 'gender': None}, {'name': 'Colby', 'count': 100, 'gender': 'male', 'probability': 0.8}, {'name': 'Colin', 'count': 954, 'gender': 'male', 'probability': 0.99}, {'name': 'Come', 'count': 5, 'gender': 'male', 'probability': 1.0}, {'name': 'Connor', 'count': 447, 'gender': 'male', 'probability': 0.99}, {'name': 'Cristin', 'count': 14, 'gender': 'female', 'probability': 0.93}, {'name': 'Daan', 'count': 25, 'gender': 'male', 'probability': 1.0}, {'name': 'Damiano', 'count': 61, 'gender': 'male', 'probability': 1.0}, {'name': 'Damon', 'count': 159, 'gender': 'male', 'probability': 0.99}, {'name': 'Dan', 'count': 3240, 'gender': 'male', 'probability': 0.98}, {'name': 'Daniel', 'count': 8186, 'gender': 'male', 'probability': 1.0}, {'name': 'Daniela', 'count': 2105, 'gender': 'female', 'probability': 0.99}, {'name': 'Daryl', 'count': 272, 'gender': 'male', 'probability': 0.92}, {'name': 'Davey', 'count': 29, 'gender': 'male', 'probability': 0.97}, {'name': 'David', 'count': 12597, 'gender': 'male', 'probability': 1.0}, {'name': 'Deb', 'count': 542, 'gender': 'female', 'probability': 0.99}, {'name': 'Debora', 'count': 336, 'gender': 'female', 'probability': 0.99}, {'name': 'Denis', 'count': 577, 'gender': 'male', 'probability': 0.93}, {'name': 'Denise', 'count': 1844, 'gender': 'female', 'probability': 1.0}, {'name': 'Dennis', 'count': 1279, 'gender': 'male', 'probability': 0.99}, {'name': 'Derek', 'count': 1169, 'gender': 'male', 'probability': 1.0}, {'name': 'Devdatt', 'gender': None}, {'name': 'Didier', 'count': 204, 'gender': 'male', 'probability': 0.99}, {'name': 'Dietrich', 'count': 6, 'gender': 'male', 'probability': 1.0}, {'name': 'Dilan', 'count': 80, 'gender': 'male', 'probability': 0.56}, {'name': 'Dimitrios-Georgios', 'gender': None}, {'name': 'Dinesh', 'count': 154, 'gender': 'male', 'probability': 1.0}, {'name': 'Dingcheng', 'gender': None}, {'name': 'Dmitri', 'count': 21, 'gender': 'male', 'probability': 1.0}, {'name': 'Dongjun', 'gender': None}, {'name': 'Dongming', 'count': 1, 'gender': 'male', 'probability': 1.0}, {'name': 'Dongqing', 'gender': None}, {'name': 'Donovan', 'count': 173, 'gender': 'male', 'probability': 0.99}, {'name': 'Dorota', 'count': 70, 'gender': 'female', 'probability': 1.0}, {'name': 'Douglas', 'count': 611, 'gender': 'male', 'probability': 1.0}, {'name': 'Eamonn', 'count': 20, 'gender': 'male', 'probability': 1.0}, {'name': 'Eberhard', 'count': 2, 'gender': 'male', 'probability': 1.0}, {'name': 'Edda', 'count': 16, 'gender': 'female', 'probability': 1.0}, {'name': 'Edmund', 'count': 129, 'gender': 'male', 'probability': 1.0}, {'name': 'Eduard', 'count': 105, 'gender': 'male', 'probability': 1.0}, {'name': 'Eduardo', 'count': 1644, 'gender': 'male', 'probability': 1.0}, {'name': 'Eitan', 'count': 10, 'gender': 'male', 'probability': 1.0}, {'name': 'Elana', 'count': 45, 'gender': 'female', 'probability': 1.0}, {'name': 'Eldon', 'count': 18, 'gender': 'male', 'probability': 1.0}, {'name': 'Elena', 'count': 1645, 'gender': 'female', 'probability': 1.0}, {'name': 'Elior', 'count': 5, 'gender': 'male', 'probability': 1.0}, {'name': 'Elisabet', 'count': 57, 'gender': 'female', 'probability': 1.0}, {'name': 'Elsa', 'count': 404, 'gender': 'female', 'probability': 1.0}, {'name': 'Emily', 'count': 3765, 'gender': 'female', 'probability': 1.0}, {'name': 'Emmanuel', 'count': 761, 'gender': 'male', 'probability': 1.0}, {'name': 'Emre', 'count': 800, 'gender': 'male', 'probability': 0.99}, {'name': 'Ence', 'gender': None}, {'name': 'Enis', 'count': 66, 'gender': 'male', 'probability': 0.94}, {'name': 'Eran', 'count': 74, 'gender': 'male', 'probability': 0.99}, {'name': 'Ergude', 'gender': None}, {'name': 'Eric', 'count': 4110, 'gender': 'male', 'probability': 1.0}, {'name': 'Erik', 'count': 1072, 'gender': 'male', 'probability': 0.99}, {'name': 'Erika', 'count': 1544, 'gender': 'female', 'probability': 0.99}, {'name': 'Eriko', 'count': 18, 'gender': 'female', 'probability': 0.89}, {'name': 'Ernest', 'count': 147, 'gender': 'male', 'probability': 0.99}, {'name': 'Esa', 'count': 22, 'gender': 'male', 'probability': 0.55}, {'name': 'Eske', 'count': 1, 'gender': 'male', 'probability': 1.0}, {'name': 'Estela', 'count': 138, 'gender': 'female', 'probability': 1.0}, {'name': 'Evan', 'count': 657, 'gender': 'male', 'probability': 0.97}, {'name': 'Ezequiel', 'count': 190, 'gender': 'male', 'probability': 1.0}, {'name': 'Fabian', 'count': 689, 'gender': 'male', 'probability': 0.99}, {'name': 'Fabien', 'count': 237, 'gender': 'male', 'probability': 1.0}, {'name': 'Fabio', 'count': 911, 'gender': 'male', 'probability': 0.99}, {'name': 'Fatemeh', 'count': 12, 'gender': 'female', 'probability': 1.0}, {'name': 'Fei-Yang', 'gender': None}, {'name': 'Felix', 'count': 692, 'gender': 'male', 'probability': 1.0}, {'name': 'Fernando', 'count': 2012, 'gender': 'male', 'probability': 1.0}, {'name': 'Filip', 'count': 270, 'gender': 'male', 'probability': 0.99}, {'name': 'Filipe', 'count': 132, 'gender': 'male', 'probability': 1.0}, {'name': 'Fiona', 'count': 926, 'gender': 'female', 'probability': 1.0}, {'name': 'Florence', 'count': 538, 'gender': 'female', 'probability': 1.0}, {'name': 'Florian', 'count': 489, 'gender': 'male', 'probability': 1.0}, {'name': 'Francesca', 'count': 1122, 'gender': 'female', 'probability': 1.0}, {'name': 'Francis', 'count': 680, 'gender': 'male', 'probability': 0.89}, {'name': 'Francisco', 'count': 1513, 'gender': 'male', 'probability': 1.0}, {'name': 'Franck', 'count': 327, 'gender': 'male', 'probability': 1.0}, {'name': 'Frank', 'count': 1565, 'gender': 'male', 'probability': 1.0}, {'name': 'Fred', 'count': 966, 'gender': 'male', 'probability': 0.98}, {'name': 'Frederick', 'count': 294, 'gender': 'male', 'probability': 1.0}, {'name': 'Fritz', 'count': 68, 'gender': 'male', 'probability': 0.93}, {'name': 'Fumihide', 'gender': None}, {'name': 'Gabor', 'count': 61, 'gender': 'male', 'probability': 1.0}, {'name': 'Gabriel', 'count': 1676, 'gender': 'male', 'probability': 0.99}, {'name': 'Gang', 'count': 5, 'gender': 'male', 'probability': 1.0}, {'name': 'Gareth', 'count': 475, 'gender': 'male', 'probability': 0.97}, {'name': 'Gary', 'count': 2132, 'gender': 'male', 'probability': 1.0}, {'name': 'Georg', 'count': 64, 'gender': 'male', 'probability': 1.0}, {'name': 'Gerard', 'count': 461, 'gender': 'male', 'probability': 1.0}, {'name': 'Gerd', 'count': 54, 'gender': 'female', 'probability': 0.52}, {'name': 'Gergely', 'count': 14, 'gender': 'male', 'probability': 0.93}, {'name': 'Gernot', 'count': 17, 'gender': 'male', 'probability': 1.0}, {'name': 'Giacomo', 'count': 192, 'gender': 'male', 'probability': 1.0}, {'name': 'Gianluigi', 'count': 25, 'gender': 'male', 'probability': 1.0}, {'name': 'Giles', 'count': 77, 'gender': 'male', 'probability': 0.86}, {'name': 'Gilles', 'count': 225, 'gender': 'male', 'probability': 1.0}, {'name': 'Giosuè', 'count': 13, 'gender': 'male', 'probability': 1.0}, {'name': 'Giovanna', 'count': 347, 'gender': 'female', 'probability': 1.0}, {'name': 'Glenn', 'count': 652, 'gender': 'male', 'probability': 0.99}, {'name': 'Goran', 'count': 190, 'gender': 'male', 'probability': 1.0}, {'name': 'Gos', 'count': 1, 'gender': 'male', 'probability': 1.0}, {'name': 'Goutham', 'count': 3, 'gender': 'male', 'probability': 1.0}, {'name': 'Graham', 'count': 603, 'gender': 'male', 'probability': 1.0}, {'name': 'Greg', 'count': 1854, 'gender': 'male', 'probability': 1.0}, {'name': 'Gregory', 'count': 722, 'gender': 'male', 'probability': 1.0}, {'name': 'Grzegorz', 'count': 56, 'gender': 'male', 'probability': 0.98}, {'name': 'Guido', 'count': 245, 'gender': 'male', 'probability': 1.0}, {'name': 'Guillaume', 'count': 779, 'gender': 'male', 'probability': 1.0}, {'name': 'Gunnar', 'count': 136, 'gender': 'male', 'probability': 1.0}, {'name': 'Guo-Cheng', 'gender': None}, {'name': 'Gustavo', 'count': 1161, 'gender': 'male', 'probability': 1.0}, {'name': 'Guy', 'count': 445, 'gender': 'male', 'probability': 1.0}, {'name': 'Görel', 'count': 2, 'gender': 'female', 'probability': 1.0}, {'name': 'Hai', 'count': 45, 'gender': 'male', 'probability': 0.84}, {'name': 'Haibao', 'gender': None}, {'name': 'Hailiang', 'gender': None}, {'name': 'Hamid', 'count': 157, 'gender': 'male', 'probability': 0.98}, {'name': 'Hamidreza', 'count': 9, 'gender': 'male', 'probability': 1.0}, {'name': 'Han', 'count': 145, 'gender': 'male', 'probability': 0.61}, {'name': 'Hans', 'count': 431, 'gender': 'male', 'probability': 0.99}, {'name': 'Harald', 'count': 80, 'gender': 'male', 'probability': 1.0}, {'name': 'Hari', 'count': 101, 'gender': 'male', 'probability': 0.94}, {'name': 'Harish', 'count': 79, 'gender': 'male', 'probability': 1.0}, {'name': 'Harold', 'count': 364, 'gender': 'male', 'probability': 1.0}, {'name': 'Harriet', 'count': 168, 'gender': 'female', 'probability': 0.99}, {'name': 'Harvey', 'count': 116, 'gender': 'male', 'probability': 0.98}, {'name': 'Hayssam', 'count': 1, 'gender': 'male', 'probability': 1.0}, {'name': 'Hector', 'count': 1112, 'gender': 'male', 'probability': 1.0}, {'name': 'Heiko', 'count': 46, 'gender': 'male', 'probability': 1.0}, {'name': 'Helen', 'count': 2232, 'gender': 'female', 'probability': 0.99}, {'name': 'Helge', 'count': 28, 'gender': 'male', 'probability': 0.93}, {'name': 'Heng', 'count': 27, 'gender': 'male', 'probability': 0.85}, {'name': 'Henna', 'count': 59, 'gender': 'female', 'probability': 1.0}, {'name': 'Henning', 'count': 77, 'gender': 'male', 'probability': 1.0}, {'name': 'Henry', 'count': 1067, 'gender': 'male', 'probability': 1.0}, {'name': 'Hidetoshi', 'gender': None}, {'name': 'Hilary', 'count': 315, 'gender': 'female', 'probability': 0.87}, {'name': 'Hilmar', 'count': 38, 'gender': 'male', 'probability': 1.0}, {'name': 'Hiroshi', 'count': 19, 'gender': 'male', 'probability': 1.0}, {'name': 'Hiroyuki', 'count': 29, 'gender': 'male', 'probability': 1.0}, {'name': 'Hisham', 'count': 125, 'gender': 'male', 'probability': 1.0}, {'name': 'Hitesh', 'count': 58, 'gender': 'male', 'probability': 1.0}, {'name': 'Hong', 'count': 93, 'gender': 'male', 'probability': 0.51}, {'name': 'Hongfang', 'gender': None}, {'name': 'Hongjie', 'gender': None}, {'name': 'Hongjun', 'count': 1, 'gender': 'male', 'probability': 1.0}, {'name': 'Hongseok', 'count': 1, 'gender': 'male', 'probability': 1.0}, {'name': 'Hongyu', 'count': 1, 'gender': 'male', 'probability': 1.0}, {'name': 'Hua-Lin', 'gender': None}, {'name': 'Hugh', 'count': 174, 'gender': 'male', 'probability': 1.0}, {'name': 'Hugo', 'count': 836, 'gender': 'male', 'probability': 1.0}, {'name': 'Hui', 'count': 100, 'gender': 'female', 'probability': 0.79}, {'name': 'Hyatt', 'count': 1, 'gender': 'male', 'probability': 1.0}, {'name': 'Hyeshik', 'gender': None}, {'name': 'Hákon', 'count': 42, 'gender': 'male', 'probability': 1.0}, {'name': 'Ian', 'count': 2159, 'gender': 'male', 'probability': 0.99}, {'name': 'Ignacio', 'count': 451, 'gender': 'male', 'probability': 1.0}, {'name': 'Ignas', 'count': 5, 'gender': 'male', 'probability': 1.0}, {'name': 'Igor', 'count': 345, 'gender': 'male', 'probability': 1.0}, {'name': 'Ilinca', 'count': 1, 'gender': 'female', 'probability': 1.0}, {'name': 'Ilya', 'count': 42, 'gender': 'male', 'probability': 0.98}, {'name': 'Ingemar', 'count': 5, 'gender': 'male', 'probability': 1.0}, {'name': 'Ingolfur', 'count': 1, 'gender': 'male', 'probability': 1.0}, {'name': 'Insu', 'gender': None}, {'name': 'Ioannis', 'count': 53, 'gender': 'male', 'probability': 1.0}, {'name': 'Ira', 'count': 101, 'gender': 'female', 'probability': 0.76}, {'name': 'Irmgard', 'count': 5, 'gender': 'female', 'probability': 1.0}, {'name': 'Israel', 'count': 385, 'gender': 'male', 'probability': 0.99}, {'name': 'Ivana', 'count': 509, 'gender': 'female', 'probability': 1.0}, {'name': 'Iztok', 'count': 7, 'gender': 'male', 'probability': 1.0}, {'name': 'Jaap', 'count': 14, 'gender': 'male', 'probability': 1.0}, {'name': 'Jacob', 'count': 1309, 'gender': 'male', 'probability': 1.0}, {'name': 'Jagir', 'gender': None}, {'name': 'Jai', 'count': 86, 'gender': 'male', 'probability': 0.7}, {'name': 'James', 'count': 6359, 'gender': 'male', 'probability': 0.99}, {'name': 'Jamison', 'count': 25, 'gender': 'male', 'probability': 0.92}, {'name': 'Jan', 'count': 1692, 'gender': 'male', 'probability': 0.6}, {'name': 'Jan-Ming', 'gender': None}, {'name': 'Janez', 'count': 10, 'gender': 'male', 'probability': 1.0}, {'name': 'Janna', 'count': 118, 'gender': 'female', 'probability': 1.0}, {'name': 'Janusz', 'count': 15, 'gender': 'male', 'probability': 1.0}, {'name': 'Jared', 'count': 667, 'gender': 'male', 'probability': 1.0}, {'name': 'Jarl', 'count': 4, 'gender': 'male', 'probability': 1.0}, {'name': 'Jason', 'count': 4927, 'gender': 'male', 'probability': 1.0}, {'name': 'Javier', 'count': 1975, 'gender': 'male', 'probability': 1.0}, {'name': 'Jay', 'count': 1882, 'gender': 'male', 'probability': 0.9}, {'name': 'Jean-Marc', 'count': 81, 'gender': 'male', 'probability': 1.0}, {'name': 'Jean-Philippe', 'count': 79, 'gender': 'male', 'probability': 1.0}, {'name': 'Jeet', 'count': 15, 'gender': 'male', 'probability': 0.93}, {'name': 'Jeff', 'count': 2925, 'gender': 'male', 'probability': 1.0}, {'name': 'Jeffrey', 'count': 932, 'gender': 'male', 'probability': 1.0}, {'name': 'Jennifer', 'count': 6717, 'gender': 'female', 'probability': 1.0}, {'name': 'Jens', 'count': 292, 'gender': 'male', 'probability': 1.0}, {'name': 'Jeremy', 'count': 2160, 'gender': 'male', 'probability': 0.99}, {'name': 'Jeroen', 'count': 113, 'gender': 'male', 'probability': 1.0}, {'name': 'Jessica', 'count': 6696, 'gender': 'female', 'probability': 1.0}, {'name': 'Jiali', 'count': 2, 'gender': 'female', 'probability': 1.0}, {'name': 'Jiang', 'count': 14, 'gender': 'male', 'probability': 0.71}, {'name': 'Jiantao', 'gender': None}, {'name': 'Jianxin', 'gender': None}, {'name': 'Jie', 'count': 29, 'gender': 'female', 'probability': 0.69}, {'name': 'Jikai', 'gender': None}, {'name': 'Jillian', 'count': 388, 'gender': 'female', 'probability': 0.99}, {'name': 'Jingde', 'gender': None}, {'name': 'Joakim', 'count': 213, 'gender': 'male', 'probability': 1.0}, {'name': 'Joaquin', 'count': 369, 'gender': 'male', 'probability': 1.0}, {'name': 'Joaquín', 'count': 369, 'gender': 'male', 'probability': 1.0}, {'name': 'Joel', 'count': 1472, 'gender': 'male', 'probability': 0.99}, {'name': 'Johann', 'count': 118, 'gender': 'male', 'probability': 0.99}, {'name': 'Johannes', 'count': 199, 'gender': 'male', 'probability': 1.0}, {'name': 'John', 'count': 9931, 'gender': 'male', 'probability': 0.99}, {'name': 'Jon', 'count': 1805, 'gender': 'male', 'probability': 1.0}, {'name': 'Jonas', 'count': 630, 'gender': 'male', 'probability': 1.0}, {'name': 'Jonathan', 'count': 3702, 'gender': 'male', 'probability': 1.0}, {'name': 'Joost', 'count': 28, 'gender': 'male', 'probability': 1.0}, {'name': 'Jordan', 'count': 1774, 'gender': 'male', 'probability': 0.77}, {'name': 'Jordi', 'count': 400, 'gender': 'male', 'probability': 1.0}, {'name': 'Jose', 'count': 5109, 'gender': 'male', 'probability': 0.99}, {'name': 'Joseph', 'count': 2213, 'gender': 'male', 'probability': 0.99}, {'name': 'Josh', 'count': 2597, 'gender': 'male', 'probability': 1.0}, {'name': 'Joshua', 'count': 1570, 'gender': 'male', 'probability': 0.99}, {'name': 'José', 'count': 5109, 'gender': 'male', 'probability': 0.99}, {'name': 'Jotun', 'gender': None}, {'name': 'Juha', 'count': 97, 'gender': 'male', 'probability': 1.0}, {'name': 'Juho', 'count': 68, 'gender': 'male', 'probability': 1.0}, {'name': 'Julia', 'count': 2171, 'gender': 'female', 'probability': 0.99}, {'name': 'Julian', 'count': 1080, 'gender': 'male', 'probability': 1.0}, {'name': 'Juliane', 'count': 59, 'gender': 'female', 'probability': 1.0}, {'name': 'Julie', 'count': 4263, 'gender': 'female', 'probability': 1.0}, {'name': 'Julien', 'count': 1066, 'gender': 'male', 'probability': 1.0}, {'name': 'Julio', 'count': 1052, 'gender': 'male', 'probability': 1.0}, {'name': 'Jun', 'count': 189, 'gender': 'male', 'probability': 0.94}, {'name': "Jun'ichi", 'gender': None}, {'name': 'Junfang', 'gender': None}, {'name': 'Justin', 'count': 2871, 'gender': 'male', 'probability': 1.0}, {'name': 'Jörn', 'count': 25, 'gender': 'male', 'probability': 1.0}, {'name': 'Kai', 'count': 207, 'gender': 'male', 'probability': 0.87}, {'name': 'Kaname', 'count': 2, 'gender': 'female', 'probability': 1.0}, {'name': 'Karissa', 'count': 68, 'gender': 'female', 'probability': 1.0}, {'name': 'Kasper', 'count': 110, 'gender': 'male', 'probability': 0.98}, {'name': 'Katerina', 'count': 392, 'gender': 'female', 'probability': 0.99}, {'name': 'Katharina', 'count': 244, 'gender': 'female', 'probability': 1.0}, {'name': 'Kathryn', 'count': 827, 'gender': 'female', 'probability': 1.0}, {'name': 'Katrina', 'count': 729, 'gender': 'female', 'probability': 1.0}, {'name': 'Kay', 'count': 709, 'gender': 'female', 'probability': 0.88}, {'name': 'Kazuharu', 'gender': None}, {'name': 'Kazuki', 'count': 16, 'gender': 'male', 'probability': 1.0}, {'name': 'Kenichi', 'count': 7, 'gender': 'male', 'probability': 1.0}, {'name': 'Kensaku', 'gender': None}, {'name': 'Kerstin', 'count': 151, 'gender': 'female', 'probability': 1.0}, {'name': 'Kevin', 'count': 5362, 'gender': 'male', 'probability': 1.0}, {'name': 'Kim', 'count': 3561, 'gender': 'female', 'probability': 0.88}, {'name': 'Kiran', 'count': 274, 'gender': 'female', 'probability': 0.51}, {'name': 'Kishore', 'count': 67, 'gender': 'male', 'probability': 0.99}, {'name': 'Koh-Ichiro', 'gender': None}, {'name': 'Koh-ichiro', 'gender': None}, {'name': 'Koji', 'count': 10, 'gender': 'male', 'probability': 0.9}, {'name': 'Konrad', 'count': 62, 'gender': 'male', 'probability': 1.0}, {'name': 'Konstantin', 'count': 66, 'gender': 'male', 'probability': 1.0}, {'name': 'Konstantinos', 'count': 87, 'gender': 'male', 'probability': 1.0}, {'name': 'Kranti', 'count': 1, 'gender': 'male', 'probability': 1.0}, {'name': 'Kristina', 'count': 913, 'gender': 'female', 'probability': 1.0}, {'name': 'Kristoffer', 'count': 141, 'gender': 'male', 'probability': 1.0}, {'name': 'Kunyaluk', 'gender': None}, {'name': 'Kurt', 'count': 421, 'gender': 'male', 'probability': 1.0}, {'name': 'Kyle', 'count': 1944, 'gender': 'male', 'probability': 0.99}, {'name': 'Kyowon', 'gender': None}, {'name': 'Kyungmin', 'count': 2, 'gender': 'female', 'probability': 0.5}, {'name': 'Ladislav', 'count': 26, 'gender': 'male', 'probability': 0.96}, {'name': 'Lakshmi', 'count': 68, 'gender': 'female', 'probability': 0.9}, {'name': 'Lana', 'count': 277, 'gender': 'female', 'probability': 0.99}, {'name': 'Larisa', 'count': 42, 'gender': 'female', 'probability': 1.0}, {'name': 'Lars', 'count': 450, 'gender': 'male', 'probability': 1.0}, {'name': 'Lasse', 'count': 259, 'gender': 'male', 'probability': 1.0}, {'name': 'Laura', 'count': 7953, 'gender': 'female', 'probability': 1.0}, {'name': 'Laurent', 'count': 571, 'gender': 'male', 'probability': 0.99}, {'name': 'Lawrence', 'count': 393, 'gender': 'male', 'probability': 1.0}, {'name': 'Layla', 'count': 99, 'gender': 'female', 'probability': 0.99}, {'name': 'Lea', 'count': 513, 'gender': 'female', 'probability': 0.97}, {'name': 'Lei', 'count': 54, 'gender': 'female', 'probability': 0.56}, {'name': 'Leonardo', 'count': 886, 'gender': 'male', 'probability': 1.0}, {'name': 'Leopold', 'count': 10, 'gender': 'male', 'probability': 1.0}, {'name': 'Leyla', 'count': 183, 'gender': 'female', 'probability': 0.97}, {'name': 'Li', 'count': 274, 'gender': 'female', 'probability': 0.66}, {'name': 'Lifang', 'count': 1, 'gender': 'female', 'probability': 1.0}, {'name': 'Lilian', 'count': 278, 'gender': 'female', 'probability': 0.94}, {'name': 'Lin', 'count': 192, 'gender': 'female', 'probability': 0.74}, {'name': 'Lincoln', 'count': 91, 'gender': 'male', 'probability': 0.98}, {'name': 'Lior', 'count': 85, 'gender': 'male', 'probability': 0.61}, {'name': 'Lisa', 'count': 6395, 'gender': 'female', 'probability': 1.0}, {'name': 'Lisle', 'count': 3, 'gender': 'female', 'probability': 1.0}, {'name': 'Liyang', 'gender': None}, {'name': 'Lorenzo', 'count': 496, 'gender': 'male', 'probability': 1.0}, {'name': 'Louis', 'count': 654, 'gender': 'male', 'probability': 0.99}, {'name': 'Luca', 'count': 1078, 'gender': 'male', 'probability': 0.99}, {'name': 'Ludovic', 'count': 187, 'gender': 'male', 'probability': 1.0}, {'name': 'Ludwig', 'count': 24, 'gender': 'male', 'probability': 1.0}, {'name': 'Luis', 'count': 4053, 'gender': 'male', 'probability': 1.0}, {'name': 'Lukas', 'count': 333, 'gender': 'male', 'probability': 1.0}, {'name': 'Luke', 'count': 1408, 'gender': 'male', 'probability': 1.0}, {'name': 'Luminita', 'count': 36, 'gender': 'female', 'probability': 1.0}, {'name': 'Lusine', 'count': 13, 'gender': 'female', 'probability': 1.0}, {'name': 'Lutz', 'count': 9, 'gender': 'male', 'probability': 1.0}, {'name': 'Lynda', 'count': 432, 'gender': 'female', 'probability': 1.0}, {'name': 'Maarten', 'count': 69, 'gender': 'male', 'probability': 1.0}, {'name': 'Macha', 'count': 9, 'gender': 'male', 'probability': 0.78}, {'name': 'Man', 'count': 152, 'gender': 'male', 'probability': 0.7}, {'name': 'Manfred', 'count': 39, 'gender': 'male', 'probability': 1.0}, {'name': 'Mannis', 'count': 1, 'gender': 'male', 'probability': 1.0}, {'name': 'Manuel', 'count': 1954, 'gender': 'male', 'probability': 1.0}, {'name': 'Marc', 'count': 1555, 'gender': 'male', 'probability': 1.0}, {'name': 'Marcelo', 'count': 810, 'gender': 'male', 'probability': 1.0}, {'name': 'Marcin', 'count': 132, 'gender': 'male', 'probability': 1.0}, {'name': 'Marco', 'count': 2493, 'gender': 'male', 'probability': 0.99}, {'name': 'Marcus', 'count': 970, 'gender': 'male', 'probability': 1.0}, {'name': 'Marghoob', 'gender': None}, {'name': 'Maria', 'count': 8467, 'gender': 'female', 'probability': 0.99}, {'name': 'Maria-Jesus', 'count': 1, 'gender': 'female', 'probability': 1.0}, {'name': 'Marie', 'count': 2299, 'gender': 'female', 'probability': 0.99}, {'name': 'Marinka', 'count': 12, 'gender': 'female', 'probability': 1.0}, {'name': 'Mario', 'count': 2056, 'gender': 'male', 'probability': 0.99}, {'name': 'Marjo', 'count': 71, 'gender': 'female', 'probability': 0.97}, {'name': 'Mark', 'count': 6178, 'gender': 'male', 'probability': 1.0}, {'name': 'Markus', 'count': 424, 'gender': 'male', 'probability': 1.0}, {'name': 'Marnix', 'count': 6, 'gender': 'male', 'probability': 1.0}, {'name': 'Marta', 'count': 1408, 'gender': 'female', 'probability': 1.0}, {'name': 'Marten', 'count': 37, 'gender': 'male', 'probability': 1.0}, {'name': 'Martin', 'count': 3568, 'gender': 'male', 'probability': 1.0}, {'name': 'Maryam', 'count': 302, 'gender': 'female', 'probability': 1.0}, {'name': 'Masao', 'count': 2, 'gender': 'male', 'probability': 1.0}, {'name': 'Masaru', 'count': 3, 'gender': 'male', 'probability': 1.0}, {'name': 'Mathieu', 'count': 545, 'gender': 'male', 'probability': 1.0}, {'name': 'Matias', 'count': 648, 'gender': 'male', 'probability': 1.0}, {'name': 'Matteo', 'count': 555, 'gender': 'male', 'probability': 1.0}, {'name': 'Matthew', 'count': 3338, 'gender': 'male', 'probability': 1.0}, {'name': 'Matthias', 'count': 257, 'gender': 'male', 'probability': 1.0}, {'name': 'Mattias', 'count': 106, 'gender': 'male', 'probability': 1.0}, {'name': 'Maxim', 'count': 68, 'gender': 'male', 'probability': 0.99}, {'name': 'Maximilian', 'count': 69, 'gender': 'male', 'probability': 1.0}, {'name': 'Mayumi', 'count': 41, 'gender': 'female', 'probability': 0.98}, {'name': 'Mehdi', 'count': 329, 'gender': 'male', 'probability': 0.99}, {'name': 'Mehmet', 'count': 2004, 'gender': 'male', 'probability': 0.99}, {'name': 'Melissa', 'count': 4541, 'gender': 'female', 'probability': 1.0}, {'name': 'Mengjun', 'gender': None}, {'name': 'Mengyao', 'count': 1, 'gender': 'female', 'probability': 1.0}, {'name': 'Michael', 'count': 11160, 'gender': 'male', 'probability': 1.0}, {'name': 'Michal', 'count': 360, 'gender': 'male', 'probability': 0.75}, {'name': 'Michel', 'count': 729, 'gender': 'male', 'probability': 0.96}, {'name': 'Michelle', 'count': 5658, 'gender': 'female', 'probability': 1.0}, {'name': 'Mick', 'count': 273, 'gender': 'male', 'probability': 0.99}, {'name': 'Miguel', 'count': 2065, 'gender': 'male', 'probability': 1.0}, {'name': 'Mihai', 'count': 89, 'gender': 'male', 'probability': 0.98}, {'name': 'Mikhail', 'count': 57, 'gender': 'male', 'probability': 1.0}, {'name': 'Mikko', 'count': 186, 'gender': 'male', 'probability': 1.0}, {'name': 'Mila', 'count': 116, 'gender': 'female', 'probability': 0.98}, {'name': 'Milad', 'count': 38, 'gender': 'male', 'probability': 0.95}, {'name': 'Min', 'count': 127, 'gender': 'female', 'probability': 0.8}, {'name': 'Mingwen', 'gender': None}, {'name': 'Minxian', 'gender': None}, {'name': 'Mirco', 'count': 58, 'gender': 'male', 'probability': 1.0}, {'name': 'Miroslav', 'count': 97, 'gender': 'male', 'probability': 1.0}, {'name': 'Mitchell', 'count': 345, 'gender': 'male', 'probability': 0.99}, {'name': 'Mohammad', 'count': 937, 'gender': 'male', 'probability': 1.0}, {'name': 'Morgan', 'count': 847, 'gender': 'female', 'probability': 0.64}, {'name': 'Morris', 'count': 49, 'gender': 'male', 'probability': 1.0}, {'name': 'Mostafa', 'count': 383, 'gender': 'male', 'probability': 1.0}, {'name': 'Moustapha', 'count': 21, 'gender': 'male', 'probability': 0.95}, {'name': 'Murat', 'count': 1531, 'gender': 'male', 'probability': 1.0}, {'name': 'Murray', 'count': 72, 'gender': 'male', 'probability': 0.96}, {'name': 'Naama', 'count': 12, 'gender': 'female', 'probability': 1.0}, {'name': 'Nam', 'count': 79, 'gender': 'male', 'probability': 0.78}, {'name': 'Nancy', 'count': 2716, 'gender': 'female', 'probability': 1.0}, {'name': 'Nansheng', 'gender': None}, {'name': 'Naoki', 'count': 6, 'gender': 'male', 'probability': 1.0}, {'name': 'Naoto', 'count': 3, 'gender': 'male', 'probability': 1.0}, {'name': 'Natalie', 'count': 2033, 'gender': 'female', 'probability': 1.0}, {'name': 'Natalja', 'count': 3, 'gender': 'female', 'probability': 1.0}, {'name': 'Nathan', 'count': 1632, 'gender': 'male', 'probability': 1.0}, {'name': 'Nathaniel', 'count': 234, 'gender': 'male', 'probability': 1.0}, {'name': 'Neda', 'count': 41, 'gender': 'female', 'probability': 0.98}, {'name': 'Neil', 'count': 1036, 'gender': 'male', 'probability': 1.0}, {'name': 'Nelson', 'count': 609, 'gender': 'male', 'probability': 0.99}, {'name': 'Ngan', 'count': 27, 'gender': 'female', 'probability': 0.93}, {'name': 'Nicholas', 'count': 1176, 'gender': 'male', 'probability': 1.0}, {'name': 'Nick', 'count': 3326, 'gender': 'male', 'probability': 0.99}, {'name': 'Niclas', 'count': 120, 'gender': 'male', 'probability': 0.99}, {'name': 'Nicola', 'count': 1226, 'gender': 'female', 'probability': 0.71}, {'name': 'Nicolas', 'count': 1953, 'gender': 'male', 'probability': 1.0}, {'name': 'Nicole', 'count': 4042, 'gender': 'female', 'probability': 1.0}, {'name': 'Nicoló', 'count': 58, 'gender': 'male', 'probability': 1.0}, {'name': 'Nikhil', 'count': 154, 'gender': 'male', 'probability': 1.0}, {'name': 'Niko', 'count': 183, 'gender': 'male', 'probability': 0.93}, {'name': 'Nils', 'count': 149, 'gender': 'male', 'probability': 0.99}, {'name': 'Noah', 'count': 309, 'gender': 'male', 'probability': 0.98}, {'name': 'Noam', 'count': 41, 'gender': 'male', 'probability': 0.8}, {'name': 'Nobal', 'gender': None}, {'name': 'Nolan', 'count': 95, 'gender': 'male', 'probability': 1.0}, {'name': 'Nur', 'count': 308, 'gender': 'female', 'probability': 0.89}, {'name': 'Nuria', 'count': 288, 'gender': 'female', 'probability': 1.0}, {'name': 'Nuwan', 'count': 12, 'gender': 'male', 'probability': 1.0}, {'name': 'Nyasha', 'count': 3, 'gender': 'male', 'probability': 1.0}, {'name': 'Oksana', 'count': 68, 'gender': 'female', 'probability': 1.0}, {'name': 'Ola', 'count': 300, 'gender': 'female', 'probability': 0.52}, {'name': 'Ole', 'count': 160, 'gender': 'male', 'probability': 0.99}, {'name': 'Oliver', 'count': 801, 'gender': 'male', 'probability': 1.0}, {'name': 'Olivier', 'count': 715, 'gender': 'male', 'probability': 0.99}, {'name': 'Omar', 'count': 1471, 'gender': 'male', 'probability': 1.0}, {'name': 'Osamu', 'count': 3, 'gender': 'male', 'probability': 1.0}, {'name': 'Owen', 'count': 240, 'gender': 'male', 'probability': 0.99}, {'name': 'Pablo', 'count': 1653, 'gender': 'male', 'probability': 1.0}, {'name': 'Patrick', 'count': 2877, 'gender': 'male', 'probability': 1.0}, {'name': 'Paul', 'count': 5931, 'gender': 'male', 'probability': 1.0}, {'name': 'Paula', 'count': 2298, 'gender': 'female', 'probability': 0.99}, {'name': 'Pavel', 'count': 217, 'gender': 'male', 'probability': 0.99}, {'name': 'Pavlo', 'count': 2, 'gender': 'male', 'probability': 1.0}, {'name': 'Pedro', 'count': 1631, 'gender': 'male', 'probability': 0.99}, {'name': 'Pekka', 'count': 57, 'gender': 'male', 'probability': 1.0}, {'name': 'Pelin', 'count': 104, 'gender': 'female', 'probability': 0.99}, {'name': 'Perry', 'count': 154, 'gender': 'male', 'probability': 0.94}, {'name': 'Peter', 'count': 4373, 'gender': 'male', 'probability': 1.0}, {'name': 'Philip', 'count': 1097, 'gender': 'male', 'probability': 1.0}, {'name': 'Philipp', 'count': 205, 'gender': 'male', 'probability': 1.0}, {'name': 'Philippe', 'count': 605, 'gender': 'male', 'probability': 0.99}, {'name': 'Phillip', 'count': 463, 'gender': 'male', 'probability': 1.0}, {'name': 'Piero', 'count': 104, 'gender': 'male', 'probability': 0.99}, {'name': 'Pierre', 'count': 852, 'gender': 'male', 'probability': 0.99}, {'name': 'Pierre-Yves', 'count': 23, 'gender': 'male', 'probability': 1.0}, {'name': 'Piotr', 'count': 176, 'gender': 'male', 'probability': 1.0}, {'name': 'Pontus', 'count': 76, 'gender': 'male', 'probability': 0.99}, {'name': 'Poul', 'count': 30, 'gender': 'male', 'probability': 1.0}, {'name': 'Pratap', 'count': 5, 'gender': 'male', 'probability': 1.0}, {'name': 'Preeti', 'count': 73, 'gender': 'female', 'probability': 0.84}, {'name': 'Punita', 'gender': None}, {'name': 'Qing', 'count': 27, 'gender': 'female', 'probability': 1.0}, {'name': 'Qingpeng', 'gender': None}, {'name': 'Qiyun', 'gender': None}, {'name': 'Quaid', 'count': 4, 'gender': 'male', 'probability': 1.0}, {'name': 'Quang', 'count': 23, 'gender': 'male', 'probability': 0.96}, {'name': 'Quanhu', 'gender': None}, {'name': 'Radek', 'count': 71, 'gender': 'male', 'probability': 0.97}, {'name': 'Rafael', 'count': 1245, 'gender': 'male', 'probability': 1.0}, {'name': 'Rainer', 'count': 84, 'gender': 'male', 'probability': 0.99}, {'name': 'Rajiv', 'count': 60, 'gender': 'male', 'probability': 1.0}, {'name': 'Rakesh', 'count': 191, 'gender': 'male', 'probability': 1.0}, {'name': 'Ralf', 'count': 86, 'gender': 'male', 'probability': 1.0}, {'name': 'Ramamurthy', 'gender': None}, {'name': 'Ramil', 'count': 11, 'gender': 'male', 'probability': 1.0}, {'name': 'Raoul', 'count': 70, 'gender': 'male', 'probability': 1.0}, {'name': 'Raphael', 'count': 430, 'gender': 'male', 'probability': 1.0}, {'name': 'Raquel', 'count': 878, 'gender': 'female', 'probability': 1.0}, {'name': 'Rasmus', 'count': 171, 'gender': 'male', 'probability': 0.99}, {'name': 'Ravindra', 'count': 55, 'gender': 'male', 'probability': 1.0}, {'name': 'Raymond', 'count': 770, 'gender': 'male', 'probability': 1.0}, {'name': 'Rebecca', 'count': 3188, 'gender': 'female', 'probability': 1.0}, {'name': 'Reda', 'count': 120, 'gender': 'male', 'probability': 0.77}, {'name': 'René', 'count': 640, 'gender': 'male', 'probability': 0.85}, {'name': 'Richard', 'count': 4381, 'gender': 'male', 'probability': 1.0}, {'name': 'Rita', 'count': 1097, 'gender': 'female', 'probability': 1.0}, {'name': 'Rob', 'count': 2262, 'gender': 'male', 'probability': 0.99}, {'name': 'Robert', 'count': 5245, 'gender': 'male', 'probability': 1.0}, {'name': 'Robyn', 'count': 641, 'gender': 'female', 'probability': 0.97}, {'name': 'Roderic', 'count': 6, 'gender': 'male', 'probability': 1.0}, {'name': 'Roderick', 'count': 76, 'gender': 'male', 'probability': 1.0}, {'name': 'Rodrigo', 'count': 1277, 'gender': 'male', 'probability': 1.0}, {'name': 'Roger', 'count': 1220, 'gender': 'male', 'probability': 1.0}, {'name': 'Roman', 'count': 449, 'gender': 'male', 'probability': 1.0}, {'name': 'Rong', 'count': 26, 'gender': 'female', 'probability': 0.92}, {'name': 'Rongxia', 'gender': None}, {'name': 'Ross', 'count': 692, 'gender': 'male', 'probability': 0.95}, {'name': 'Ruchi', 'count': 50, 'gender': 'female', 'probability': 1.0}, {'name': 'Ruping', 'gender': None}, {'name': 'Russ', 'count': 246, 'gender': 'male', 'probability': 1.0}, {'name': 'Rutger', 'count': 26, 'gender': 'male', 'probability': 1.0}, {'name': 'Ryan', 'count': 4755, 'gender': 'male', 'probability': 0.99}, {'name': 'Sadique', 'count': 2, 'gender': 'male', 'probability': 1.0}, {'name': 'Sahand', 'count': 1, 'gender': 'male', 'probability': 1.0}, {'name': 'Sam', 'count': 3336, 'gender': 'male', 'probability': 0.76}, {'name': 'Sampo', 'count': 4, 'gender': 'male', 'probability': 1.0}, {'name': 'Samuel', 'count': 1411, 'gender': 'male', 'probability': 1.0}, {'name': 'Sandeep', 'count': 272, 'gender': 'male', 'probability': 0.89}, {'name': 'Sanghyuk', 'count': 1, 'gender': 'male', 'probability': 1.0}, {'name': 'Sangwoo', 'gender': None}, {'name': 'Santiago', 'count': 567, 'gender': 'male', 'probability': 1.0}, {'name': 'Sarah', 'count': 8371, 'gender': 'female', 'probability': 1.0}, {'name': 'Sarven', 'gender': None}, {'name': 'Sashank', 'count': 2, 'gender': 'male', 'probability': 1.0}, {'name': 'Satoru', 'count': 4, 'gender': 'male', 'probability': 1.0}, {'name': 'Scott', 'count': 3450, 'gender': 'male', 'probability': 1.0}, {'name': 'Sean', 'count': 2515, 'gender': 'male', 'probability': 1.0}, {'name': 'Sebastian', 'count': 1459, 'gender': 'male', 'probability': 1.0}, {'name': 'See-Kiong', 'gender': None}, {'name': 'Seishi', 'gender': None}, {'name': 'Seiya', 'count': 5, 'gender': 'male', 'probability': 1.0}, {'name': 'Serafim', 'count': 4, 'gender': 'male', 'probability': 1.0}, {'name': 'Sergei', 'count': 38, 'gender': 'male', 'probability': 0.97}, {'name': 'Sergey', 'count': 117, 'gender': 'male', 'probability': 0.99}, {'name': 'Seyed', 'count': 8, 'gender': 'male', 'probability': 1.0}, {'name': 'Shankar', 'count': 35, 'gender': 'male', 'probability': 1.0}, {'name': 'Shibu', 'count': 7, 'gender': 'male', 'probability': 1.0}, {'name': 'Shilin', 'gender': None}, {'name': 'Shintaro', 'count': 7, 'gender': 'male', 'probability': 1.0}, {'name': 'Sikander', 'count': 5, 'gender': 'male', 'probability': 1.0}, {'name': 'Silvio', 'count': 183, 'gender': 'male', 'probability': 1.0}, {'name': 'Simon', 'count': 2444, 'gender': 'male', 'probability': 0.99}, {'name': 'Simone', 'count': 1086, 'gender': 'female', 'probability': 0.55}, {'name': 'Sirko', 'gender': None}, {'name': 'Sitanshu', 'count': 1, 'gender': 'male', 'probability': 1.0}, {'name': 'Sofie', 'count': 299, 'gender': 'female', 'probability': 1.0}, {'name': 'Sohan', 'count': 7, 'gender': 'male', 'probability': 1.0}, {'name': 'Soile', 'count': 11, 'gender': 'female', 'probability': 0.91}, {'name': 'Sol', 'count': 286, 'gender': 'female', 'probability': 0.89}, {'name': 'Solon', 'count': 6, 'gender': 'male', 'probability': 1.0}, {'name': 'Son', 'count': 37, 'gender': 'male', 'probability': 0.89}, {'name': 'Sonia', 'count': 1555, 'gender': 'female', 'probability': 1.0}, {'name': 'Sophia', 'count': 473, 'gender': 'female', 'probability': 0.99}, {'name': 'Sophie', 'count': 1836, 'gender': 'female', 'probability': 1.0}, {'name': 'Spencer', 'count': 334, 'gender': 'male', 'probability': 0.99}, {'name': 'Stefan', 'count': 923, 'gender': 'male', 'probability': 1.0}, {'name': 'Stefanie', 'count': 466, 'gender': 'female', 'probability': 1.0}, {'name': 'Stefano', 'count': 732, 'gender': 'male', 'probability': 1.0}, {'name': 'Steffen', 'count': 120, 'gender': 'male', 'probability': 0.98}, {'name': 'Sten', 'count': 57, 'gender': 'male', 'probability': 1.0}, {'name': 'Stephan', 'count': 284, 'gender': 'male', 'probability': 1.0}, {'name': 'Stephen', 'count': 2608, 'gender': 'male', 'probability': 1.0}, {'name': 'Steve', 'count': 3965, 'gender': 'male', 'probability': 1.0}, {'name': 'Steven', 'count': 2600, 'gender': 'male', 'probability': 1.0}, {'name': 'Stuart', 'count': 759, 'gender': 'male', 'probability': 0.99}, {'name': 'Sujai', 'count': 1, 'gender': 'male', 'probability': 1.0}, {'name': 'Susan', 'count': 3203, 'gender': 'female', 'probability': 1.0}, {'name': 'Susana', 'count': 729, 'gender': 'female', 'probability': 1.0}, {'name': 'Susanna-Assunta', 'gender': None}, {'name': 'Sven', 'count': 253, 'gender': 'male', 'probability': 0.97}, {'name': 'Sébastien', 'count': 801, 'gender': 'male', 'probability': 1.0}, {'name': 'Søren', 'count': 148, 'gender': 'male', 'probability': 1.0}, {'name': 'Tallulah', 'count': 4, 'gender': 'female', 'probability': 1.0}, {'name': 'Tamara', 'count': 999, 'gender': 'female', 'probability': 1.0}, {'name': 'Tamil', 'count': 5, 'gender': 'female', 'probability': 0.6}, {'name': 'Tanja', 'count': 365, 'gender': 'female', 'probability': 0.99}, {'name': 'Tao', 'count': 27, 'gender': 'male', 'probability': 0.7}, {'name': 'Tatyana', 'count': 78, 'gender': 'female', 'probability': 1.0}, {'name': 'Terry', 'count': 1429, 'gender': 'male', 'probability': 0.78}, {'name': 'Theo', 'count': 219, 'gender': 'male', 'probability': 0.97}, {'name': 'Theodore', 'count': 46, 'gender': 'male', 'probability': 0.96}, {'name': 'Thilo', 'count': 11, 'gender': 'male', 'probability': 1.0}, {'name': 'Thomas', 'count': 3753, 'gender': 'male', 'probability': 1.0}, {'name': 'Tiejun', 'gender': None}, {'name': 'Till', 'count': 25, 'gender': 'male', 'probability': 0.92}, {'name': 'Tim', 'count': 2949, 'gender': 'male', 'probability': 0.99}, {'name': 'Timothy', 'count': 809, 'gender': 'male', 'probability': 1.0}, {'name': 'Tobias', 'count': 352, 'gender': 'male', 'probability': 1.0}, {'name': 'Todd', 'count': 847, 'gender': 'male', 'probability': 1.0}, {'name': 'Tom', 'count': 3736, 'gender': 'male', 'probability': 1.0}, {'name': 'Tomasz', 'count': 122, 'gender': 'male', 'probability': 1.0}, {'name': 'Tomi', 'count': 106, 'gender': 'male', 'probability': 0.92}, {'name': 'Tony', 'count': 3071, 'gender': 'male', 'probability': 1.0}, {'name': 'Tracy', 'count': 1709, 'gender': 'female', 'probability': 0.95}, {'name': 'Travers', 'count': 9, 'gender': 'male', 'probability': 1.0}, {'name': 'Trevor', 'count': 758, 'gender': 'male', 'probability': 1.0}, {'name': 'Trisha', 'count': 253, 'gender': 'female', 'probability': 1.0}, {'name': 'Tyler', 'count': 1497, 'gender': 'male', 'probability': 0.98}, {'name': 'Tzung-Fu', 'gender': None}, {'name': 'Tālis', 'count': 2, 'gender': 'female', 'probability': 1.0}, {'name': 'Ueli', 'count': 3, 'gender': 'male', 'probability': 1.0}, {'name': 'Ugis', 'count': 1, 'gender': 'male', 'probability': 1.0}, {'name': 'Ulf', 'count': 71, 'gender': 'male', 'probability': 1.0}, {'name': 'Ulrich', 'count': 40, 'gender': 'male', 'probability': 1.0}, {'name': 'Valentin', 'count': 299, 'gender': 'male', 'probability': 1.0}, {'name': 'Vassilios', 'count': 3, 'gender': 'male', 'probability': 1.0}, {'name': 'Vassily', 'count': 1, 'gender': 'male', 'probability': 1.0}, {'name': 'Vicente', 'count': 267, 'gender': 'male', 'probability': 1.0}, {'name': 'Victoria', 'count': 1879, 'gender': 'female', 'probability': 1.0}, {'name': 'Vijay', 'count': 319, 'gender': 'male', 'probability': 0.98}, {'name': 'Vinay', 'count': 217, 'gender': 'male', 'probability': 1.0}, {'name': 'Vincent', 'count': 1424, 'gender': 'male', 'probability': 1.0}, {'name': 'Vineet', 'count': 60, 'gender': 'male', 'probability': 0.95}, {'name': 'Vinhthuy', 'gender': None}, {'name': 'Virginie', 'count': 437, 'gender': 'female', 'probability': 1.0}, {'name': 'Virpi', 'count': 17, 'gender': 'female', 'probability': 1.0}, {'name': 'Volkhard', 'gender': None}, {'name': 'Volodymyr', 'count': 3, 'gender': 'male', 'probability': 1.0}, {'name': 'Wan-Ping', 'gender': None}, {'name': 'Waraluk', 'gender': None}, {'name': 'Wei', 'count': 203, 'gender': 'female', 'probability': 0.54}, {'name': 'Weidong', 'count': 3, 'gender': 'male', 'probability': 1.0}, {'name': 'Weifan', 'gender': None}, {'name': 'Wen', 'count': 73, 'gender': 'female', 'probability': 0.77}, {'name': 'Werner', 'count': 80, 'gender': 'male', 'probability': 1.0}, {'name': 'Whitney', 'count': 497, 'gender': 'female', 'probability': 0.99}, {'name': 'William', 'count': 2626, 'gender': 'male', 'probability': 1.0}, {'name': 'Winfried', 'count': 6, 'gender': 'male', 'probability': 1.0}, {'name': 'Wing', 'count': 104, 'gender': 'female', 'probability': 0.74}, {'name': 'Wojciech', 'count': 30, 'gender': 'male', 'probability': 1.0}, {'name': 'Wolfgang', 'count': 95, 'gender': 'male', 'probability': 1.0}, {'name': 'Wubin', 'gender': None}, {'name': 'Xavier', 'count': 686, 'gender': 'male', 'probability': 0.99}, {'name': 'Xiahan', 'gender': None}, {'name': 'Xiao', 'count': 102, 'gender': 'female', 'probability': 0.74}, {'name': 'Xiao-Li', 'gender': None}, {'name': 'Xiaohui', 'count': 3, 'gender': 'female', 'probability': 0.67}, {'name': 'Xiaowu', 'gender': None}, {'name': 'Xin', 'count': 35, 'gender': 'female', 'probability': 0.63}, {'name': 'Xuebin', 'count': 2, 'gender': 'female', 'probability': 1.0}, {'name': "Ya'ara", 'gender': None}, {'name': 'Yadong', 'gender': None}, {'name': 'Yan', 'count': 203, 'gender': 'male', 'probability': 0.51}, {'name': 'Yanbo', 'gender': None}, {'name': 'Yang', 'count': 93, 'gender': 'male', 'probability': 0.65}, {'name': 'Yanli', 'gender': None}, {'name': 'Yanni', 'count': 24, 'gender': 'female', 'probability': 0.67}, {'name': 'Yannick', 'count': 282, 'gender': 'male', 'probability': 0.99}, {'name': 'Yardena', 'gender': None}, {'name': 'Yelena', 'count': 25, 'gender': 'female', 'probability': 1.0}, {'name': 'Yi', 'count': 68, 'gender': 'female', 'probability': 0.56}, {'name': 'Yi-Bao', 'gender': None}, {'name': 'Yinan', 'count': 2, 'gender': 'female', 'probability': 1.0}, {'name': 'Ying-Wooi', 'gender': None}, {'name': 'Yong-Huan', 'gender': None}, {'name': 'Yongchao', 'gender': None}, {'name': 'Yongkyu', 'gender': None}, {'name': 'Yotsawat', 'gender': None}, {'name': 'Youri', 'count': 20, 'gender': 'male', 'probability': 1.0}, {'name': 'Yu', 'count': 110, 'gender': 'female', 'probability': 0.53}, {'name': 'Yu-Jung', 'count': 1, 'gender': 'female', 'probability': 1.0}, {'name': 'Yuanfeng', 'gender': None}, {'name': 'Yuichi', 'count': 8, 'gender': 'male', 'probability': 1.0}, {'name': 'Yungang', 'gender': None}, {'name': 'Yurong', 'gender': None}, {'name': 'Yusuke', 'count': 28, 'gender': 'male', 'probability': 1.0}, {'name': 'Yves', 'count': 265, 'gender': 'male', 'probability': 0.98}, {'name': 'Zalman', 'count': 1, 'gender': 'male', 'probability': 1.0}, {'name': 'Zasha', 'count': 10, 'gender': 'female', 'probability': 1.0}, {'name': 'Zhandong', 'gender': None}, {'name': 'Zhe', 'count': 7, 'gender': 'male', 'probability': 0.86}, {'name': 'Zheng', 'count': 30, 'gender': 'male', 'probability': 0.87}, {'name': 'Zhengdeng', 'gender': None}, {'name': 'Zhi', 'count': 26, 'gender': 'male', 'probability': 0.73}, {'name': 'Zhi-Min', 'gender': None}, {'name': 'Zhiping', 'gender': None}, {'name': 'Zhixing', 'gender': None}, {'name': 'Zhong', 'count': 4, 'gender': 'male', 'probability': 0.75}, {'name': 'Zsuzsanna', 'count': 21, 'gender': 'female', 'probability': 1.0}]

In [104]:
genders_fixed = {}
for name in genders_save:
    try:
        count = name['count']
        probability = name['probability']
    except KeyError:
        count = None
        probability = None
    genders_fixed[name['name']] = {'count':count, 'gender':name['gender'], 'probability':probability}

no_gender = []
for name in genders_fixed:
    if not genders_fixed[name]['gender']:
        no_gender.append(name)

no_gender.sort()
print(len(no_gender))
print(no_gender)


98
['Abhaya', 'Adedapo', 'Ajasja', 'Anna-Sapfo', 'Ayton', 'Basuthkar', 'Beifang', 'Bortolo', 'Chenggang', 'Chengkun', 'Chien-Chih', 'Chiranjib', 'Chuen-Liang', 'Colbert', 'Devdatt', 'Dimitrios-Georgios', 'Dingcheng', 'Dongjun', 'Dongqing', 'Ence', 'Ergude', 'Fei-Yang', 'Fumihide', 'Guo-Cheng', 'Haibao', 'Hailiang', 'Hidetoshi', 'Hongfang', 'Hongjie', 'Hua-Lin', 'Hyeshik', 'Insu', 'Jagir', 'Jan-Ming', 'Jiantao', 'Jianxin', 'Jikai', 'Jingde', 'Jotun', "Jun'ichi", 'Junfang', 'Kazuharu', 'Kensaku', 'Koh-Ichiro', 'Koh-ichiro', 'Kunyaluk', 'Kyowon', 'Liyang', 'Marghoob', 'Mengjun', 'Mingwen', 'Minxian', 'Nansheng', 'Nobal', 'Punita', 'Qingpeng', 'Qiyun', 'Quanhu', 'Ramamurthy', 'Rongxia', 'Ruping', 'Sangwoo', 'Sarven', 'See-Kiong', 'Seishi', 'Shilin', 'Sirko', 'Susanna-Assunta', 'Tiejun', 'Tzung-Fu', 'Vinhthuy', 'Volkhard', 'Wan-Ping', 'Waraluk', 'Weifan', 'Wubin', 'Xiahan', 'Xiao-Li', 'Xiaowu', "Ya'ara", 'Yadong', 'Yanbo', 'Yanli', 'Yardena', 'Yi-Bao', 'Ying-Wooi', 'Yong-Huan', 'Yongchao', 

... For these names that didn't have genders in genderize.io, I tried [GenderAPI](https://gender-api.com)

In [146]:
import json
from urllib.request import urlopen

key = "insert your key here"

In [None]:
for name in no_gender:
    url = "https://gender-api.com/get?key=" + key + "&name={}".format(name.lower())
    response = urlopen(url)
    decoded = response.read().decode('utf-8')
    data = json.loads(decoded)
    print(data)
    genders_fixed[name] = {'count':data['samples'], 'probability':(float(data['accuracy'])/100.0), 'gender':data['gender']}

In [147]:
print(genders_fixed['Abhaya'])

{'probability': 0.7, 'gender': 'male', 'count': 119}


In [151]:
no_gender = []
for key in genders_fixed:
    gender = genders_fixed[key]['gender']
    if not gender or gender == "unknown":
        no_gender.append(key)

no_gender.sort()
print(len(no_gender))
print(no_gender)

37
['Ajasja', 'Anna-Sapfo', 'Basuthkar', 'Chien-Chih', 'Chuen-Liang', 'Dimitrios-Georgios', 'Ergude', 'Fei-Yang', 'Fumihide', 'Guo-Cheng', 'Hongfang', 'Hua-Lin', 'Hyeshik', 'Jan-Ming', "Jun'ichi", 'Koh-Ichiro', 'Koh-ichiro', 'Kunyaluk', 'Mingwen', 'Minxian', 'Nansheng', 'Quanhu', 'See-Kiong', 'Susanna-Assunta', 'Tzung-Fu', 'Vinhthuy', 'Wan-Ping', 'Xiahan', 'Xiao-Li', "Ya'ara", 'Yi-Bao', 'Ying-Wooi', 'Yong-Huan', 'Yotsawat', 'Zhandong', 'Zhengdeng', 'Zhi-Min']


With only 37 names, I will try to enter genders manually by doing a google image search...

In [170]:
manual_genders = {
    'Ajasja': 'male',
    'Anna-Sapfo': 'female',
    'Basuthkar': 'androgenous',
    'Chien-Chih': 'male',
    'Chuen-Liang': 'male',
    'Dimitrios-Georgios': 'male',
    'Ergude': 'female',
    'Fei-Yang': 'androgenous',
    'Fumihide': 'male',
    'Guo-Cheng': 'male',
    'Hongfang': 'androgenous',
    'Hua-Lin': 'female',
    'Hyeshik': 'male',
    'Jan-Ming': 'male',
    "Jun'ichi": 'male',
    'Koh-Ichiro': 'male',
    'Koh-ichiro': 'male',
    'Kunyaluk': 'female',
    'Mingwen': 'female',
    'Minxian': 'androgenous',
    'Nansheng': 'female',
    'Quanhu': 'unknown',
    'Rogier': 'male',
    'See-Kiong': 'male',
    'Susanna-Assunta': 'female',
    'Tzung-Fu': 'male',
    'Vinhthuy': 'male',
    'Wan-Ping': 'female',
    'Xiahan': 'androgenous',
    'Xiao-Li': 'androgenous',
    "Ya'ara": 'female',
    'Yi-Bao': 'male',
    'Ying-Wooi': 'androgenous',
    'Yong-Huan': 'male',
    'Yotsawat': 'male',
    'Zhandong': 'male',
    'Zhengdeng': 'male',
    'Zhi-Min': 'androgenous'}

## Getting Genders - Stats

Now it's time to start working on stats. Best to get gender assignments into columns along with names. Going to generate csv files with `pandas` in order to do analysis elsewhere. Using similar code from [Data Frame Generation](xml_parsing.ipynb#DataFrame-generation)

In [162]:
def get_gender_probability(name):
    """
    Requires `genders_fixed` and `manual_genders` from above
    """
    try:
        stats = genders_fixed[name]
        if stats['gender'] == 'male':
            p_male = stats['probability']
        elif stats['gender'] == 'female':
            p_male = 1 - stats['probability']
        else:
            p_male = None
    except KeyError:
        gender = manual_genders[name]
        if gender == 'male':
            p_male = 1.0
        elif gender == 'female':
            p_male = 0
        else:
            p_male = None
    
    return name, p_male

def make_csv_from_xml(xml_file, name):
    col_names = ["Date", "Journal", 
                 "First Author", "Last Author", "Second Author", "Penultimate Author", "Other Authors"]
    df = pd.DataFrame()

    for article in parse_pubmed_xml(xml_file):
        first, last, second, penultimate, others = score_authors(article.authors)
        fa = get_gender_probability(first.first_name)
        if last:
            la = get_gender_probability(last.first_name)
        else:
            la = None
        if second:
            sa = get_gender_probability(second.first_name)
        else:
            sa = None
        if penultimate:
            pa = get_gender_probability(penultimate.first_name)
        else:
            pa = None
        if others:
            oa = [get_gender_probability(o.first_name) for o in others]
        else:
            oa = []
        
        row = pd.Series([article.pubdate, article.journal, fa, la, sa, pa, oa],
                        name=article.pmid, index=col_names)
        df = df.append(row)

    df.to_csv(name)
    


In [171]:
make_csv_from_xml('github_pubs.xml', 'github_pubs.csv')

KeyError: 'Ruslan'