# Gender Prediction using Sound

GitHub Repository: https://github.com/skhiearth/Gender-Prediction-using-Sound

Analysing the gender distribution of children's book writers and use sound to match names to gender. Inspired by the DataCamp project of the same name by Tufool Alnuaimi.

Using the Python package Fuzzy to find out the genders of authors that have appeared in the New York Times Best Seller list for Children's Picture books. First, using fuzzy (sound) name matching, we search for author names in a dataset provided by the US Social Security Administration that contains names and genders of all individuals who have applied for Social Security Cards. Next, we'll aggregate the author dataset by including gender. Finally, we will use the new dataset to plot the gender distribution of children's picture books authors over time to get a better way to match names than spelling.

In [1]:
# Importing the required libraries
import fuzzy as fz
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
# Reading in dataset for the New York Times Best Seller list for Children's Picture books
author_df = pd.read_csv('datasets/nytkids_yearly.csv', sep=';')
author_df.head(3)

Unnamed: 0,Year,Book Title,Author,Besteller this year
0,2017,DRAGONS LOVE TACOS,Adam Rubin,49
1,2017,THE WONDERFUL THINGS YOU WILL BE,Emily Winfield Martin,48
2,2017,THE DAY THE CRAYONS QUIT,Drew Daywalt,44


In [3]:
# Extracting the authors' first names
first_name = []
for name in author_df['Author']:
    first_name.append(name.split()[0])

# Adding first_name as a column to author_df
author_df['first_name'] = first_name
author_df.head(3)

Unnamed: 0,Year,Book Title,Author,Besteller this year,first_name
0,2017,DRAGONS LOVE TACOS,Adam Rubin,49,Adam
1,2017,THE WONDERFUL THINGS YOU WILL BE,Emily Winfield Martin,48,Emily
2,2017,THE DAY THE CRAYONS QUIT,Drew Daywalt,44,Drew


In [4]:
# Extracting the nysiis equivalent of the authors' first name
nysiis_name = []
for name in author_df['first_name']:
    nysiis_name.append(fz.nysiis(name))

# Adding nysiis_name as a column to author_df
author_df['nysiis_name'] = nysiis_name
author_df.head(3)

Unnamed: 0,Year,Book Title,Author,Besteller this year,first_name,nysiis_name
0,2017,DRAGONS LOVE TACOS,Adam Rubin,49,Adam,ADAN
1,2017,THE WONDERFUL THINGS YOU WILL BE,Emily Winfield Martin,48,Emily,ENALY
2,2017,THE DAY THE CRAYONS QUIT,Drew Daywalt,44,Drew,DR


In [5]:
# Reading in dataset containing unique NYSIIS versions of baby names
babies_df = pd.read_csv('datasets/babynames_nysiis.csv', sep=';')
babies_df.head(3)

Unnamed: 0,babynysiis,perc_female,perc_male
0,,62.5,37.5
1,RAX,63.64,36.36
2,ESAR,44.44,55.56


Here, `perc_female` and `perc_male` are the percentage of times the name appeared as a female name and otherwise. We use these numeric values to assign a definitive gender to each NYSIIS version.

In [6]:
# Extracting the NYSIIS versions' definitive gender
gender = []
for gen in range(len(babies_df)):
    if(babies_df.perc_female[gen] > babies_df.perc_male[gen]):
        gender.append('F')
    elif(babies_df.perc_female[gen] < babies_df.perc_male[gen]):
        gender.append('M')
    else:
        gender.append('N')
        
# Adding gender as a column to babies_df
babies_df['gender'] = gender
babies_df.head(3)

Unnamed: 0,babynysiis,perc_female,perc_male,gender
0,,62.5,37.5,F
1,RAX,63.64,36.36,F
2,ESAR,44.44,55.56,M
