# Comparison

By William Conley

---

This file is used to compare the proportion of men to women on the sunshine list between 2020 and 1996 by using the name-gender data on both data frames

In [53]:
#importing used libraries
import csv
import re
from functools import reduce
import numpy as np
import pandas as pd

# this aux function reads the CSV file and returns the data in a Python dictionary
def get_data_csv(link):
    collection = []
    with open(link, 'r') as f:
        for line in csv.DictReader(f):
            collection.append(line)
        return collection

sunshine_2020 = pd.DataFrame(get_data_csv("sunshine2020.csv"))
sunshine_1996 = pd.DataFrame(get_data_csv("sunshine1996.csv"))
name_gender = pd.DataFrame(get_data_csv("name_gender.csv"))

# Add the gender and probability columns to each row depending on the name
sunshine_1996 = sunshine_1996.merge(name_gender, left_on="First Name", right_on="name", how="left")
sunshine_1996.drop(columns=["name"], inplace=True)
sunshine_2020 = sunshine_2020.merge(name_gender, left_on="First name", right_on="name", how="left")
sunshine_2020.drop(columns=["name"], inplace=True)

# Convert the salary table entry to an int
sunshine_1996["Salary Paid"] = sunshine_1996["Salary Paid"].replace('[\$,]', '', regex=True).astype(float)
sunshine_2020["Salary paid"] = sunshine_2020["Salary paid"].replace('[\$,]', '', regex=True).astype(float)

In [42]:
n_men = sum(sunshine_1996["gender"] == "M")
n_women = sum(sunshine_1996["gender"] == "F")
p_men = n_men / (n_men + n_women) * 100
p_women = n_women / (n_men + n_women) * 100
print("In 1996 there were ", len(sunshine_1996.index), " men and women on this sunshine list")
print("Proportionally, ", np.round(p_men, 2), "% of the list were men, and ", np.round(p_women, 2), "% were women")

In 1996 there were  4501  men and women on this sunshine list
Proportionally,  77.31 % of the list were men, and  22.692533803644917 % were women


In [59]:
n_men = sum(sunshine_2020["gender"] == "M")
n_women = sum(sunshine_2020["gender"] == "F")
p_men = n_men / (n_men + n_women) * 100
p_women = n_women / (n_men + n_women) * 100
print("In 2020 there are ", len(sunshine_2020.index), " men and women on this sunshine list")
print("Proportionally, ", np.round(p_men, 2), "% of the list are men, and ", np.round(p_women, 2), "% are women")

In 2020 there are  205606  men and women on this sunshine list
Proportionally,  49.9 % of the list are men, and  50.1 % are women


Looking at the difference between these two sunshine lists, we can see that in 1996 there were over 3 men to one woman on the sunshine list, while today there the ratio is essentially the same (0.9:1)

In [78]:
sal_men = sum(sunshine_1996[sunshine_1996["gender"] == "M"]["Salary Paid"])
sal_women = sum(sunshine_1996[sunshine_1996["gender"] == "F"]["Salary Paid"])
n_men = sum(sunshine_1996["gender"] == "M")
n_women = sum(sunshine_1996["gender"] == "F")
p_men = sal_men / (sal_men + sal_women) * 100
p_women = sal_women / (sal_men + sal_women) * 100
print("In 1996 there was $", sum(sunshine_1996["Salary Paid"]), " in total salaries paid on the sunshine list")
print("Proportionally, ", np.round(p_men, 2), "% of the money was made by men, and ", np.round(p_women, 2), "%  of the money was made by women")
print("On average, men made $", np.round(sal_men / n_men, 2), " while on average women made $", np.round(sal_women / n_women, 2))

In 1996 there was $ 546849796.3099989  in total salaries paid on the sunshine list
Proportionally,  77.58 % of the money was made by men, and  22.42 %  of the money was made by women
On average, men made $ 123616.49  while on average women made $ 121702.74


In [76]:
sal_men = sum(sunshine_2020[sunshine_2020["gender"] == "M"]["Salary paid"])
sal_women = sum(sunshine_2020[sunshine_2020["gender"] == "F"]["Salary paid"])
n_men = sum(sunshine_2020["gender"] == "M")
n_women = sum(sunshine_2020["gender"] == "F")
p_men = sal_men / (sal_men + sal_women) * 100
p_women = sal_women / (sal_men + sal_women) * 100
total_money = sum(sunshine_2020["Salary paid"])
print("In 2020 there was $", total_money, " in total salaries paid on the sunshine list")
print("Proportionally, ", np.round(p_men, 2), "% of the money was made by men, and ", np.round(p_women, 2), "%  of the money was made by women")
print("On average, men made $", np.round(sal_men / n_men, 2), " while on average women made $", np.round(sal_women / n_women, 2))

In 2020 there was $ 25879686837.354095  in total salaries paid on the sunshine list
Proportionally,  51.81 % of the money was made by men, and  48.19 %  of the money was made by women
On average, men made $ 130320.7  while on average women made $ 120737.52


Although women are being more proportionally represented on the sunshine list, that equity of representation still includes an inequity in pay of around $10,000