Given the changing political landscape in the U.S. currently and changes to curricula in higher education with regards to diversity, we wanted to ask the question: **How does racial homogeneity impact the presence and quality of gender studies programs?**

To begin, we obtained a CSV file with information about universities and colleges from the government website [National Center for Education Statistics (NCES).](https://nces.ed.gov/)

In [None]:
import pandas as pd
from google.colab import drive
drive.mount('/content/drive')

df_colleges = pd.read_csv('/content/drive/My Drive/colleges.csv')
df_colleges = df_colleges.rename(columns={'Institution Name': 'Name', 'Total men': 'Men total', 'Total women': 'Women total'})

df_colleges.head()

Mounted at /content/drive


Unnamed: 0,UnitID,Name,State,Affiliation,Urbanization,Total,Men total,Women total,American Indian or Alaska Native total,Asian total,Black or African American total,Hispanic total,Native Hawaiian or Other Pacific Islander total,White total,Two or more races total
0,177834,A T Still University of Health Sciences,MO,3.0,33,,,,,,,,,,
1,180203,Aaniiih Nakoda College,MT,1.0,43,133.0,59.0,74.0,121.0,0.0,0.0,0.0,0.0,12.0,0.0
2,222178,Abilene Christian University,TX,4.0,12,3188.0,1279.0,1909.0,16.0,62.0,214.0,570.0,0.0,2000.0,142.0
3,497037,Abilene Christian University-Undergraduate Online,TX,4.0,21,918.0,146.0,772.0,3.0,13.0,179.0,217.0,0.0,413.0,31.0
4,138558,Abraham Baldwin Agricultural College,GA,1.0,32,3768.0,1489.0,2279.0,12.0,49.0,312.0,374.0,2.0,2936.0,52.0


In [None]:
values_df = pd.read_csv('/content/drive/My Drive/valuelabels.csv')
values_df.head()

Unnamed: 0,VariableName,Value,ValueLabel
0,Urbanization,11,City: Large
1,Urbanization,12,City: Midsize
2,Urbanization,13,City: Small
3,Urbanization,21,Suburb: Large
4,Urbanization,22,Suburb: Midsize


The numerical values of the 'Urbanization' column represent the degree of urbanization for each college's location. We can see that a higher number indicates a more rural and less populated area, while a lower number signifies a more densely populated, urban region.

In [None]:
affiliation_mapping = dict(zip(values_df[values_df['VariableName'] == 'Affiliation']['Value'],
                               values_df[values_df['VariableName'] == 'Affiliation']['ValueLabel']))
df_colleges['Affiliation'] = df_colleges['Affiliation'].map(affiliation_mapping)

df_colleges.head()

Unnamed: 0,UnitID,Name,State,Affiliation,Urbanization,Total,Men total,Women total,American Indian or Alaska Native total,Asian total,Black or African American total,Hispanic total,Native Hawaiian or Other Pacific Islander total,White total,Two or more races total
0,177834,A T Still University of Health Sciences,MO,Private not-for-profit (no religious affiliation),33,,,,,,,,,,
1,180203,Aaniiih Nakoda College,MT,Public,43,133.0,59.0,74.0,121.0,0.0,0.0,0.0,0.0,12.0,0.0
2,222178,Abilene Christian University,TX,Private not-for-profit (religious affiliation),12,3188.0,1279.0,1909.0,16.0,62.0,214.0,570.0,0.0,2000.0,142.0
3,497037,Abilene Christian University-Undergraduate Online,TX,Private not-for-profit (religious affiliation),21,918.0,146.0,772.0,3.0,13.0,179.0,217.0,0.0,413.0,31.0
4,138558,Abraham Baldwin Agricultural College,GA,Public,32,3768.0,1489.0,2279.0,12.0,49.0,312.0,374.0,2.0,2936.0,52.0


In [None]:
# Adding new columns representing the percentages of each gender + racial group for each college
group_cols = ['Men', 'Women', 'American Indian or Alaska Native', 'Asian', 'Black or African American', 'Hispanic', 'Native Hawaiian or Other Pacific Islander', 'White', 'Two or more races']

for col in group_cols:
  df_colleges[col + ' %'] = ((df_colleges[col + ' total'] / df_colleges['Total']) * 100).round(1)

df_colleges.head()

Unnamed: 0,UnitID,Name,State,Affiliation,Urbanization,Total,Men total,Women total,American Indian or Alaska Native total,Asian total,...,Two or more races total,Men %,Women %,American Indian or Alaska Native %,Asian %,Black or African American %,Hispanic %,Native Hawaiian or Other Pacific Islander %,White %,Two or more races %
0,177834,A T Still University of Health Sciences,MO,Private not-for-profit (no religious affiliation),33,,,,,,...,,,,,,,,,,
1,180203,Aaniiih Nakoda College,MT,Public,43,133.0,59.0,74.0,121.0,0.0,...,0.0,44.4,55.6,91.0,0.0,0.0,0.0,0.0,9.0,0.0
2,222178,Abilene Christian University,TX,Private not-for-profit (religious affiliation),12,3188.0,1279.0,1909.0,16.0,62.0,...,142.0,40.1,59.9,0.5,1.9,6.7,17.9,0.0,62.7,4.5
3,497037,Abilene Christian University-Undergraduate Online,TX,Private not-for-profit (religious affiliation),21,918.0,146.0,772.0,3.0,13.0,...,31.0,15.9,84.1,0.3,1.4,19.5,23.6,0.0,45.0,3.4
4,138558,Abraham Baldwin Agricultural College,GA,Public,32,3768.0,1489.0,2279.0,12.0,49.0,...,52.0,39.5,60.5,0.3,1.3,8.3,9.9,0.1,77.9,1.4


How can we determine which schools are the most and least ***diverse***? We will analyze gender and racial demographic data to identify the 20 most and least diverse four-year colleges and universities in the United States.





We chose to use Simpson's Diversity Index to determine the diversity of each university, sorting the DataFrame from most to least diverse (based on their SDI metric).

In [None]:
import numpy as np

def calculate_simpsons(data):
  proportions = data / np.sum(data)
  sdi = 1 - np.sum(proportions**2)
  return sdi

df_colleges['SDI'] = df_colleges[['American Indian or Alaska Native %', 'Asian %', 'Black or African American %',
                                  'Hispanic %', 'Native Hawaiian or Other Pacific Islander %', 'White %',
                                  'Two or more races %']].apply(calculate_simpsons, axis=1)

df_colleges.dropna(inplace=True)

df_diverse = df_colleges[(df_colleges['Total'] >= 1000) & (abs(df_colleges['Women %'] - df_colleges['Men %']) <= 15)]

df_diverse = df_diverse.sort_values(by='SDI', ascending=False)
df_diverse.head()

Unnamed: 0,UnitID,Name,State,Affiliation,Urbanization,Total,Men total,Women total,American Indian or Alaska Native total,Asian total,...,Men %,Women %,American Indian or Alaska Native %,Asian %,Black or African American %,Hispanic %,Native Hawaiian or Other Pacific Islander %,White %,Two or more races %,SDI
955,170286,Hillsdale College,MI,Private not-for-profit (religious affiliation),32,1688.0,838.0,850.0,0.0,0.0,...,49.6,50.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
1854,236513,Seattle Central College,WA,Public,11,5210.0,2242.0,2968.0,12.0,913.0,...,43.0,57.0,0.2,17.5,15.5,12.3,0.3,24.2,7.3,0.77627
91,168740,Andrews University,MI,Private not-for-profit (religious affiliation),31,1312.0,637.0,675.0,2.0,158.0,...,48.6,51.4,0.2,12.0,16.9,20.4,0.6,22.0,5.6,0.77448
1661,121345,Pomona College,CA,Private not-for-profit (no religious affiliation),21,1664.0,752.0,912.0,0.0,298.0,...,45.2,54.8,0.0,17.9,10.6,16.3,0.5,29.1,9.4,0.767334
1998,243744,Stanford University,CA,Private not-for-profit (no religious affiliation),21,8054.0,3900.0,4154.0,63.0,2175.0,...,48.4,51.6,0.8,27.0,8.1,18.5,0.2,23.8,9.3,0.767208


We filtered our search to four-year universities and colleges, excluding institutions that offer two-year degrees with limited four-year programs such as Seattle Central College and Laredo College.

In [None]:
# Dropping Hillsdale College because the number of students among all racial groups is 0
# Dropping Solano Community College and Seattle Central College because they have been mistakenly counted as a four-year university
df_most_diverse = df_diverse[~df_diverse['Name'].isin(['Hillsdale College', 'Solano Community College', 'Seattle Central College'])]

df_most_diverse = df_most_diverse.head(20)
df_most_diverse.head()

Unnamed: 0,UnitID,Name,State,Affiliation,Urbanization,Total,Men total,Women total,American Indian or Alaska Native total,Asian total,...,Men %,Women %,American Indian or Alaska Native %,Asian %,Black or African American %,Hispanic %,Native Hawaiian or Other Pacific Islander %,White %,Two or more races %,SDI
91,168740,Andrews University,MI,Private not-for-profit (religious affiliation),31,1312.0,637.0,675.0,2.0,158.0,...,48.6,51.4,0.2,12.0,16.9,20.4,0.6,22.0,5.6,0.77448
1661,121345,Pomona College,CA,Private not-for-profit (no religious affiliation),21,1664.0,752.0,912.0,0.0,298.0,...,45.2,54.8,0.0,17.9,10.6,16.3,0.5,29.1,9.4,0.767334
1998,243744,Stanford University,CA,Private not-for-profit (no religious affiliation),21,8054.0,3900.0,4154.0,63.0,2175.0,...,48.4,51.6,0.8,27.0,8.1,18.5,0.2,23.8,9.3,0.767208
2060,216287,Swarthmore College,PA,Private not-for-profit (no religious affiliation),21,1644.0,795.0,849.0,7.0,288.0,...,48.4,51.6,0.4,17.5,9.4,15.1,0.1,29.6,11.0,0.765415
589,190549,CUNY Brooklyn College,NY,Public,11,11330.0,5000.0,6330.0,11.0,2671.0,...,44.1,55.9,0.1,23.6,22.8,22.7,0.0,24.7,3.2,0.765344


Based on heuristics, we kept the gender equality factor constant for the least diverse colleges because we wanted to focus specifically on how *racial* diversity affects Gender Studies curricula.

If we had not introduced this constraint, our dataset would incorporate universities with significant gender imbalances (e.g. women's colleges), and our data analysis would not account for these differences.

For the purposes of this project, we excluded colleges that were not easily web-scrapable via web scraping tools such as Beautiful Soup or Selenium. For example, there were some dynamic webpages that would not load elements after 120 seconds, despite using Selenium to scrape them.

In [None]:
df_least_diverse = df_diverse.sort_values(by='SDI', ascending=True)
df_least_diverse.head()

Unnamed: 0,UnitID,Name,State,Affiliation,Urbanization,Total,Men total,Women total,American Indian or Alaska Native total,Asian total,...,Men %,Women %,American Indian or Alaska Native %,Asian %,Black or African American %,Hispanic %,Native Hawaiian or Other Pacific Islander %,White %,Two or more races %,SDI
1055,198756,Johnson C Smith University,NC,Private not-for-profit (no religious affiliation),11,1058.0,519.0,539.0,6.0,2.0,...,49.1,50.9,0.6,0.2,77.1,0.4,0.1,0.6,0.2,0.052173
1129,226134,Laredo College,TX,Public,11,10191.0,4353.0,5838.0,7.0,26.0,...,42.7,57.3,0.1,0.3,0.1,95.2,0.0,2.4,0.0,0.057639
1909,409315,South Texas College,TX,Public,12,26440.0,11635.0,14805.0,24.0,168.0,...,44.0,56.0,0.1,0.6,0.4,95.7,0.0,2.0,0.1,0.063201
1315,101675,Miles College,AL,Private not-for-profit (religious affiliation),21,1151.0,542.0,609.0,0.0,1.0,...,47.1,52.9,0.0,0.1,95.1,1.8,0.0,2.5,0.3,0.091008
1901,218733,South Carolina State University,SC,Public,32,2762.0,1209.0,1553.0,5.0,5.0,...,43.8,56.2,0.2,0.2,92.7,0.1,0.1,1.4,2.6,0.091386
2711,197708,Yeshiva University,NY,Private not-for-profit (no religious affiliation),11,3091.0,1709.0,1382.0,1.0,3.0,...,55.3,44.7,0.0,0.1,0.2,2.2,0.0,60.0,0.7,0.097355
2141,157748,The Southern Baptist Theological Seminary,KY,Private not-for-profit (religious affiliation),11,1004.0,553.0,451.0,2.0,26.0,...,55.1,44.9,0.2,2.6,2.0,0.6,0.1,90.0,0.0,0.110642
223,217721,Benedict College,SC,Private not-for-profit (religious affiliation),12,1694.0,798.0,896.0,18.0,9.0,...,47.1,52.9,1.1,0.5,74.7,2.8,0.1,0.8,0.0,0.126556
668,153250,Dordt University,IA,Private not-for-profit (religious affiliation),33,1695.0,845.0,850.0,2.0,18.0,...,49.9,50.1,0.1,1.1,0.9,2.6,0.1,79.7,1.7,0.143553
2158,228796,The University of Texas at El Paso,TX,Public,11,20609.0,9379.0,11230.0,33.0,143.0,...,45.5,54.5,0.2,0.7,1.8,87.7,0.1,3.8,0.7,0.145706


In [None]:
df_least_diverse = df_diverse[~df_diverse['Name'].isin(['Laredo College', 'South Texas College', 'The Southern Baptist Theological Seminary', 'Alpena Community College'])].tail(20)
df_least_diverse.head()

Unnamed: 0,UnitID,Name,State,Affiliation,Urbanization,Total,Men total,Women total,American Indian or Alaska Native total,Asian total,...,Men %,Women %,American Indian or Alaska Native %,Asian %,Black or African American %,Hispanic %,Native Hawaiian or Other Pacific Islander %,White %,Two or more races %,SDI
2598,237950,West Virginia University Institute of Technology,WV,Public,13,1448.0,733.0,715.0,4.0,16.0,...,50.6,49.4,0.3,1.1,2.8,2.2,0.2,81.5,3.2,0.200245
2377,183044,University of New Hampshire-Main Campus,NH,Public,31,11376.0,4984.0,6392.0,5.0,296.0,...,43.8,56.2,0.0,2.6,0.8,4.1,0.0,84.9,2.5,0.196262
751,237367,Fairmont State University,WV,Public,32,3060.0,1316.0,1744.0,12.0,13.0,...,43.0,57.0,0.4,0.4,4.6,0.7,0.0,88.1,4.1,0.192748
2478,240329,University of Wisconsin-La Crosse,WI,Public,13,9378.0,4036.0,5342.0,13.0,198.0,...,43.0,57.0,0.1,2.1,0.7,4.0,0.1,88.6,3.1,0.191055
418,153108,Central College,IA,Private not-for-profit (religious affiliation),32,1095.0,583.0,512.0,2.0,9.0,...,53.2,46.8,0.2,0.8,1.7,4.2,0.0,87.5,3.0,0.18977
714,133526,Edward Waters University,FL,Private not-for-profit (religious affiliation),11,1113.0,587.0,526.0,4.0,3.0,...,52.7,47.3,0.4,0.3,84.0,2.0,0.2,4.9,1.7,0.189317
808,220215,Freed-Hardeman University,TN,Private not-for-profit (religious affiliation),32,1878.0,833.0,1045.0,10.0,24.0,...,44.4,55.6,0.5,1.3,4.8,0.1,0.1,86.3,2.6,0.183332
1338,176044,Mississippi Valley State University,MS,Public,42,2005.0,895.0,1110.0,3.0,8.0,...,44.6,55.4,0.1,0.4,83.8,2.4,0.1,5.0,0.7,0.175589
1904,219356,South Dakota State University,SD,Public,33,10177.0,4507.0,5670.0,116.0,121.0,...,44.3,55.7,1.1,1.2,1.2,2.9,0.1,85.5,2.2,0.174228
784,133979,Florida Memorial University,FL,Private not-for-profit (religious affiliation),21,1299.0,604.0,695.0,1.0,2.0,...,46.5,53.5,0.1,0.2,77.4,3.8,0.0,2.9,0.5,0.165663


In [None]:
df_most_diverse.to_csv('most_diverse_colleges.csv', index=False)

In [None]:
df_least_diverse.to_csv('least_diverse_colleges.csv', index=False)

In [None]:
df_diverse.to_csv('df_diverse.csv', index=False)