# Determining college campus diversity

Top 10 Miscellaneous College Data with Diversity

In [2]:
import numpy as np
import pandas as pd

In [5]:
college = pd.read_csv('../data/college_diversity.csv')
college

Unnamed: 0,School,Diversity Index
0,"Rutgers University--Newark Newark, NJ",0.76
1,"Andrews University Berrien Springs, MI",0.74
2,"Stanford University Stanford, CA",0.74
3,"University of Houston Houston, TX",0.74
4,"University of Nevada--Las Vegas Las Vegas, NV",0.74
5,"University of San Francisco San Francisco, CA",0.74
6,"San Francisco State University San Francisco, CA",0.73
7,"University of Illinois--Chicago Chicago, IL",0.73
8,"New Jersey Institute of Technology Newark, NJ",0.72
9,"Texas Woman's University Denton, TX",0.72


Our college dataset classifies race into nine different categories. When trying to quantify
something without an obvious definition, such as diversity, it helps to start with something
very simple. In this recipe, our diversity metric will equal the count of the number of races
having greater than 15% of the student population

In [9]:
#Read in the college dataset, and filter for just the undergraduate race columns:
college = pd.read_csv('../data/college.csv',index_col = 'INSTNM')
college_ugds_ = college.filter(like = 'UGDS_') 

In [10]:
#Many of these colleges have missing values for all their race columns. We cancount all the missing values for each row and sort the resulting Series from thehighest to lowest. This will reveal the colleges that have missing values:
college_ugds_.isnull()\
             .sum(axis = 1)\
             .sort_values(ascending = False)\
             .head()   

INSTNM
Excel Learning Center-San Antonio South         9
Philadelphia College of Osteopathic Medicine    9
Assemblies of God Theological Seminary          9
Episcopal Divinity School                       9
Phillips Graduate Institute                     9
dtype: int64

In [11]:
#Now that we have seen the colleges that are missing all their race columns, wecan use the dropna method to drop all rows that have all nine race percentagesmissing. We can then count the remaining missing values:
college_ugds_ = college_ugds_.dropna(how = 'all')
college_ugds_.isnull().sum()

UGDS_WHITE    0
UGDS_BLACK    0
UGDS_HISP     0
UGDS_ASIAN    0
UGDS_AIAN     0
UGDS_NHPI     0
UGDS_2MOR     0
UGDS_NRA      0
UGDS_UNKN     0
dtype: int64

In [12]:
#There are no missing values left in the dataset. We can now calculate ourdiversity metric. To get started, we will use the greater than or equal DataFramemethod, ge, to convert each value to a boolean:
college_ugds_.ge(.15)

Unnamed: 0_level_0,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Alabama A & M University,False,True,False,False,False,False,False,False,False
University of Alabama at Birmingham,True,True,False,False,False,False,False,False,False
Amridge University,True,True,False,False,False,False,False,False,True
University of Alabama in Huntsville,True,False,False,False,False,False,False,False,False
Alabama State University,False,True,False,False,False,False,False,False,False
The University of Alabama,True,False,False,False,False,False,False,False,False
Central Alabama Community College,True,True,False,False,False,False,False,False,False
Athens State University,True,False,False,False,False,False,False,False,False
Auburn University at Montgomery,True,True,False,False,False,False,False,False,False
Auburn University,True,False,False,False,False,False,False,False,False


In [13]:
#From here, we can use the sum method to count the True values for each college.
diversity_metric = college_ugds_.ge(.15).sum(axis='columns')
diversity_metric.head()


INSTNM
Alabama A & M University               1
University of Alabama at Birmingham    2
Amridge University                     3
University of Alabama in Huntsville    1
Alabama State University               1
dtype: int64

In [16]:
diversity_metric.value_counts()

1    3042
2    2884
3     876
4      63
0       7
5       2
dtype: int64

In [18]:
 diversity_metric.sort_values(ascending=False).head()

INSTNM
Regency Beauty Institute-Austin          5
Central Texas Beauty College-Temple      5
Sullivan and Cogliano Training Center    4
Ambria College of Nursing                4
Berkeley College-New York                4
dtype: int64

In [19]:
 college_ugds_.loc[['Regency Beauty Institute-Austin',
 'Central Texas Beauty College-Temple']]

Unnamed: 0_level_0,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Regency Beauty Institute-Austin,0.1867,0.2133,0.16,0.0,0.0,0.0,0.1733,0.0,0.2667
Central Texas Beauty College-Temple,0.1616,0.2323,0.2626,0.0202,0.0,0.0,0.1717,0.0,0.1515
