# Enrollment Diversity at Wellesley
#### by Selina and Whitney

We will be exploring data that contains student enrollments for each department broken down by course level and race/ethnicity of enrolled students (i.e. number of students by race in BISC 100, BISC 200, BISC 300) for the past 4 academic years (Fall 2011 to Spring 2015). 

We are wondering if the diversity of students in courses varies by course level and department. It could be that regardless of department, many 100 level courses have diverse students. However, it may be that this diveristy decreases as course level increases (i.e. there is less diveristy in 200 level courses and lesser diversity in 300 level courses). This could also vary by department. We suspect that many of the upper level courses in the STEM departments have fewer non-white and non-Asian students, but it could be that there are more of these students in upper level courses in other departments (i.e WGST or POLS). 

In [1]:
import pandas as pd
import io
import xlrd
from pandas import Series, DataFrame

We started off by loading in the excel file. For each sheet we created a dataframe. For any values that were missing we replaced those with zeros.

In [2]:
#xls = pd.ExcelFile('Diversity.xlsx')
xls = pd.ExcelFile('/Users/ssotomayor93/Desktop/CS/CS 249/Course_Data.xlsx')
Fall_2011 = xls.parse('Fall 2011')
Fall_2011['Semester'] = 'Fall'
Fall_2011['Year'] = '2011'
Fall_2011 = Fall_2011.fillna(0)

In [3]:
Spring_2012 = xls.parse('Spring 2012')
Spring_2012['Semester'] = 'Spring'
Spring_2012['Year'] = '2012'
Spring_2012 = Spring_2012.fillna(0)

In [4]:
Fall_2012 = xls.parse('Fall 2012')
Fall_2012['Semester'] = 'Fall'
Fall_2012['Year'] = '2012'
Fall_2012 = Fall_2012.fillna(0)

In [5]:
Spring_2013 = xls.parse('Spring 2013')
Spring_2013['Semester'] = 'Spring'
Spring_2013['Year'] = '2013'
Spring_2013 = Spring_2013.fillna(0)

In [6]:
Fall_2013 = xls.parse("Fall 2013")
Fall_2013["Year"] = "2013"
Fall_2013["Semester"] = "Fall"
Fall_2013=Fall_2013.fillna(0)

In [7]:
Spring_2014 = xls.parse("Spring 2014")
Spring_2014["Year"] = "2014"
Spring_2014["Semester"] = "Spring"
Spring_2014=Spring_2014.fillna(0)

In [8]:
Fall_2014 = xls.parse("Fall 2014")
Fall_2014["Year"] = "2014"
Fall_2014["Semester"] = "Fall"
Fall_2014=Fall_2014.fillna(0)

In [9]:
Spring_2015 = xls.parse("Spring 2015")
Spring_2015["Year"] = "2015"
Spring_2015["Semester"] = "Spring"
Spring_2015=Spring_2015.fillna(0)

Here we combine all dataframes for the sheets into one large dataframe.

In [10]:
frames = [Fall_2011, Spring_2012, Fall_2012, Spring_2013, Fall_2013, Spring_2014, Fall_2014, Spring_2015]
df = pd.concat(frames)
df = df.reindex(index=None, columns=['Department','Course Level','Semester','Year','All IPEDS Ethnicities','American Indian or Alaska Native','Asian','Black or African American','Hispanics of any race','Native Hawaiian or Other Pacific Islander','Nonresident Alien','Two or more races','White','None Reported'])
df = df.fillna(0)
df.head()

Unnamed: 0,Department,Course Level,Semester,Year,All IPEDS Ethnicities,American Indian or Alaska Native,Asian,Black or African American,Hispanics of any race,Native Hawaiian or Other Pacific Islander,Nonresident Alien,Two or more races,White,None Reported
0,Africana Studies (AFR),200 Level,Fall,2011,79,0,8,35,9,0,1,6,20,0
1,Africana Studies (AFR),300 Level,Fall,2011,18,0,0,8,1,0,6,2,1,0
2,American Studies (AMST),100 Level,Fall,2011,24,0,2,4,3,0,0,1,14,0
3,American Studies (AMST),300 Level,Fall,2011,14,0,1,1,3,0,0,0,9,0
4,Anthropology (ANTH),100 Level,Fall,2011,36,0,5,5,5,0,3,2,16,0


We need to reset the indices since we combined multiple dataframes.

In [11]:
indices = range(len(df))
df2 = df.reset_index(drop=True)
df2.head()

Unnamed: 0,Department,Course Level,Semester,Year,All IPEDS Ethnicities,American Indian or Alaska Native,Asian,Black or African American,Hispanics of any race,Native Hawaiian or Other Pacific Islander,Nonresident Alien,Two or more races,White,None Reported
0,Africana Studies (AFR),200 Level,Fall,2011,79,0,8,35,9,0,1,6,20,0
1,Africana Studies (AFR),300 Level,Fall,2011,18,0,0,8,1,0,6,2,1,0
2,American Studies (AMST),100 Level,Fall,2011,24,0,2,4,3,0,0,1,14,0
3,American Studies (AMST),300 Level,Fall,2011,14,0,1,1,3,0,0,0,9,0
4,Anthropology (ANTH),100 Level,Fall,2011,36,0,5,5,5,0,3,2,16,0


We needed to remove any courses that had fewer than 6 people (deemed by *).

In [12]:
def asterisk(df):
    '''Extracts courses that have * in department title. 
    Returns a list containing these courses.'''
    remove = []
    for index in range(len(df)):
        dept = df.iloc[index]['Department']
        if '*' in dept:
            remove.append(index)
    return remove

toBeRemoved = asterisk(df2)

df3 = df2
# Removes courses containing *
df3 = df3.drop(df3.index[toBeRemoved])

# Resets indices
indices = range(len(df3))
df3 = df3.reset_index(drop=True)
df3.head()

Unnamed: 0,Department,Course Level,Semester,Year,All IPEDS Ethnicities,American Indian or Alaska Native,Asian,Black or African American,Hispanics of any race,Native Hawaiian or Other Pacific Islander,Nonresident Alien,Two or more races,White,None Reported
0,Africana Studies (AFR),200 Level,Fall,2011,79,0,8,35,9,0,1,6,20,0
1,Africana Studies (AFR),300 Level,Fall,2011,18,0,0,8,1,0,6,2,1,0
2,American Studies (AMST),100 Level,Fall,2011,24,0,2,4,3,0,0,1,14,0
3,American Studies (AMST),300 Level,Fall,2011,14,0,1,1,3,0,0,0,9,0
4,Anthropology (ANTH),100 Level,Fall,2011,36,0,5,5,5,0,3,2,16,0


In [21]:
from numpy import inf
def makeVector(df):
    '''Creates a vector for each course containing enrollment 
    percentages of each racial/ethnic group. Returns a list of vectors.'''
    vectorList = []
    for index in range(len(df)): 
        row = list(df.iloc[index]["American Indian or Alaska Native":"None Reported"])
        vectorList.append(row)
    return vectorList

vector = makeVector(df3)
vector[:10]

[[0.0, 8.0, 35.0, 9.0, 0.0, 1.0, 6.0, 20.0, 0.0],
 [0.0, 0.0, 8.0, 1.0, 0.0, 6.0, 2.0, 1.0, 0.0],
 [0.0, 2.0, 4.0, 3.0, 0.0, 0.0, 1.0, 14.0, 0.0],
 [0.0, 1.0, 1.0, 3.0, 0.0, 0.0, 0.0, 9.0, 0.0],
 [0.0, 5.0, 5.0, 5.0, 0.0, 3.0, 2.0, 16.0, 0.0],
 [0.0, 3.0, 3.0, 7.0, 0.0, 2.0, 0.0, 47.0, 0.0],
 [0.0, 4.0, 2.0, 4.0, 0.0, 1.0, 0.0, 16.0, 0.0],
 [0.0, 44.0, 9.0, 13.0, 0.0, 35.0, 7.0, 72.0, 0.0],
 [0.0, 39.0, 6.0, 8.0, 0.0, 26.0, 9.0, 86.0, 0.0],
 [0.0, 17.0, 7.0, 7.0, 0.0, 7.0, 3.0, 28.0, 1.0]]

In [26]:
import math
def normalizeVectors(v):
    '''Takes in a list of vectors and returns a normalized list of vectors.'''
    vectorList = []
    rowsAll = []
    for index in v:
        rows = []
        # Asian students
        row1 = index[1]
        # Black students 
        row2 = index[2]
        # Latina students 
        row3 = index[3]
        # White students
        row4 = index[7]
        rows.append(row1)
        rows.append(row2)
        rows.append(row3)
        rows.append(row4)
        rowsAll.append(rows)
    # List for each sum of 4 groups 
    sumList = []
    for i in rowsAll:
        sumList.append(sum(i))

    sumList2 = []
    for s in sumList:
        sList = []
        for i in range(9):
            sList.append(s)
        sumList2.append(sList)
        
    normalList = []
    for vector,s in zip(v,sumList2):
        # Divides each value in vector by sum 
        normalList.append(map(lambda x,y: x/y, vector,s))

    # Replaces 'nan' with zeros
    for nlist in normalList:
        for z in range(len(nlist)):
            if nlist[z] == inf or math.isnan(nlist[z]):
                nlist[z] = 0.0
    # Rounds percentages to 2 decimal places
    for n in normalList:
        n = [round(i,2) for i in n]
    for x in normalList:
        # Adds lists of non-zero values to vector list
        if sum(x) != 0:
            vectorList.append(x)
    return vectorList

In [27]:
# Removes rows that contain all zeros
df4 = df3.drop(df3.index[[655,848]])
indices = range(len(df4))
df4 = df4.reset_index(drop=True)
df4.head()

Unnamed: 0,Department,Course Level,Semester,Year,All IPEDS Ethnicities,American Indian or Alaska Native,Asian,Black or African American,Hispanics of any race,Native Hawaiian or Other Pacific Islander,Nonresident Alien,Two or more races,White,None Reported
0,Africana Studies (AFR),200 Level,Fall,2011,79,0,8,35,9,0,1,6,20,0
1,Africana Studies (AFR),300 Level,Fall,2011,18,0,0,8,1,0,6,2,1,0
2,American Studies (AMST),100 Level,Fall,2011,24,0,2,4,3,0,0,1,14,0
3,American Studies (AMST),300 Level,Fall,2011,14,0,1,1,3,0,0,0,9,0
4,Anthropology (ANTH),100 Level,Fall,2011,36,0,5,5,5,0,3,2,16,0


In [28]:
vectors = normalizeVectors(vector)
vectors[:3]

[[0.0,
  0.1111111111111111,
  0.4861111111111111,
  0.125,
  0.0,
  0.013888888888888888,
  0.083333333333333329,
  0.27777777777777779,
  0.0],
 [0.0,
  0.0,
  0.80000000000000004,
  0.10000000000000001,
  0.0,
  0.59999999999999998,
  0.20000000000000001,
  0.10000000000000001,
  0.0],
 [0.0,
  0.086956521739130432,
  0.17391304347826086,
  0.13043478260869565,
  0.0,
  0.0,
  0.043478260869565216,
  0.60869565217391308,
  0.0]]

Now we add a column for the vectors in the dataframe. 

In [30]:
df4["Vectors"] = vectors
df4.head()

Unnamed: 0,Department,Course Level,Semester,Year,All IPEDS Ethnicities,American Indian or Alaska Native,Asian,Black or African American,Hispanics of any race,Native Hawaiian or Other Pacific Islander,Nonresident Alien,Two or more races,White,None Reported,Vectors
0,Africana Studies (AFR),200 Level,Fall,2011,79,0,8,35,9,0,1,6,20,0,"[0.0, 0.111111111111, 0.486111111111, 0.125, 0..."
1,Africana Studies (AFR),300 Level,Fall,2011,18,0,0,8,1,0,6,2,1,0,"[0.0, 0.0, 0.8, 0.1, 0.0, 0.6, 0.2, 0.1, 0.0]"
2,American Studies (AMST),100 Level,Fall,2011,24,0,2,4,3,0,0,1,14,0,"[0.0, 0.0869565217391, 0.173913043478, 0.13043..."
3,American Studies (AMST),300 Level,Fall,2011,14,0,1,1,3,0,0,0,9,0,"[0.0, 0.0714285714286, 0.0714285714286, 0.2142..."
4,Anthropology (ANTH),100 Level,Fall,2011,36,0,5,5,5,0,3,2,16,0,"[0.0, 0.161290322581, 0.161290322581, 0.161290..."


In [31]:
df5 = df4.reindex(columns = ["Department", "Course Level","Semester","Year","Vectors"])
df5.head()

Unnamed: 0,Department,Course Level,Semester,Year,Vectors
0,Africana Studies (AFR),200 Level,Fall,2011,"[0.0, 0.111111111111, 0.486111111111, 0.125, 0..."
1,Africana Studies (AFR),300 Level,Fall,2011,"[0.0, 0.0, 0.8, 0.1, 0.0, 0.6, 0.2, 0.1, 0.0]"
2,American Studies (AMST),100 Level,Fall,2011,"[0.0, 0.0869565217391, 0.173913043478, 0.13043..."
3,American Studies (AMST),300 Level,Fall,2011,"[0.0, 0.0714285714286, 0.0714285714286, 0.2142..."
4,Anthropology (ANTH),100 Level,Fall,2011,"[0.0, 0.161290322581, 0.161290322581, 0.161290..."


In [32]:
def makeDict(df):
    '''Takes in a dataframe and returns a dictionary.'''
    dictList = []
    for index in range(len(df)): 
        dictionary = {}
        dictionary["Department"] = df.iloc[index]["Department"]
        dictionary['Course Level'] = df.iloc[index]["Course Level"]
        dictionary['Semester'] = df.iloc[index]["Semester"]
        dictionary['Year'] = df.iloc[index]["Year"]
        dictionary['Vectors'] = df.iloc[index]["Vectors"]
        dictList.append(dictionary)
    return dictList

dictList = makeDict(df5)
dictList[:10]

[{'Course Level': u'200 Level',
  'Department': u'Africana Studies (AFR)',
  'Semester': 'Fall',
  'Vectors': [0.0,
   0.1111111111111111,
   0.4861111111111111,
   0.125,
   0.0,
   0.013888888888888888,
   0.083333333333333329,
   0.27777777777777779,
   0.0],
  'Year': '2011'},
 {'Course Level': u'300 Level',
  'Department': u'Africana Studies (AFR)',
  'Semester': 'Fall',
  'Vectors': [0.0,
   0.0,
   0.80000000000000004,
   0.10000000000000001,
   0.0,
   0.59999999999999998,
   0.20000000000000001,
   0.10000000000000001,
   0.0],
  'Year': '2011'},
 {'Course Level': u'100 Level',
  'Department': u'American Studies (AMST)',
  'Semester': 'Fall',
  'Vectors': [0.0,
   0.086956521739130432,
   0.17391304347826086,
   0.13043478260869565,
   0.0,
   0.0,
   0.043478260869565216,
   0.60869565217391308,
   0.0],
  'Year': '2011'},
 {'Course Level': u'300 Level',
  'Department': u'American Studies (AMST)',
  'Semester': 'Fall',
  'Vectors': [0.0,
   0.071428571428571425,
   0.07142857

In [33]:
dictList2 = dictList

def removeLowCounts(dList):
    '''Removes columns of racial/ethnic groups w/ low counts.
    Returns a list of dictionaries w/ vectors for Asian, 
    Black, Latina, and White.'''
    for dictionary in dList:
        vectors = dictionary['Vectors']
        indices = [0,4,5,6,8]
        for index in sorted(indices, reverse=True):
            del vectors[index]
    return dList

dictList2 = removeLowCounts(dictList2)

def combineBlackLatina(dList):
    '''Combines the vectors for Black and Latina.
    Returns a list of dictionaries'''
    for dictionary in dList:
        vectors = dictionary['Vectors']
        vectors[1] = vectors[1] + vectors[2]
        vectors[1] = round(vectors[1],2)
        del vectors[2]
    return dList

dictList2 = combineBlackLatina(dictList2)
dictList2[:10]

[{'Course Level': u'200 Level',
  'Department': u'Africana Studies (AFR)',
  'Semester': 'Fall',
  'Vectors': [0.1111111111111111, 0.61, 0.27777777777777779],
  'Year': '2011'},
 {'Course Level': u'300 Level',
  'Department': u'Africana Studies (AFR)',
  'Semester': 'Fall',
  'Vectors': [0.0, 0.9, 0.10000000000000001],
  'Year': '2011'},
 {'Course Level': u'100 Level',
  'Department': u'American Studies (AMST)',
  'Semester': 'Fall',
  'Vectors': [0.086956521739130432, 0.3, 0.60869565217391308],
  'Year': '2011'},
 {'Course Level': u'300 Level',
  'Department': u'American Studies (AMST)',
  'Semester': 'Fall',
  'Vectors': [0.071428571428571425, 0.29, 0.6428571428571429],
  'Year': '2011'},
 {'Course Level': u'100 Level',
  'Department': u'Anthropology (ANTH)',
  'Semester': 'Fall',
  'Vectors': [0.16129032258064516, 0.32, 0.5161290322580645],
  'Year': '2011'},
 {'Course Level': u'200 Level',
  'Department': u'Anthropology (ANTH)',
  'Semester': 'Fall',
  'Vectors': [0.050000000000000

Now we categorized the courses by 3 groups (Humanities, STEM, and Social Sciences). The grouping was subjective, but primarily based on the course browser.

In [34]:
Humanities = ["Humanities",'Africana Studies','American Studies','Art','Anthropology','Cinema and Media Studies','Classical Studies','Comparative Literature','East Asian Languages','English','French','German','History','Italian Studies','Jewish Studies','Medieval Renaissance Studies','Middle Eastern Studies','Music','Peace and Justice Studies', 'Philosophy','Physical Education','Religion','Russian','Russian Area Studies','Spanish',"South Asia Studies",'Theatre Studies',"Women's and Gender Studies (WGST)",'Writing']
STEM = ["STEM", "Astronomy","Biological Sciences","Biochemistry","Chemistry","Computer Science","Environmental Sciences (ES)","Geosciences","Mathematics","Neuroscience","Physics","Quantitative Reasoning","Sustainability"]
Social_Sciences = ["Social_Sciences","Cognitive and Linguistic Sci","Economics","Education","Political Science","Psychology","Sociology"]

dictList3 = dictList2

# Changes 'Environmental Studies' to 'Environmental Sciences'
# Creates Category key
for dictionary in dictList3:
    dictionary["Category"] = ""
    if dictionary["Department"] == "Environmental Studies (ES)":
        dictionary["Department"] = "Environmental Sciences (ES)"
        
def findCategory(df,ls,dictList): 
    '''Takes in a dataframe, grouping list, and a list of dictionaries.
    Returns a list of dictionaries a new value for Category based on the grouping list'''
    dfList = map(list, df.values)
    i = 0
    for d in dfList:
        for department in ls:
            if department in str(d[0]):
                dictList[i]["Category"] = ls[0]
        i = i+1
    return dictList


dictList3 = findCategory(df5,Humanities,dictList3)
dictList3 = findCategory(df5,STEM,dictList3)
dictList3 = findCategory(df5,Social_Sciences,dictList3)

newList = []
# Removes Extradepartmental because we didn't know which group this belonged to
for d in dictList3:
    if d["Department"] != "Extradepartmental (EXTD)":
        newList.append(d)
# For some reason if the Department was Evironmental Sciences the Category
# remained blank, so this makes sure that the empty Category for ES is STEM
for n in newList:
    if n["Category"] =="":
         n["Category"] = "STEM"

dictList4=newList
dictList4[:10]

[{'Category': 'Humanities',
  'Course Level': u'200 Level',
  'Department': u'Africana Studies (AFR)',
  'Semester': 'Fall',
  'Vectors': [0.1111111111111111, 0.61, 0.27777777777777779],
  'Year': '2011'},
 {'Category': 'Humanities',
  'Course Level': u'300 Level',
  'Department': u'Africana Studies (AFR)',
  'Semester': 'Fall',
  'Vectors': [0.0, 0.9, 0.10000000000000001],
  'Year': '2011'},
 {'Category': 'Humanities',
  'Course Level': u'100 Level',
  'Department': u'American Studies (AMST)',
  'Semester': 'Fall',
  'Vectors': [0.086956521739130432, 0.3, 0.60869565217391308],
  'Year': '2011'},
 {'Category': 'Humanities',
  'Course Level': u'300 Level',
  'Department': u'American Studies (AMST)',
  'Semester': 'Fall',
  'Vectors': [0.071428571428571425, 0.29, 0.6428571428571429],
  'Year': '2011'},
 {'Category': 'Humanities',
  'Course Level': u'100 Level',
  'Department': u'Anthropology (ANTH)',
  'Semester': 'Fall',
  'Vectors': [0.16129032258064516, 0.32, 0.5161290322580645],
  'Y

We take the previous list of dictionaries, create a dataframe, extract the vectors, group them by race in a new dataframe. We also include the Category and Course Level.

In [35]:
df7 = DataFrame(dictList4)

v = df7['Vectors'].values
a = []
bL = []
w = []
for x in v:
    a.append(x[0])
    bL.append(x[1])
    w.append(x[2])
        
df8 = DataFrame(a, columns=['Asian'])
df8['Black, Latina'] = bL
df8['White'] = w
df8['Category'] = df7['Category'].values
df8["Course Level"] = df7["Course Level"].values
df8.head()

Unnamed: 0,Asian,"Black, Latina",White,Category,Course Level
0,0.111111,0.61,0.277778,Humanities,200 Level
1,0.0,0.9,0.1,Humanities,300 Level
2,0.086957,0.3,0.608696,Humanities,100 Level
3,0.071429,0.29,0.642857,Humanities,300 Level
4,0.16129,0.32,0.516129,Humanities,100 Level


These are the dataframes per course level

In [36]:
df8["Department"] = df7["Department"].values
df8.head()

Unnamed: 0,Asian,"Black, Latina",White,Category,Course Level,Department
0,0.111111,0.61,0.277778,Humanities,200 Level,Africana Studies (AFR)
1,0.0,0.9,0.1,Humanities,300 Level,Africana Studies (AFR)
2,0.086957,0.3,0.608696,Humanities,100 Level,American Studies (AMST)
3,0.071429,0.29,0.642857,Humanities,300 Level,American Studies (AMST)
4,0.16129,0.32,0.516129,Humanities,100 Level,Anthropology (ANTH)


In [37]:
df100 = df8[df8["Course Level"]=="100 Level"]
df200 = df8[df8["Course Level"]=="200 Level"]
df300 = df8[df8["Course Level"]=="300 Level"]

In [38]:
df300.head()

Unnamed: 0,Asian,"Black, Latina",White,Category,Course Level,Department
1,0.0,0.9,0.1,Humanities,300 Level,Africana Studies (AFR)
3,0.071429,0.29,0.642857,Humanities,300 Level,American Studies (AMST)
6,0.153846,0.23,0.615385,Humanities,300 Level,Anthropology (ANTH)
9,0.288136,0.24,0.474576,Humanities,300 Level,Art (ART)
14,0.355556,0.09,0.555556,STEM,300 Level,Biological Sciences (BISC)


Here we create a 3D Cluster graph using plotly

In [39]:
import plotly.plotly as py
%matplotlib inline 

py.sign_in('wfahnbul', 'vijrqi8zag')

scatter = dict(
    mode = "markers",
    name = "y",
    type = "scatter3d",    
    x = df8['Asian'], y = df8['Black, Latina'], z = df8['White'],
    marker = dict( size=2, color="rgb(23, 190, 207)" )
)
clusters = dict(
    alphahull = 3,
    name = "y",
    opacity = 0.1,
    type = "mesh3d",    
    x = df8['Asian'], y = df8['Black, Latina'], z = df8['White']
)
layout = dict(
    title = 'Diversity at Wellesley',
    scene = dict(
        xaxis = dict(title = 'Asian', zeroline=False ),
        yaxis = dict(title = 'Black, Latina', zeroline=False ),
        zaxis = dict(title = 'White', zeroline=False ),
    )
)
fig = dict( data=[scatter, clusters], layout=layout )
# Use py.iplot() for IPython notebook
py.iplot(fig, filename='Diversity at Wellesley')


Matplotlib is building the font cache using fc-list. This may take a moment.



The graph above is all the courses. However they are not distinguished by categories.

Here is our second attempt the at the 3D Clustering Graph for all courses with categories.

In [40]:
import plotly.graph_objs as go
%matplotlib inline 

py.sign_in('wfahnbul', 'vijrqi8zag')

data = []
clusters = []
colors = ['rgb(228,26,28)','rgb(55,126,184)','rgb(77,175,74)']

for i in range(len(df8['Category'].unique())):
    category = df8['Category'].unique()[i]
    color = colors[i]
    x = df8[ df8['Category'] == category ]['Asian']
    y = df8[ df8['Category'] == category ]['Black, Latina']
    z = df8[ df8['Category'] == category ]['White']
    
    trace = dict(
        name = category,
        x = x, y = y, z = z,
        type = "scatter3d",    
        mode = 'markers',
        marker = dict( size=3, color=color, line=dict(width=0) ) )
    data.append( trace )

layout = dict(
    width=800,
    height=550,
    autosize=False,
    title='Diversity dataset',
    scene=dict(
        xaxis=dict(
            title="Asian",
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        yaxis=dict(
            title="Black, Latina",
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        zaxis=dict(
            title = "White",
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        aspectratio = dict( x=1, y=1, z=0.7 ),
        aspectmode = 'manual'        
    ),
)

fig = dict(data=data, layout=layout)

# IPython notebook
py.iplot(fig, filename='Diversity dataset', validate=False)

#url = py.plot(fig, filename='pandas-3d-scatter-iris', validate=False)


We chose to use plotly to make this graph because of its interactive nature. We found the turnable rotation tool to be the most helpful in looking at the graph from different angles. There are many points clustered in the middle. If you click on one or more of the categories in the legend, those corresponding points will disappear. 

**Humanities selected:**
* White: Most of the points are clustered towards the "White" axis.
* Asian: It seems that Asian students are underrepresented in Humanities.
* Black, Latina: It seems that Black and Latina students underrepresented in Humanities.

**STEM selected:**
* White: A large cluster of the points may suggest that these courses are comprised of less than 50% of these students.
* Black, Latina: There seems to an underrepresentation of Black and Latina students.
* Asian: These students seem to be well represented in STEM.

**Social_Sciences selected:**
* White: These students are well represented.
* Black, Latina: Slightly more representation of these Students than in STEM.
* Asian: There is less representation of these students here than in STEM.

In [43]:
%matplotlib inline 

py.sign_in('wfahnbul', 'vijrqi8zag')

data = []
clusters = []
colors = ['rgb(228,26,28)','rgb(55,126,184)','rgb(77,175,74)']

for i in range(len(df100['Category'].unique())):
    category = df100['Category'].unique()[i]
    color = colors[i]
    x = df100[ df100['Category'] == category ]['Asian']
    y = df100[ df100['Category'] == category ]['Black, Latina']
    z = df100[ df100['Category'] == category ]['White']
    
    trace = dict(
        name = category,
        x = x, y = y, z = z,
        type = "scatter3d",    
        mode = 'markers',
        marker = dict( size=3, color=color, line=dict(width=0) ) )
    data.append( trace )

layout = dict(
    width=800,
    height=550,
    autosize=False,
    title='Diversity 100 Level Courses dataset',
    scene=dict(
        xaxis=dict(
            title = "Asian",
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        yaxis=dict(
            title="Black, Latina",
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        zaxis=dict(
            title = "White",
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        aspectratio = dict( x=1, y=1, z=0.7 ),
        aspectmode = 'manual'        
    ),
)

fig = dict(data=data, layout=layout)

# IPython notebook
py.iplot(fig, filename='Diversity 100 Level Courses dataset', validate=False)

**Humanities selected:**
* White: Most of the points are clustered towards the middle (0.5) of the "White" axis.
* Asian: There seems to be less representation of these students here, but there are a few outliers towards the higher end (above 0.3) of the "Asian" axis.
* Black, Latina: The points seem more spread out, but there is a higher concentration of points towards the lower end (0.3) of the "Black, Latina" axis. 

**STEM selected:**
* White: These students are well represented. The points seem to cluster more towards the higher end (above 0.3) of the "White" axis.
* Black, Latina: There is an underrepresentation of Black and Latina students, with a few outliers. 
* Asian: The points are more spread out for these students, but are slightly more concentrated towards the higher end (above 0.2) of the "Asian" axis.

**Social_Sciences selected:**
* White: These students are well represented.
* Black, Latina: These students are more represented here than in STEM.
* Asian: The points are more towards the lower end (below 0.3).  

In [42]:
%matplotlib inline 

py.sign_in('wfahnbul', 'vijrqi8zag')

data = []
clusters = []
colors = ['rgb(228,26,28)','rgb(55,126,184)','rgb(77,175,74)']

for i in range(len(df200['Category'].unique())):
    category = df200['Category'].unique()[i]
    color = colors[i]
    x = df200[ df200['Category'] == category ]['Asian']
    y = df200[ df200['Category'] == category ]['Black, Latina']
    z = df200[ df200['Category'] == category ]['White']
    
    trace = dict(
        name = category,
        x = x, y = y, z = z,
        type = "scatter3d",    
        mode = 'markers',
        marker = dict( size=3, color=color, line=dict(width=0) ) )
    data.append( trace )

layout = dict(
    width=800,
    height=550,
    autosize=False,
    title='Diversity 200 Level Courses dataset',
    scene=dict(
        xaxis=dict(
            title = "Asian",
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        yaxis=dict(
            title="Black, Latina",
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        zaxis=dict(
            title = "White",
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        aspectratio = dict( x=1, y=1, z=0.7 ),
        aspectmode = 'manual'        
    ),
)

fig = dict(data=data, layout=layout)

# IPython notebook
py.iplot(fig, filename='Diversity 200 Level Courses dataset', validate=False)

**Humanities selected:**
* White: Most of the points are clustered towards the higher end (above 0.4) of the "White" axis.
* Asian: There seems to be less representation of these students here, but there are a few outliers towards the higher end (above 0.4) of the "Asian" axis.
* Black, Latina: There is a higher concentration of points towards the lower half (below 0.3) of the "Black, Latina" axis. 

**STEM selected:**
* White: The points seems to be concentrated at or below the middle (0.5 and below) of the "White" axis.
* Black, Latina: There is a lack of representation here, with most of the points clustered towards the lower half (below 0.15) the "Black, Latina" axis. 
* Asian: The points seem to be spread out, but slightly more concentrated towards the middle (0.3) of the "Asian" axis.

**Social_Sciences selected:**
* White: The points seem slightly more concentrated towards the middle (0.45) of the "White" axis.
* Black, Latina: The points seem fairly spread. There may be a slightly higher concentration of points towards the upper half (above 0.2) of the "Black and Latina"
* Asian: The points are spread out evenly for these students. 

In [45]:
%matplotlib inline 

py.sign_in('wfahnbul', 'vijrqi8zag')

data = []
clusters = []
colors = ['rgb(228,26,28)','rgb(55,126,184)','rgb(77,175,74)']
for i,t in zip(range(len(df300['Category'].unique())),text):
    category = df300['Category'].unique()[i]
    color = colors[i]
    x = df300[ df300['Category'] == category ]['Asian']
    y = df300[ df300['Category'] == category ]['Black, Latina']
    z = df300[ df300['Category'] == category ]['White']
    trace = dict(
        name = category,
        x = x, y = y, z = z,
        type = "scatter3d", 
        mode = 'markers',
        marker = dict( size=3, color=color, line=dict(width=0) ) )
    data.append( trace )

layout = dict(
    width=800,
    height=550,
    autosize=False,
    title='Diversity 300 Level Courses dataset',
    scene=dict(
        xaxis=dict(
            title = "Asian",
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        yaxis=dict(
            title="Black, Latina",
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        zaxis=dict(
            title = "White",
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        aspectratio = dict( x=1, y=1, z=0.7 ),
        aspectmode = 'manual'        
    ),
)

fig = dict(data=data, layout=layout)

# IPython notebook
py.iplot(fig, filename='Diversity 300 Level Courses dataset', validate=False)

**Humanities selected:**
* White: Most of the points are towards the lower half (below 0.45) of the "White" axis. However, there is a cluster of points above the middle axis that have a high concentration of White students and very low concentration of Black and Latina, and Asian students.
* Asian: There is a lack of representation of these students. Most of the points are below the lower half (0.45) of the "Asian" axis.
* Black, Latina: There is a higher concentration of points towards the lower half (below 0.4) of the "Black, Latina" axis. There are a few outliers in which these students are overrepresented (i.e there are significantly less White and Asian students). 

**STEM selected:**
* White: The points seem slightly more concentrated towards the middle (0.5) of the "White" axis.
* Black, Latina: There is a lack of representation here, with most of the points clustered towards the lower half (below 0.2) the "Black, Latina" axis. 
* Asian: The points seem evenly spread out for these students. 

**Social_Sciences selected:**
* White: The points seem slightly more concentrated towards the upper half (0.35) of the "White" axis.
* Black, Latina: The points seem fairly spread out. There may be a slightly higher concentration of points towards the lower half (below 0.2) of the "Black and Latina"
* Asian: The points are spread out and evenly distributed. 