# Enrollment Diversity at Wellesley
#### by Selina and Whitney

We will be exploring data that contains student enrollments for each department broken down by course level and race/ethnicity of enrolled students (i.e. number of students by race in BISC 100, BISC 200, BISC 300) for the past 4 academic years (Fall 2011 to Spring 2015). 

We are wondering if the diversity of students in courses varies by course level and department. It could be that regardless of department, many 100 level courses have diverse students. However, it may be that this diveristy decreases as course level increases (i.e. there is less diveristy in 200 level courses and lesser diversity in 300 level courses). This could also vary by department. We suspect that many of the upper level courses in the STEM departments have fewer non-white and non-Asian students, but it could be that there are more of these students in upper level courses in other departments (i.e WGST or POLS). 

In [1]:
import pandas as pd
import io
import xlrd
from pandas import Series, DataFrame

In [1]:
from IPython.display import HTML

HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>''')

Here we create a 3D Cluster graph using plotly

In [39]:
import plotly.plotly as py
%matplotlib inline 

py.sign_in('wfahnbul', 'vijrqi8zag')

scatter = dict(
    mode = "markers",
    name = "y",
    type = "scatter3d",    
    x = df8['Asian'], y = df8['Black, Latina'], z = df8['White'],
    marker = dict( size=2, color="rgb(23, 190, 207)" )
)
clusters = dict(
    alphahull = 3,
    name = "y",
    opacity = 0.1,
    type = "mesh3d",    
    x = df8['Asian'], y = df8['Black, Latina'], z = df8['White']
)
layout = dict(
    title = 'Diversity at Wellesley',
    scene = dict(
        xaxis = dict(title = 'Asian', zeroline=False ),
        yaxis = dict(title = 'Black, Latina', zeroline=False ),
        zaxis = dict(title = 'White', zeroline=False ),
    )
)
fig = dict( data=[scatter, clusters], layout=layout )
# Use py.iplot() for IPython notebook
py.iplot(fig, filename='Diversity at Wellesley')


Matplotlib is building the font cache using fc-list. This may take a moment.



The graph above is all the courses. However they are not distinguished by categories.

Here is our second attempt the at the 3D Clustering Graph for all courses with categories.

In [40]:
import plotly.graph_objs as go
%matplotlib inline 

py.sign_in('wfahnbul', 'vijrqi8zag')

data = []
clusters = []
colors = ['rgb(228,26,28)','rgb(55,126,184)','rgb(77,175,74)']

for i in range(len(df8['Category'].unique())):
    category = df8['Category'].unique()[i]
    color = colors[i]
    x = df8[ df8['Category'] == category ]['Asian']
    y = df8[ df8['Category'] == category ]['Black, Latina']
    z = df8[ df8['Category'] == category ]['White']
    
    trace = dict(
        name = category,
        x = x, y = y, z = z,
        type = "scatter3d",    
        mode = 'markers',
        marker = dict( size=3, color=color, line=dict(width=0) ) )
    data.append( trace )

layout = dict(
    width=800,
    height=550,
    autosize=False,
    title='Diversity dataset',
    scene=dict(
        xaxis=dict(
            title="Asian",
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        yaxis=dict(
            title="Black, Latina",
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        zaxis=dict(
            title = "White",
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        aspectratio = dict( x=1, y=1, z=0.7 ),
        aspectmode = 'manual'        
    ),
)

fig = dict(data=data, layout=layout)

# IPython notebook
py.iplot(fig, filename='Diversity dataset', validate=False)

#url = py.plot(fig, filename='pandas-3d-scatter-iris', validate=False)


We chose to use plotly to make this graph because of its interactive nature. We found the turnable rotation tool to be the most helpful in looking at the graph from different angles. There are many points clustered in the middle. If you click on one or more of the categories in the legend, those corresponding points will disappear. 

**Humanities selected:**
* White: Most of the points are clustered towards the "White" axis.
* Asian: It seems that Asian students are underrepresented in Humanities.
* Black, Latina: It seems that Black and Latina students underrepresented in Humanities.

**STEM selected:**
* White: A large cluster of the points may suggest that these courses are comprised of less than 50% of these students.
* Black, Latina: There seems to an underrepresentation of Black and Latina students.
* Asian: These students seem to be well represented in STEM.

**Social_Sciences selected:**
* White: These students are well represented.
* Black, Latina: Slightly more representation of these Students than in STEM.
* Asian: There is less representation of these students here than in STEM.

In [43]:
%matplotlib inline 

py.sign_in('wfahnbul', 'vijrqi8zag')

data = []
clusters = []
colors = ['rgb(228,26,28)','rgb(55,126,184)','rgb(77,175,74)']

for i in range(len(df100['Category'].unique())):
    category = df100['Category'].unique()[i]
    color = colors[i]
    x = df100[ df100['Category'] == category ]['Asian']
    y = df100[ df100['Category'] == category ]['Black, Latina']
    z = df100[ df100['Category'] == category ]['White']
    
    trace = dict(
        name = category,
        x = x, y = y, z = z,
        type = "scatter3d",    
        mode = 'markers',
        marker = dict( size=3, color=color, line=dict(width=0) ) )
    data.append( trace )

layout = dict(
    width=800,
    height=550,
    autosize=False,
    title='Diversity 100 Level Courses dataset',
    scene=dict(
        xaxis=dict(
            title = "Asian",
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        yaxis=dict(
            title="Black, Latina",
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        zaxis=dict(
            title = "White",
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        aspectratio = dict( x=1, y=1, z=0.7 ),
        aspectmode = 'manual'        
    ),
)

fig = dict(data=data, layout=layout)

# IPython notebook
py.iplot(fig, filename='Diversity 100 Level Courses dataset', validate=False)

**Humanities selected:**
* White: Most of the points are clustered towards the middle (0.5) of the "White" axis.
* Asian: There seems to be less representation of these students here, but there are a few outliers towards the higher end (above 0.3) of the "Asian" axis.
* Black, Latina: The points seem more spread out, but there is a higher concentration of points towards the lower end (0.3) of the "Black, Latina" axis. 

**STEM selected:**
* White: These students are well represented. The points seem to cluster more towards the higher end (above 0.3) of the "White" axis.
* Black, Latina: There is an underrepresentation of Black and Latina students, with a few outliers. 
* Asian: The points are more spread out for these students, but are slightly more concentrated towards the higher end (above 0.2) of the "Asian" axis.

**Social_Sciences selected:**
* White: These students are well represented.
* Black, Latina: These students are more represented here than in STEM.
* Asian: The points are more towards the lower end (below 0.3).  

In [42]:
%matplotlib inline 

py.sign_in('wfahnbul', 'vijrqi8zag')

data = []
clusters = []
colors = ['rgb(228,26,28)','rgb(55,126,184)','rgb(77,175,74)']

for i in range(len(df200['Category'].unique())):
    category = df200['Category'].unique()[i]
    color = colors[i]
    x = df200[ df200['Category'] == category ]['Asian']
    y = df200[ df200['Category'] == category ]['Black, Latina']
    z = df200[ df200['Category'] == category ]['White']
    
    trace = dict(
        name = category,
        x = x, y = y, z = z,
        type = "scatter3d",    
        mode = 'markers',
        marker = dict( size=3, color=color, line=dict(width=0) ) )
    data.append( trace )

layout = dict(
    width=800,
    height=550,
    autosize=False,
    title='Diversity 200 Level Courses dataset',
    scene=dict(
        xaxis=dict(
            title = "Asian",
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        yaxis=dict(
            title="Black, Latina",
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        zaxis=dict(
            title = "White",
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        aspectratio = dict( x=1, y=1, z=0.7 ),
        aspectmode = 'manual'        
    ),
)

fig = dict(data=data, layout=layout)

# IPython notebook
py.iplot(fig, filename='Diversity 200 Level Courses dataset', validate=False)

**Humanities selected:**
* White: Most of the points are clustered towards the higher end (above 0.4) of the "White" axis.
* Asian: There seems to be less representation of these students here, but there are a few outliers towards the higher end (above 0.4) of the "Asian" axis.
* Black, Latina: There is a higher concentration of points towards the lower half (below 0.3) of the "Black, Latina" axis. 

**STEM selected:**
* White: The points seems to be concentrated at or below the middle (0.5 and below) of the "White" axis.
* Black, Latina: There is a lack of representation here, with most of the points clustered towards the lower half (below 0.15) the "Black, Latina" axis. 
* Asian: The points seem to be spread out, but slightly more concentrated towards the middle (0.3) of the "Asian" axis.

**Social_Sciences selected:**
* White: The points seem slightly more concentrated towards the middle (0.45) of the "White" axis.
* Black, Latina: The points seem fairly spread. There may be a slightly higher concentration of points towards the upper half (above 0.2) of the "Black and Latina"
* Asian: The points are spread out evenly for these students. 

In [45]:
%matplotlib inline 

py.sign_in('wfahnbul', 'vijrqi8zag')

data = []
clusters = []
colors = ['rgb(228,26,28)','rgb(55,126,184)','rgb(77,175,74)']
for i,t in zip(range(len(df300['Category'].unique())),text):
    category = df300['Category'].unique()[i]
    color = colors[i]
    x = df300[ df300['Category'] == category ]['Asian']
    y = df300[ df300['Category'] == category ]['Black, Latina']
    z = df300[ df300['Category'] == category ]['White']
    trace = dict(
        name = category,
        x = x, y = y, z = z,
        type = "scatter3d", 
        mode = 'markers',
        marker = dict( size=3, color=color, line=dict(width=0) ) )
    data.append( trace )

layout = dict(
    width=800,
    height=550,
    autosize=False,
    title='Diversity 300 Level Courses dataset',
    scene=dict(
        xaxis=dict(
            title = "Asian",
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        yaxis=dict(
            title="Black, Latina",
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        zaxis=dict(
            title = "White",
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        aspectratio = dict( x=1, y=1, z=0.7 ),
        aspectmode = 'manual'        
    ),
)

fig = dict(data=data, layout=layout)

# IPython notebook
py.iplot(fig, filename='Diversity 300 Level Courses dataset', validate=False)

**Humanities selected:**
* White: Most of the points are towards the lower half (below 0.45) of the "White" axis. However, there is a cluster of points above the middle axis that have a high concentration of White students and very low concentration of Black and Latina, and Asian students.
* Asian: There is a lack of representation of these students. Most of the points are below the lower half (0.45) of the "Asian" axis.
* Black, Latina: There is a higher concentration of points towards the lower half (below 0.4) of the "Black, Latina" axis. There are a few outliers in which these students are overrepresented (i.e there are significantly less White and Asian students). 

**STEM selected:**
* White: The points seem slightly more concentrated towards the middle (0.5) of the "White" axis.
* Black, Latina: There is a lack of representation here, with most of the points clustered towards the lower half (below 0.2) the "Black, Latina" axis. 
* Asian: The points seem evenly spread out for these students. 

**Social_Sciences selected:**
* White: The points seem slightly more concentrated towards the upper half (0.35) of the "White" axis.
* Black, Latina: The points seem fairly spread out. There may be a slightly higher concentration of points towards the lower half (below 0.2) of the "Black and Latina"
* Asian: The points are spread out and evenly distributed. 