# Diversity in EECS at UC Berkeley: Part 1
This notebook contains the code for the article at https://shomil.me/eecs-diversity/.
<br>
<br>
All data for this project has been sourced from Cal Answers (http://calanswers.berkeley.edu). If you're interested in downloading the data, please shoot me an email (shomil@berkeley.edu) and I'll help walk you through it. 
<br>
<br>
The data files should be stored in the `/data/eecs-admissions/` folder in three sub-folders: `applied/`, `admitted/`, and `committed/`. Each category should contain a list of CSV files – these files come directly from Cal Answers.
<br>
<br>
Recommended: use pipenv to install & run this notebook (`pipenv install && pipenv run jupyter notebook` is all you should need to take care of all of the packages required for these visualizations).

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from utils import admissions, export, helpers

In [3]:
import os
from dotenv import load_dotenv
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import glob
from tabulate import tabulate
import chart_studio
import chart_studio.plotly as py

In [23]:
# Set the Academic Year
ACADEMIC_YEAR = '2020-21'

# Choose included majors.
INCLUDED_MAJORS = ['Electrical Eng & Comp Sci', 'L&S Computer Science']

# For treemaps, set an overarching bucket name.
TOP_LEVEL= f'All SIR\'ed Students in EECS/L&S CS (Academic Year: {ACADEMIC_YEAR})'

In [None]:
# If this is set to true, graphs/tables will be exported to EXPORT_FOLDER.
SHOULD_WRITE = False
EXPORT_FOLDER = 'exported_graphs/'

# If this is set to true, graphs will be published to Plotly Chart Studio.
SHOULD_PUBLISH = False
PUBLISH_PREFIX = 'eecs-diversity-'

In [5]:
export.setup_chart_studio()

In [6]:
applied, admitted, committed = admissions.load_admissions_df(majors=INCLUDED_MAJORS)
frames = {'Applied': applied, 'Admitted': admitted, 'SIR\'ed': committed}

Columns: 
 • ('Admitted')
 • ('Applied')
 • ('SIRed')
 • Academic Division
 • Academic Yr
 • Admit Rate
 • Applicant Headcounts (renamed to: Headcount)
 • College/School
 • Department
 • Derived Residency (renamed to: Residency)
 • Gender
 • High School API Rank
 • Income Range Amt 2 - Parent (renamed to: Family Income)
 • Intended Major
 • LCFF+ Flg
 • Neither Parent has 4 Year College Degree
 • Neither Parent has Attended College (renamed to: First Generation Student)
 • Prior School Type
 • Short Ethnic Desc (renamed to: Ethnicity L3)
 • Ucb Level1 Ethnic Rollup Desc (renamed to: Ethnicity L1)
 • Ucb Level2 Ethnic Rollup Desc (renamed to: Ethnicity L2)
 • Yield Rate

Years Present: 2000-01, 2001-02, 2002-03, 2003-04, 2004-05, 2005-06, 2006-07, 2007-08, 2008-09, 2009-10, 2010-11, 2011-12, 2012-13, 2013-14, 2014-15, 2015-16, 2016-17, 2017-18, 2018-19, 2019-20, 2020-21

Total # Applied:  118885.0 students
Total # Admitted: 16184.0 students
Total # SIRed:    8519.0 students

Latest Acad

# A Historical Look

In [7]:
# Plot Applications, Acceptances, and SIR's over time.
fig = go.Figure()
for name, df in frames.items():
    data = df.groupby('Academic Yr').sum().reset_index()
    fig.add_trace(go.Scatter(
        x=data['Academic Yr'],
        y=data['Headcount'],
        mode='lines+markers',
        name=name
    ))

helpers.style_figure(fig, 
                     title='Number of Applied, Admitted, and SIR\'ed Students over Time', 
                     x_title='Academic Year', 
                     y_title='Number of Students')

if SHOULD_WRITE:
    fig.write_html(EXPORT_FOLDER + 'graph1.html', include_plotlyjs='cdn')
else:
    fig.show()
    
if SHOULD_PUBLISH:
    py.plot(fig, filename=PUBLISH_PREFIX + 'graph1', auto_open=True)

In [8]:
helpers.plot_line_graph(frames=frames,
                        column='Gender',
                        title='Gender Ratios over Time (Applications)',
                        file_name='graph2',
                        write=SHOULD_WRITE,
                        write_folder=EXPORT_FOLDER,
                        publish=SHOULD_PUBLISH,
                        publish_prefix=PUBLISH_PREFIX,
                        categories=['Applied', 'Admitted', 'SIR\'ed'])

In [9]:
helpers.write_table(frames=frames,
                    category='Gender',
                    title='Gender Breakdowns',
                    file_name='table1.txt',
                    year=ACADEMIC_YEAR,
                    write=SHOULD_WRITE,
                    write_folder=EXPORT_FOLDER)

In [10]:
helpers.plot_line_graph(frames=frames,
                        column='Ethnicity L1',
                        title='Ethnicity Ratios over Time (Applications)',
                        file_name='graph3',
                        write=SHOULD_WRITE,
                        write_folder=EXPORT_FOLDER,
                        publish=SHOULD_PUBLISH,
                        publish_prefix=PUBLISH_PREFIX,
                        categories=['Applied'])

In [11]:
helpers.plot_line_graph(frames=frames,
                        column='Ethnicity L1',
                        title='Ethnicity Ratios over Time (Applications)',
                        file_name='graph4',
                        write=SHOULD_WRITE,
                        write_folder=EXPORT_FOLDER,
                        publish=SHOULD_PUBLISH,
                        publish_prefix=PUBLISH_PREFIX,
                        categories=['Admitted', 'SIR\'ed'])

In [12]:
helpers.write_table(frames=frames,
                    category='Ethnicity L1',
                    title='Ethnic Breakdowns (L1) for Applicant Pool of ' + ACADEMIC_YEAR,
                    file_name='table2.txt',
                    year=ACADEMIC_YEAR,
                    write=SHOULD_WRITE,
                    write_folder=EXPORT_FOLDER)

helpers.write_table(frames=frames,
                    category='Ethnicity L2',
                    title='Ethnic Breakdowns (L2) for the Applicant Pool of ' + ACADEMIC_YEAR,
                    file_name='table3.txt',
                    year=ACADEMIC_YEAR,
                    write=SHOULD_WRITE,
                    write_folder=EXPORT_FOLDER)

helpers.write_table(frames=frames,
                    category='Ethnicity L3',
                    title='Ethnic Breakdowns (L3) for the Applicant Pool of ' + ACADEMIC_YEAR,
                    file_name='table4.txt',
                    year=ACADEMIC_YEAR,
                    write=SHOULD_WRITE,
                    write_folder=EXPORT_FOLDER)

In [13]:
helpers.plot_line_graph(frames=frames,
                        column='Family Income',
                        title='Family Income Breakdown over Time (Applications)',
                        file_name='graph5',
                        write=SHOULD_WRITE,
                        write_folder=EXPORT_FOLDER,
                        publish=SHOULD_PUBLISH,
                        publish_prefix=PUBLISH_PREFIX,
                        categories=['Applied'])

In [14]:
helpers.plot_line_graph(frames=frames,
                        column='Family Income',
                        title='Family Income Breakdown over Time (Admitted & SIR\'ed Students)',
                        file_name='graph6',
                        write=SHOULD_WRITE,
                        write_folder=EXPORT_FOLDER,
                        publish=SHOULD_PUBLISH,
                        publish_prefix=PUBLISH_PREFIX,
                        categories=['Admitted', 'SIR\'ed'])

In [15]:
helpers.write_table(frames=frames,
                    category='Family Income',
                    title='Family Income Breakdowns for the Applicant Pool of ' + ACADEMIC_YEAR,
                    file_name='table5.txt',
                    write=SHOULD_WRITE,
                    year=ACADEMIC_YEAR,
                    write_folder=EXPORT_FOLDER)

In [17]:
helpers.plot_treemap(data=committed,
                     title='Gender/Ethnicity Breakdown (Shaded by Headcount)',
                     path=['Ethnicity L3', 'Gender'],
                     file_name='graph7',
                     color_col='Headcount',
                     write=SHOULD_WRITE,
                     publish=SHOULD_PUBLISH,
                     top_level=TOP_LEVEL,
                     year=ACADEMIC_YEAR,
                     write_folder=EXPORT_FOLDER,
                     publish_prefix=PUBLISH_PREFIX)

In [18]:
helpers.plot_treemap(data=committed,
                     title='Gender/Ethnicity Breakdown (Shaded by Admit Rate)',
                     path=['Ethnicity L3', 'Gender'],
                     file_name='graph8',
                     color_col='Admit Rate',
                     write=SHOULD_WRITE,
                     publish=SHOULD_PUBLISH,
                     top_level=TOP_LEVEL,
                     year=ACADEMIC_YEAR,
                     write_folder=EXPORT_FOLDER,
                     publish_prefix=PUBLISH_PREFIX)

In [19]:
helpers.plot_treemap(data=committed,
                     title='Family Income/Ethnicity Breakdown (Shaded by Headcount)',
                     path=['Family Income', 'Ethnicity L3'],
                     file_name='graph9',
                     color_col='Headcount',
                     write=SHOULD_WRITE,
                     publish=SHOULD_PUBLISH,
                     top_level=TOP_LEVEL,
                     year=ACADEMIC_YEAR,
                     write_folder=EXPORT_FOLDER,
                     publish_prefix=PUBLISH_PREFIX)

In [20]:
helpers.plot_treemap(data=committed,
                     title='Family Income/Ethnicity Breakdown (Shaded by Admit Rate)',
                     path=['Family Income', 'Ethnicity L3'],
                     file_name='graph10',
                     color_col='Admit Rate',
                     write=SHOULD_WRITE,
                     publish=SHOULD_PUBLISH,
                     top_level=TOP_LEVEL,
                     year=ACADEMIC_YEAR,
                     write_folder=EXPORT_FOLDER,
                     publish_prefix=PUBLISH_PREFIX)

In [21]:
helpers.plot_treemap(data=committed,
                     title='Comparing Ethnicity & First Generation College Students (Shaded by Admit Rate)',
                     path=['Ethnicity L3', 'First Generation Student'],
                     file_name='graph11',
                     color_col='Admit Rate',
                     write=SHOULD_WRITE,
                     publish=SHOULD_PUBLISH,
                     top_level=TOP_LEVEL,
                     year=ACADEMIC_YEAR,
                     write_folder=EXPORT_FOLDER,
                     publish_prefix=PUBLISH_PREFIX)

In [22]:
helpers.plot_treemap(data=committed,
                     title='Stacking Demographic Inequalities (Shaded by Headcount)',
                     path=['Gender', 'Ethnicity L3', 'Family Income', 'First Generation Student'],
                     file_name='graph12',
                     color_col='Headcount',
                     write=SHOULD_WRITE,
                     publish=SHOULD_PUBLISH,
                     top_level=TOP_LEVEL,
                     year=ACADEMIC_YEAR,
                     write_folder=EXPORT_FOLDER,
                     publish_prefix=PUBLISH_PREFIX)