Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time
NOTE: "unknown" ethnic background could also represent people who identified
as multiracial (e.g. in Amherst before 2010).

//About the Project:
This project was created by Jamie Hand ('16.5) and Emily Sarich ('16) for
CS465, Information Visualization.

To deploy, make sure the file 'college_data.csv' is in the same directory
as the main visualization file, 'diversity_vis.html'. The main visualization file
can be opened locally or deployed on a server. It is also available through its
github page at

Our data is comprised of the Common Data Sets (CDS) of the NESCAC colleges.
All colleges are required to fill out the CDS every year to give statistics on
admissions and the student body. This data was originally in the form of PDFs
published each year by each college, so we converted the sections that we wanted
into a .csv file that we could manipulate and visualize with D3. Some colleges only
had certain years available online, so we reached out to those colleges and requested
the data. We were somewhat successful, but couldn't get all of the data from all of
the colleges, so there are some gaps in the data.

Our visualization is targeted towards a high school student who is investigating the
diversity of the student body at the colleges they are applying to.
It is designed to show the ethnic breakdown of each NESCAC college so a student can
compare the diversity of each school, and show the ethnic breakdown of a college over
time so a student can see how diversity has changed over time.

We focused on a student who is looking to find an ethnically diverse student body,
so we chose to exclude the students who identify as 'white'; otherwise, unfortunately,
the other categories would be significantly smaller and harder to analyze.

//Description of the Vis:
Our first and main visualization is a stacked bar chart that shows all colleges
for a specific year. It shows each ethnic category and the total number of students that
make up the incoming first-year class for that year. Each bar can be brought down to
the axis so that a student can get a better comparison of how one college compares
to the other. We also added a tooltip that shows the actual number of students
in each ethnic category as well as the percentage of the first-year class that
the category makes up.

In this visualization, a user can use a drop-down menu to select the year they
wish to visualize. We initially considered several options for this selection
(check boxes, radio buttons, a text box, etc.) but decided that a drop-down was best
for this selection feature because it was the least cluttered and made it clear
that a user could only select one year at a time.

Our second visualization is a different type of visualization that chooses to focus on
a specific school rather than a specific year. The idea behind this is that a user could
look at our first visualization, see something they want to investigate in a specific college,
and click on it to show that college's ethnic breakdown over the years to see if
it has changed over time.

For example:
A student looking at the data from Trinity College notices that it has the highest
number of students who identify as Hispanic compared to other colleges.
They are curious as to whether or not Trinity always has a high number of Hispanic
students, so they look at Trinity's ethnic breakdown over the years and see that
the number of Hispanic students has been steadily increasing since 2005.

One of our biggest difficulties in this process was the collection of the data.
Collecting data from pdfs is, not surprisingly, a pain. We were initially going
to include many more colleges (other small liberal arts colleges similar to Middlebury),
but we found that the process of collecting data was very time-consuming and not
worth the amount of time necessary to gather data from a larger number of schools.
Our code is scalable and could handle a larger data set, but because of our limited
time frame we chose to reduce the list down to only NESCAC schools.

Another slightly frustrating part of the data was that in 2010, the ethnic breakdown
of the CDS changes slightly to be more inclusive. This changes some of the categories,
and it was difficult to figure out what to do when the data suddenly changed formats.

One of our biggest successes was the ability to draw our second visualization based on
data in our main visualization. Adding the action listener to the tick marks on the
x-axis of our main vis makes it possible for a user to visualize a college without
having to use a less intuitive drop-down or button selection. It keeps the user
in the visualization and keeps the interface cleaner.

If we could change anything about the project, we would have obviously included
more data. If we had more time, we would have also added some more detail to our
second visualization. The main goal of this visualization is to show general trends
and not specific instances, so we thought that this was an acceptable amount of detail.


No description, website, or topics provided.






No releases published


No packages published