# CS 109 Final Project:  GreatSchools 
<i>Project team: Samuel Barrows, Joseph Davin, Brian Feeny, Michael Lackner</i>
<br>
<br>

<img src="http://www.newschools.org/wp/wp-content/uploads/Bill-Jackson.jpg">



## Overview

This project analyzes data from the nearly 1 million ratings and reviews submitted to
greatschools.org, the largest online review site for schools in the world.

Over the past several decades, GreatSchools has emerged as one of the primary sources of information that
parents turn to in selecting schools. According to GreatSchools, more than half of
the families in America use GreatSchools to find information on individual schools when deciding where
to send their children. Anyone is free to write a review, and parents can then search for
information at the school level. But what if instead we zoom out and look at information
from all reviews? Could GreatSchools help us to understand the issues that matter to
parents and children at different schools?

Understanding what matters to reviewers at different schools could help educators to develop policies that satisfy parents and children. Reviews may also tell us about the different pressures faced by schools. Reviews are a direct form of feedback, and may put pressure on schools indirectly, by influencing the school choices of parents. In addition, reviews may give an indication of what parents and children and talking to schools about more broadly. Despite the potential importance of the information contained in reviews, however, to the
best of our knowledge no one has yet to explore what this data can tell us. We therefore
set out to understand what reviewers are talking about, and how this varies across
schools.

We were given access to every GreatSchools review ever written, going back 15 years
and covering the entire United States. In the process of our analysis, we used
machine-learning techniques to analyze not only parental ratings but also the text of
reviews. We also combined exam results from state-administered exams to develop the
first measure of school performance to allow for the comparison of schools nationwide at
the individual level.

##Motivation and Goals

A first motivation for studying GreatSchools reviews is that they may serve as an
important new form of “crowd-sourced accountability”, putting pressure on schools both
as a direct form of feedback, and indirectly via their influence on people’s school choices.

Since the early 1990s, governments around the world have put great emphasis on
introducing accountability in the public services, introducing the publication of
performance measures in numerous areas. A central component of the No Child Left
Behind Act (2001), for example, was that US states were required to regularly test the
performance of school students and make data on performance at the school level
publicly available. The theory behind these conventional systems of accountability is that
arming citizens with information about the performance of public services will both
empower citizens to choose between service providers, thus pressuring poorly performing
providers to improve, and also encourage citizens to pressure failing service providers
directly.

At the same time as governments have been introducing systems of formal reporting in
the public services, “crowd sourced accountability” in the form of online reviews has
been emerging in many areas of the public sector. From a website in the UK that allows
patients to review primary care physicians, to individuals in Massachusetts reviewing
their local public libraries on Yelp or Google, citizens have increasingly been putting
feedback about public services online. Reviews provide a direct form of feedback to the
organizations concerned. In addition, like conventional forms of public accountability,
these reviews may put pressure on schools to improve by influencing the choices of other
consumers.

The broad goal of our project, therefore, is to better understand what reviewers on the
GreatSchools website are saying about their local schools and how this varies across
schools. More specifically, how positive are individuals ratings, what topics do they talk
about, and how does this vary over time, by geography, with school performance, and
with the demographics of a school’s pupils and other school characteristics?

## Related Work

Exploring what online reviewers are saying about their local schools allows us to
dramatically extend a large literature in the social sciences on individual response to
organization performance. 

Hirschman’s (1970) seminal work Exit, Voice, and Loyalty
proposed that in response to a decline in organizational performance consumers would either exit or voice their
concerns, and offered multiple hypotheses concerning when consumers would exercise
these different responses. Many subsequent studies have sought to test Hirschman’s
theories empirically, particularly in the context of public services, where opportunities for
exit are often limited. However, whilst data on exit in response to organizational
performance is often available, existing studies have struggled to measure different types
of voice.

Public schooling is one area in which Hirschman’s work has considerable attention from
empirical researchers, and offers a good example of the limitations of existing studies of
voice. Researchers have frequently used administrative data to explore the determinants
on individuals’ actual behavior with respect to exit (for example, Henderson 2010).
However, studies of the determinants of voice in public schooling have either focused on
voting or relied on surveys of self-reported behavior (Berry and Howell 2007; Cox and
Witko 2010, Fleming 2011). We know of no study that has sought to directly measure
what individuals are actually saying about public schools. A number of recent studies have made considerable progress in analyzing data from
online reviews (for example, Anderson and Nagruder 2012, Dai et al 2012). However,
these studies have focused on restaurants and other commercial services. We know of
no study that has analyzed online reviews in the context of a public service.

Online reviews are both an important form of voice in their own right, and may also serve as a proxy for what people are talking about more broadly. In analyzing the nearly 1 million ratings and reviews of schools contributed to GreatSchools, therefore, we have the opportunity to go far beyond existing studies with respect to both the scale and detail with which we are able to measure voice in the context of either public education or public services more broadly.

## Initial Questions

The first question that we wanted to answer was whether there was any useful signal at all in the GreatSchools data provided by reviewers, or if we were just dealing with a lot of noise. Although there are a large number of reviews on the GreatSchools website, the website has existed since 1998 and there are around 200,000 schools in the US, so the data is sparse. Furthermore, online reviewers are hardly a representative sample of pupils, parents or some other group. We therefore thought it quite possible that there would be nothing in the data, that is, that reviews would reflect a pupil here or there grumbling about their teachers, or the odd very enthusiastic parent inspired to write a review, but with no real pattern or useful information. 

Assuming that there was useful information in the data, we were then faced with a huge number of possible questions to address. What are the characteristics of reviews and ratings at the national or state level? Does school performance predict ratings or reviews? If so, what aspects of school performance? Do reviews vary with school type? Are reviews influenced by the other schooling options in the neighborhood? What other school characteristics influence school performance? On the other hand, one might consider whether reviews can help predict outcomes of interest, such as school test score performance. One might also consider whether schools have a causal effect on other outcomes of interest, such as school enrollment.

We began by simply describing the distribution of review characteristics. We then explored how reviews varied over time, with geography, and with test scores, school type, and other school characteristics. This initial exploration quickly revealed that there was a great deal of interesting information in the data; however, there was also a lot of noise. We felt it was unlikely that we would be able to use school reviews to predict other outcomes, such as school reviews, with sufficient precision of be of much interest. However, we also felt that it was fair to make the assumption that reviews matter in their own right, as discussed above. 

We therefore decided to focus on analyzing how school characteristics predict school reviews. In particular, we were interested in whether highly performing schools received different reviews to low performing schools, and whether the behavior of reviewers varies with the wealth of the school’s makeup. We focused on these variables because we felt that, if our broader goal was to shed light on the different pressures that schools face, then understanding whether good schools faced different pressures from bad ones, and rich schools different pressures from poor ones, were the questions of greatest substantive concern.

## References

Anderson, M, and Magruder, J. 2012. “Learning From the Crowd: Regression Discontinuity Estimates of the Effect of an Online Review Database.” The Economic Journal

Berry, Christopher R., and Howell, William G. 2007. “Accountability and Local Elections: Rethinking Retrospective Voting.” Journal of Politics Vol. 69 No. 3

Cox, James H. and Witko, Chris. 2010. “School Choice, Exit and Voice: Competition and Parental School Decision-making.” APSA 2010 Annual Meeting Paper.

Dai, Weijia et al. 2012. “Optimal Aggregation of Consumer Ratings: An Application to Yelp” HBS Working Paper 13-042

Dee, Thomas S. and Jacob, Brian A. (2011).  “The Impact of No Child Left Behind on Student Achievement.”  Journal of Policy Analysis and Management.  30(3): 418-446.

Fleming, David J. 2011. “Choice, Voice & Exit: School Vouchers in Milwaukee.” APSA 2011 Annual Meeting Paper.

Henderson, Michael. 2010. “Does Information Help Families Choose Schools? Evidence from a Regression Discontinuity Design.” PEPG Working Paper Series 10-17

Hirschman, Albert O. 1970. Exit, Voice, and Loyalty: Responses to Decline in Firms, Organizations, and States. Cambridge, MA: Harvard University Press.

Jacob, B. 2005.  “Accountability, Incentives and Behavior:  Evidence from School Reform in Chicago.” Journal of Public Economics. 89(5-6): 761-796.