Materials from StanCon
StanCon’s version of conference proceedings is a collection of contributed talks based on interactive notebooks. Every submission is peer reviewed by at least two reviewers. The reviewers are members of the Stan Conference Organizing Committee and the Stan Developmemt Team. This repository contains all of the accepted notebooks as well as any supplementary materials required for building the notebooks. The slides presented at the conference are also included.
StanCon 2017 | January 21, Columbia University, New York
2017 Peer reviewed contributed talks
Twelve Cities: Does lowering speed limits save pedestrian lives?
- Authors: Jonathan Auerbach, Rob Trangucci (Columbia University)
We investigate whether American cities can expect to achieve a meaningful reduction in pedestrian deaths by lowering the posted speed limit. We find some evidence that a lower speed limit does in fact reduce fatality rates, and our estimated causal effect is similar to the traditional before-after analysis espoused by policy analysts. Nevertheless, we conclude that adjusting the posted speed limit in urban environments does not correspond with a reliable reduction in pedestrian fatalities.
Hierarchical Bayesian Modeling of the English Premier League
- Authors: Milad Kharratzadeh (Columbia University)
In this case study, we provide a hierarchical Bayesian model for the English Premier League in the season of 2015/2016. The league consists of 20 teams and each two teams play two games with each other (home and away games). So, in total, there are 38 weeks, and 380 games. We model the score difference (home team goals − away team goals) in each match. The main parameters of the model are the teams’ abilities which is assumed to vary over the course of the 38 weeks. The initial abilities are determined by performance in the previous season plus some variation.
Advertising Attribution Modeling in the Movie Industry
- Authors: Victor Lei, Nathan Sanders, Abigail Dawson (Legendary Entertainment)
We present a Bayesian method for inferring advertising platform effectiveness as applied to the movie industry, and show some possibilities for drawing inferences by analyzing model parameters at different levels of the hierarchy. In addition, we show some common ways to check model efficacy, and possibilities for comparing between different models.
hBayesDM: Hierarchical Bayesian modeling of decision-making tasks
- Authors: Woo-Young Ahn, Nate Haines, Lei Zhang (Ohio State University)
hBayesDM (hierarchical Bayesian modeling of Decision-Making tasks) is a user-friendly R package that offers hierarchical Bayesian analysis of various computational models on an array of decision-making tasks.
Differential Equation Based Models in Stan
- Authors: Charles Margossian, Bill Gillespie (Metrum Research Group)
Differential equations can help us model sophisticated processes in biology, physics, and many other fields. Over the past year, the Stan team has developed many tools to tackle models based on differential equations.
How to Test IRT Models Using Simulated Data
- Authors: Teddy Groves (Football Radar)
This notebook explains how to code some IRT models using Stan and test whether they can recover input parameters when given simulated data.
Models of Retrieval in Sentence Comprehension
- Authors: Bruno Nicenboim, Shravan Vasishth (University of Potsdam)
This work presents an evaluation of two well-known models of retrieval processes in sentence comprehension, the activation-based model and the direct-access model. We implemented these models in a Bayesian hierarchical framework and showed that some aspects of the data can be explained better by the direct access model. Specifically, the activation-based cannot predict that, on average, incorrect retrievals would be faster than correct ones. More generally, our work leverages the capabilities of Stan to provide a powerful framework for flexibly developing computational models of competing theories of retrieval, and demonstrates how these models’ predictions can be compared in a Bayesian setting.
Hierarchical Gaussian Processes in Stan
- Authors: Rob Trangucci (Columbia University)
Stan’s library has been expanded with functions that facilitate adding Gaussian processes (GPs) to Stan models. I will share the best practices for coding GPs in Stan, and demonstrate how GPs can be added as one component of a larger model.
Modeling the Rate of Public Mass Shootings with Gaussian Processes
- Authors: Nathan Sanders, Victor Lei (Legendary Entertainment)
We have used Stan to develop a new model for the annualized rate of public mass shootings in the United States based on a Gaussian process with a time-varying mean function. This design yields a predictive model with the full non-parametric flexibility of a Gaussian process, while retaining the direct interpretability of a parametric model for long-term evolution of the mass shooting rate. We apply this model to the Mother Jones database of public mass shootings and explore the posterior consequences of different prior choices and of correlations between hyperparameters. We reach conclusions about the long term evolution of the rate of public mass shootings in the United States and short-term periods deviating from this trend.
StanCon 2018 | January 10-12, Asilomar, California
2018 Peer reviewed contributed talks
Does the New York City Police Department rely on quotas?
- Authors: Jonathan Auerbach (Columbia University)
This submission investigates whether the New York City Police Department (NYPD) uses productivity targets or quotas to manage officers in contravention of New York State Law. The analysis is presented in three parts. First, the NYPD's employee evaluation system is introduced, and the criticism that it constitutes a quota is summarized. Secondly, a publically available dataset of traffic tickets issued by NYPD officers in 2014 and 2015 is described. Finally, a generative model to describe how officers write traffic tickets is proposed. The fitted model is consistent with the criticism that police officers substantially alter their ticket writing to coincide with departmental targets. The submission concludes by discussing the implication of these findings and offering directions for further research.
Diagnosing Alzheimer’s the Bayesian way
- Authors: Arya A. Pourzanjani, Benjamin B. Bales, Linda R. Petzold, Michael Harrington (UC Santa Barbara)
Alzheimer's Disease is one the most debilitating diseases, but how do we diagnose it accurately? Researchers have been trying to answer this question by building generative models to describe how patient biomarkers, such as MRI scans, psychological tests, and lab tests relate over time to the underlying brain deterioration that's present in Alzheimer's Disease. In this notebook we show how we translated these models to the Bayesian framework in Stan and how this allowed for several model improvements that can ultimately improve our understanding of Alzheimer's and help physicians in diagnosis. In particular, we describe how we hierarchically model patient disease trajectories to obtain stable estimates for patients who lack data. We describe how fitting in Stan yields uncertainties on these disease trajectories, and why that is important for weighing the pros and cons of risky treatment. Lastly, we describe a new method for Bayesian modeling of these monotonic disease trajectories in Stan using I-Splines.
Joint longitudinal and time-to-event models via Stan
- Authors: Sam Brilleman, Michael Crowther, Margarita Moreno-Betancur, Jacqueline Buros Novik, Rory Wolfe (Monash University, Columbia University)
The joint modelling of longitudinal and time-to-event data has received much attention in the biostatistical literature in recent years. In this notebook (and talk), we describe the implementation of a shared parameter joint model for longitudinal and time-to-event data in Stan. The methods described in the
notebook are a simplified version of those underpinning the
stan_jm modeling function that has recently been contributed to the rstanarm R package.
A tutorial on Hidden Markov Models using Stan
- Authors: Luis Damiano, Brian Peterson, Michael Weylandt
We implement a standard Hidden Markov Model (HMM) and the Input-Output Hidden Markov Model for unsupervised learning of time series dynamics in Stan. We begin by reviewing three commonly-used algorithms for inference and parameter estimation, as well as a number of computational techniques and modeling strategies that make full Bayesian inference practical. For both models, we demonstrate the effectiveness of our proposed approach in simulations. Finally, we give an example of embedding a HMM within a larger model using an example from the econometrics literature.
Student Ornstein-Uhlenbeck models served three ways (with applications for population dynamics data)
- Authors: Aaron Goodman (Stanford University)
Ornstein-Uhlenbeck (OU) processes are a mean reverting process and is used to model dynamics in biology, physics, and finance. I fit an extension of the OU process that is driven by a Lévy process with Student's t-marginals rather than Brownian motion with Gaussian marginals, which allows for heavy-tailed increments. I implement four formulations of the Student-t OU-type model in Stan and compare the sampling performance on both real and simulated population dynamic data.
- Video (coming soon)
- Notebook, code, slides
SlicStan: a blockless Stan-like language
- Authors: Maria I. Gorinova, Andrew D. Gordon, Charles Sutton (University of Edinburgh)
We present SlicStan — a probabilistic programming language that compiles to Stan and uses static analysis techniques to allow for more abstract and flexible models. SlicStan is novel in two ways: (1) it allows variable declarations and statements to be automatically shredded into different components needed for efficient Hamiltonian Monte Carlo inference, and (2) it introduces more flexible user-defined functions that allow for new model parameters to be declared as local variables. This work demonstrates that efficient automatic inference can be the result of the machine learning and programming languages communities joint efforts.
- Notebook, code, slides
- homepages.inf.ed.ac.uk/s1207807, microsoft.com/en-us/research/people/adg, http://homepages.inf.ed.ac.uk/csutton
idealstan: an R package for ideal point modeling with Stan
- Authors: Robert Kubinec (University of Virginia)
Item-response theory (IRT) ideal-point scaling/dimension reduction methods that incorporate additional response categories and missing/censored values, including absences and abstentions, for roll call voting data (or any other kind of binary or ordinal item-response theory data). Full and approximate Bayesian inference is done via Stan.
Computing steady states with Stan’s nonlinear algebraic solver
- Authors: Charles C. Margossian (Metrum, Columbia University)
Stan’s numerical algebraic solver can be used to solve systems of nonlinear algebraic equations with no closed form solutions. One of its key applications in scientific and engineering fields is the computation of equilibrium states (equivalently steady states). This case study illustrates the use of the algebraic solver by applying it to a problem in pharmacometrics. In particular, I show the algebraic system we solve can be quite complex and embed, for instance, numerical solutions to ordinary differential equations. The code in R and Stan are provided, and a Bayesian model is fitted to simulated data.
Bayesian estimation of mechanical elastic constants
- Authors: Ben Bales, Brent Goodlet, Tresa Pollock, Linda Petzold (UC Santa Barbara)
This outlines a Bayesian approach to resonance ultrasound spectroscopy (RUS), a technique for estimating elastic constants of a material from a sample's measured resonance modes. The notebook includes an example of how to take advantage of custom automatic differentiation in specialized Stan models (either for numerical or efficiency reasons).
Aggregate random coefficients logit — a generative approach
- Authors: Jim Savage (Lendable Marketplace), Shoshana Vasserman (Harvard University).
This notebook illustrates how to fit aggregate random coefficient logit models in Stan, using Bayesian techniques. It’s far easier to learn and implement than the standard BLP algorithm, and has the benefits of being robust to mismeasurement of market shares, and giving limited-sample posterior uncertainty of all parameters (and demand shocks). This comes at the cost of modeling firms’ price-setting process, including how unobserved product-market demand shocks affect prices.
The threshold test: Testing for racial bias in vehicle searches by police
- Authors: Camelia Simoiu, Sam Corbett-Davies, Sharad Goel, Emma Pierson (Stanford University)
We develop a new statistical test to detect bias in decision making — the threshold test—that mitigates the problem of infra-marginality by jointly estimating decision thresholds and risk distributions.
- Notebook, code, slides
- web.stanford.edu/~csimoiu, samcorbettdavies.com, cs.stanford.edu/~emmap1, 5harad.com
Assessing the safety of Rosiglitazone for the treatment of type II diabetes
- Authors: Konstantinos Vamvourellis, K. Kalogeropoulos, L. Phillips (London School of Economics and Political Science)
A Bayesian paradigm for making drug approval decisions. Case study in the treatment of Diabetes (Type 2).
- Notebook, code, slides
Causal inference with the g-formula in Stan
- Authors: Leah Comment (Harvard University)
The potential outcomes framework often uses one or more parametric outcome models to learn about underlying causal processes. In Stan, parameter estimation using observed data takes place in the model block, while simulation-based estimation of causal parameters using the g-formula can be done separately with generated quantities. Bayesian estimation allows for data-driven sensitivity analysis regarding the assumption of no unmeasured confounding. This presentation shows some simple causal models, then outlines a basic sensitivity analysis using prior information derived from an external data source.
Bayesian estimation of ETAS models with Rstan
- Authors: Fausto Fabian Crespo Fernandez (Universidad San Francisco de Quito)
Earthquake modeling with Stan. Applied to seismic recurrence in Ecuador in 2016.
2018 Invited talks
Predictive information criteria in hierarchical Bayesian models for clustered data
- Presenters: Sophia Rabe-Hesketh, Daniel Furr (UC Berkeley)
- Slides and code
- gse.berkeley.edu/people/sophia-rabe-hesketh, github.com/danielcfurr
Stan applications in physics: Testing quantum mechanics and modeling neutrino masses
Forecasting at scale: How and why we developed Prophet for forecasting at Facebook
- Presenters: Sean Taylor, Ben Letham (Facebook)
Stan applications in human genetics: Prioritizing genetic mutations that protect individuals from human disease
Statistics using geometry to show uncertainties and integrate graph information
A brief history of Stan
Model assessment, model selection and inference after model selection
Spatial models in Stan: intrinsic auto-regressive models for areal data
Some problems I'd like to solve in Stan, and what we'll need to do to get there
- Presenter: Andrew Gelman (Columbia University)