This repository contains historical SF housing data and R scripts to graph that data. The data here was used to generate the graphs and analysis in the blog post "Employment, construction, and the cost of San Francisco apartments", and was recently used in a paper by Stanford researchers, "The Effects of Rent Control Expansion on Tenants, Landlords, and Inequality: Evidence from San Francisco.".
Data for each year lives in the file named after the year. Later years may be listed as "craigslist-X".
You can get the rent out by running
./extract-craigslist craigslist-2016 for
example. Note the data is not perfect. Here are some samples in the 2016
Craigslist data, for example.
799000 Apr 29 Exceptional Pacific Heights TIC $799000 / 2br - (Pacific Heights) pic 800 Apr 29 Awesome 5 Bedroom Available $800 / 5br - 3895ft2 - (2483 N Smiderle, San Bernardino, CA) pic 99 Apr 29 Jr. 1 BD. Washer & Dryer in unit! $99 deposit $3425 / 1br - 550ft2 - (nob hill) pic map
(It's not clear if these prices have been stripped before generating the
averages in the
You can combine a bunch of data sources by running the "combine" script,
./combine. This generates the
combined file in this repository.
The charts in the blog post are generated by running the
model script in this
repository, on the
calc-medians computes the medians for each year in the file. It prints the
median, 95th, and 5th percentile for each year in the dataset. These values are
present in the
medians file in this repository.