The Machine Learning Algorithms that power Smoke Signals.
TeX R Makefile
Latest commit fa38336 Sep 29, 2015 @abelsonlive abelsonlive Update
Failed to load latest commit information.
data reformatted model Aug 10, 2015
paper updated readme Sep 24, 2015
rscripts fixing plots Sep 24, 2015
.gitignore remove battery plots Sep 17, 2015
.gitkeep closes #4 Sep 17, 2015
Makefile Add case sensitive .R file extention for make init Sep 28, 2015 Update Sep 30, 2015
index.Rmd merge Sep 24, 2015
requirements.R bugfixes Sep 24, 2015


This repository contains code and documentation for generating scores that help indicate whether or not the residents of a census block group have a high risk for its residents not having smoke alarms. You can read an overview of the analysis here. This analysis is made possible by mapping common variables in the American Housing Survery and the American Community Survey. You can see details on how these mappings are done in this repository.

Getting Started.


First clone the repository and navigate to the project's root directory:

git clone
cd smoke-alarm-risk

This project is written in R and depends on the following packages:

  • bit64
  • plyr
  • ggplot2
  • data.table
  • knitr
  • reshape2
  • scales
  • bigrf
  • pROC

You can install these packages by running the following command in the project's root directory:

$ make init

Get the data

This project also requires six csv files (two of which - the ACS and the AHS, are generated by this project). You can grab these files from the web by running the following command:

$ make fetch_data

WARNING: This may take a while. The ACS file is ~ 2 GB.

Once this is finished, you should see five files in data/:

  • acs-bg-at-risk-population.csv - percent of population under the age of 5 and over the age of 65 per block group.
  • acs-bg-population.csv - total population per block group.
  • acs-bg-pop-density.csv - population density per block group.
  • msa80-bg.csv - A lookup of 1980 MSA IDs to 2010 Block Group IDs.
  • acs.csv - an export of the ACS with variables mapped to the AHS. (see this repo)
  • ahs.csv - an export of the AHS with variables mapped to the ACS. (see this repo)

Once you've run got these files, you should be all set to generate risk scores.

Generate the risk scores.

First, open up and change this line to your working directory:

WD <- '/path/to/this/directory'

Execute the model using this command:

$ make model

Under the hood, this command executes index.Rmd, which is a RMarkdown file. It contains notes on each step of our process and generates plots which visualize our results. You can see the finalized output of the modeling process by typing this command:

$ make view

If you open a web browser and navigate to http://localhost:8000/ you should see the report on the modeling process.

Get the output.

When the modeling script has finished executing, the risk scores per block group will be output to data/smoke-alarm-risk-scores.csv. These also include total population and at-risk population (< 5 years old, > 65 years old) per block group.

Known Issues

bigrf seems to have a memory leak when executed within RStudio. This can be avoided by simply using the make model command. SEE: