Prediction of Zika outbreaks using supervised machine learning, PostgreSQL, and visualization with D3
Jupyter Notebook JavaScript Other
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
figures
map
notebooks Update notebook for D3 Sep 25, 2016
presentation
.gitignore
README.md
environment.yml
rsync.sh
unison_sync.sh

README.md

Prediction of Zika Outbreaks

by Michelle L. Gill

Summary

This is my third project for the Summer 2016 Metis Data Science Bootcamp, which incorporated supervised machine learning, PostgreSQL, and D3 for visualization. This project predicted whether or not a region would would have an outbreak of Zika virus.

Blog Post

A blog post on themodernscientist.com provides further details about this project.

Repo Contents

  • environment.yml: list of conda python libraries that were used during analysis.
  • figures: images used on the presentation.
  • map: D3 animated timeline used during presentation. A movie of the animation is also available.
  • notebooks: Jupyter notebooks used for analysis.
  • presentation: PDF version of the final presentation.

Data Sources

This project made extensive use of external data sources, including data from GitHub repos and that was scraped from various websites.

  1. Zika outbreak data was pulled from the CDC Epidemic Prediction Initiative GitHub repo. My project used data that was pulled on 07/30/2016, which corresponds to commit d44c5d1ca3af633224c8b8b490b1a3aafa9bcc8e. A clone of this commit is available here.
  2. Latitude and longitude data for Zika outbreaks was pulled from the following: Google Maps API, Scraped from Google Search via four proxies, and scraped from LatLong.
  3. Airport location information was scraped from Falling Rain.
  4. Worldwide historical weather data was scraped from Wunderground using closest airport code as the key.
  5. Aedes aegypti and Aedes albopictus occurrences were from Dryad. See references below for manuscripts related to this data.
  6. Worldwide population density was from the NASA Socioeconomic Data and Applications Center (SEDAC) Gridded Population and Population Density of the World.
  7. World GDP and purchase parity adjusted GDP from 2015 were scraped from Knoema.
  8. Flight patterns were scraped from FlightRadar24, however this data was not incorporated into the model due to time limitations.

References for Aedes aegypti and Aedes albopictus occurrences

Kraemer MUG, Sinka ME, Duda KA, Mylne A, Shearer FM, Brady OJ, Messina JP, Barker CM, Moore CG, Carvalho RG, Coelho GE, Van Bortel W, Hendrickx G, Schaffner F, Wint GRW, Elyazar IRF, Teng H, Hay SI (2015) The global compendium of Aedes aegypti and Ae. albopictus occurrence. Scientific Data 2(7): 150035. http://dx.doi.org/10.1038/sdata.2015.35

Kraemer MUG, Sinka ME, Duda KA, Mylne A, Shearer FM, Brady OJ, Messina JP, Barker CM, Moore CG, Carvalho RG, Coelho GE, Van Bortel W, Hendrickx G, Schaffner F, Wint GRW, Elyazar IRF, Teng H, Hay SI (2015) Data from: The global compendium of Aedes aegypti and Ae. albopictus occurrence. Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.47v3c