Skip to content

Prediction of Zika outbreaks using supervised machine learning, PostgreSQL, and visualization with D3

Notifications You must be signed in to change notification settings

mlgill/zika_prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Prediction of Zika Outbreaks

by Michelle L. Gill

Summary

This is my third project for the Summer 2016 Metis Data Science Bootcamp, which incorporated supervised machine learning, PostgreSQL, and D3 for visualization. This project predicted whether or not a region would would have an outbreak of Zika virus.

Blog Post

A blog post on themodernscientist.com provides further details about this project.

Repo Contents

  • environment.yml: list of conda python libraries that were used during analysis.
  • figures: images used on the presentation.
  • map: D3 animated timeline used during presentation. A movie of the animation is also available.
  • notebooks: Jupyter notebooks used for analysis.
  • presentation: PDF version of the final presentation.

Data Sources

This project made extensive use of external data sources, including data from GitHub repos and that was scraped from various websites.

  1. Zika outbreak data was pulled from the CDC Epidemic Prediction Initiative GitHub repo. My project used data that was pulled on 07/30/2016, which corresponds to commit d44c5d1ca3af633224c8b8b490b1a3aafa9bcc8e. A clone of this commit is available here.
  2. Latitude and longitude data for Zika outbreaks was pulled from the following: Google Maps API, Scraped from Google Search via four proxies, and scraped from LatLong.
  3. Airport location information was scraped from Falling Rain.
  4. Worldwide historical weather data was scraped from Wunderground using closest airport code as the key.
  5. Aedes aegypti and Aedes albopictus occurrences were from Dryad. See references below for manuscripts related to this data.
  6. Worldwide population density was from the NASA Socioeconomic Data and Applications Center (SEDAC) Gridded Population and Population Density of the World.
  7. World GDP and purchase parity adjusted GDP from 2015 were scraped from Knoema.
  8. Flight patterns were scraped from FlightRadar24, however this data was not incorporated into the model due to time limitations.

References for Aedes aegypti and Aedes albopictus occurrences

Kraemer MUG, Sinka ME, Duda KA, Mylne A, Shearer FM, Brady OJ, Messina JP, Barker CM, Moore CG, Carvalho RG, Coelho GE, Van Bortel W, Hendrickx G, Schaffner F, Wint GRW, Elyazar IRF, Teng H, Hay SI (2015) The global compendium of Aedes aegypti and Ae. albopictus occurrence. Scientific Data 2(7): 150035. http://dx.doi.org/10.1038/sdata.2015.35

Kraemer MUG, Sinka ME, Duda KA, Mylne A, Shearer FM, Brady OJ, Messina JP, Barker CM, Moore CG, Carvalho RG, Coelho GE, Van Bortel W, Hendrickx G, Schaffner F, Wint GRW, Elyazar IRF, Teng H, Hay SI (2015) Data from: The global compendium of Aedes aegypti and Ae. albopictus occurrence. Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.47v3c

About

Prediction of Zika outbreaks using supervised machine learning, PostgreSQL, and visualization with D3

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published