Data comes from the source here: 
https://data.cityofchicago.org/Health-Human-Services/West-Nile-Virus-WNV-Mosquito-Test-Results/jqe8-8r6s/about_data

This dataset tracks the presence of West Nile Virus* (WNV) in mosquito populations across Chicago. 
Traps were placed around Chicago to help track the amount of mosquitoes in different areas.
They are then grouped into pools, and tested in a lab to determine if any in the group carry WNV. 
Each row represents one test of a mosquito pool from a specific trap on a specific date, with 
results indicating whether WNV was detected. This surveillance data helps public health officials 
monitor mosquito-borne virus activity, assess risk to humans, and guide mosquito control efforts.

Ideally, the main goal is to build an accurate predictive model to help predict when there are 
spikes in WNV based on the historical data gathered thus far, which can help city officials more 
efficiently and effectively allocate resources towards preventing transmission of this potentially 
deadly virus.


*West Nile Virus (WNV) is an infectious disease that was discovered in 1937 in the West Nile region 
of Uganda. It started spreading to the United States in 1999, via infected mosquitos. Most people
who are infected do not have symptoms (at least initially), but some do; Said symptoms include
fever, headaches, body aches, and skin rashes. In rare circumstances, WNV can be life-threatening 
if it enters the brain, causing Encephalitis (inflammation to the brain). Unfortunately there are no
vaccines or treatments available, so the best way way to prevent getting WNV is to avoid mosquito 
bites altogether.

The City of Chicago and the Chicago Public Health Department (CPHD) have resorted to spraying areas 
when WNV is detected to kill the infected mosquites and prevent the virus' spread. 

In [None]:
# Library imports
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import re
import requests
import time

from matplotlib import cm
from sklearn import preprocessing, tree, ensemble, linear_model, metrics, model_selection, svm
from sodapy import Socrata

%matplotlib inline


Below, I get the mosquito trap data from Chicago's Open Data API.

In [None]:
# TODO: Pull new data from API and see how results differ. 
# TODO: Check if data format changes. API states that data may be formatted differently in the 
# future.

# Define client to access the City of Chicago's Open Data API
client = Socrata("data.cityofchicago.org", None)

# Get data from API, limit of first 50k records (returned as a list of dictionaries)
# String parameter is the dataset identifier based on the URL
# https://data.cityofchicago.org/Health-Human-Services/West-Nile-Virus-WNV-Mosquito-Test-Results/jqe8-8r6s/about_data
results = client.get("jqe8-8r6s", limit=50000)

# Convert to pandas DataFrame
results_df = pd.DataFrame.from_records(results)

# Having issues with weather data beyond 2014, so data is limited to before 2015
results_df['date'] = pd.to_datetime(results_df['date'])

# Subset for dates before 2014
results_df = results_df[results_df['date'] < '2015-01-01']

# For convienence, save a static copy (to ensure data stays the same throughout my work on this 
# project). But ideally, it would be best to automatically pull the latest data from the API.
results_df.to_pickle("../data/mosquito_data.pkl")



In [None]:
# Retrieve the static copy of the mosquito trap data
mosquito_data = pd.read_pickle("../data/mosquito_data.pkl")

Below, I get associated weather/spray data for the Chicago area. However, I had issues with getting
the right data from an API connection, so I ended up just saving the data manually, specifically 
bewteen 2007 and 2014, when available.

In [None]:
# TODO: Retrieve weather data from NOAA API, and spray data from GIS API

# Data from https://www.ncdc.noaa.gov/cdo-web/datatools/findstation, for Chicago O'Hare and Midway
# TODO: Put this in readme, not here
weather_data = pd.read_pickle("../data/weather.pkl")
spray_data = pd.read_pickle("../data/spray.pkl")