Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



1 Commits

Repository files navigation

Project 4 - West Nile Virus in the City of Chicago

by Mudassir Mayet, Rajan Davis and Riordan Tenney

Our final project is located in the completed directory.

All of our Jupter Notebooks can be found within the code directory.

All of our kaggle submissions can be found within the kaggle directory.

All of our pickles and such (although most are not included due to size) can be found within assets directory.

All of data we used, along with the maps we plotted on can be found within the data directory.

Our data dictionary is located here at data dictionary directory.

Data Science Problem/ Mission Statement

In this group project, we set out to determine which factors most heavily influenced West Nile Virus in mosquito populations in the city of Chicago.

Relying on datasets from a Kaggle competition and some outside research, we were able to determine which factors contributed to the likelihood of WNV appearing in a given trap. We hypothesized that there would be a strong correlation between the number of mosquitos in a given trap and the presence of WNV. Moreover, we were able to were able to determine, with some degree of accuracy, whether or not the virus would be present. This information would be extremely valuable to the CDC as well as the Chicago Department of Environmental Health, which jointly manage the problem of WNV in the city.


We began by cleaning the three datasets on which we would base all of our models--weather data, mosquito spray data and our training data. The weather data contained detailed weather information from the years 2007-2014; spray data contained spatial and temporal information from the years 2011 and 2013; training data contained spatial and temporal information from the even years of 2008-2014.

After reducing features and filling in missing values, we combined these dataframes in order to model. We each developed our initial models independently, with varying degrees of success. We then built on our strongest model, which used a RandomForestClassifier, by feature engineering rolling means of weather data and by using Principal Component Analysis. These are the scores of all of our models:

Model ROC/AUC Score
ExtraTreesClassifier 0.53299
LogisticRegression 0.64892
RandomForestClassifier 0.71922
RandomForest w/ PCA 0.71974


In our final model, these were the features with the greatest weights:

Feature Weight
0 Species 0.008183
1 Street 0.343346
2 Trap 0.389918
3 Lat_int 0.014845
4 Long_int 0.000000
5 ResultSpeed_21 0.044472
6 PrecipTotal_15 0.030347
7 DewPoint_16 0.047255
8 AvgSpeed_19 0.034524
9 Heat_28 0.030297
10 Tmax_4 0.029762
11 Tmin_8 0.027050

Recommendations/Cost-Benefit Analysis

Chicago's department of Environmental Health and Safety, which oversees mosquito abatement, spent ~$1m attacking the problem of WNV in 2014, the last year that we have information on:

In 2017, the budget of this department has grown to over $2 million.

We think that this money would be better spent early in the mosquito season, rather than spraying insecticide after adult mosquito populations peak.

In fact, we want to rethink the spraying approach entirely. In 2013, for instance, WNV already appears in late June whereas spraying doesn't begin until July 17th, well after a sizable mosquito population was established in the city. For spraying to be at all effective, it must begin earlier in the season--we recommend the first week of August.

We've also noticed that each year, mosquito populations begin to appear at around the area of ORD--this may be due to a topographic feature of the landscape or some other factor, but in any case populations seem to "begin" at around this point and "spread" east to the areas immediately adjacent to Lake Michigan. Focusing our initial spraying at this location would make sense.

Looking at the mosquito population alone doesn't give us the best insight into the spread of WNV. We also need to take into account the sparrow population--this species of invasive bird is involved in a cyclical transmission of WNV to mosquitos and vice versa. Mitigating the sparrow population might reduce impact of WNV on human populations.

Long-Term Goals

Given the recent advancements in CRISPR technologies, we feel that there is a strong case to be made for further research and development in this area. This is a great deal more expensive in the short term, (something along the lines of $100m instead of $1m) but would save money in the long run after implementation. We also expect the cost of these technologies to come down significantly in the coming years/decades.


No description, website, or topics provided.






No releases published


No packages published