# Project 4: Predict West Nile Virus
### Section 7: Cost-Benefit Analysis

## Problem Statement

1. As an employee of Disease And Treatment Agency, division of Societal Cures In Epidemiology and New Creative Engineering (DATA-SCIENCE), we are tasked to better understand the mosquito population and advise on appropriate interventions which are beneficial and cost-effective for the city.


2. Through this project, we hope to:
- Identify features which are most important to predict presence of West Nile Virus (which can be done by ranking the coefficients of each feature in a logistic regression model)
- Predict the probability of West Nile Virus by location to provide decision makers an effective plan to deploy pesticides throughout the city, which consequently can help to reduce cost.

## Import Libraries

In [60]:
#!pip install shapely
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from shapely import geometry
from shapely.geometry import Point, Polygon
import geopandas as gpd
from datetime import timedelta
import math
import datetime as dt

In [61]:
# Set chart style
plt.style.use('fivethirtyeight')

## Using the geo_traintest dataframe

Using the geo_traintest dataframe in notebook 2 apply our analysis of infection rates and sensitivity to a target year of choice - 2014. Therefore, the top half of this notebook is extracted from notebook 2 to acheive the same dataframe.  

In [62]:
geo_traintest = pd.read_csv('../data/geo_traintest.csv', index_col='Unnamed: 0')

## Calculation of infection rate

#### Trap prevalence of WNV

This shows out of all the traps surveyed in that year, how many of them have WNV presence.

In [63]:
geo_traintest.groupby('year')['wnvpresent'].mean()

year
2007    0.070182
2008         NaN
2009    0.010096
2010         NaN
2011    0.029709
2012         NaN
2013    0.097263
2014         NaN
Name: wnvpresent, dtype: float64

In [64]:
prev_07 = 0.070182
prev_09 = 0.010096
prev_11 = 0.029709
prev_13 = 0.097263

#### Population Size
https://www.biggestuscities.com/il/2007
Gov data on population count
https://www.opendatanetwork.com/entity/1600000US1714000/Chicago_IL/demographics.population.count?year=2011

In [65]:
pop_2007 = 2_811_035
pop_2009 = 2_824_064
pop_2011 = 2_700_741
pop_2013 = 2_706_101

#### Number of people with WNV by years 

https://www.chicago.gov/content/dam/city/depts/cdph/food_env/general/West_Nile_Virus/WNV_2018databrief_FINALJan102019.pdf
Previous infected number of people
https://www.chicago.gov/content/dam/city/depts/cdph/statistics_and_reports/CDInfo_2013_JULY_WNV.pdf

In [66]:
inf_2007 = 10
inf_2009 = 1
inf_2011 = 8
inf_2013 = 37

#### Drawing relations: Rate of infection by proportion of traps that had WNV

The prevalence of traps with WNV multiplied by the number of people population in Chicago is not sufficient to tell us how many people will get infected with WNV. When multiplied by an infection rate, we will be able to get the number of infected people. Therefore, infection rate is the number of infected people, divided by prevalence rate of that year, and the population of that year.

In [67]:
a = inf_2007 / (prev_07 * pop_2007)
a

5.068833276556648e-05

In [68]:
b = inf_2009 / (prev_09 * pop_2009)
b

3.507325909316161e-05

In [69]:
c = inf_2011 / (prev_11 * pop_2011)
c

9.970547703884652e-05

In [70]:
d  = inf_2013 / (prev_13 * pop_2013)
d

0.0001405756374071024

We will take the average of these 4 years to get an estimated infection rate in Chicago.

In [71]:
infection_rate = (a+b+c+d)/4
infection_rate

8.151067657616925e-05

## Cost of spraying

Type of spray: Zenivex
https://www.chicago.gov/city/en/depts/cdph/provdrs/healthy_living/news/2021/august/city-to-spray-insecticide-wednesday-to-kill-mosquitoes.html

In [72]:
geo_traintest.groupby('year')['wnvpresent'].mean()

year
2007    0.070182
2008         NaN
2009    0.010096
2010         NaN
2011    0.029709
2012         NaN
2013    0.097263
2014         NaN
Name: wnvpresent, dtype: float64

In [73]:
#converting acres to m^2
acre_to_m2 = 4046.86

It costs 0.67c to spray one acre of land according to  information by Zenivex, the government chosen pesticide for mosquito control due to its non-toxicity to humans

In [74]:
cost_to_spray = 0.67 #for each acre

In [75]:
#cost to spray per 1000m2
cost_to_spray_m2 = 0.67 / acre_to_m2 

In [76]:
cost_to_spray_m2

0.00016556045922023495

### Medical Cost per individual

Using data from this website that details cost to America over 1999-2012 data that confluence of factors such as the physician's cost, diagnostic test cost, and productivity costs, and long term costs such as caregiving.

https://www.sciencedaily.com/releases/2014/02/140210184713.htm#:~:text=In%20a%20study%20of%20the,care%20expenditures%20and%20lost%20productivity

In [77]:
cost_indiv = 778_000_000 / 37_088
cost_indiv

20977.13546160483

## 2014 test data (with 50% threshold to be counted as positive)

In [78]:
pop_2014 = 2712608

In [79]:
prev_14_50 = 0.201006 #from our test data, after being fitted with our model of 50%, the prevalence of traps with wnv
                        # is about 0.2

In [80]:
inf_2014_50 = infection_rate * prev_14_50 * pop_2014
inf_2014_50
#This will give the number of people infected in 2014

44.44373582563198

#### Number of lives impacted 

In [81]:
num_impact_50 = inf_2014_50 * 0.772 #our sensitivity value is 0.772 for our final model with default threshold
num_impact_50

34.31056405738789

#### Cost saved of lives impacted

In [82]:
num_impact_50 * cost_indiv

719737.3499958956

If only sprayed around 1km radius, when the traps are tested as positive for WNV

In [83]:
cost_radius = 608 * 1000 * 1000 * 3.1415 * cost_to_spray_m2 #608 traps considered to be positive due to increased sensitivity
cost_radius

316225.7750453438

In [84]:
719737.3499958956 - 316225.7750453438


403511.57495055185

The amount saved is close to 403,511 dollars, therefore we would definitely go ahead with the spraying of mosquito adulticide.


## 2014 test data (with 30% threshold to be counted as positive)

In [85]:
prev_14_30 = 0.245156 #from our test data, after being fitted with our model of 30% 

In [86]:
inf_2014_30 = infection_rate * prev_14_30 * pop_2014
inf_2014_30
#This will give the number of people infected in 2014

54.205588390737766

In [87]:
num_impact_30 = inf_2014_30 * 0.893 #our sensitivity value is 0.893 for our final model with 30% threshold
num_impact_30

48.405590432928825

In [88]:
num_impact_30 * cost_indiv

1015410.6276105108

In [89]:
cost_radius = 736 * 1000 * 1000 * 3.1415 * cost_to_spray_m2 #706 traps considered as positive due to increased sensitivity
cost_radius

382799.62242331094

In [90]:
1015410.6276105108 - 382799.62242331094

632611.0051871999

We will save about 632,611 in cost, which is about 230K more than if we had set our threshold to be 50%. Therefore, we will want sensitivity to increase by setting our threshold to be about 30%, and apply the spray accordingly.

# Conclusion and Recommendations

#### Weeks and interaction of weather features which are most important to predict presence of West Nile Virus 

- Probability of WNV increases exponentially from week 31 to 35, especially when there is a combination of high temperature and precipitation (rainfall). 
- Certain traps have higher chance of contracting WNV (e.g. T900, T003, T028)

#### We recommend using Logistic Regression with 30% threshold to predict WNV

- 89% of WNV+ is being captured our model (based on sensitivity)
- 15% of our positive predictions are correct

#### More cost-effective to spray the city 

- Benefits to public health are economically significant when compared to the cost of eradicating mosquitoes
- About 14 more true positives can be prevented in 2014, preventing medical costs
- Cost of spray is lower than short and long term medical costs


## Next steps:

- Further tune parameters to improve accuracy
    - More weather features and their interaction
    - Grouping of traps based on risk level (likelihood of WNV)
