In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [2]:
weather = pd.read_csv('../../data/weather_clean.csv', parse_dates=['date'])

In [3]:
weather

Unnamed: 0,date,tmax,tmin,tavg,depart,dewpoint,wetbulb,heat,cool,sunrise,sunset,codesum,preciptotal,stnpressure,sealevel,resultspeed,resultdir,avgspeed
0,2007-05-01,83,51,67,14,51,56,0,2,448,1849,set(),0.00,29.14,29.82,2.20,26,9.40
1,2007-05-02,59,42,51,-2,42,47,13,0,447,1850,"{'HZ', 'BR'}",0.00,29.41,30.08,13.15,3,13.40
2,2007-05-03,66,47,57,3,40,49,8,0,446,1851,{'HZ'},0.00,29.42,30.12,12.30,6,12.55
3,2007-05-04,72,50,61,7,41,50,4,0,444,1852,{'RA'},0.00,29.34,30.04,10.25,7,10.60
4,2007-05-05,66,53,60,5,38,49,5,0,443,1853,set(),0.01,29.43,30.10,11.45,7,11.75
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1467,2014-10-27,78,52,65,17,51,58,0,1,618,1653,{'RA'},0.01,28.96,29.66,12.35,19,13.25
1468,2014-10-28,67,46,57,10,39,47,8,0,619,1651,{'RA'},0.02,29.19,29.85,14.40,26,15.10
1469,2014-10-29,49,38,44,-3,33,41,21,0,620,1650,set(),0.00,29.39,30.06,9.00,29,9.45
1470,2014-10-30,52,34,43,-2,34,41,21,0,622,1649,{'RA'},0.00,29.38,30.10,5.50,23,6.00


## Relative Humidity

[Studies have shown](https://www.cabdirect.org/cabdirect/abstract/19302901857) that relative humidity has a noticeable effect on biting stimulus, lifespan & aestivation period of mosquitoes. We will be creating a new feature reflecting the [relative humidity](https://www.wikihow.com/Calculate-Humidity) of each observation using the `dewpoint` and `tavg` columns.





## Time Lagged Temperature

According to a [study](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4342965/#:~:text=The%20effects%20of%20weather%20fluctuations,infection%20%5B43%2C81%5D), increases in mean weekly temperature were associated with significantly higher incidences of reported WNF infection. In order to include this potential relationship in our prediction model, we will create a new column indicating the difference in mean temperature from the week before.

## Rolling Temporal Features

The typical incubation period for _Culex_ species mosquitoes is [7-10 days](https://www.cdc.gov/mosquitoes/about/life-cycles/culex.html). However, the length of each stage has been reported to be affected by certain external factors, and the life cycle may be extended to [a month long](https://www.mosquito.org/page/lifecycle). However, our data only describes external factors on the day of the observation itself, which is not very useful in certain cases.

In order to capture any relationships related to the mosquitoes' incubation period, we will create new columns monitoring the rolling average for some features across 3 periods - 5, 14 & 28 days.

1. `tavg`
- External temperature has been known to be one of the most important environmental features in affecting mosquitoes' lifespans & breeding characteristics   
2. `rel_humid`
- As discussed above, relative humidity has a noticeable effect on certain characteristics of mosquitoes.
3. `avgspeed`
- Humans & mosquitoes are not the only hosts for the WNV. [Birds](https://academic.oup.com/jme/article/56/6/1467/5572129) have also been shown to be one of the main carriers of the virus, so it is worth exploring features that may affect bird populations in the area, such as average wind speed.
4. `preciptotal`
- Precipitation has been shown to have both [positive](https://www.sciencedirect.com/science/article/abs/pii/S0022519317301431) and [negative](https://pubmed.ncbi.nlm.nih.gov/18283939/) effects on mosquito populations, depending on rainfall intensity. This non-linear relationship may be explored further across time periods.