# Introduction

This notebook is an attempt at basic EDA and plotting maps. We were asked to do a causal analysis on this dataset as an assignment. 
Looking at the data, I did not know what interesting plots I could come up with until I saw lattitude and longitude columns

In [None]:
!unzip -oq ../input/predict-west-nile-virus/spray.csv.zip
!unzip -oq ../input/predict-west-nile-virus/train.csv.zip
!unzip -oq ../input/predict-west-nile-virus/weather.csv.zip

I am using plotly+mapbox for plotting maps.
Read more about it [here](https://plotly.com/python/mapbox-layers/)

To run this notebook, you will need to register for a free account at https://mapbox.com/ and obtain a Mapbox Access token. <br></br>
Paste the access token into the MAPBOX_TOKEN variable below.

In [None]:
import os
import plotly.express as px
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objects as go

MAPBOX_TOKEN = ''

In [None]:
for files in os.listdir():
    print(files) if files.endswith('csv') else None

In [None]:
train = pd.read_csv('train.csv')
spray = pd.read_csv('spray.csv')
weather = pd.read_csv('weather.csv')

In [None]:
train.columns

In [None]:
train.describe()

In [None]:
train.WnvPresent.value_counts()

This is an Imbalanced Dataset, Most cases have West Nile Virus absent.
We will explore areas of the city where West Nile Virus is present

# Plotting the satellite traps on the map

In [None]:
# If maps are not being displayed, please re-run the cell

In [None]:
px.set_mapbox_access_token(MAPBOX_TOKEN)
fig = px.scatter_mapbox(train, lat = 'Latitude', lon  = 'Longitude',
                        size_max=15, zoom = 10)

fig.update_layout(title = 'Traps',
    autosize=False,
    width=500,
    height=700,)

fig.show()

In [None]:
spray.describe()

# Areas where mosquito repellant was sprayed

In [None]:
px.set_mapbox_access_token(MAPBOX_TOKEN)

fig = px.scatter_mapbox(spray, lat = 'Latitude', lon  = 'Longitude',
                     animation_frame="Date",
                        size_max=15, zoom = 9)

fig.update_layout(
    title="Spray day-wise",
        width=500,
    height=700,
)

fig.show()

In [None]:
mosquito_count = train.groupby(['Address'], as_index = False)[['NumMosquitos']].sum()

In [None]:
areas = train.groupby(['Address'], as_index = False)[['Latitude','Longitude']].median()

In [None]:
wnv = train.groupby(['Address'], as_index = False)[['WnvPresent']].sum() 
# sum() because it has either 0 or 1 values. adding ones will give us total cases in an area.

In [None]:
mosquito_areas_wnv = pd.concat([mosquito_count,areas, wnv], axis = 1)

In [None]:
mosquito_areas_wnv.drop('Address', axis = 1, inplace = True)

# Plotting number of Mosquitos and areas where West Nile Virus is present

In [None]:
fig = px.scatter_mapbox(mosquito_areas_wnv, lat = 'Latitude', lon  = 'Longitude', color = 'WnvPresent',
                        size = 'NumMosquitos', color_continuous_scale=px.colors.cyclical.IceFire,
                        hover_data = ['NumMosquitos', 'WnvPresent'],
                       zoom = 9)
fig.show()


Two regions are seen black and yellow. Hover on top of them for more details. <br></br>
Black region: 66 cases <br></br>
Yellow region: 41 cases <br></br>

# Analyzing effectiveness of spray

In [None]:
fig = px.scatter_mapbox(spray, lat = 'Latitude', lon  = 'Longitude',#animation_frame = 'Date',
                        size_max=15, zoom = 9,color_discrete_sequence=["palegoldenrod"],  opacity = 0.5)

fig2 = px.scatter_mapbox(mosquito_areas_wnv, lat = 'Latitude', lon  = 'Longitude', color = 'WnvPresent',
                        size = 'NumMosquitos', color_continuous_scale=px.colors.cyclical.IceFire,
                        hover_data = ['NumMosquitos', 'WnvPresent'],
                       zoom = 9)

fig.add_trace(fig2.data[0],)

fig.update_layout( title = 'Spray - West Nile Virus and Mosquito clusters',
                width=500,
    height=700,
)

Spraying is effective. Areas that were sprayed, have very less virus cases but not spraying in correct areas caused more West Nile Virus cases <br></br>

In [None]:
fig2 = px.scatter_mapbox(mosquito_areas_wnv, lat = 'Latitude', lon  = 'Longitude', color = 'WnvPresent',
                        size = 'NumMosquitos', color_continuous_scale=px.colors.cyclical.IceFire,
                        hover_data = ['NumMosquitos', 'WnvPresent'],
                       zoom = 9)

fig3 = px.scatter_mapbox(train, lat = 'Latitude', lon  = 'Longitude',
                        size_max=15, zoom = 10, color_discrete_sequence = ['lemonchiffon'])

#below is one way to plot multiple graphs on the same plot. 
#print figure object as is to see the elements inside
fig2.add_trace(fig3.data[0]) 

fig2.update_layout(mapbox_style='dark')

fig2.update_layout( title = 'Traps - West Nile Virus and Mosquito clusters',
                width=500,
    height=700,)


But traps were setup, in all places with West Nile Virus. 
Especially in the big clusters.
Cant say conclusively if Traps are helping.

# Analyzing species

In [None]:
species_vs_virus = train[['Species', 'WnvPresent']].groupby('Species', as_index = False).sum()

In [None]:
species_vs_virus

In [None]:
fig = px.bar(species_vs_virus, x = 'Species', y = 'WnvPresent')
fig.update_layout(
    title="West Nile Virus count vs Species",
    xaxis_title="Species",
    yaxis_title="West Nile Virus Present",)
fig.show()


CULEX PIPIENS cause most west nile virus

# Analyzing impact of weather on mosquitos and disease

In [None]:
weather.head()

In [None]:
weather['Tavg'].unique()

In [None]:
weather[weather['Tavg']=='M']
# Only 11 rows, so drop these for now.

In [None]:
weather.drop(weather[weather['Tavg']=='M'].index, axis = 0, inplace = True)

In [None]:
weather.reset_index(drop = True)

In [None]:
weather.columns

In [None]:
weather['Tavg'] = weather['Tavg'].astype(int) 

In [None]:
weather_imp = weather.groupby(['Date'], as_index = False)[['Tavg']].mean()

In [None]:
weather_imp

In [None]:
mosquitos_date_wise = train.groupby(['Date'], as_index = False)[['NumMosquitos']].sum()
wnv_date_wise = train.groupby(['Date'], as_index = False)[['WnvPresent']].sum()

In [None]:
wnv_mosquitos_dw = pd.merge(mosquitos_date_wise,wnv_date_wise, on = 'Date')
weather_df = pd.merge(wnv_mosquitos_dw, weather_imp)

In [None]:
weather_df

In [None]:
fig = px.scatter(weather_df, x="Tavg", y="NumMosquitos",
                 size='WnvPresent')

fig.update_layout(
    title="Mosquitos vs Average temperature",
    xaxis_title="Average Temperature in Fahrenheit",
    yaxis_title="Number of Mosquitos",)
fig.show()

Mosquitos and West Nile Virus are less prevalent on colder days. Looks like higher temperature suits them. Similar analysis can be done on heat, precipitation etc.

**Conclusions:** <br></br>
Spraying needs to be done effectively. Not spraying in correct areas caused more West Nile Virus cases <br></br>
The disease and mosquitos thrive on Hotter days<br></br>
CULEX PIPIENS/RESTUANS Species of mosquito is the cause for most cases.