# Problem Statement

Welcome to your first week of work at the Disease And Treatment Agency, division of Societal Cures In Epidemiology and New Creative Engineering (DATA-SCIENCE). Time to get to work!

Due to the recent epidemic of West Nile Virus in the Windy City, we've had the Department of Public Health set up a surveillance and control system. We're hoping it will let us learn something from the mosquito population as we collect data over time. Pesticides are a necessary evil in the fight for public health and safety, not to mention expensive! We need to derive an effective plan to deploy pesticides throughout the city, and that is exactly where you come in!

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [6]:
spray = pd.read_csv('assets/spray.csv')
test = pd.read_csv('assets/test.csv')
train = pd.read_csv('assets/train.csv')
weather = pd.read_csv('assets/weather.csv')

In [5]:
spray

Unnamed: 0,Date,Time,Latitude,Longitude
0,2011-08-29,6:56:58 PM,42.391623,-88.089163
1,2011-08-29,6:57:08 PM,42.391348,-88.089163
2,2011-08-29,6:57:18 PM,42.391022,-88.089157
3,2011-08-29,6:57:28 PM,42.390637,-88.089158
4,2011-08-29,6:57:38 PM,42.390410,-88.088858
...,...,...,...,...
14830,2013-09-05,8:34:11 PM,42.006587,-87.812355
14831,2013-09-05,8:35:01 PM,42.006192,-87.816015
14832,2013-09-05,8:35:21 PM,42.006022,-87.817392
14833,2013-09-05,8:35:31 PM,42.005453,-87.817423


In [20]:
print(f'spray shape is {spray.shape}')
print(f'train shape is {train.shape}')
print(f'test shape is {test.shape}')
print(f'weather shape is {weather.shape}')

spray shape is (14835, 4)
train shape is (10506, 12)
test shape is (116293, 11)
weather shape is (2944, 22)


In [22]:
print(f'spray has {spray.isnull().sum().sum()} null values')
print(f'train has {train.isnull().sum().sum()} null values')
print(f'test has {test.isnull().sum().sum()} null values')
print(f'weather has {weather.isnull().sum().sum()} null values')

spray has 584 null values
train has 0 null values
test has 0 null values
weather has 0 null values


In [24]:
weather.columns

Index(['Station', 'Date', 'Tmax', 'Tmin', 'Tavg', 'Depart', 'DewPoint',
       'WetBulb', 'Heat', 'Cool', 'Sunrise', 'Sunset', 'CodeSum', 'Depth',
       'Water1', 'SnowFall', 'PrecipTotal', 'StnPressure', 'SeaLevel',
       'ResultSpeed', 'ResultDir', 'AvgSpeed'],
      dtype='object')

In [25]:
spray.columns

Index(['Date', 'Time', 'Latitude', 'Longitude'], dtype='object')

In [26]:
train.columns

Index(['Date', 'Address', 'Species', 'Block', 'Street', 'Trap',
       'AddressNumberAndStreet', 'Latitude', 'Longitude', 'AddressAccuracy',
       'NumMosquitos', 'WnvPresent'],
      dtype='object')

In [42]:
train.loc[:,['Date', 'Block', 'Street', 'Latitude', 'Longitude', 'Trap', 'AddressAccuracy', 'WnvPresent']]

Unnamed: 0,Date,Block,Street,Latitude,Longitude,Trap,AddressAccuracy,WnvPresent
0,2007-05-29,41,N OAK PARK AVE,41.954690,-87.800991,T002,9,0
1,2007-05-29,41,N OAK PARK AVE,41.954690,-87.800991,T002,9,0
2,2007-05-29,62,N MANDELL AVE,41.994991,-87.769279,T007,9,0
3,2007-05-29,79,W FOSTER AVE,41.974089,-87.824812,T015,8,0
4,2007-05-29,79,W FOSTER AVE,41.974089,-87.824812,T015,8,0
...,...,...,...,...,...,...,...,...
10501,2013-09-26,51,W 72ND ST,41.763733,-87.742302,T035,8,1
10502,2013-09-26,58,N RIDGE AVE,41.987280,-87.666066,T231,8,0
10503,2013-09-26,17,N ASHLAND AVE,41.912563,-87.668055,T232,9,0
10504,2013-09-26,71,N HARLEM AVE,42.009876,-87.807277,T233,9,0


In [41]:
train.loc[0:1,:]

Unnamed: 0,Date,Address,Species,Block,Street,Trap,AddressNumberAndStreet,Latitude,Longitude,AddressAccuracy,NumMosquitos,WnvPresent
0,2007-05-29,"4100 North Oak Park Avenue, Chicago, IL 60634,...",CULEX PIPIENS/RESTUANS,41,N OAK PARK AVE,T002,"4100 N OAK PARK AVE, Chicago, IL",41.95469,-87.800991,9,1,0
1,2007-05-29,"4100 North Oak Park Avenue, Chicago, IL 60634,...",CULEX RESTUANS,41,N OAK PARK AVE,T002,"4100 N OAK PARK AVE, Chicago, IL",41.95469,-87.800991,9,1,0


In [29]:
weather.columns

Index(['Station', 'Date', 'Tmax', 'Tmin', 'Tavg', 'Depart', 'DewPoint',
       'WetBulb', 'Heat', 'Cool', 'Sunrise', 'Sunset', 'CodeSum', 'Depth',
       'Water1', 'SnowFall', 'PrecipTotal', 'StnPressure', 'SeaLevel',
       'ResultSpeed', 'ResultDir', 'AvgSpeed'],
      dtype='object')

In [28]:
weather['Date']

0       2007-05-01
1       2007-05-01
2       2007-05-02
3       2007-05-02
4       2007-05-03
           ...    
2939    2014-10-29
2940    2014-10-30
2941    2014-10-30
2942    2014-10-31
2943    2014-10-31
Name: Date, Length: 2944, dtype: object

In [34]:
test.columns

Index(['Id', 'Date', 'Address', 'Species', 'Block', 'Street', 'Trap',
       'AddressNumberAndStreet', 'Latitude', 'Longitude', 'AddressAccuracy'],
      dtype='object')

In [37]:
test['Date']

0         2008-06-11
1         2008-06-11
2         2008-06-11
3         2008-06-11
4         2008-06-11
             ...    
116288    2014-10-02
116289    2014-10-02
116290    2014-10-02
116291    2014-10-02
116292    2014-10-02
Name: Date, Length: 116293, dtype: object

In [4]:
plt.figure(figsize=(15,20))
origin = [41.6, -88.0]              # lat/long of origin (lower left corner)
upperRight = [42.1, -87.5]          # lat/long of upper right corner

mapdata = np.loadtxt("./assets/mapdata_copyright_openstreetmap_contributors.txt")

intersection = [41.909614, -87.746134]  # co-ordinates of intersection of IL64 / IL50 according to Google Earth


# generate plot
plt.imshow(mapdata, cmap=plt.get_cmap('gray'), extent=[origin[1], upperRight[1], origin[0], upperRight[0]])
plt.scatter(x=spray_longs, y=spray_lats, c='r', s=20)
plt.scatter(x=train_longs, y=train_lats, c='b', s=60, marker='s')

#plt.show()
plt.savefig('map.png')

OSError: ./assets/mapdata_copyright_openstreetmap_contributors.txt not found.

<Figure size 1080x1440 with 0 Axes>

In [5]:
train_longs = train['Longitude']
train_lats = train['Latitude']

spray_longs = spray['Longitude']
spray_lats = spray['Latitude']

NameError: name 'train' is not defined

In [6]:
plt.figure(figsize=(15,20))
origin = [41.6, -88.0]              # lat/long of origin (lower left corner)
upperRight = [42.1, -87.5]          # lat/long of upper right corner

mapdata = np.loadtxt("./assets/mapdata_copyright_openstreetmap_contributors.txt")


# generate some data to overlay
numPoints = 50
lats = (upperRight[0] - origin[0]) * np.random.random_sample(numPoints) + origin[0]
longs = (upperRight[1] - origin[1]) * np.random.random_sample(numPoints) + origin[1]

intersection = [41.909614, -87.746134]  # co-ordinates of intersection of IL64 / IL50 according to Google Earth


# generate plot
plt.imshow(mapdata, cmap=plt.get_cmap('gray'), extent=[origin[1], upperRight[1], origin[0], upperRight[0]])
plt.scatter(x=longs, y=lats, c='r', s=20)
plt.scatter(x=intersection[1], y=intersection[0], c='b', s=60, marker='s')

#plt.show()
plt.savefig('map.png')

OSError: ./assets/mapdata_copyright_openstreetmap_contributors.txt not found.

<Figure size 1080x1440 with 0 Axes>

In [1]:
abcef = 1235
abcdef

NameError: name 'abcdef' is not defined