# IEEE MEGA PROJECT

**Team Name: BetaTech**                           
**Team Leader: Mollika Garg**                           
**Email Id: mollika.garg@gmail.com**

**Team Member: Shreya Sharma**                
**Email Id: shreyasharma.1510001@gmail.com**

**Team Member: Koushiki Chakrabarti**                     
**Email Id: koushikichakrabarti@gmail.com**

### PROJECT DETAILS

**Domain: Machine Learning**                                              
**Project Name: Tackling Dengue Cases**   

### PROJECT DESCRIPTION
Predict dengue cases from climate and determine potential dengue hotspots by detecting stagnant water areas from satellite data. Make ML algorithms predict the number of dengue cases based on climate factors and use thresholding techniques to predict stagnant water hotspots by using satellite data.

In [1]:
##IMPORTS

# used for manipulating directory paths
import os

# used to analyze data
import pandas as pd

# scientific and vector computation for python
import numpy as np

# for image visualisation
from matplotlib import pyplot as plt

# encode target labels 
from sklearn.preprocessing import LabelEncoder

# performs the task of Standardization
from sklearn.preprocessing import StandardScaler

# to find the error
from sklearn.metrics import mean_absolute_error

# used for training SVM
from sklearn.svm import SVR

# used for training KNN
from sklearn.neighbors import KNeighborsRegressor

# used for training Random Forest
from sklearn.ensemble import RandomForestRegressor

### READING DATA

In [2]:
## Read Data

dengue_features = pd.read_excel("C:\\Users\\molli\\OneDrive\\Documents\\Dengue_Data.xlsx")
dengue_labels= dengue_features["No. of cases"]
dengue_features=dengue_features.drop(labels="No. of cases",axis=1)

In [3]:
## Displaying head of the data

dengue_features.head()

Unnamed: 0,City,Year,Month,Precipitation Amt.(mm),Humidity(%),Avg Temp.(F),Previous Cases
0,New Delhi,2016,Feb,1.4,68,66.2,0
1,New Delhi,2016,Mar,17.8,61,77.0,0
2,New Delhi,2016,Apr,0.6,37,89.6,2
3,New Delhi,2016,May,30.8,47,93.2,5
4,New Delhi,2016,Jun,17.4,59,93.2,6


In [4]:
## Change avg temperage from farehneit to celcius

dengue_features["Avg Temp.(F)"]=(dengue_features["Avg Temp.(F)"]- 32)* 5/9
dengue_features["Avg Temp.(C)"]=dengue_features["Avg Temp.(F)"]
dengue_features=dengue_features.drop(labels="Avg Temp.(F)", axis=1)

In [5]:
## Encoding labels

lmap={"Jan":0,"Feb":1,"Mar":2,"Apr":3,"May":4,"Jun":5, "Jul":6, "Aug":7, "Sep":8, "Oct":9, "Nov":10, "Dec":11}
dengue_features["Month "]=dengue_features["Month "].map(lmap)

In [6]:
## Droping the feature 'City'

dengue_features=dengue_features.drop("City", axis=1)

In [7]:
## Displaying head of the data

dengue_features.head()

Unnamed: 0,Year,Month,Precipitation Amt.(mm),Humidity(%),Previous Cases,Avg Temp.(C)
0,2016,1,1.4,68,0,19.0
1,2016,2,17.8,61,0,25.0
2,2016,3,0.6,37,2,32.0
3,2016,4,30.8,47,5,34.0
4,2016,5,17.4,59,6,34.0


In [8]:
## storing feature values in X and labels in Y

X=dengue_features.values
Y=dengue_labels.values

In [9]:
# ## shuffling data

# rand_col=np.arange(0,len(Y),dtype=np.int)
# np.random.shuffle(rand_col)
# Y=Y[rand_col.astype(int)]
# X=X[rand_col.astype(int)]

In [10]:
## spiltting data into training and testing

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X,Y,test_size = 0.2,random_state = 0) 

In [11]:
print(y_test)

[  12  141  338   72    2  190 1362   10  217   18   10   58 1062  374]


In [12]:
## scaling the data 

scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

### Training The Data and Chosing the Best Hyperparameters

#### 1) K Nearest Neighbours

In [13]:
knn = KNeighborsRegressor(n_neighbors=5)
knn.fit(x_train, y_train)
y_pred = knn.predict(x_test)
print(mean_absolute_error(y_test, y_pred))

120.54285714285717


#### 2) Random Forest

In [14]:
rf = RandomForestRegressor(n_estimators=50)
rf.fit(x_train, y_train)
y_pred1 = rf.predict(x_test)
print(mean_absolute_error(y_test, y_pred1))

129.47


#### 3) Support Vector Machine

In [15]:
clf = SVR(C=1200, tol=1e-3)
clf.fit(x_train, y_train)
y_pred2 = clf.predict(x_test)
print(mean_absolute_error(y_test, y_pred2))

64.64986258167876


### Comparing Predicted and Test data values Trained on SVM

In [16]:
## printing predicted and test data values

print(y_pred2)
print(y_test)

[  42.41296749  139.68107506  271.78165449  142.67924131    5.94500793
  294.48041103 1257.38458983   40.59084101   67.71874453    8.90655512
   17.54350292  187.22313303  871.04612987  380.7417203 ]
[  12  141  338   72    2  190 1362   10  217   18   10   58 1062  374]


In [17]:
## In case of negative values, converting them to 0 to obtain better accuracy

for i in range(0,len(y_pred2)):
    if y_pred2[i]<0:
        y_pred2[i]=0
print(mean_absolute_error(y_test, y_pred2))

64.64986258167876


### Conclusion
We tried a differnet types of models. We tried KNN, Random Forest and SVM. In the end we found that, the data trained on SVM give the best testing result with mean absolute error of 66.7