# IEEE MEGA PROJECT

**Team Name: BetaTech**                           
**Team Leader: Mollika Garg**                           
**Email Id: mollika.garg@gmail.com**

**Team Member: Shreya Sharma**                
**Email Id: shreyasharma.1510001@gmail.com**

**Team Member: Koushiki Chakrabarti**                     
**Email Id: koushikichakrabarti@gmail.com**

### PROJECT DETAILS

**Domain: Machine Learning**                                              
**Project Name: Tackling Dengue Cases**   

### PROJECT DESCRIPTION
Predict dengue cases from climate and determine potential dengue hotspots by detecting stagnant water areas from satellite data. Make ML algorithms predict the number of dengue cases based on climate factors and use thresholding techniques to predict stagnant water hotspots by using satellite data.

In [1]:
##IMPORTS

# used for manipulating directory paths
import os

# used to analyze data
import pandas as pd

# scientific and vector computation for python
import numpy as np

# for image visualisation
from matplotlib import pyplot as plt

# encode target labels 
from sklearn.preprocessing import LabelEncoder

# performs the task of Standardization
from sklearn.preprocessing import StandardScaler

# to find the error
from sklearn.metrics import mean_absolute_error

# used for training SVM
from sklearn.svm import SVR

# used for training KNN
from sklearn.neighbors import KNeighborsRegressor

# used for training Random Forest
from sklearn.ensemble import RandomForestRegressor

### READING DATA

In [3]:
## Read Data

malaria_features = pd.read_excel("C:\\Users\\molli\\OneDrive\\Desktop\\Data Set\\Malaria_Data.xlsx")
malaria_labels= malaria_features["No. of cases"]
malaria_features=malaria_features.drop(labels="No. of cases",axis=1)

In [4]:
## Displaying head of the data

malaria_features.head()

Unnamed: 0,City,Year,Month,Precipitation Amt.(mm),Humidity(%),Avg Temp.(C),Previous Cases
0,New Delhi,2011,Feb,49.9,74,17.0,0
1,New Delhi,2011,Mar,2.3,62,23.0,1
2,New Delhi,2011,Apr,2.2,45,28.0,1
3,New Delhi,2011,May,33.4,45,33.0,5
4,New Delhi,2011,Jun,104.2,63,32.0,7


In [5]:
## Encoding labels

lmap={"Jan":0,"Feb":1,"Mar":2,"Apr":3,"May":4,"Jun":5, "Jul":6, "Aug":7, "Sep":8, "Oct":9, "Nov":10, "Dec":11}
malaria_features["Month "]=malaria_features["Month "].map(lmap)

In [6]:
## Droping the feature 'City'

malaria_features=malaria_features.drop("City", axis=1)

In [7]:
## Displaying head of the data

malaria_features.head()

Unnamed: 0,Year,Month,Precipitation Amt.(mm),Humidity(%),Avg Temp.(C),Previous Cases
0,2011,1,49.9,74,17.0,0
1,2011,2,2.3,62,23.0,1
2,2011,3,2.2,45,28.0,1
3,2011,4,33.4,45,33.0,5
4,2011,5,104.2,63,32.0,7


In [8]:
## storing feature values in X and labels in Y

X=malaria_features.values
Y=malaria_labels.values

In [9]:
## spiltting data into training and testing

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X,Y,test_size = 0.2,random_state = 0) 

In [10]:
print(y_test)

[  4   0   6 214  78   1   3   3   0  28   8   5   3   1  47   3  68   1
   7   1  37   1   5  25]


In [11]:
## scaling the data 

scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

### Training The Data and Chosing the Best Hyperparameters

#### 1) K Nearest Neighbours

In [17]:
knn = KNeighborsRegressor(n_neighbors=4)
knn.fit(x_train, y_train)
y_pred = knn.predict(x_test)
print(mean_absolute_error(y_test, y_pred))

16.770833333333332


#### 2) Random Forest

In [22]:
rf = RandomForestRegressor(n_estimators=200)
rf.fit(x_train, y_train)
y_pred1 = rf.predict(x_test)
print(mean_absolute_error(y_test, y_pred1))

17.485208333333333


#### 3) Support Vector Machine

In [29]:
clf = SVR(C=3000, tol=1e-3)
clf.fit(x_train, y_train)
y_pred2 = clf.predict(x_test)
print(mean_absolute_error(y_test, y_pred2))

17.642641738148946


### Comparing Predicted and Test data values Trained on SVM

In [30]:
## printing predicted and test data values

print(y_pred)
print(y_test)

[  5.5    0.75   0.75  57.25  47.25   8.25   8.     7.25   3.25  10.
  50.25   8.    12.75   0.75  51.25  12.75 120.5   12.75   2.     4.25
  13.75   0.75   3.5   22.  ]
[  4   0   6 214  78   1   3   3   0  28   8   5   3   1  47   3  68   1
   7   1  37   1   5  25]


In [31]:
## In case of negative values, converting them to 0 to obtain better accuracy

for i in range(0,len(y_pred)):
    if y_pred[i]<0:
        y_pred[i]=0
print(mean_absolute_error(y_test, y_pred))

16.770833333333332


### Conclusion
We tried a differnet types of models. We tried KNN, Random Forest and SVM. In the end we found that, the data trained on KNN give the best testing result with mean absolute error of 16.7