# Instructions


In this notebook, you will  practice all the classification algorithms that we have learned in this course.


Below, is where we are going to use the classification algorithms to create a model based on our training data and evaluate our testing data using evaluation metrics learned in the course.

We will use some of the algorithms taught in the course, specifically:

1. Linear Regression
2. KNN
3. Decision Trees
4. Logistic Regression
5. SVM

We will evaluate our models using:

1.  Accuracy Score
2.  Jaccard Index
3.  F1-Score
4.  LogLoss
5.  Mean Absolute Error
6.  Mean Squared Error
7.  R2-Score

Finally, you will use your models to generate the report at the end.


# About The Dataset


The original source of the data is Australian Government's Bureau of Meteorology and the latest data can be gathered from [http://www.bom.gov.au/climate/dwo/](http://www.bom.gov.au/climate/dwo/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkML0101ENSkillsNetwork20718538-2022-01-01).

The dataset to be used has extra columns like 'RainToday' and our target is 'RainTomorrow', which was gathered from the Rattle at [https://bitbucket.org/kayontoga/rattle/src/master/data/weatherAUS.RData](https://bitbucket.org/kayontoga/rattle/src/master/data/weatherAUS.RData?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkML0101ENSkillsNetwork20718538-2022-01-01)




## **Import the required libraries**


In [None]:
# Surpress warnings:
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn

In [None]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression
from sklearn import preprocessing
import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn import svm
from sklearn.metrics import jaccard_score
from sklearn.metrics import f1_score
from sklearn.metrics import log_loss
from sklearn.metrics import confusion_matrix, accuracy_score
import sklearn.metrics as metrics
from sklearn.metrics import r2_score
from sklearn.metrics import accuracy_score

### Importing the Dataset


In [None]:
df = pd.read_csv("Weather_Data.csv")
df.head()

Unnamed: 0,Date,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,WindDir9am,WindDir3pm,...,Humidity9am,Humidity3pm,Pressure9am,Pressure3pm,Cloud9am,Cloud3pm,Temp9am,Temp3pm,RainToday,RainTomorrow
0,2/1/2008,19.5,22.4,15.6,6.2,0.0,W,41,S,SSW,...,92,84,1017.6,1017.4,8,8,20.7,20.9,Yes,Yes
1,2/2/2008,19.5,25.6,6.0,3.4,2.7,W,41,W,E,...,83,73,1017.9,1016.4,7,7,22.4,24.8,Yes,Yes
2,2/3/2008,21.6,24.5,6.6,2.4,0.1,W,41,ESE,ESE,...,88,86,1016.7,1015.6,7,8,23.5,23.0,Yes,Yes
3,2/4/2008,20.2,22.8,18.8,2.2,0.0,W,41,NNE,E,...,83,90,1014.2,1011.8,8,8,21.4,20.9,Yes,Yes
4,2/5/2008,19.7,25.7,77.4,4.8,0.0,W,41,NNE,W,...,88,74,1008.3,1004.8,8,8,22.5,25.5,Yes,Yes


### Training Data and Test Data


Now, we set our 'features' or x values and our Y or target variable.


In [None]:
df_sydney_processed.drop('Date',axis=1,inplace=True)

In [None]:
df_sydney_processed = df_sydney_processed.astype(float)

In [None]:
features = df_sydney_processed.drop(columns='RainTomorrow', axis=1)
Y = df_sydney_processed['RainTomorrow']

### Linear Regression


In [None]:
x_train, x_test, y_train, y_test = train_test_split(features,Y,test_size=0.2,random_state=10)

In [None]:
LinearReg = LinearRegression().fit(x_train,y_train)

In [None]:
predictions = LinearReg.predict(x_test)

In [None]:
LinearRegression_MAE = np.mean(abs(predictions - y_test))
LinearRegression_MSE = np.mean((predictions - y_test) ** 2)
LinearRegression_R2 = r2_score(predictions,y_test)

In [None]:
Report = {"Metrics":["MAE","MSE","R2"],"Result":
    [LinearRegression_MAE,LinearRegression_MSE,LinearRegression_R2]}
pd.DataFrame(Report)

Unnamed: 0,Metrics,Result
0,MAE,0.256309
1,MSE,0.115719
2,R2,-0.384252


### KNN


In [None]:
KNN = KNeighborsClassifier(n_neighbors=4).fit(x_train, y_train)

In [None]:
predictions = KNN.predict(x_test)

In [None]:
KNN_Accuracy_Score = accuracy_score(predictions,y_test)
KNN_JaccardIndex = jaccard_score(predictions,y_test)
KNN_F1_Score = f1_score(predictions,y_test)

### Decision Tree


In [None]:
Tree = DecisionTreeClassifier().fit(x_train, y_train)

In [None]:
predictions = Tree.predict(x_test)

In [None]:
Tree_Accuracy_Score = accuracy_score(predictions,y_test)
Tree_JaccardIndex = jaccard_score(predictions,y_test)
Tree_F1_Score = f1_score(predictions,y_test)

### Logistic Regression


In [None]:
x_train, x_test, y_train, y_test = train_test_split(features,Y,test_size=0.2,random_state=1)

In [None]:
LR =LogisticRegression(solver='liblinear').fit(x_train, y_train)

In [None]:
predictions = LR.predict(x_test)

In [None]:
predict_proba = LR.predict_proba(x_test)

In [None]:
LR_Accuracy_Score = accuracy_score(predictions,y_test)
LR_JaccardIndex = jaccard_score(predictions,y_test)
LR_F1_Score = f1_score(predictions, y_test)
LR_Log_Loss = log_loss(predictions, y_test)

### SVM


In [None]:
SVM =svm.SVC().fit(x_train, y_train)

In [None]:
predictions =SVM.predict(x_test)

In [None]:
SVM_Accuracy_Score =accuracy_score(predictions,y_test)
SVM_JaccardIndex =jaccard_score(predictions,y_test)
SVM_F1_Score =f1_score(predictions,y_test)

### Report


#### Q19) Show the Accuracy,Jaccard Index,F1-Score and LogLoss in a tabular format using data frame for all of the above models.

\*LogLoss is only for Logistic Regression Model


In [None]:
Report ={"Metrics":["Accuracy_Score","JaccardIndex","F1_Score","log loss"],"Result DT":
    [Tree_Accuracy_Score ,Tree_JaccardIndex,Tree_F1_Score,'-'],"Result KNN":[KNN_Accuracy_Score ,KNN_JaccardIndex,KNN_F1_Score,'-'],"Recsult LinearRegression":
    [LinearRegression_MAE,LinearRegression_MSE,LinearRegression_R2,'-'],"Result SVM":[SVM_Accuracy_Score,SVM_JaccardIndex,SVM_F1_Score,'-'],"Result LR":
    [LR_Accuracy_Score,LR_JaccardIndex,LR_F1_Score,LR_Log_Loss]}
pd.DataFrame(Report)

Unnamed: 0,Metrics,Result DT,Result KNN,Recsult LinearRegression,Result SVM,Result LR
0,Accuracy_Score,0.755725,0.818321,0.256309,0.722137,0.836641
1,JaccardIndex,0.402985,0.425121,0.115719,0.0,0.509174
2,F1_Score,0.574468,0.59661,-0.384252,0.0,0.674772
3,log loss,-,-,-,-,5.888047
