# **Rain Prediction in Australia using Classification**

Algorithms that will be used in this project:

1. Linear Regression
2. KNN
3. Decision Trees
4. Logistic Regression
5. SVM

Evaluation methods that will be used in this project:

1.  Accuracy Score
2.  Jaccard Index
3.  F1-Score
4.  LogLoss
5.  Mean Absolute Error
6.  Mean Squared Error
7.  R2-Score

## About The Dataset


The original source of the data is Australian Government's Bureau of Meteorology and the latest data can be gathered from [http://www.bom.gov.au/climate/dwo/](http://www.bom.gov.au/climate/dwo/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkML0101ENSkillsNetwork20718538-2022-01-01).

The dataset to be used has extra columns like 'RainToday' and our target is 'RainTomorrow', which was gathered from the Rattle at [https://bitbucket.org/kayontoga/rattle/src/master/data/weatherAUS.RData](https://bitbucket.org/kayontoga/rattle/src/master/data/weatherAUS.RData?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkML0101ENSkillsNetwork20718538-2022-01-01)




This dataset contains observations of weather metrics for each day from 2008 to 2017. The **weatherAUS.csv** dataset includes the following fields:

| Field         | Description                                           | Unit            | Type   |
| ------------- | ----------------------------------------------------- | --------------- | ------ |
| Date          | Date of the Observation in YYYY-MM-DD                 | Date            | object |
| Location      | Location of the Observation                           | Location        | object |
| MinTemp       | Minimum temperature                                   | Celsius         | float  |
| MaxTemp       | Maximum temperature                                   | Celsius         | float  |
| Rainfall      | Amount of rainfall                                    | Millimeters     | float  |
| Evaporation   | Amount of evaporation                                 | Millimeters     | float  |
| Sunshine      | Amount of bright sunshine                             | hours           | float  |
| WindGustDir   | Direction of the strongest gust                       | Compass Points  | object |
| WindGustSpeed | Speed of the strongest gust                           | Kilometers/Hour | object |
| WindDir9am    | Wind direction averaged of 10 minutes prior to 9am    | Compass Points  | object |
| WindDir3pm    | Wind direction averaged of 10 minutes prior to 3pm    | Compass Points  | object |
| WindSpeed9am  | Wind speed averaged of 10 minutes prior to 9am        | Kilometers/Hour | float  |
| WindSpeed3pm  | Wind speed averaged of 10 minutes prior to 3pm        | Kilometers/Hour | float  |
| Humidity9am   | Humidity at 9am                                       | Percent         | float  |
| Humidity3pm   | Humidity at 3pm                                       | Percent         | float  |
| Pressure9am   | Atmospheric pressure reduced to mean sea level at 9am | Hectopascal     | float  |
| Pressure3pm   | Atmospheric pressure reduced to mean sea level at 3pm | Hectopascal     | float  |
| Cloud9am      | Fraction of the sky obscured by cloud at 9am          | Eights          | float  |
| Cloud3pm      | Fraction of the sky obscured by cloud at 3pm          | Eights          | float  |
| Temp9am       | Temperature at 9am                                    | Celsius         | float  |
| Temp3pm       | Temperature at 3pm                                    | Celsius         | float  |
| RainToday     | If there was rain today                               | Yes/No          | object |
| RainTomorrow  | If there is rain tomorrow                             | Yes/No          | float  |

Column definitions were gathered from [http://www.bom.gov.au/climate/dwo/IDCJDW0000.shtml](http://www.bom.gov.au/climate/dwo/IDCJDW0000.shtml?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkML0101ENSkillsNetwork20718538-2022-01-01)



## Import libraries


In [1]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression
from sklearn import preprocessing
import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn import svm
from sklearn.metrics import jaccard_score
from sklearn.metrics import f1_score
from sklearn.metrics import log_loss
from sklearn.metrics import confusion_matrix, accuracy_score
import sklearn.metrics as metrics

### Importing the Dataset


In [20]:
df = pd.read_csv('/Users/rafihidayat/Desktop/python learning/ML with Python/Weather_Data.csv')
pd.options.display.max_columns = 100
df.head()

Unnamed: 0,Date,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,WindDir9am,WindDir3pm,WindSpeed9am,WindSpeed3pm,Humidity9am,Humidity3pm,Pressure9am,Pressure3pm,Cloud9am,Cloud3pm,Temp9am,Temp3pm,RainToday,RainTomorrow
0,2/1/2008,19.5,22.4,15.6,6.2,0.0,W,41,S,SSW,17,20,92,84,1017.6,1017.4,8,8,20.7,20.9,Yes,Yes
1,2/2/2008,19.5,25.6,6.0,3.4,2.7,W,41,W,E,9,13,83,73,1017.9,1016.4,7,7,22.4,24.8,Yes,Yes
2,2/3/2008,21.6,24.5,6.6,2.4,0.1,W,41,ESE,ESE,17,2,88,86,1016.7,1015.6,7,8,23.5,23.0,Yes,Yes
3,2/4/2008,20.2,22.8,18.8,2.2,0.0,W,41,NNE,E,22,20,83,90,1014.2,1011.8,8,8,21.4,20.9,Yes,Yes
4,2/5/2008,19.7,25.7,77.4,4.8,0.0,W,41,NNE,W,11,6,88,74,1008.3,1004.8,8,8,22.5,25.5,Yes,Yes


### Data Preprocessing


#### One Hot Encoding


Hot encoding to convert categorical variables to binary variables.


In [21]:
df_processed = pd.get_dummies(data=df, columns=['RainToday', 'WindGustDir', 'WindDir9am', 'WindDir3pm'])
df_processed

Unnamed: 0,Date,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustSpeed,WindSpeed9am,WindSpeed3pm,Humidity9am,Humidity3pm,Pressure9am,Pressure3pm,Cloud9am,Cloud3pm,Temp9am,Temp3pm,RainTomorrow,RainToday_No,RainToday_Yes,WindGustDir_E,WindGustDir_ENE,WindGustDir_ESE,WindGustDir_N,WindGustDir_NE,WindGustDir_NNE,WindGustDir_NNW,WindGustDir_NW,WindGustDir_S,WindGustDir_SE,WindGustDir_SSE,WindGustDir_SSW,WindGustDir_SW,WindGustDir_W,WindGustDir_WNW,WindGustDir_WSW,WindDir9am_E,WindDir9am_ENE,WindDir9am_ESE,WindDir9am_N,WindDir9am_NE,WindDir9am_NNE,WindDir9am_NNW,WindDir9am_NW,WindDir9am_S,WindDir9am_SE,WindDir9am_SSE,WindDir9am_SSW,WindDir9am_SW,WindDir9am_W,WindDir9am_WNW,WindDir9am_WSW,WindDir3pm_E,WindDir3pm_ENE,WindDir3pm_ESE,WindDir3pm_N,WindDir3pm_NE,WindDir3pm_NNE,WindDir3pm_NNW,WindDir3pm_NW,WindDir3pm_S,WindDir3pm_SE,WindDir3pm_SSE,WindDir3pm_SSW,WindDir3pm_SW,WindDir3pm_W,WindDir3pm_WNW,WindDir3pm_WSW
0,2/1/2008,19.5,22.4,15.6,6.2,0.0,41,17,20,92,84,1017.6,1017.4,8,8,20.7,20.9,Yes,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False
1,2/2/2008,19.5,25.6,6.0,3.4,2.7,41,9,13,83,73,1017.9,1016.4,7,7,22.4,24.8,Yes,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
2,2/3/2008,21.6,24.5,6.6,2.4,0.1,41,17,2,88,86,1016.7,1015.6,7,8,23.5,23.0,Yes,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False
3,2/4/2008,20.2,22.8,18.8,2.2,0.0,41,22,20,83,90,1014.2,1011.8,8,8,21.4,20.9,Yes,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
4,2/5/2008,19.7,25.7,77.4,4.8,0.0,41,11,6,88,74,1008.3,1004.8,8,8,22.5,25.5,Yes,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3266,6/21/2017,8.6,19.6,0.0,2.0,7.8,37,22,20,73,52,1025.9,1025.3,2,2,10.5,17.9,No,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False
3267,6/22/2017,9.3,19.2,0.0,2.0,9.2,30,20,7,78,53,1028.5,1024.6,2,2,11.0,18.7,No,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False
3268,6/23/2017,9.4,17.7,0.0,2.4,2.7,24,15,13,85,56,1020.8,1015.0,6,6,10.2,17.3,No,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False
3269,6/24/2017,10.1,19.3,0.0,1.4,9.3,43,17,19,56,35,1017.3,1015.1,5,2,12.4,19.0,No,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False


Replace the values of the 'RainTomorrow' column changing them from a categorical column to a binary column


In [22]:
df_processed.replace(['No', 'Yes'], [0,1], inplace=True)
df_processed

Unnamed: 0,Date,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustSpeed,WindSpeed9am,WindSpeed3pm,Humidity9am,Humidity3pm,Pressure9am,Pressure3pm,Cloud9am,Cloud3pm,Temp9am,Temp3pm,RainTomorrow,RainToday_No,RainToday_Yes,WindGustDir_E,WindGustDir_ENE,WindGustDir_ESE,WindGustDir_N,WindGustDir_NE,WindGustDir_NNE,WindGustDir_NNW,WindGustDir_NW,WindGustDir_S,WindGustDir_SE,WindGustDir_SSE,WindGustDir_SSW,WindGustDir_SW,WindGustDir_W,WindGustDir_WNW,WindGustDir_WSW,WindDir9am_E,WindDir9am_ENE,WindDir9am_ESE,WindDir9am_N,WindDir9am_NE,WindDir9am_NNE,WindDir9am_NNW,WindDir9am_NW,WindDir9am_S,WindDir9am_SE,WindDir9am_SSE,WindDir9am_SSW,WindDir9am_SW,WindDir9am_W,WindDir9am_WNW,WindDir9am_WSW,WindDir3pm_E,WindDir3pm_ENE,WindDir3pm_ESE,WindDir3pm_N,WindDir3pm_NE,WindDir3pm_NNE,WindDir3pm_NNW,WindDir3pm_NW,WindDir3pm_S,WindDir3pm_SE,WindDir3pm_SSE,WindDir3pm_SSW,WindDir3pm_SW,WindDir3pm_W,WindDir3pm_WNW,WindDir3pm_WSW
0,2/1/2008,19.5,22.4,15.6,6.2,0.0,41,17,20,92,84,1017.6,1017.4,8,8,20.7,20.9,1,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False
1,2/2/2008,19.5,25.6,6.0,3.4,2.7,41,9,13,83,73,1017.9,1016.4,7,7,22.4,24.8,1,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
2,2/3/2008,21.6,24.5,6.6,2.4,0.1,41,17,2,88,86,1016.7,1015.6,7,8,23.5,23.0,1,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False
3,2/4/2008,20.2,22.8,18.8,2.2,0.0,41,22,20,83,90,1014.2,1011.8,8,8,21.4,20.9,1,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
4,2/5/2008,19.7,25.7,77.4,4.8,0.0,41,11,6,88,74,1008.3,1004.8,8,8,22.5,25.5,1,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3266,6/21/2017,8.6,19.6,0.0,2.0,7.8,37,22,20,73,52,1025.9,1025.3,2,2,10.5,17.9,0,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False
3267,6/22/2017,9.3,19.2,0.0,2.0,9.2,30,20,7,78,53,1028.5,1024.6,2,2,11.0,18.7,0,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False
3268,6/23/2017,9.4,17.7,0.0,2.4,2.7,24,15,13,85,56,1020.8,1015.0,6,6,10.2,17.3,0,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False
3269,6/24/2017,10.1,19.3,0.0,1.4,9.3,43,17,19,56,35,1017.3,1015.1,5,2,12.4,19.0,0,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False


### Training Data and Test Data


In [23]:
df_processed.drop('Date',axis=1,inplace=True)

In [24]:
df_processed = df_processed.astype(float)

In [25]:
X = df_processed.drop(columns='RainTomorrow', axis=1)
y = df_processed['RainTomorrow']

### Linear Regression


Split the X and y to train and test

In [27]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 10)

Create and train a Linear Regression model called LinearReg using the training data (`X_train`, `y_train`)


In [30]:
from sklearn import linear_model
LinearReg = linear_model.LinearRegression()
LinearReg.fit(X_train, y_train)

Use the `predict` method on the testing data (`X_test`) and save it to the array `prediction`.


In [35]:
prediction = LinearReg.predict(X_test)

Use the `prediction` and the `y_test` dataframe calculate the value for each metric using the appropriate function.


In [55]:
from sklearn.metrics import r2_score
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error

In [56]:
LinearRegression_MAE = mean_absolute_error(y_test, prediction)
LinearRegression_MSE = mean_squared_error(y_test, prediction)
LinearRegression_R2 = r2_score(y_test, prediction)

In [57]:
Evaluation = pd.DataFrame({
    'Metric': ['Mean Absolute Error', 'Mean Squared Error', 'R² Score'],
    'Value': [LinearRegression_MAE, LinearRegression_MSE, LinearRegression_R2]
})
Evaluation

Unnamed: 0,Metric,Value
0,Mean Absolute Error,0.256319
1,Mean Squared Error,0.11572
2,R² Score,0.427136


### KNN


Create and train a KNN model called KNN using the training data (`X_train`, `y_train`) with the `n_neighbors` parameter set to `4`.

In [59]:
KNN = KNeighborsClassifier(n_neighbors = 4).fit(X_train,y_train)
KNN

Use the `predict` method on the testing data (`X_test`) and save it to the array `prediction`


Using the `prediction` and the `y_test` dataframe calculate the value for each metric using the appropriate function.


In [61]:
KNN_Accuracy_Score = metrics.accuracy_score(y_test, prediction)
KNN_JaccardIndex = metrics.jaccard_score(y_test, prediction)
KNN_F1_Score = metrics.f1_score(y_test, prediction)

In [62]:
Evaluation = pd.DataFrame({
    'Metric': ['Accuracy Score', 'Jaccard Score', 'F1 Score'],
    'Value': [KNN_Accuracy_Score, KNN_JaccardIndex, KNN_F1_Score]
})
Evaluation

Unnamed: 0,Metric,Value
0,Accuracy Score,0.818321
1,Jaccard Score,0.425121
2,F1 Score,0.59661


### Decision Tree


Create and train a Decision Tree model called Tree using the training data (`X_train`, `y_train`).

In [65]:
Tree = DecisionTreeClassifier(criterion = 'entropy', max_depth = 4)
Tree.fit(X_train, y_train)

Use the `predict` method on the testing data (`X_test`) and save it to the array `prediction`.

In [66]:
prediction = Tree.predict(X_test)

Using the `prediction` and the `y_test` dataframe calculate the value for each metric using the appropriate function.

In [67]:
Tree_Accuracy_Score = metrics.accuracy_score(y_test, prediction)
Tree_JaccardIndex = metrics.jaccard_score(y_test, prediction)
Tree_F1_Score = metrics.f1_score(y_test, prediction)

In [68]:
Evaluation = pd.DataFrame({
    'Metric': ['Accuracy Score', 'Jaccard Score', 'F1 Score'],
    'Value': [Tree_Accuracy_Score, Tree_JaccardIndex, Tree_F1_Score]
})
Evaluation

Unnamed: 0,Metric,Value
0,Accuracy Score,0.818321
1,Jaccard Score,0.480349
2,F1 Score,0.648968


### Logistic Regression


Use the `train_test_split` function to split the `features` and `Y` dataframes with a `test_size` of `0.2` and the `random_state` set to `1`.


In [69]:
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size = 0.2, random_state = 1)

Create and train a LogisticRegression model called LR using the training data (`X_train`, `y_train`) with the `solver` parameter set to `liblinear`.


In [70]:
LR = LogisticRegression(solver = 'liblinear').fit(X_train, y_train)
LR

Use the `predict` and `predict_proba` methods on the testing data (`X_test`) and save it as 2 arrays `prediction` and `predict_proba`.


In [72]:
prediction = LR.predict(X_test)

In [73]:
predict_proba = LR.predict_proba(X_test)

Using the `prediction`, `predict_proba` and the `y_test` dataframe calculate the value for each metric using the appropriate function.


In [75]:
LR_Accuracy_Score = accuracy_score(y_test, prediction)
LR_JaccardIndex = jaccard_score(y_test, prediction)
LR_F1_Score = f1_score(y_test, prediction)
LR_Log_Loss = log_loss(y_test, predict_proba)

In [76]:
Evaluation = pd.DataFrame({
    'Metric': ['Accuracy Score', 'Jaccard Score', 'F1 Score','Log Loss'],
    'Value': [LR_Accuracy_Score, LR_JaccardIndex, LR_F1_Score, LR_Log_Loss]
})
Evaluation

Unnamed: 0,Metric,Value
0,Accuracy Score,0.836641
1,Jaccard Score,0.506912
2,F1 Score,0.672783
3,Log Loss,0.379477


### SVM


Create and train a SVM model called SVM using the training data (`X_train`, `y_train`).


In [77]:
SVM = svm.SVC(kernel = 'rbf')
SVM.fit(X_train, y_train)

Use the `predict` method on the testing data (`X_test`) and save it to the array `prediction`.


In [78]:
prediction = SVM.predict(X_test)

Using the `prediction` and the `y_test` dataframe calculate the value for each metric using the appropriate function.


In [79]:
SVM_Accuracy_Score = accuracy_score(y_test, prediction)
SVM_JaccardIndex = jaccard_score(y_test, prediction)
SVM_F1_Score = f1_score(y_test, prediction)

In [80]:
Evaluation = pd.DataFrame({
    'Metric': ['Accuracy Score', 'Jaccard Score', 'F1 Score'],
    'Value': [LR_Accuracy_Score, LR_JaccardIndex, LR_F1_Score]
})
Evaluation

Unnamed: 0,Metric,Value
0,Accuracy Score,0.836641
1,Jaccard Score,0.506912
2,F1 Score,0.672783
