# Rain Prediction in Australia üå¶Ô∏è
## [Kaggle Main Reference](https://www.kaggle.com/code/ahmedbaalash/rain-prediction-in-australia-using-ml-91)
#### Courtesy to Ahmed Baalash

Key variables include:

- Date ‚Äì Observation date
- Location ‚Äì Weather station
- MinTemp / MaxTemp ‚Äì Minimum and maximum temperature
- Rainfall ‚Äì Amount of rain in mm
- Wind & Humidity ‚Äì Recorded at 9am and 3pm
- RainToday / RainTomorrow ‚Äì Binary indicators for rain occurrence

This notebook aims to:

1. Clean and preprocess the data:
    - Handle missing values
    - Convert categorical data to numerical formats
    - Normalize or scale relevant features
1. Explore the dataset:
    - Perform univariate and multivariate analysis
    - Visualize distributions, trends, and correlations
1. Model the data:
    - Predict the target variable: `RainTomorrow`
    - Train and evaluate classification models (e.g., Logistic Regression, Random Forest, etc.)
1. Evaluate model performance using:
    - Accuracy, Precision, Recall, F1-Score, ROC-AUC
  
#### Target Variable
The main target for prediction is:
- `RainTomorrow`: whether it will rain the next day (Yes or No)

This is a binary classification problem, ideal for testing and comparing machine learning models on imbalanced real-world data.

In [1]:
import os
import numpy as np
import pandas as pd

#### Data Loading & Initial Cleaning

In [2]:
data_df = pd.read_csv(os.path.join('data', 'weatherAUS.csv'))

In [3]:
data_df.head()

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,WindDir9am,...,Humidity9am,Humidity3pm,Pressure9am,Pressure3pm,Cloud9am,Cloud3pm,Temp9am,Temp3pm,RainToday,RainTomorrow
0,2008-12-01,Albury,13.4,22.9,0.6,,,W,44.0,W,...,71.0,22.0,1007.7,1007.1,8.0,,16.9,21.8,No,No
1,2008-12-02,Albury,7.4,25.1,0.0,,,WNW,44.0,NNW,...,44.0,25.0,1010.6,1007.8,,,17.2,24.3,No,No
2,2008-12-03,Albury,12.9,25.7,0.0,,,WSW,46.0,W,...,38.0,30.0,1007.6,1008.7,,2.0,21.0,23.2,No,No
3,2008-12-04,Albury,9.2,28.0,0.0,,,NE,24.0,SE,...,45.0,16.0,1017.6,1012.8,,,18.1,26.5,No,No
4,2008-12-05,Albury,17.5,32.3,1.0,,,W,41.0,ENE,...,82.0,33.0,1010.8,1006.0,7.0,8.0,17.8,29.7,No,No


In [4]:
data_df.columns

Index(['Date', 'Location', 'MinTemp', 'MaxTemp', 'Rainfall', 'Evaporation',
       'Sunshine', 'WindGustDir', 'WindGustSpeed', 'WindDir9am', 'WindDir3pm',
       'WindSpeed9am', 'WindSpeed3pm', 'Humidity9am', 'Humidity3pm',
       'Pressure9am', 'Pressure3pm', 'Cloud9am', 'Cloud3pm', 'Temp9am',
       'Temp3pm', 'RainToday', 'RainTomorrow'],
      dtype='object')