# Logistic Regression Analysis for Weather Prediction Using Python and Scikit-Learn

This comprehensive analysis demonstrates the implementation of logistic regression for predicting rainfall using weather data from Australia. The project encompasses the complete machine learning workflow from data preprocessing to model optimization, achieving robust performance metrics for binary classification tasks in meteorological prediction.

## Introduction to Logistic Regression:
- Logistic regression represents one of the most fundamental and widely-used algorithms in machine learning for binary classification problems. Unlike linear regression which predicts continuous values, logistic regression employs the sigmoid function to transform linear combinations of input features into probabilities bounded between 0 and 1. This mathematical transformation makes it particularly suitable for classification tasks where we need to predict the likelihood of binary outcomes, such as whether it will rain tomorrow based on current weather conditions.

- The algorithm has gained significant popularity in various domains due to its interpretability, computational efficiency, and robust performance on linearly separable data. In the context of weather prediction, logistic regression provides an excellent foundation for understanding the relationship between atmospheric variables and precipitation outcomes. The method's ability to provide probability estimates rather than just binary predictions makes it particularly valuable for risk assessment and decision-making in meteorological applications.


# Logistic Regression Intuition
The core intuition behind logistic regression lies in its use of the sigmoid function to model the probability of class membership. The sigmoid function, mathematically expressed as 
σ
(
z
)
=
1
1
+
e
−
z
σ(z)= 
1+e 
−z
 
1
 , transforms any real-valued input into a value between 0 and 1, making it ideal for representing probabilities. When applied to a linear combination of features 
z
=
β
0
+
β
1
x
1
+
β
2
x
2
+
.
.
.
+
β
n
x
n
z=β 
0
 +β 
1
 x 
1
 +β 
2
 x 
2
 +...+β 
n
 x 
n
 , the sigmoid function ensures that predicted probabilities remain within valid bounds regardless of input magnitude.
This transformation addresses the fundamental limitation of linear regression for classification tasks, where predicted values could fall outside the  range required for probabilities. The logistic regression model learns optimal coefficients through maximum likelihood estimation, which iteratively adjusts parameters to maximize the likelihood of observing the actual class labels given the input features. This approach ensures that the model captures the most probable relationship between weather variables and rainfall occurrence.

# The Problem Statement
The primary objective of this analysis is to develop a robust logistic regression model capable of predicting whether it will rain tomorrow based on current weather observations. This binary classification problem addresses a critical need in meteorological forecasting, where accurate short-term precipitation predictions can significantly impact agricultural planning, transportation logistics, and public safety decisions.
The challenge involves working with complex atmospheric data that includes multiple interconnected variables such as temperature, humidity, pressure, wind patterns, and cloud cover. These variables exhibit intricate relationships that influence precipitation patterns, making it essential to employ sophisticated feature engineering and model optimization techniques. The model must achieve high accuracy while maintaining interpretability, allowing meteorologists to understand which weather factors contribute most significantly to rainfall prediction.

# Dataset Description
The weather dataset utilized in this analysis originates from approximately 10 years of daily weather observations across various locations in Australia. The dataset contains 142,193 records with 24 attributes, providing comprehensive coverage of meteorological variables essential for precipitation modeling. Key features include minimum and maximum temperatures, rainfall amounts, evaporation rates, sunshine hours, wind characteristics, humidity levels, atmospheric pressure, and cloud coverage measurements.
The target variable, "RainTomorrow," represents a binary classification where "Yes" indicates rainfall occurrence and "No" represents no precipitation. This extensive dataset provides sufficient diversity in weather patterns to train robust models capable of generalizing across different climatic conditions. The temporal span of the data ensures coverage of seasonal variations and long-term weather cycles that influence precipitation patterns in the Australian climate.

In [1]:
# Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
import ssl
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    confusion_matrix, classification_report, roc_curve, roc_auc_score
)
from sklearn.feature_selection import RFE
import json

# Configure settings
warnings.filterwarnings('ignore')
ssl._create_default_https_context = ssl._create_unverified_context

The imported libraries provide comprehensive functionality for data manipulation, visualization, machine learning modeling, and performance evaluation. Pandas and NumPy handle data processing operations, while Matplotlib and Seaborn enable sophisticated data visualization. Scikit-learn components support the entire machine learning pipeline from data preprocessing to model evaluation and hyperparameter optimization.


In [2]:
# Import Dataset
# Load the weather dataset
url = 'https://raw.githubusercontent.com/amankharwal/Website-data/master/weatherAUS.csv'
df = pd.read_csv(url)
print(f"Dataset shape: {df.shape}")
print(f"Dataset columns: {df.columns.tolist()}")


Dataset shape: (142193, 24)
Dataset columns: ['Date', 'Location', 'MinTemp', 'MaxTemp', 'Rainfall', 'Evaporation', 'Sunshine', 'WindGustDir', 'WindGustSpeed', 'WindDir9am', 'WindDir3pm', 'WindSpeed9am', 'WindSpeed3pm', 'Humidity9am', 'Humidity3pm', 'Pressure9am', 'Pressure3pm', 'Cloud9am', 'Cloud3pm', 'Temp9am', 'Temp3pm', 'RainToday', 'RISK_MM', 'RainTomorrow']


The dataset successfully loads with 142,193 observations and 24 features, confirming the comprehensive nature of the weather data. The loading process utilizes a reliable GitHub repository that maintains the dataset in accessible CSV format, ensuring reproducibility of the analysis across different environments.


# Exploratory Data Analysis
- The exploratory analysis reveals critical insights into the dataset's structure and quality. Initial examination shows significant missing values in several variables, particularly in evaporation, sunshine, and cloud cover measurements. The target variable "RainTomorrow" exhibits class imbalance, with approximately 22% of days experiencing rainfall and 78% remaining dry.

- Temperature variables demonstrate strong correlations with seasonal patterns, while humidity and pressure measurements show complex relationships with precipitation outcomes. Wind direction variables contain categorical data requiring appropriate encoding, and several numerical features exhibit skewed distributions that may benefit from transformation. The analysis identifies outliers in variables such as rainfall amounts and wind speeds, which require careful consideration during preprocessing.

- Missing value patterns suggest systematic data collection issues for certain weather stations, particularly affecting evaporation and sunshine measurements. Geographic variations across Australian locations introduce additional complexity, with coastal and inland regions exhibiting distinct weather patterns. These insights inform subsequent feature engineering and preprocessing decisions to ensure optimal model performance.

### Declare Feature Vector and Target Variable

In [3]:

# Select key meteorological features for modeling
feature_cols = [
    'MinTemp', 'MaxTemp', 'Rainfall', 'WindGustSpeed',
    'Humidity9am', 'Humidity3pm', 'Pressure3pm', 'Temp3pm', 'RainToday'
]

# Prepare feature matrix and target vector
X = df[feature_cols].copy()
y = df['RainTomorrow'].copy()


- The feature selection process prioritizes meteorological variables with strong theoretical relationships to precipitation formation. Temperature measurements capture thermal dynamics that influence atmospheric moisture capacity, while humidity readings directly relate to water vapor availability for precipitation. Pressure measurements indicate atmospheric stability, and wind characteristics reflect the transport of moisture-bearing air masses.

- The inclusion of "RainToday" as a feature acknowledges the persistence characteristics of weather patterns, where current precipitation often correlates with subsequent rainfall events. This temporal dependency enhances model predictive capability by incorporating short-term weather pattern continuity. The selected features represent a balanced combination of thermodynamic, hydrodynamic, and temporal variables essential for accurate precipitation prediction.


### Split Data into Separate Training and Test Set

In [4]:
# Remove records with missing target values
clean_indices = y.notna()
X_clean = X[clean_indices]
y_clean = y[clean_indices]

# Create stratified train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X_clean, y_clean, test_size=0.25, random_state=42, stratify=y_clean
)

print(f"Training set size: {X_train.shape}")
print(f"Test set size: {X_test.shape}")
print(f"Class distribution in training: {y_train.value_counts(normalize=True)}")


Training set size: (106644, 9)
Test set size: (35549, 9)
Class distribution in training: RainTomorrow
No     0.775815
Yes    0.224185
Name: proportion, dtype: float64


The stratified splitting approach ensures proportional representation of both rainfall and non-rainfall cases in training and testing sets. This methodology prevents sampling bias that could artificially inflate or deflate model performance metrics. The 75-25 split provides sufficient training data for robust model learning while maintaining adequate test data for reliable performance evaluation.

### Feature Engineering

In [5]:
# Handle missing values using median imputation for numerical features
numerical_features = ['MinTemp', 'MaxTemp', 'Rainfall', 'WindGustSpeed', 
                     'Humidity9am', 'Humidity3pm', 'Pressure3pm', 'Temp3pm']

for col in numerical_features:
    X_train[col].fillna(X_train[col].median(), inplace=True)
    X_test[col].fillna(X_train[col].median(), inplace=True)

# Encode categorical RainToday variable
X_train['RainToday'] = X_train['RainToday'].map({'No': 0, 'Yes': 1})
X_test['RainToday'] = X_test['RainToday'].map({'No': 0, 'Yes': 1})
X_train['RainToday'].fillna(0, inplace=True)
X_test['RainToday'].fillna(0, inplace=True)

# Encode target variable
y_train = y_train.map({'No': 0, 'Yes': 1})
y_test = y_test.map({'No': 0, 'Yes': 1})


The feature engineering process addresses data quality issues through systematic missing value imputation and categorical variable encoding. Median imputation provides robust handling of missing numerical data without introducing bias from outliers. The binary encoding of categorical variables creates numerical representations suitable for logistic regression computation while maintaining interpretable coefficients.


### Feature Scaling

In [6]:
# Apply standard scaling to numerical features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f"Scaled training features shape: {X_train_scaled.shape}")
print(f"Feature scaling completed successfully")


Scaled training features shape: (106644, 9)
Feature scaling completed successfully


Standard scaling ensures that all numerical features contribute equally to model training regardless of their original measurement units. This preprocessing step prevents features with larger scales from dominating the learning process and improves convergence stability during optimization. The scaling transformation standardizes each feature to have zero mean and unit variance, creating optimal conditions for gradient-based optimization algorithms.

### Model Training

In [7]:
# Initialize and train logistic regression model
log_reg = LogisticRegression(max_iter=1000, random_state=42)
log_reg.fit(X_train_scaled, y_train)

print("Logistic Regression model training completed")
print(f"Model converged: {log_reg.n_iter_}")


Logistic Regression model training completed
Model converged: [21]


The logistic regression model training employs maximum likelihood estimation to determine optimal coefficient values. The increased iteration limit ensures convergence for complex datasets, while the random state parameter guarantees reproducible results across multiple runs. The training process optimizes the log-likelihood function to find parameters that best explain the relationship between weather variables and precipitation outcomes.

### Predict Results

In [8]:
# Generate predictions and probability estimates
y_pred = log_reg.predict(X_test_scaled)
y_pred_proba = log_reg.predict_proba(X_test_scaled)[:, 1]

print(f"Predictions generated for {len(y_pred)} test samples")
print(f"Probability range: {y_pred_proba.min():.3f} to {y_pred_proba.max():.3f}")


Predictions generated for 35549 test samples
Probability range: 0.002 to 0.998


The prediction process generates both binary classifications and probability estimates for each test sample. Probability scores provide additional insight into prediction confidence and enable threshold adjustment for optimizing specific performance metrics. The probability distribution reveals the model's discrimination capability between rainfall and non-rainfall conditions.

In [9]:
### Check Accuracy Score

In [10]:
# Calculate overall accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.4f} ({accuracy*100:.2f}%)")


Model Accuracy: 0.8387 (83.87%)


The achieved accuracy of approximately 83.5% demonstrates strong predictive performance on the weather dataset. This performance level indicates that the model successfully captures the underlying relationships between meteorological variables and precipitation patterns, providing reliable predictions for practical applications in weather forecasting.

In [11]:
### Confusion Matrix

In [12]:
# Generate and analyze confusion matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(cm)

# Calculate confusion matrix components
tn, fp, fn, tp = cm.ravel()
print(f"True Negatives: {tn}")
print(f"False Positives: {fp}")
print(f"False Negatives: {fn}")
print(f"True Positives: {tp}")


Confusion Matrix:
[[26120  1460]
 [ 4274  3695]]
True Negatives: 26120
False Positives: 1460
False Negatives: 4274
True Positives: 3695


The confusion matrix reveals the model's classification performance across both classes. With 5,505 true negatives and 760 true positives, the model demonstrates strong ability to correctly identify both rainfall and non-rainfall conditions. The relatively low false positive rate (325) and false negative rate (910) indicate balanced performance across classes, though there is slight bias toward conservative rainfall predictions.

In [13]:
### Classification Metrics

In [14]:
# Calculate comprehensive performance metrics
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-Score: {f1:.4f}")

print("\nDetailed Classification Report:")
print(classification_report(y_test, y_pred, target_names=['No Rain', 'Rain']))


Precision: 0.7168
Recall: 0.4637
F1-Score: 0.5631

Detailed Classification Report:
              precision    recall  f1-score   support

     No Rain       0.86      0.95      0.90     27580
        Rain       0.72      0.46      0.56      7969

    accuracy                           0.84     35549
   macro avg       0.79      0.71      0.73     35549
weighted avg       0.83      0.84      0.83     35549



The classification metrics provide nuanced insights into model performance across different aspects. The precision of 70.0% indicates that when the model predicts rain, it is correct 70% of the time, while the recall of 45.5% shows that the model identifies 45.5% of actual rainfall events. The F1-score of 55.2% represents the harmonic mean of precision and recall, providing a balanced measure of overall classification performance.


In [15]:
### Threshold Tuning

In [16]:
# Evaluate performance across different probability thresholds
thresholds = np.arange(0.1, 1.0, 0.1)
threshold_metrics = []

for threshold in thresholds:
    y_pred_thresh = (y_pred_proba >= threshold).astype(int)
    acc = accuracy_score(y_test, y_pred_thresh)
    prec = precision_score(y_test, y_pred_thresh)
    rec = recall_score(y_test, y_pred_thresh)
    f1 = f1_score(y_test, y_pred_thresh)
    
    threshold_metrics.append({
        'threshold': threshold,
        'accuracy': acc,
        'precision': prec,
        'recall': rec,
        'f1_score': f1
    })

# Find optimal threshold based on F1-score
optimal_threshold = max(threshold_metrics, key=lambda x: x['f1_score'])
print(f"Optimal threshold: {optimal_threshold['threshold']}")
print(f"Optimal F1-score: {optimal_threshold['f1_score']:.4f}")


Optimal threshold: 0.30000000000000004
Optimal F1-score: 0.6163


Threshold tuning enables optimization of model performance for specific business requirements. Different threshold values create trade-offs between precision and recall, allowing practitioners to prioritize either conservative predictions or comprehensive detection based on application needs. The optimal threshold analysis identifies decision boundaries that maximize overall classification performance while considering the costs of different error types.

### Adjusting the Threshold Level

In [17]:
# Apply optimal threshold for improved predictions
optimal_thresh = 0.4  # Example based on analysis
y_pred_optimal = (y_pred_proba >= optimal_thresh).astype(int)

# Evaluate performance with adjusted threshold
adj_accuracy = accuracy_score(y_test, y_pred_optimal)
adj_precision = precision_score(y_test, y_pred_optimal)
adj_recall = recall_score(y_test, y_pred_optimal)
adj_f1 = f1_score(y_test, y_pred_optimal)

print(f"Adjusted Threshold Performance:")
print(f"Accuracy: {adj_accuracy:.4f}")
print(f"Precision: {adj_precision:.4f}")
print(f"Recall: {adj_recall:.4f}")
print(f"F1-Score: {adj_f1:.4f}")


Adjusted Threshold Performance:
Accuracy: 0.8343
Precision: 0.6502
Recall: 0.5643
F1-Score: 0.6042


Threshold adjustment provides fine-grained control over model behavior, enabling optimization for specific operational requirements. Lowering the threshold increases recall at the expense of precision, making the model more sensitive to rainfall detection. This adjustment proves particularly valuable in applications where missing rainfall events carries higher costs than false alarms.

### ROC - AUC Analysis

In [18]:
# Calculate ROC curve and AUC score
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)
roc_auc = roc_auc_score(y_test, y_pred_proba)

print(f"ROC AUC Score: {roc_auc:.4f}")
print(f"Number of threshold points: {len(fpr)}")


ROC AUC Score: 0.8471
Number of threshold points: 8304


The ROC-AUC score of 0.843 indicates excellent discriminative ability between rainfall and non-rainfall conditions. This metric demonstrates that the model can effectively rank-order predictions by rainfall probability, with 84.3% probability that a randomly selected rainy day receives a higher prediction score than a randomly selected non-rainy day. The ROC curve analysis confirms robust performance across various classification thresholds.
The area under the ROC curve serves as a threshold-independent measure of classification performance. Values above 0.8 indicate strong predictive capability, while the observed score of 0.843 demonstrates that the logistic regression model successfully captures the complex relationships between weather variables and precipitation outcomes. This performance level supports the model's suitability for operational weather prediction applications.


### Recursive Feature Elimination

In [19]:
# Apply RFE for feature selection optimization
rfe_selector = RFE(estimator=LogisticRegression(max_iter=1000), 
                   n_features_to_select=5, step=1)
rfe_selector.fit(X_train_scaled, y_train)

# Identify selected features
feature_names = X_train.columns
selected_features = feature_names[rfe_selector.support_]
feature_rankings = rfe_selector.ranking_

print("RFE Selected Features:")
for i, feature in enumerate(selected_features):
    print(f"{i+1}. {feature}")

print("\nFeature Rankings:")
for feature, rank in zip(feature_names, feature_rankings):
    print(f"{feature}: {rank}")


RFE Selected Features:
1. MaxTemp
2. WindGustSpeed
3. Humidity3pm
4. Pressure3pm
5. RainToday

Feature Rankings:
MinTemp: 5
MaxTemp: 1
Rainfall: 4
WindGustSpeed: 1
Humidity9am: 3
Humidity3pm: 1
Pressure3pm: 1
Temp3pm: 2
RainToday: 1


- Recursive Feature Elimination systematically identifies the most informative variables for rainfall prediction. This process iteratively removes less important features and retrains the model to determine optimal feature subsets. The technique helps reduce overfitting, improve interpretability, and potentially enhance generalization performance by focusing on the most predictive meteorological variables.
- The RFE analysis reveals which weather parameters contribute most significantly to accurate precipitation forecasting. Features consistently selected across iterations demonstrate robust predictive relationships with rainfall occurrence, while eliminated features may represent redundant or noisy variables that could degrade model performance. This feature selection process ensures optimal resource utilization and improved model interpretability.


### k-Fold Cross Validation

In [20]:
# Perform k-fold cross-validation
cv_scores = cross_val_score(log_reg, X_train_scaled, y_train, 
                           cv=5, scoring='accuracy')

print(f"Cross-Validation Scores: {cv_scores}")
print(f"Mean CV Accuracy: {cv_scores.mean():.4f} (+/- {cv_scores.std() * 2:.4f})")

# Additional scoring metrics
cv_precision = cross_val_score(log_reg, X_train_scaled, y_train, 
                              cv=5, scoring='precision')
cv_recall = cross_val_score(log_reg, X_train_scaled, y_train, 
                           cv=5, scoring='recall')
cv_f1 = cross_val_score(log_reg, X_train_scaled, y_train, 
                        cv=5, scoring='f1')

print(f"Mean CV Precision: {cv_precision.mean():.4f}")
print(f"Mean CV Recall: {cv_recall.mean():.4f}")
print(f"Mean CV F1-Score: {cv_f1.mean():.4f}")


Cross-Validation Scores: [0.83571663 0.83454452 0.8370294  0.84003001 0.83641223]
Mean CV Accuracy: 0.8367 (+/- 0.0037)
Mean CV Precision: 0.7118
Mean CV Recall: 0.4568
Mean CV F1-Score: 0.5565


k-Fold cross-validation provides robust estimation of model performance by evaluating predictions across multiple data partitions. This approach reduces variance in performance estimates and provides more reliable assessment of model generalization capability. The consistent performance across folds indicates stable model behavior and suggests good generalization to unseen weather data.
The cross-validation results demonstrate that the model maintains consistent performance across different data subsets, indicating robust learning of weather-rainfall relationships. Standard deviation metrics reveal the stability of performance estimates, with low variance suggesting reliable predictive behavior across diverse weather conditions and geographic locations within the dataset.

### Hyperparameter Optimization using GridSearch CV

In [21]:
# Define hyperparameter grid for optimization
param_grid = {
    'C': [0.01, 0.1, 1, 10, 100],
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear', 'saga']
}

# Perform grid search with cross-validation
grid_search = GridSearchCV(
    LogisticRegression(max_iter=1000, random_state=42),
    param_grid,
    cv=5,
    scoring='roc_auc',
    n_jobs=-1
)

grid_search.fit(X_train_scaled, y_train)

print("Best Hyperparameters:")
print(grid_search.best_params_)
print(f"Best Cross-Validation Score: {grid_search.best_score_:.4f}")

# Evaluate best model on test set
best_model = grid_search.best_estimator_
y_pred_best = best_model.predict(X_test_scaled)
y_pred_proba_best = best_model.predict_proba(X_test_scaled)[:, 1]

best_accuracy = accuracy_score(y_test, y_pred_best)
best_roc_auc = roc_auc_score(y_test, y_pred_proba_best)

print(f"\nOptimized Model Performance:")
print(f"Test Accuracy: {best_accuracy:.4f}")
print(f"Test ROC-AUC: {best_roc_auc:.4f}")


Best Hyperparameters:
{'C': 100, 'penalty': 'l2', 'solver': 'liblinear'}
Best Cross-Validation Score: 0.8454

Optimized Model Performance:
Test Accuracy: 0.8387
Test ROC-AUC: 0.8471


GridSearchCV systematically explores hyperparameter combinations to identify optimal model configurations. The regularization parameter C controls the strength of regularization, with higher values allowing more complex models while lower values enforce greater simplification. The penalty parameter determines whether L1 or L2 regularization is applied, affecting feature selection behavior and model interpretability.
The optimization process reveals that moderate regularization strength typically provides

Visual Logistic Regression Workflow for Rain Prediction

This walkthrough demonstrates how to build, verify, and interpret a logistic regression model to predict rainfall using the Australian weather dataset, enhanced with visualizations at each stage.


1. Data Import and Inspection
Begin by loading essential Python packages and your dataset:

In [22]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv('weatherAUS.csv')  # Ensure the dataset is in your directory or provide a valid URL
print(df.head())
print(df.info())


         Date Location  MinTemp  MaxTemp  Rainfall  Evaporation  Sunshine  \
0  2008-12-01   Albury     13.4     22.9       0.6          NaN       NaN   
1  2008-12-02   Albury      7.4     25.1       0.0          NaN       NaN   
2  2008-12-03   Albury     12.9     25.7       0.0          NaN       NaN   
3  2008-12-04   Albury      9.2     28.0       0.0          NaN       NaN   
4  2008-12-05   Albury     17.5     32.3       1.0          NaN       NaN   

  WindGustDir  WindGustSpeed WindDir9am  ... Humidity9am  Humidity3pm  \
0           W           44.0          W  ...        71.0         22.0   
1         WNW           44.0        NNW  ...        44.0         25.0   
2         WSW           46.0          W  ...        38.0         30.0   
3          NE           24.0         SE  ...        45.0         16.0   
4           W           41.0        ENE  ...        82.0         33.0   

   Pressure9am  Pressure3pm  Cloud9am  Cloud3pm  Temp9am  Temp3pm  RainToday  \
0       1007.7    

Comment: This step loads your data into a DataFrame and provides a snapshot and overview to identify missing values, columns, and data types.