# Example: Using the Basel Rain Model for a Single-Day Prediction

This short notebook shows how the final tuned logistic regression model for Basel can be saved to disk and then used through a simple helper function called `predict_tomorrow_from_features`. The idea is to treat the model like a small "service": given a dictionary of today's Basel weather features (pressure, humidity, temperature, sunshine, lags, and month), we build a one-row table, run the model, and get back both a label (`"Rain"` or `"No Rain"`) and the corresponding probability of rain tomorrow.

This is not meant to be a full deployment; it is just a clear example of how someone could plug the model into another script or app once training is finished.


## 1. Load the Processed Basel Feature Table


In [1]:
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, f1_score
from joblib import dump, load

# path to the processed Basel feature table
PROC_PATH = "../../data/processed/basel_rain_features.csv"

df = pd.read_csv(PROC_PATH)
df["DATE"] = pd.to_datetime(df["DATE"].astype(str), errors="coerce")
df = df.sort_values("DATE").reset_index(drop=True)

print("Loaded processed Basel data:", df.shape)
df.head()


Loaded processed Basel data: (3653, 12)


Unnamed: 0,DATE,MONTH,RainToday,RainTomorrow,BASEL_pressure,BASEL_humidity,BASEL_temp_mean,BASEL_sunshine,BASEL_pressure_lag1,BASEL_humidity_lag1,BASEL_temp_mean_lag1,BASEL_sunshine_lag1
0,2000-01-02,1,0,0,1.0318,0.87,3.6,0.0,1.0286,0.89,2.9,0.0
1,2000-01-03,1,0,1,1.0314,0.81,2.2,3.7,1.0318,0.87,3.6,0.0
2,2000-01-04,1,1,1,1.0262,0.79,3.9,6.9,1.0314,0.81,2.2,3.7
3,2000-01-05,1,1,0,1.0246,0.9,6.0,3.7,1.0262,0.79,3.9,6.9
4,2000-01-06,1,0,0,1.0244,0.85,4.2,5.7,1.0246,0.9,6.0,3.7


## 2. Define Features and Create a Time-Aware Train/Test Split


In [2]:
# Define label and feature columns (same as in the main modeling notebook)
y = df["RainTomorrow"].astype(int)

feature_cols = [
    "MONTH",
    "RainToday",
    "BASEL_pressure",
    "BASEL_humidity",
    "BASEL_temp_mean",
    "BASEL_sunshine",
    "BASEL_pressure_lag1",
    "BASEL_humidity_lag1",
    "BASEL_temp_mean_lag1",
    "BASEL_sunshine_lag1",
]

X = df[feature_cols].copy()

print("Feature columns:", feature_cols)
print("X shape:", X.shape)
print("Label distribution:")
print(y.value_counts(normalize=True).rename("proportion"))

# chronological 80/20 split
n = len(df)
split_idx = int(0.8 * n)

X_train = X.iloc[:split_idx].copy()
y_train = y.iloc[:split_idx].copy()
X_test  = X.iloc[split_idx:].copy()
y_test  = y.iloc[split_idx:].copy()

print("Train shape:", X_train.shape, "Test shape:", X_test.shape)
print("Train dates:", df["DATE"].iloc[0], "→", df["DATE"].iloc[split_idx - 1])
print("Test dates:", df["DATE"].iloc[split_idx], "→", df["DATE"].iloc[-1])


Feature columns: ['MONTH', 'RainToday', 'BASEL_pressure', 'BASEL_humidity', 'BASEL_temp_mean', 'BASEL_sunshine', 'BASEL_pressure_lag1', 'BASEL_humidity_lag1', 'BASEL_temp_mean_lag1', 'BASEL_sunshine_lag1']
X shape: (3653, 10)
Label distribution:
RainTomorrow
0    0.532987
1    0.467013
Name: proportion, dtype: float64
Train shape: (2922, 10) Test shape: (731, 10)
Train dates: 2000-01-02 00:00:00 → 2008-01-01 00:00:00
Test dates: 2008-01-02 00:00:00 → 2010-01-01 00:00:00


## 3. Rebuild the Tuned Logistic Regression Model


In [3]:
# Tuned logistic regression: this matches the final model used in the project
logreg_clf = Pipeline(steps=[
    ("scaler", StandardScaler()),
    ("logreg", LogisticRegression(
        C=3.0,
        class_weight="balanced",
        max_iter=2000,
        solver="lbfgs",
    )),
])

logreg_clf.fit(X_train, y_train)
y_pred = logreg_clf.predict(X_test)

acc = accuracy_score(y_test, y_pred)
f1  = f1_score(y_test, y_pred)

print("=== Final Logistic Regression (rebuilt) ===")
print("Accuracy:", f"{acc:.3f}")
print("F1 (Rain):", f"{f1:.3f}")


=== Final Logistic Regression (rebuilt) ===
Accuracy: 0.668
F1 (Rain): 0.668


## 4. Save the Model Bundle to a `.joblib` File


In [7]:
# Choose a probability threshold (we keep 0.5 here, but you could use 0.4 etc.)
best_threshold = 0.5

export_obj = {
    "model": logreg_clf,             # tuned logistic regression pipeline
    "features": feature_cols,        # exact feature order
    "threshold": best_threshold,     # probability cutoff for "Rain"
    "labels": {0: "No Rain", 1: "Rain"},
}

MODEL_PATH = "/Users/purvigarg/Downloads/CMSE492/cmse492_project/data/processed/basel_rain_model.joblib"
dump(export_obj, MODEL_PATH)
print("Saved model bundle to:", MODEL_PATH)


Saved model bundle to: /Users/purvigarg/Downloads/CMSE492/cmse492_project/data/processed/basel_rain_model.joblib


## 5. Define `predict_tomorrow_from_features`


In [8]:
def predict_tomorrow_from_features(feature_dict, model_path=MODEL_PATH):
    """
    Given a dict of today's Basel features, load the saved model
    and return a rain/no-rain prediction for tomorrow.
    """
    obj = load(model_path)
    model     = obj["model"]
    features  = obj["features"]
    threshold = obj["threshold"]
    labels    = obj["labels"]

    # build a 1-row DataFrame in the exact feature order
    X_new = pd.DataFrame([feature_dict], columns=features)

    # probability that tomorrow = rain
    prob_rain = float(model.predict_proba(X_new)[:, 1][0])

    # apply threshold
    pred_int = int(prob_rain >= threshold)
    pred_label = labels[pred_int]

    return {
        "prediction": pred_label,
        "prob_rain": prob_rain,
        "threshold": threshold,
    }


## 6. Example: Predicting Rain Tomorrow from a Single Day of Weather


In [9]:
# Example "today" values for Basel (these are made up for illustration)
example_today = {
    "MONTH": 4,                 # April
    "RainToday": 1,             # it rained today
    "BASEL_pressure": 1.010,
    "BASEL_humidity": 0.75,
    "BASEL_temp_mean": 12.0,
    "BASEL_sunshine": 5.0,
    "BASEL_pressure_lag1": 1.003,
    "BASEL_humidity_lag1": 0.80,
    "BASEL_temp_mean_lag1": 10.5,
    "BASEL_sunshine_lag1": 2.0,
}

result = predict_tomorrow_from_features(example_today)
result


{'prediction': 'Rain', 'prob_rain': 0.7100450926205591, 'threshold': 0.5}

## Conclusion

This notebook shows one possible way to wrap the final Basel rain model into a small, reusable function. By saving the tuned logistic regression pipeline together with its feature order, labels, and chosen probability threshold into a single `.joblib` file, I can later reload it and call `predict_tomorrow_from_features` with a simple Python dictionary of “today’s” Basel weather values. The function then returns both a human-readable label (`"Rain"` or `"No Rain"`) and the underlying probability of rain tomorrow. This example is not required for the core CMSE project, but it helps illustrate how the trained model could be plugged into another script or application.

