# 07_Testing_Inference

# 1. Introduction

This notebook validates the inference pipeline implemented in `src/inference/predict.py`.  
The goal is to ensure that the PredictionPipeline behaves consistently, robustly, and 
faithfully reproduces the model’s predictions under different input scenarios.

The following aspects will be tested:

1. Fidelity with the original model predictions  
2. Dictionary input compatibility  
3. DataFrame batch predictions  
4. JSON input validation  
5. Handling of extra (unused) columns  
6. Missing feature detection  
7. Automatic column ordering  
8. Stability with extreme values  
9. Scaler integrity  
10. Large-batch scalability  

A fully tested inference pipeline is critical for real-world deployment and future API integration.

In [10]:
import sys
sys.path.append("..")

from src.inference.predict import PredictionPipeline
import pandas as pd
import numpy as np
import json

pipeline = PredictionPipeline()

# Load preprocessed test data
X_test = pd.read_csv("../data/processed/X_test_preprocessed.csv")
y_test = pd.read_csv("../data/processed/y_test.csv").squeeze()

sample = X_test.iloc[0].to_dict()   # Reference sample

## 2. Test 1 — Fidelity with Original Model Predictions

This test ensures that the PredictionPipeline produces the same output as the 
model evaluated during Step 6. Any mismatch here would indicate inconsistencies 
in preprocessing, feature ordering, or model loading.


In [11]:
# Prediction using the pipeline
pred_pipeline = pipeline.predict_single(sample)["prediction"]

# Prediction using the model directly (Step 6 reproduction)
# Must load the model bundle manually for this comparison
import joblib
bundle = joblib.load("../src/models/final_model.pkl")
model = bundle["model"]
features = bundle["features"]

pred_original = model.predict(X_test[features].iloc[[0]])[0]

pred_pipeline, pred_original

(0    0
 Name: prediction, dtype: int64,
 0)

## 3. Test 2 — Dictionary Input

Inference should work smoothly when a single sample is provided as a Python dictionary.


In [13]:
pipeline.predict_single(sample)

Unnamed: 0,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave_points_mean,symmetry_mean,fractal_dimension_mean,...,fractal_dimension_worst,radius_avg,perimeter_avg,area_avg,concavity_avg,texture_avg,var_total,prediction,prediction_label,probability_malignant
0,-0.770899,-2.006025,-0.764517,-0.71184,-0.158315,-0.6868,-0.662486,-0.574754,-0.535821,-0.227607,...,0.060958,-0.808355,-0.774616,-0.66735,-0.500561,-1.671472,0.280026,0,Benign,0.000335


## 4. Test 3 — DataFrame Batch Prediction

The pipeline should handle multiple samples at once using a Pandas DataFrame.


In [14]:
pipeline.predict_batch(X_test.head(5))

Unnamed: 0,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave_points_mean,symmetry_mean,fractal_dimension_mean,...,fractal_dimension_worst,radius_avg,perimeter_avg,area_avg,concavity_avg,texture_avg,var_total,prediction,prediction_label,probability_malignant
0,-0.770899,-2.006025,-0.764517,-0.71184,-0.158315,-0.6868,-0.662486,-0.574754,-0.535821,-0.227607,...,0.060958,-0.808355,-0.774616,-0.66735,-0.500561,-1.671472,0.280026,0,Benign,0.000335
1,1.894726,0.966489,1.890816,1.956605,0.329304,1.054116,2.230361,2.068127,1.412336,-0.536385,...,-0.310419,1.941636,1.735012,1.96005,2.074827,0.142361,0.883464,1,Malignant,0.999902
2,0.560515,-0.781088,0.57044,0.358094,0.196381,0.742144,-0.277843,0.125003,0.633797,0.431598,...,0.301747,-0.043233,-0.00429,-0.079158,-0.324498,-1.072063,0.273316,0,Benign,0.093691
3,0.071025,0.082201,0.091703,-0.047437,0.106832,0.190621,0.05754,0.2365,0.231854,0.170214,...,0.518292,-0.117726,-0.037602,-0.152099,0.101326,0.681905,0.128028,1,Malignant,0.599948
4,-0.219873,2.637069,-0.237986,-0.284366,-0.247863,-0.549198,-0.747404,-0.413058,-1.593184,-0.366916,...,-0.406294,-0.242851,-0.286566,-0.285952,-0.825198,2.167691,0.865732,0,Benign,0.019918


## 5. Test 4 — JSON Input Support

Most deployment environments send data as JSON.  
This test validates that JSON input can be loaded and processed correctly.


In [15]:
# Save sample temporarily
with open("sample.json", "w") as f:
    json.dump(sample, f)

loaded = json.load(open("sample.json"))
pipeline.predict_single(loaded)

Unnamed: 0,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave_points_mean,symmetry_mean,fractal_dimension_mean,...,fractal_dimension_worst,radius_avg,perimeter_avg,area_avg,concavity_avg,texture_avg,var_total,prediction,prediction_label,probability_malignant
0,-0.770899,-2.006025,-0.764517,-0.71184,-0.158315,-0.6868,-0.662486,-0.574754,-0.535821,-0.227607,...,0.060958,-0.808355,-0.774616,-0.66735,-0.500561,-1.671472,0.280026,0,Benign,0.000335


## 6. Test 5 — Extra Columns Handling

The pipeline should ignore any columns that are not part of the training feature set.


In [17]:
df_extra = X_test.head(5).copy()
df_extra["random_noise"] = 12345

pipeline.predict_batch(df_extra)


Unnamed: 0,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave_points_mean,symmetry_mean,fractal_dimension_mean,...,radius_avg,perimeter_avg,area_avg,concavity_avg,texture_avg,var_total,random_noise,prediction,prediction_label,probability_malignant
0,-0.770899,-2.006025,-0.764517,-0.71184,-0.158315,-0.6868,-0.662486,-0.574754,-0.535821,-0.227607,...,-0.808355,-0.774616,-0.66735,-0.500561,-1.671472,0.280026,12345,0,Benign,0.000335
1,1.894726,0.966489,1.890816,1.956605,0.329304,1.054116,2.230361,2.068127,1.412336,-0.536385,...,1.941636,1.735012,1.96005,2.074827,0.142361,0.883464,12345,1,Malignant,0.999902
2,0.560515,-0.781088,0.57044,0.358094,0.196381,0.742144,-0.277843,0.125003,0.633797,0.431598,...,-0.043233,-0.00429,-0.079158,-0.324498,-1.072063,0.273316,12345,0,Benign,0.093691
3,0.071025,0.082201,0.091703,-0.047437,0.106832,0.190621,0.05754,0.2365,0.231854,0.170214,...,-0.117726,-0.037602,-0.152099,0.101326,0.681905,0.128028,12345,1,Malignant,0.599948
4,-0.219873,2.637069,-0.237986,-0.284366,-0.247863,-0.549198,-0.747404,-0.413058,-1.593184,-0.366916,...,-0.242851,-0.286566,-0.285952,-0.825198,2.167691,0.865732,12345,0,Benign,0.019918


## 7. Test 6 — Missing Feature Detection

When required features are missing, the pipeline must raise a clear and explicit error.


In [19]:
sample_missing = X_test.iloc[0].drop(labels=[pipeline.features[0]]).to_dict()

try:
    pipeline.predict_single(sample_missing)
except Exception as e:
    print("Error caught :", e)

Error caught : Missing required features: ['concave_points_mean']


## 8. Test 7 — Automatic Column Ordering

Input data may arrive in an arbitrary order.  
The pipeline must reorder columns internally before prediction.


In [20]:
sample_shuffled = dict(reversed(list(sample.items())))
pipeline.predict_single(sample_shuffled)

Unnamed: 0,var_total,texture_avg,concavity_avg,area_avg,perimeter_avg,radius_avg,fractal_dimension_worst,symmetry_worst,concave_points_worst,concavity_worst,...,concavity_mean,compactness_mean,smoothness_mean,area_mean,perimeter_mean,texture_mean,radius_mean,prediction,prediction_label,probability_malignant
0,0.280026,-1.671472,-0.500561,-0.66735,-0.774616,-0.808355,0.060958,0.162547,-0.388057,-0.306687,...,-0.662486,-0.6868,-0.158315,-0.71184,-0.764517,-2.006025,-0.770899,0,Benign,0.000335


## 9. Test 8 — Extreme Values Stability

The pipeline must not break when receiving extremely large or abnormal values.  
This simulates unexpected inputs that may occur in real-world systems.


In [21]:
weird_sample = {f: 99999 for f in pipeline.features}
pipeline.predict_single(weird_sample)

Unnamed: 0,concave_points_mean,concavity_worst,symmetry_worst,radius_avg,perimeter_avg,area_avg,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,symmetry_mean,fractal_dimension_mean,prediction,prediction_label,probability_malignant
0,99999,99999,99999,99999,99999,99999,99999,99999,99999,99999,99999,99999,99999,99999,99999,1,Malignant,0.999946


## 10. Test 9 — Scaler Integrity

Even though data is not re-scaled during inference, the scaler must still load correctly.


In [22]:
pipeline.scaler.mean_.shape, pipeline.scaler.scale_.shape

((30,), (30,))

## 11. Test 10 — Large Batch Scalability

Ensures the pipeline can process large volumes of data without issues.


In [24]:
df_big = pd.concat([X_test]*50, ignore_index=True)
pipeline.predict_batch(df_big).head()

Unnamed: 0,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave_points_mean,symmetry_mean,fractal_dimension_mean,...,fractal_dimension_worst,radius_avg,perimeter_avg,area_avg,concavity_avg,texture_avg,var_total,prediction,prediction_label,probability_malignant
0,-0.770899,-2.006025,-0.764517,-0.71184,-0.158315,-0.6868,-0.662486,-0.574754,-0.535821,-0.227607,...,0.060958,-0.808355,-0.774616,-0.66735,-0.500561,-1.671472,0.280026,0,Benign,0.000335
1,1.894726,0.966489,1.890816,1.956605,0.329304,1.054116,2.230361,2.068127,1.412336,-0.536385,...,-0.310419,1.941636,1.735012,1.96005,2.074827,0.142361,0.883464,1,Malignant,0.999902
2,0.560515,-0.781088,0.57044,0.358094,0.196381,0.742144,-0.277843,0.125003,0.633797,0.431598,...,0.301747,-0.043233,-0.00429,-0.079158,-0.324498,-1.072063,0.273316,0,Benign,0.093691
3,0.071025,0.082201,0.091703,-0.047437,0.106832,0.190621,0.05754,0.2365,0.231854,0.170214,...,0.518292,-0.117726,-0.037602,-0.152099,0.101326,0.681905,0.128028,1,Malignant,0.599948
4,-0.219873,2.637069,-0.237986,-0.284366,-0.247863,-0.549198,-0.747404,-0.413058,-1.593184,-0.366916,...,-0.406294,-0.242851,-0.286566,-0.285952,-0.825198,2.167691,0.865732,0,Benign,0.019918


# 12. Final Notes

All tests above confirm whether the inference pipeline is:

- accurate (fidelity test)  
- stable  
- robust to malformed inputs  
- scalable  
- production-ready  

If every test passes successfully, the inference module is fully validated and suitable for 
future API deployment or integration into automated ML pipelines.