### Unified Inference Pipeline

We implemented a unified inference pipeline that allows switching between two trained models: **Random Forest** and **XGBoost**. The pipeline takes a list of cleaned Java code snippets (from SmellyCode++) and performs the following steps:

1. **Metric Extraction** – using a static analysis tool.
2. **Feature Scaling** – using the corresponding pre-fitted `StandardScaler`.
3. **Prediction** – using the selected model.
4. **Label Mapping** – converting numeric class labels into meaningful smell names.

The model can be switched by passing `"random_forest"` or `"xgboost"` to the `run_inference` function. This setup ensures flexibility and reproducibility when applying classical models to unseen code.

In [2]:
import pandas as pd
from src.inference.classical_models_predict_smells import run_inference

# Load cleaned code
df_cleaned = pd.read_csv("../data/processed/smellycodepp_cleaned.csv")

# Run inference
# result_rf = run_inference("random_forest", df_cleaned, 100)
result_xgb = run_inference("xgboost", df_cleaned, 100)

# Show first predictions
# result_rf.head()
# result_rf["True_Label"].value_counts()
# result_rf["Predicted_Label"].value_counts()
# result_rf.to_csv('../data/processed/random_forest_inference_result.csv', index=False)


# result_xgb.head()
# result_xgb["True_Label"].value_counts()
result_xgb["Predicted_Label"].value_counts()
# result_xgb.to_csv('../data/processed/xgboost_inference_result.csv', index=False)

Predicted_Label
2    399
4    101
Name: count, dtype: int64