# Multi-Model Training Pipeline

This notebook provides an interactive environment to train, tune, and evaluate machine learning models for crop recommendation. It leverages the custom `backend` modules for modularity and `MLflow` for experiment tracking.

In [None]:
import sys
import os

# Ensure the project root is in sys.path to import backend modules
current_dir = os.getcwd()
project_root = os.path.dirname(current_dir)
if project_root not in sys.path:
    sys.path.append(project_root)

print(f"Project Root: {project_root}")

In [None]:
from backend.training.trainer import Trainer
from backend.models.config import MODELS

## 1. Initialize Experiment
We initialize the `Trainer`, which sets up the MLflow experiment and loads the processed data.

In [None]:
trainer = Trainer(experiment_name="AgroSense_Crop_Recommendation")

## 2. Configure Models
Select which models you want to train. You can choose from:
*   `rf`: Random Forest
*   `xgb`: XGBoost
*   `svm`: Support Vector Machine
*   `catboost`: CatBoost
*   `lr`: Logistic Regression
*   `ensemble`: Voting Ensemble (RF + XGB + SVM + CatBoost)
*   `stacking`: Stacking Classifier (Ensemble with Meta-Learner)

In [None]:
models_to_train = ["rf", "xgb", "svm", "catboost", "lr", "ensemble", "stacking"]
# models_to_train = ["rf"] # FAST TRACK: Uncomment for a quick test run

## 3. Run Training & Tuning
Execute the training loop. 

**Hyperparameter Tuning**:
Set `tune=True` to enable `RandomizedSearchCV`. This will search for the best hyperparameters for each model, which takes significantly longer but produces better results.
Set `tune=False` for a quick run using default parameters.

In [None]:
# Set tune=True for optimized results (Recommended for final production build)
# Set tune=False for fast iteration
tune_hyperparameters = True 

results = trainer.run_experiment(models_to_train, tune=tune_hyperparameters)

## 4. Results Summary
Below are the validation metrics for all trained models.

In [None]:
import pandas as pd

metrics_df = pd.DataFrame(results).T
metrics_df.sort_values(by="accuracy", ascending=False, inplace=True)

print("\n--- Model Performance Leaderboard ---")
display(metrics_df)

## 5. Next Steps
- Open **MLflow UI** `mlflow ui` in your terminal to see confusion matrices and detailed logs.
- The improved **Voting Ensemble** or **Stacking** models are recommended for deployment based on their high accuracy.