# Creating a New Imputer Model

This notebook demonstrates how to create a new imputation model by extending the `Imputer` and `ImputerResults` abstract base classes in MicroImpute.

## 1. Understanding the MicroImpute Architecture

MicroImpute uses a two-class architecture for imputation models:

1. **Imputer**: The base model class that handles model initialization and fitting
2. **ImputerResults**: Represents a fitted model and handles prediction

This separation provides a clean distinction between the model definition and the fitted model instance, similar to statsmodels' approach.

In [None]:
import pandas as pd
from typing import Dict, List, Optional, Any
from pydantic import validate_call

from us_imputation_benchmarking.models.imputer import Imputer, ImputerResults
from us_imputation_benchmarking.config import VALIDATE_CONFIG

## 2. Implementing a Model Results Class

First, we need to implement the `ImputerResults` subclass that will represent our fitted model and handle predictions. Let's create a model-specific imputer results class:

In [None]:
class NewModelResults(ImputerResults):
    """
    Fitted Model imputer ready for prediction.
    """

    def __init__(
        self,
        predictors: List[str],
        imputed_variables: List[str],
        **kwargs: Dict[str, Any],
    ) -> None:
        """
        Initialize the NewModelResults parameter.

        Args:
            predictors: List of predictor variable names
            imputed_variables: List of imputed variable names
            **kwargs: Additional keyword arguments for model parameters
        """
        super().__init__(predictors, imputed_variables)
        # Add any additional model specific parameters here

    # You may choose to validate your model parameters with pydantic
    @validate_call(config=VALIDATE_CONFIG)
    def _predict(
        self, X_test: pd.DataFrame, quantiles: Optional[List[float]] = None
    ) -> Dict[float, pd.DataFrame]:
        """
        Predict imputed values at specified quantiles.

        Args:
            X_test: DataFrame containing the test data
            quantiles: List of quantiles to predict. If None, predicts at median

        Returns:
            Dictionary mapping quantiles to DataFrames with predicted values

        Raises:
            RuntimeError: If prediction fails
        """
        try:
            # Implement model specific prediction functionality...

            return

        except Exception as e:
            self.logger.error(f"Error during Model prediction: {str(e)}")
            raise RuntimeError(
                f"Failed to predict with Model: {str(e)}"
            ) from e

## 3. Implementing the Main Model Class

Next, let's implement the main `Imputer` subclass that will handle model initialization and fitting:

In [None]:
class NewModel(Imputer):
    """
    Imputation model to be fitted.
    """

    def __init__(self) -> None:
        """Initialize the model parameters."""
        super().__init__()

    @validate_call(config=VALIDATE_CONFIG)
    def _fit(
        self,
        X_train: pd.DataFrame,
        predictors: List[str],
        imputed_variables: List[str],
        **kwargs: Any,
    ) -> NewModelResults:
        """
        Fit the Model on training data.

        Args:
            X_train: DataFrame containing training data
            predictors: List of predictor variable names
            imputed_variables: List of variable names to impute
            **kwargs: Additional arguments passed specific to Model

        Returns:
            NewModelResults instance with the fitted model

        Raises:
            RuntimeError: If model fitting fails
        """
        try:
            # Implement model specific training functionality...

            # Return the results object with fitted models
            return NewModelResults(
                predictors=predictors,
                imputed_variables=imputed_variables,
                **kwargs,  # Pass any additional model parameters here
            )

        except Exception as e:
            self.logger.error(f"Error fitting Model: {str(e)}")
            raise RuntimeError(f"Failed to fit Model: {str(e)}") from e

## 4. Testing the New Model

You can test the functionality of your newly implemented `NewModel` imputer model with a simple example using the Iris dataset:

In [None]:
from sklearn.datasets import load_iris
from us_imputation_benchmarking.comparisons.data import preprocess_data

# Load the Iris dataset
iris = load_iris()
iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)

# Define predictors and variables to impute
predictors = ["sepal length (cm)", "sepal width (cm)", "petal length (cm)"]
imputed_variables = ["petal width (cm)"]

# Filter the data
data = iris_df[predictors + imputed_variables]

# Split into train and test
X_train, X_test = preprocess_data(data)

# Initialize our new model
new_imputer = NewModel()

# Fit the model
fitted_model = new_imputer.fit(
    X_train,
    predictors,
    imputed_variables,
)

# Make predictions at different quantiles
test_quantiles = [0.1, 0.5, 0.9]
predictions = fitted_model.predict(X_test, test_quantiles)

# Print sample predictions
for q in test_quantiles:
    print(f"\nPredictions at {q} quantile:")
    print(predictions[q].head())

## 5. Integrating with the Benchmarking Framework

The new `NewModel` model is then ready to be integrated into the MicroImpute benchmarking framework. Here's how you would compare it with other models:

In [None]:
from us_imputation_benchmarking.models import OLS, QRF
from us_imputation_benchmarking.comparisons import (
    get_imputations,
    compare_quantile_loss,
)
from us_imputation_benchmarking.comparisons.plot import plot_loss_comparison

# Define models to compare
model_classes = [NewModel, OLS, QRF]

# Get test data for evaluation
Y_test = X_test[imputed_variables]

# Get imputations from all models
method_imputations = get_imputations(
    model_classes, X_train, X_test, predictors, imputed_variables
)

# Compare quantile loss
loss_comparison_df = compare_quantile_loss(Y_test, method_imputations)

# Plot the comparison
plot_loss_comparison(loss_comparison_df)

## 6. Best Practices for Implementing New Models

When implementing a new imputation model for MicroImpute, follow these best practices:

### Architecture

1. **Create two classes**:
   - An `Imputer` subclass for model definition and fitting
   - An `ImputerResults` subclass for the fitted model and prediction

2. **Implement required methods**:
   - `_fit()` in your `Imputer` subclass
   - `_predict()` in your `ImputerResults` subclass

### Error Handling

1. **Use thorough error handling**:
   - Wrap model fitting and prediction in try/except blocks
   - Provide informative error messages
   - Use proper error types (ValueError, RuntimeError)

2. **Log important information**:
   - Use self.logger for consistent logging
   - Log progress, parameters, and errors

### Parameters and Validation

1. **Use proper type hints**:
   - Add type hints to all methods and parameters
   - Use `validate_call` decorator on model methods (with VALIDATE_CONFIG)

2. **Support model-specific parameters**:
   - Design your `_fit()` and `_predict()` methods to accept model-specific parameters that will ensure their correct functionality
   - Document parameters clearly in docstrings

### Documentation

1. **Add comprehensive docstrings**:
   - Include class-level docstrings explaining the model's approach
   - Document all methods with proper Args, Returns, and Raises sections
   - Provide examples if helpful

2. **Create unit tests**:
   - Test both basic interface compliance and model-specific functionality