### Ensuring Feature Consistency Between Training & InferencePipelines:

**Task 1**: Consistent Feature Preparation
- Step 1: Write a function for data preprocessing and imputation shared by both training and inference pipelines.
- Step 2: Demonstrate consistent application on both datasets.

In [1]:
# write your code from here
import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler

class PreprocessingPipeline:
    def __init__(self):
        self.imputer = SimpleImputer(strategy='median')
        self.scaler = StandardScaler()
        self.columns = None

    def fit(self, df: pd.DataFrame):
        # Store column order and fit transformers
        self.columns = df.columns
        self.imputer.fit(df)
        imputed = self.imputer.transform(df)
        self.scaler.fit(imputed)

    def transform(self, df: pd.DataFrame) -> pd.DataFrame:
        if self.columns is not None:
            df = df[self.columns]  # ensure consistent column order
        imputed = self.imputer.transform(df)
        scaled = self.scaler.transform(imputed)
        return pd.DataFrame(scaled, columns=self.columns)

    def fit_transform(self, df: pd.DataFrame) -> pd.DataFrame:
        self.fit(df)
        return self.transform(df)
# Simulated training and inference datasets
train_data = pd.DataFrame({
    'feature1': [1, 2, np.nan, 4],
    'feature2': [10, 15, 10, np.nan]
})

inference_data = pd.DataFrame({
    'feature1': [3, np.nan],
    'feature2': [12, 11]
})

# Initialize and apply the preprocessing pipeline
pipeline = PreprocessingPipeline()

# Fit on training data
train_processed = pipeline.fit_transform(train_data)

# Use the same pipeline on inference data
inference_processed = pipeline.transform(inference_data)

print("Processed Training Data:\n", train_processed)
print("\nProcessed Inference Data:\n", inference_processed)


Processed Training Data:
    feature1  feature2
0 -1.147079 -0.577350
1 -0.229416  1.732051
2 -0.229416 -0.577350
3  1.605910 -0.577350

Processed Inference Data:
    feature1  feature2
0  0.688247   0.34641
1 -0.229416  -0.11547


**Task 2**: Pipeline Integration
- Step 1: Use sklearn pipelines to encapsulate the preprocessing steps.
- Step 2: Configure identical pipelines for both training and building inference models.

In [2]:
# write your code from here

**Task 3**: Saving and Loading Preprocessing Models
- Step 1: Save the transformation model after fitting it to the training data.
- Step 2: Load and apply the saved model during inference.

In [3]:
# write your code from here