# NexusML Migration Example

This notebook demonstrates how to migrate from the old NexusML architecture to the new refactored architecture. It shows how to load models trained with the old architecture, convert them to the new architecture, and make predictions with both architectures.

## Setup

First, let's import the necessary modules and set up the environment.

In [None]:
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pickle
import time

# Add the project root to the Python path if needed
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
if project_root not in sys.path:
    sys.path.append(project_root)

# Import old NexusML modules
from nexusml.train_model_pipeline import train_model as train_model_old
from nexusml.predict import predict as predict_old

# Import new NexusML modules
from nexusml.core.di.container import DIContainer
from nexusml.core.pipeline.context import PipelineContext
from nexusml.core.pipeline.factory import PipelineFactory
from nexusml.core.pipeline.orchestrator import PipelineOrchestrator
from nexusml.core.pipeline.registry import ComponentRegistry
from nexusml.core.config.provider import ConfigurationProvider
from nexusml.core.pipeline.adapters.model_adapter import OldModelAdapter

# Set up plotting
%matplotlib inline
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_context('notebook')

## Load Sample Data

Let's load some sample data that we'll use for training and prediction.

In [None]:
# Define the path to the data file
data_path = "../examples/sample_data.xlsx"

# Load the data using pandas
data = pd.read_excel(data_path)

# Display the first few rows
print(f"Data shape: {data.shape}")
data.head()

## Train a Model with the Old Architecture

First, let's train a model using the old architecture.

In [None]:
# Train a model using the old architecture
try:
    start_time = time.time()
    old_model, old_metrics = train_model_old(
        data_path=data_path,
        test_size=0.3,
        random_state=42,
        output_dir="../outputs/models",
        model_name="old_architecture_model"
    )
    old_training_time = time.time() - start_time
    
    print("Model training with old architecture completed successfully")
    print(f"Training time: {old_training_time:.2f} seconds")
    print("Metrics:")
    for key, value in old_metrics.items():
        print(f"  {key}: {value}")
except Exception as e:
    print(f"Error training model with old architecture: {e}")

## Set Up the New Architecture

Now, let's set up the new architecture.

In [None]:
# Import component implementations
from nexusml.core.pipeline.components.data_loader import CSVDataLoader, ExcelDataLoader
from nexusml.core.pipeline.components.data_preprocessor import StandardPreprocessor
from nexusml.core.pipeline.components.feature_engineer import TextFeatureEngineer
from nexusml.core.pipeline.components.model_builder import RandomForestModelBuilder
from nexusml.core.pipeline.components.model_trainer import StandardModelTrainer
from nexusml.core.pipeline.components.model_evaluator import StandardModelEvaluator
from nexusml.core.pipeline.components.model_serializer import PickleModelSerializer
from nexusml.core.pipeline.components.predictor import StandardPredictor

# Create a registry and container
registry = ComponentRegistry()
container = DIContainer()

# Register components
registry.register(DataLoader, "csv", CSVDataLoader)
registry.register(DataLoader, "excel", ExcelDataLoader)
registry.register(DataPreprocessor, "standard", StandardPreprocessor)
registry.register(FeatureEngineer, "text", TextFeatureEngineer)
registry.register(ModelBuilder, "random_forest", RandomForestModelBuilder)
registry.register(ModelTrainer, "standard", StandardModelTrainer)
registry.register(ModelEvaluator, "standard", StandardModelEvaluator)
registry.register(ModelSerializer, "pickle", PickleModelSerializer)
registry.register(Predictor, "standard", StandardPredictor)

# Set default implementations
registry.set_default_implementation(DataLoader, "excel")
registry.set_default_implementation(DataPreprocessor, "standard")
registry.set_default_implementation(FeatureEngineer, "text")
registry.set_default_implementation(ModelBuilder, "random_forest")
registry.set_default_implementation(ModelTrainer, "standard")
registry.set_default_implementation(ModelEvaluator, "standard")
registry.set_default_implementation(ModelSerializer, "pickle")
registry.set_default_implementation(Predictor, "standard")

# Create a factory and orchestrator
factory = PipelineFactory(registry, container)
context = PipelineContext()
orchestrator = PipelineOrchestrator(factory, context)

## Train a Model with the New Architecture

Now, let's train a model using the new architecture.

In [None]:
# Train a model using the new architecture
try:
    start_time = time.time()
    new_model, new_metrics = orchestrator.train_model(
        data_path=data_path,
        test_size=0.3,
        random_state=42,
        optimize_hyperparameters=False,
        output_dir="../outputs/models",
        model_name="new_architecture_model",
    )
    new_training_time = time.time() - start_time
    
    print("Model training with new architecture completed successfully")
    print(f"Training time: {new_training_time:.2f} seconds")
    print("Metrics:")
    for key, value in new_metrics.items():
        print(f"  {key}: {value}")
except Exception as e:
    print(f"Error training model with new architecture: {e}")

## Compare Training Performance

Let's compare the performance of the old and new architectures.

In [None]:
# Compare training times
print(f"Old architecture training time: {old_training_time:.2f} seconds")
print(f"New architecture training time: {new_training_time:.2f} seconds")
print(f"Difference: {new_training_time - old_training_time:.2f} seconds")
print(f"Ratio: {new_training_time / old_training_time:.2f}x")

# Compare metrics
print("\nMetrics comparison:")
all_metrics = set(old_metrics.keys()) | set(new_metrics.keys())
for metric in all_metrics:
    old_value = old_metrics.get(metric, "N/A")
    new_value = new_metrics.get(metric, "N/A")
    print(f"  {metric}: Old = {old_value}, New = {new_value}")

## Convert Old Model to New Architecture

Now, let's convert the old model to the new architecture using the adapter pattern.

In [None]:
# Convert old model to new architecture
try:
    # Create an adapter for the old model
    adapted_model = OldModelAdapter(old_model)
    
    # Save the adapted model
    model_serializer = factory.create_model_serializer()
    adapted_model_path = "../outputs/models/adapted_model.pkl"
    model_serializer.save_model(adapted_model, adapted_model_path)
    
    print(f"Old model adapted to new architecture and saved to {adapted_model_path}")
except Exception as e:
    print(f"Error adapting old model: {e}")

## Make Predictions with Both Architectures

Let's make predictions using both the old and new architectures.

In [None]:
# Create sample data for prediction
prediction_data = pd.DataFrame({
    "equipment_tag": ["AHU-01", "CHW-01", "P-01"],
    "manufacturer": ["Trane", "Carrier", "Armstrong"],
    "model": ["M-1000", "C-2000", "A-3000"],
    "description": [
        "Air Handling Unit with cooling coil",
        "Centrifugal Chiller for HVAC system",
        "Centrifugal Pump for chilled water",
    ],
})

# Make predictions with old architecture
try:
    start_time = time.time()
    old_predictions = predict_old(
        model=old_model,
        data=prediction_data,
        output_path="../outputs/old_predictions.csv",
    )
    old_prediction_time = time.time() - start_time
    
    print("Predictions with old architecture completed successfully")
    print(f"Prediction time: {old_prediction_time:.2f} seconds")
    print("Sample predictions:")
    display(old_predictions)
except Exception as e:
    print(f"Error making predictions with old architecture: {e}")

# Make predictions with new architecture
try:
    start_time = time.time()
    new_predictions = orchestrator.predict(
        model=new_model,
        data=prediction_data,
        output_path="../outputs/new_predictions.csv",
    )
    new_prediction_time = time.time() - start_time
    
    print("\nPredictions with new architecture completed successfully")
    print(f"Prediction time: {new_prediction_time:.2f} seconds")
    print("Sample predictions:")
    display(new_predictions)
except Exception as e:
    print(f"Error making predictions with new architecture: {e}")

# Make predictions with adapted model
try:
    start_time = time.time()
    adapted_predictions = orchestrator.predict(
        model=adapted_model,
        data=prediction_data,
        output_path="../outputs/adapted_predictions.csv",
    )
    adapted_prediction_time = time.time() - start_time
    
    print("\nPredictions with adapted model completed successfully")
    print(f"Prediction time: {adapted_prediction_time:.2f} seconds")
    print("Sample predictions:")
    display(adapted_predictions)
except Exception as e:
    print(f"Error making predictions with adapted model: {e}")

## Compare Prediction Results

Let's compare the prediction results from the different models.

In [None]:
# Compare prediction times
print(f"Old architecture prediction time: {old_prediction_time:.2f} seconds")
print(f"New architecture prediction time: {new_prediction_time:.2f} seconds")
print(f"Adapted model prediction time: {adapted_prediction_time:.2f} seconds")

# Compare prediction results
print("\nPrediction comparison:")
for i in range(len(prediction_data)):
    print(f"Item {i+1}: {prediction_data.iloc[i]['description']}")
    
    # Get predictions from each model
    old_pred = old_predictions.iloc[i].to_dict() if i < len(old_predictions) else {}
    new_pred = new_predictions.iloc[i].to_dict() if i < len(new_predictions) else {}
    adapted_pred = adapted_predictions.iloc[i].to_dict() if i < len(adapted_predictions) else {}
    
    # Compare predictions
    all_keys = set(old_pred.keys()) | set(new_pred.keys()) | set(adapted_pred.keys())
    for key in all_keys:
        old_val = old_pred.get(key, "N/A")
        new_val = new_pred.get(key, "N/A")
        adapted_val = adapted_pred.get(key, "N/A")
        print(f"  {key}: Old = {old_val}, New = {new_val}, Adapted = {adapted_val}")

## Conclusion

In this notebook, we demonstrated how to migrate from the old NexusML architecture to the new refactored architecture. We trained models with both architectures, converted an old model to the new architecture using the adapter pattern, and made predictions with all three models.

The new architecture provides several advantages:
- More modular and maintainable code
- Better separation of concerns
- Improved testability
- More configurable and extensible

The adapter pattern allows for a smooth transition from the old architecture to the new one, enabling you to continue using existing models while gradually migrating to the new architecture.