# NexusML Modular Notebook Template

This template demonstrates how to use the notebook_utils module to create a more modular and maintainable notebook for NexusML experiments.

In [None]:
import os
import sys
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Add the project root to the Python path if needed
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
if project_root not in sys.path:
    sys.path.append(project_root)

# Import the notebook utilities
from nexusml.utils.notebook_utils import (
    setup_notebook_environment,
    discover_and_load_data,
    explore_data,
    setup_pipeline_components,
    visualize_metrics,
    visualize_confusion_matrix
)

## Setup Environment

First, let's set up the notebook environment with common configurations.

In [None]:
# Set up the notebook environment
paths = setup_notebook_environment()
print("Project paths:")
for name, path in paths.items():
    print(f"  {name}: {path}")

## Configuration

Set up the configuration for our experiment.

In [None]:
# Set the NEXUSML_CONFIG environment variable to point to the absolute path of the configuration file
config_file_path = os.path.abspath(os.path.join(project_root, 'nexusml/config/nexusml_config.yml'))
os.environ['NEXUSML_CONFIG'] = config_file_path
print(f"Setting NEXUSML_CONFIG to: {config_file_path}")

# Get the configuration provider
from nexusml.core.config.provider import ConfigurationProvider
config_provider = ConfigurationProvider()

# Get the configuration with error handling
try:
    config = config_provider.config
    print(f"Configuration loaded successfully")
    print(f"Feature Engineering Configuration: {len(config.feature_engineering.text_combinations)} text combinations, {len(config.feature_engineering.numeric_columns)} numeric columns")
    print(f"Classification Configuration: {len(config.classification.classification_targets)} classification targets")
    print(f"Data Configuration: {len(config.data.required_columns)} required columns")
except Exception as e:
    print(f"Error loading configuration: {e}")
    print("Creating default configuration...")
    from nexusml.core.config.configuration import NexusMLConfig
    config = NexusMLConfig()
    config_provider.set_config(config)
    print("Default configuration created successfully")

## Data Loading and Exploration

Now, let's load the data and explore it using the modular utilities.

In [None]:
# Discover and load data
data, data_path = discover_and_load_data()

# Display the first few rows
data.head()

In [None]:
# Explore the data
exploration_results = explore_data(data)

## Pipeline Setup

Let's set up the pipeline components for our experiment.

In [None]:
# Set up pipeline components
pipeline_components = setup_pipeline_components()

# Extract components for easier access
registry = pipeline_components["registry"]
container = pipeline_components["container"]
factory = pipeline_components["factory"]
context = pipeline_components["context"]
orchestrator = pipeline_components["orchestrator"]

print("Pipeline components set up successfully")

## Model Training

Now, let's train a model using the pipeline.

In [None]:
# Train a model
try:
    model, metrics = orchestrator.train_model(
        data_path=data_path,
        test_size=0.3,
        random_state=42,
        optimize_hyperparameters=True,
        output_dir="../outputs/models",
        model_name="equipment_classifier_modular",
    )
    
    print("Model training completed successfully")
    print(f"Model saved to: {orchestrator.context.get('model_path')}")
    print(f"Metadata saved to: {orchestrator.context.get('metadata_path')}")
    print("Metrics:")
    for key, value in metrics.items():
        print(f"  {key}: {value}")
except Exception as e:
    print(f"Error training model: {e}")

## Model Evaluation

Let's evaluate the model in more detail.

In [None]:
# Evaluate the model
try:
    results = orchestrator.evaluate(
        model=model,
        data_path=data_path,
        output_path="../outputs/evaluation_results_modular.json",
    )
    
    print("Evaluation completed successfully")
    print(f"Evaluation results saved to: ../outputs/evaluation_results_modular.json")
    print("Metrics:")
    for key, value in results["metrics"].items():
        print(f"  {key}: {value}")
except Exception as e:
    print(f"Error evaluating model: {e}")

## Visualization

Let's visualize the results using the modular visualization utilities.

In [None]:
# Visualize the metrics
try:
    # Visualize metrics using the utility function
    visualize_metrics(metrics)
    
    # Create a confusion matrix if available
    if 'confusion_matrix' in results['analysis']:
        cm = results['analysis']['confusion_matrix']
        visualize_confusion_matrix(cm)
except Exception as e:
    print(f"Error visualizing results: {e}")

## Prediction

Finally, let's make predictions on new data.

In [None]:
# Create sample data for prediction
prediction_data = pd.DataFrame({
    "equipment_tag": ["AHU-01", "CHW-01", "P-01"],
    "manufacturer": ["Trane", "Carrier", "Armstrong"],
    "model": ["M-1000", "C-2000", "A-3000"],
    "description": [
        "Air Handling Unit with cooling coil",
        "Centrifugal Chiller for HVAC system",
        "Centrifugal Pump for chilled water",
    ],
})

# Make predictions
try:
    predictions = orchestrator.predict(
        model=model,
        data=prediction_data,
        output_path="../outputs/predictions_modular.csv",
    )
    
    print("Predictions completed successfully")
    print(f"Predictions saved to: {orchestrator.context.get('output_path')}")
    print("Sample predictions:")
    display(predictions)
except Exception as e:
    print(f"Error making predictions: {e}")

## Conclusion

In this notebook, we demonstrated how to use the NexusML package for equipment classification using a modular approach. We loaded data, trained a model, evaluated it, and made predictions using the new architecture and modular utilities.

The modular approach provides several benefits:
1. Better code organization and reusability
2. Improved maintainability
3. Consistent error handling
4. Simplified data discovery and loading
5. Standardized visualization