# NexusML Enhanced Modular Notebook Template

This enhanced template demonstrates how to use the notebook_utils module to create a more modular and maintainable notebook for NexusML experiments. It includes improved error handling, better documentation, and a more streamlined workflow.

## 1. Environment Setup

First, let's initialize the notebook environment to ensure proper imports and path configuration.

In [None]:
# Initialize the notebook environment
# This ensures that the nexusml package can be imported correctly
%run init_notebook.py

In [None]:
# Import standard libraries
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Import notebook utilities
from nexusml.utils.notebook_utils import (
    setup_notebook_environment,
    discover_and_load_data,
    explore_data,
    setup_pipeline_components,
    visualize_metrics,
    visualize_confusion_matrix
)

In [None]:
# Set up the notebook environment with common configurations
paths = setup_notebook_environment()

# Display available paths
print("Project paths:")
for name, path in paths.items():
    print(f"  {name}: {path}")

## 2. Configuration Setup

Set up the configuration for our experiment. This ensures that NexusML has access to the correct configuration settings.

In [None]:
# Set the NEXUSML_CONFIG environment variable to point to the absolute path of the configuration file
project_root = paths["project_root"]
config_file_path = os.path.abspath(os.path.join(project_root, 'nexusml/config/nexusml_config.yml'))
os.environ['NEXUSML_CONFIG'] = config_file_path
print(f"Setting NEXUSML_CONFIG to: {config_file_path}")

In [None]:
# Get the configuration provider with error handling
try:
    from nexusml.core.config.provider import ConfigurationProvider
    config_provider = ConfigurationProvider()
    config = config_provider.config
    
    print(f"Configuration loaded successfully")
    
    # Display key configuration sections
    if hasattr(config, 'feature_engineering'):
        print(f"Feature Engineering Configuration: {len(config.feature_engineering.text_combinations)} text combinations, "
              f"{len(config.feature_engineering.numeric_columns)} numeric columns")
    
    if hasattr(config, 'classification'):
        print(f"Classification Configuration: {len(config.classification.classification_targets)} classification targets")
    
    if hasattr(config, 'data'):
        print(f"Data Configuration: {len(config.data.required_columns)} required columns")
    
except Exception as e:
    print(f"Error loading configuration: {e}")
    print("Creating default configuration...")
    
    from nexusml.core.config.configuration import NexusMLConfig
    config = NexusMLConfig()
    config_provider.set_config(config)
    
    print("Default configuration created successfully")

## 3. Data Loading and Exploration

Now, let's load the data and explore it using the modular utilities. This section demonstrates how to discover available data files, load them, and perform basic exploratory data analysis.

In [None]:
# Discover and load data
try:
    # The discover_and_load_data function will automatically find available data files
    # and load the specified one (or the first one if none is specified)
    data, data_path = discover_and_load_data()
    
    # Display the first few rows
    print("\nFirst few rows of the data:")
    display(data.head())
    
except FileNotFoundError as e:
    print(f"Error: {e}")
    print("Please ensure that data files are available in the expected locations.")
    print("You can specify custom search paths using the search_paths parameter.")
    
except Exception as e:
    print(f"Unexpected error loading data: {e}")

In [None]:
# Explore the data
try:
    # The explore_data function provides basic statistics and information about the data
    exploration_results = explore_data(data)
    
    # Additional custom exploration
    print("\nUnique values in categorical columns:")
    for col in data.select_dtypes(include=['object']).columns:
        print(f"  {col}: {data[col].nunique()} unique values")
        
except NameError:
    print("Data not loaded. Please load data first.")
    
except Exception as e:
    print(f"Error exploring data: {e}")

## 4. Pipeline Setup

Let's set up the pipeline components for our experiment. This section demonstrates how to create and configure the NexusML pipeline components.

In [None]:
# Set up pipeline components
try:
    # The setup_pipeline_components function creates and configures all necessary pipeline components
    pipeline_components = setup_pipeline_components()
    
    # Extract components for easier access
    registry = pipeline_components["registry"]
    container = pipeline_components["container"]
    factory = pipeline_components["factory"]
    context = pipeline_components["context"]
    orchestrator = pipeline_components["orchestrator"]
    
    print("Pipeline components set up successfully")
    
    # Display registered components
    print("\nRegistered components:")
    for interface, implementations in registry._registry.items():
        interface_name = interface.__name__
        print(f"  {interface_name}:")
        for name, impl in implementations.items():
            print(f"    - {name}: {impl.__name__}")
    
except Exception as e:
    print(f"Error setting up pipeline components: {e}")

## 5. Model Training

Now, let's train a model using the pipeline. This section demonstrates how to use the orchestrator to train a model with the loaded data.

In [None]:
# Train a model
try:
    # Define the output directory for the model
    output_dir = os.path.join(paths["project_root"], "outputs", "models")
    os.makedirs(output_dir, exist_ok=True)
    
    # Train the model using the orchestrator
    model, metrics = orchestrator.train_model(
        data_path=data_path,
        test_size=0.3,
        random_state=42,
        optimize_hyperparameters=True,
        output_dir=output_dir,
        model_name="equipment_classifier_enhanced",
    )
    
    print("Model training completed successfully")
    print(f"Model saved to: {orchestrator.context.get('model_path')}")
    print(f"Metadata saved to: {orchestrator.context.get('metadata_path')}")
    
    print("\nMetrics:")
    for key, value in metrics.items():
        print(f"  {key}: {value}")
    
except Exception as e:
    print(f"Error training model: {e}")
    print("\nTroubleshooting tips:")
    print("1. Check that the data contains the expected columns")
    print("2. Verify that the configuration is set up correctly")
    print("3. Ensure that all required dependencies are installed")

## 6. Model Evaluation

Let's evaluate the model in more detail. This section demonstrates how to use the orchestrator to evaluate the trained model.

In [None]:
# Evaluate the model
try:
    # Define the output path for evaluation results
    evaluation_output_path = os.path.join(paths["project_root"], "outputs", "evaluation_results_enhanced.json")
    
    # Evaluate the model using the orchestrator
    results = orchestrator.evaluate(
        model=model,
        data_path=data_path,
        output_path=evaluation_output_path,
    )
    
    print("Evaluation completed successfully")
    print(f"Evaluation results saved to: {evaluation_output_path}")
    
    print("\nMetrics:")
    for key, value in results["metrics"].items():
        print(f"  {key}: {value}")
    
except NameError as e:
    print(f"Error: {e}")
    print("Model not trained. Please train the model first.")
    
except Exception as e:
    print(f"Error evaluating model: {e}")

## 7. Visualization

Let's visualize the results using the modular visualization utilities. This section demonstrates how to create visualizations of the model metrics and confusion matrix.

In [None]:
# Visualize the metrics
try:
    # Visualize metrics using the utility function
    visualize_metrics(metrics)
    
    # Create a confusion matrix if available
    if 'confusion_matrix' in results['analysis']:
        cm = results['analysis']['confusion_matrix']
        visualize_confusion_matrix(cm)
    
except NameError as e:
    print(f"Error: {e}")
    print("Metrics or results not available. Please train and evaluate the model first.")
    
except KeyError as e:
    print(f"Error: {e} not found in results")
    print("The confusion matrix may not be available in the results.")
    
except Exception as e:
    print(f"Error visualizing results: {e}")

## 8. Feature Importance

Let's examine the feature importance to understand which features are most influential in the model's predictions.

In [None]:
# Visualize feature importance
try:
    # Check if the model has feature_importances_ attribute (e.g., for tree-based models)
    if hasattr(model, 'feature_importances_'):
        # Get feature names from the model's metadata if available
        feature_names = orchestrator.context.get('feature_names', None)
        
        if feature_names is None:
            print("Feature names not found in context. Using generic feature names.")
            feature_names = [f"Feature {i}" for i in range(len(model.feature_importances_))]
        
        # Create a DataFrame for visualization
        importance_df = pd.DataFrame({
            'Feature': feature_names,
            'Importance': model.feature_importances_
        })
        
        # Sort by importance
        importance_df = importance_df.sort_values('Importance', ascending=False)
        
        # Display the top features
        print("Top 10 most important features:")
        display(importance_df.head(10))
        
        # Visualize
        plt.figure(figsize=(12, 8))
        sns.barplot(x='Importance', y='Feature', data=importance_df.head(20))
        plt.title('Feature Importance')
        plt.tight_layout()
        plt.show()
    else:
        print("This model type doesn't provide feature importance information.")
        
except NameError as e:
    print(f"Error: {e}")
    print("Model not trained. Please train the model first.")
    
except Exception as e:
    print(f"Error visualizing feature importance: {e}")

## 9. Prediction

Finally, let's make predictions on new data. This section demonstrates how to use the trained model to make predictions on new data.

In [None]:
# Create sample data for prediction
try:
    # Create a sample DataFrame with the same structure as the training data
    prediction_data = pd.DataFrame({
        "equipment_tag": ["AHU-01", "CHW-01", "P-01"],
        "manufacturer": ["Trane", "Carrier", "Armstrong"],
        "model": ["M-1000", "C-2000", "A-3000"],
        "description": [
            "Air Handling Unit with cooling coil",
            "Centrifugal Chiller for HVAC system",
            "Centrifugal Pump for chilled water",
        ],
    })
    
    print("Sample prediction data:")
    display(prediction_data)
    
except Exception as e:
    print(f"Error creating sample data: {e}")

In [None]:
# Make predictions
try:
    # Define the output path for predictions
    prediction_output_path = os.path.join(paths["project_root"], "outputs", "predictions_enhanced.csv")
    
    # Make predictions using the orchestrator
    predictions = orchestrator.predict(
        model=model,
        data=prediction_data,
        output_path=prediction_output_path,
    )
    
    print("Predictions completed successfully")
    print(f"Predictions saved to: {prediction_output_path}")
    
    print("\nSample predictions:")
    display(predictions)
    
except NameError as e:
    print(f"Error: {e}")
    print("Model not trained or prediction data not created. Please complete the previous steps first.")
    
except Exception as e:
    print(f"Error making predictions: {e}")

## 10. Conclusion

In this notebook, we demonstrated how to use the NexusML package for equipment classification using an enhanced modular approach. We loaded data, trained a model, evaluated it, and made predictions using the new architecture and modular utilities.

The enhanced modular approach provides several benefits:

1. Better code organization and reusability
2. Improved error handling and troubleshooting
3. Consistent visualization and reporting
4. Simplified data discovery and loading
5. Additional analysis capabilities like feature importance

This template can be used as a starting point for your own NexusML experiments.

## 11. Next Steps

Here are some potential next steps for further exploration:

1. Try different model types and hyperparameters
2. Experiment with different feature engineering techniques
3. Perform cross-validation for more robust evaluation
4. Deploy the model for production use
5. Create a custom pipeline component for specific needs