# MDTerp Visualization and Analysis Demo

This notebook demonstrates the new visualization and analysis utilities in MDTerp v1.5.0.

**Features covered:**
- Feature importance visualization
- Transition comparison
- Statistical analysis
- Automated report generation
- Data export for external tools

Reference: "Thermodynamics-inspired explanations of artificial intelligence", Shams Mehdi and Pratyush Tiwary, Nature Communications

## Setup

Import required modules and set up paths to MDTerp results.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from MDTerp import visualization, analysis

# Path to your MDTerp results directory
results_dir = './results/'  # Change this to your results directory
results_file = f'{results_dir}/MDTerp_results_all.pkl'
feature_names_file = f'{results_dir}/MDTerp_feature_names.npy'

print("MDTerp Visualization and Analysis Demo")
print("=" * 50)

## 1. Basic Feature Importance Visualization

Visualize the most important features overall or for specific transitions.

In [None]:
# Plot overall feature importance (averaged across all transitions)
fig = visualization.plot_feature_importance(
    results_file,
    feature_names_file,
    transition=None,  # None means average across all transitions
    top_n=15,
    figsize=(12, 6),
    show_std=True
)
plt.show()

In [None]:
# Plot feature importance for a specific transition
# Replace "0_1" with your actual transition name
fig = visualization.plot_feature_importance(
    results_file,
    feature_names_file,
    transition="0_1",  # Specify transition of interest
    top_n=15,
    figsize=(12, 6),
    show_std=True
)
plt.show()

## 2. Transition Heatmap

Visualize how feature importance varies across different transitions using a heatmap.

In [None]:
fig = visualization.plot_transition_heatmap(
    results_file,
    feature_names_file,
    top_n=20,
    figsize=(14, 8),
    cmap='YlOrRd'  # Try 'viridis', 'plasma', 'coolwarm', etc.
)
plt.show()

## 3. Compare Multiple Transitions

Side-by-side comparison of feature importance across specific transitions.

In [None]:
# Compare specific transitions
# Replace with your actual transition names
transitions_to_compare = ["0_1", "1_2"]  # Add more as needed

fig = visualization.plot_transition_comparison(
    results_file,
    feature_names_file,
    transitions=transitions_to_compare,
    top_n=12,
    figsize=(14, 7)
)
plt.show()

## 4. Sample Distribution Analysis

Examine the distribution of importance values across individual samples for a specific feature.

In [None]:
# Analyze distribution for a specific feature
# Replace with your actual feature name
fig = visualization.plot_sample_importance_distribution(
    results_file,
    feature_names_file,
    transition="0_1",
    feature_name="phi",  # Replace with your feature name
    figsize=(10, 6)
)
plt.show()

## 5. Statistical Analysis

Use the analysis module to extract quantitative insights from MDTerp results.

In [None]:
# Get top features with statistics
top_features = analysis.get_top_features(
    results_file,
    feature_names_file,
    transition=None,  # Average across all transitions
    n=10,
    normalize=True
)

print("\nTop 10 Most Important Features (Overall):")
print("=" * 60)
print(f"{'Rank':<6} {'Feature':<25} {'Mean':<12} {'Std':<12}")
print("=" * 60)
for i, (name, mean, std) in enumerate(top_features, 1):
    print(f"{i:<6} {name:<25} {mean:<12.6f} {std:<12.6f}")

In [None]:
# Get summary for all transitions
summary = analysis.get_transition_summary(
    results_file,
    feature_names_file
)

print("\nTransition Summary:")
print("=" * 80)
for trans_name, info in summary.items():
    print(f"\nTransition {trans_name}:")
    print(f"  Samples: {info['n_samples']}")
    print(f"  Important features: {info['n_important_features']}")
    print(f"  Entropy: {info['mean_entropy']:.4f}")
    print(f"  Top 3 features:")
    for feat_name, importance in info['top_features'][:3]:
        print(f"    - {feat_name}: {importance:.6f}")

## 6. Compare Two Transitions

Identify features that distinguish between two transitions.

In [None]:
# Compare two transitions
# Replace with your actual transition names
comparison = analysis.compare_transitions(
    results_file,
    feature_names_file,
    transition1="0_1",
    transition2="1_2",
    top_n=15
)

print("\nTransition Comparison:")
print("=" * 60)

print("\nShared important features:")
for name, importance in comparison['shared'][:5]:
    print(f"  {name}: {importance:.6f}")

print("\nFeatures unique to transition 0_1:")
for name, importance in comparison['unique_to_1'][:5]:
    print(f"  {name}: {importance:.6f}")

print("\nFeatures unique to transition 1_2:")
for name, importance in comparison['unique_to_2'][:5]:
    print(f"  {name}: {importance:.6f}")

print("\nFeatures with largest differences:")
for name, diff in comparison['difference'][:5]:
    print(f"  {name}: {diff:.6f}")

## 7. Identify Consensus Features

Find features that are consistently important across multiple transitions.

In [None]:
# Find consensus features
consensus = analysis.identify_consensus_features(
    results_file,
    feature_names_file,
    threshold=0.5,  # Must appear in 50% of transitions
    top_n_per_transition=10
)

print("\nConsensus Features (important in â‰¥50% of transitions):")
print("=" * 60)
print(f"{'Feature':<30} {'Appears in N transitions':<20}")
print("=" * 60)
for name, count in consensus.items():
    print(f"{name:<30} {count:<20}")

## 8. Detailed Feature Statistics

Get comprehensive statistics for a specific feature across all transitions.

In [None]:
# Analyze a specific feature
# Replace with your actual feature name
stats = analysis.get_feature_statistics(
    results_file,
    feature_names_file,
    feature_name="phi"  # Replace with your feature
)

print("\nFeature Statistics for 'phi':")
print("=" * 80)
print(f"{'Transition':<15} {'Mean':<12} {'Std':<12} {'Min':<12} {'Max':<12} {'Samples':<10}")
print("=" * 80)
for trans_name, stat in stats.items():
    print(f"{trans_name:<15} {stat['mean']:<12.6f} {stat['std']:<12.6f} "
          f"{stat['min']:<12.6f} {stat['max']:<12.6f} {stat['n_samples']:<10}")

## 9. Export Results to CSV

Export MDTerp results to CSV format for use in other tools (Excel, R, etc.).

In [None]:
# Export summary statistics (recommended for most use cases)
analysis.export_results_to_csv(
    results_file,
    feature_names_file,
    output_file=f'{results_dir}/mdterp_summary.csv',
    include_raw_data=False
)

# Optionally export raw per-sample data (larger file)
# analysis.export_results_to_csv(
#     results_file,
#     feature_names_file,
#     output_file=f'{results_dir}/mdterp_raw.csv',
#     include_raw_data=True
# )

## 10. Generate Comprehensive Report

Automatically create a complete set of visualizations for your results.

In [None]:
# Generate all visualizations at once
visualization.create_summary_report(
    results_dir=results_dir,
    output_dir=f'{results_dir}/visualizations',
    top_n=15
)

print("\nReport generated! Check the 'visualizations' directory for all plots.")

## Summary

This notebook demonstrated:

1. **Visualization tools**: Multiple plot types for feature importance analysis
2. **Statistical analysis**: Quantitative insights from MDTerp results
3. **Comparative analysis**: Tools to compare transitions and identify patterns
4. **Data export**: CSV export for external analysis
5. **Automated reporting**: One-command comprehensive report generation

For more information:
- Documentation: https://shams-mehdi.github.io/MDTerp
- Paper: "Thermodynamics-inspired explanations of artificial intelligence", Nature Communications
- GitHub: https://github.com/shams-mehdi/MDTerp