# Tutorial 3: Advanced Visualization and Analysis

This notebook demonstrates advanced visualization techniques and analysis workflows for pyrene dimer conformational data.

## Learning Objectives

By the end of this tutorial, you will be able to:
1. Create publication-quality figures
2. Analyze correlations between geometric parameters
3. Build energy landscapes
4. Prepare data for QSAR modeling

## 1. Setup

In [None]:
from pyrene_analyzer import PyreneDimerAnalyzer
from pyrene_analyzer.visualization import (
    plot_angle_vs_energy,
    plot_distance_vs_overlap,
    plot_energy_landscape,
    plot_correlation_matrix,
    create_summary_figure
)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
plt.style.use('seaborn-v0_8-whitegrid')

## 2. Load or Generate Sample Data

In [None]:
# Generate sample data with energy values
np.random.seed(42)
n = 200

# Create realistic correlated data
plane_angles = np.random.uniform(5, 85, n)
distances = 3.3 + 0.02 * plane_angles + np.random.normal(0, 0.3, n)
overlaps = 95 - 1.0 * plane_angles + np.random.normal(0, 10, n)
overlaps = np.clip(overlaps, 0, 100)

# Energy correlates with angle and distance
energies = 0.1 * plane_angles + 2 * (distances - 3.5)**2 + np.random.normal(0, 1, n)
energies = energies - energies.min()  # Relative energy

results_df = pd.DataFrame({
    'molecule': np.random.choice(['Et', 'iPr', 'cHex', 'tBu'], n),
    'conformer_id': range(n),
    'plane_angle_deg': plane_angles,
    'interplane_distance_A': distances,
    'pi_overlap_pct': overlaps,
    'centroid_distance_A': distances + np.random.uniform(0, 1, n),
    'slip_stack_A': np.random.uniform(0, 2, n),
    'energy_kcal_mol': energies + 5,
    'rel_energy_kcal_mol': energies,
})

# Add classification
analyzer = PyreneDimerAnalyzer(verbose=False)
results_df = analyzer.add_classification(results_df)

print(f"Generated {len(results_df)} conformers")
results_df.head()

## 3. Correlation Analysis

In [None]:
# Create correlation matrix
fig = plot_correlation_matrix(results_df)
plt.show()

In [None]:
# Pairplot for key parameters
key_cols = ['plane_angle_deg', 'interplane_distance_A', 'pi_overlap_pct', 'rel_energy_kcal_mol']
g = sns.pairplot(
    results_df[key_cols + ['classification']],
    hue='classification',
    palette='Set2',
    diag_kind='kde'
)
plt.suptitle('Pairwise Relationships by Classification', y=1.02)
plt.show()

## 4. Energy Landscape

In [None]:
# Create energy landscape plot
fig = plot_energy_landscape(results_df)
plt.show()

In [None]:
# Angle vs Energy with excimer threshold
fig = plot_angle_vs_energy(
    results_df,
    color_by='molecule',
    show_excimer_threshold=True
)
plt.show()

## 5. Publication-Quality Summary Figure

In [None]:
# Create comprehensive summary
fig = create_summary_figure(results_df)
plt.savefig('publication_summary.png', dpi=300, bbox_inches='tight')
plt.show()
print("Saved to publication_summary.png")

## 6. QSAR-Ready Data Preparation

In [None]:
# Calculate aggregate descriptors per molecule
qsar_features = results_df.groupby('molecule').agg({
    'plane_angle_deg': ['mean', 'std', 'min', 'max'],
    'interplane_distance_A': ['mean', 'std', 'min', 'max'],
    'pi_overlap_pct': ['mean', 'std', 'min', 'max'],
    'rel_energy_kcal_mol': ['mean', 'min'],
})

# Flatten column names
qsar_features.columns = ['_'.join(col).strip() for col in qsar_features.columns.values]
qsar_features = qsar_features.reset_index()

print("QSAR-ready features:")
qsar_features

In [None]:
# Calculate excimer fraction as a potential endpoint
excimer_fraction = results_df.groupby('molecule')['classification'].apply(
    lambda x: (x == 'strong_excimer').sum() / len(x)
).reset_index()
excimer_fraction.columns = ['molecule', 'excimer_fraction']

# Merge with features
qsar_data = pd.merge(qsar_features, excimer_fraction, on='molecule')
qsar_data

In [None]:
# Export QSAR-ready data
qsar_data.to_csv('qsar_features.csv', index=False)
print("QSAR features exported to qsar_features.csv")

## Summary

In this tutorial, we learned how to:
- Create correlation matrices and pairplots
- Build energy landscape visualizations
- Generate publication-quality summary figures
- Prepare aggregate descriptors for QSAR modeling

These techniques form the foundation for understanding structure-property relationships in pyrene dimer systems.