# Demo 2: Automated Loss Landscape Analysis Workflow

This notebook demonstrates how to use the **automated command-line scripts** for Hessian eigenvector computation and loss landscape generation, providing a streamlined alternative to the manual step-by-step approach shown in Demo 1.

## **What This Demo Covers**
1. **Use configuration files** to specify parameters
2. **Execute automated scripts** via command line

## **Expected Generated Files**

This automated workflow will create the following files and directories:

### **Configuration Files (Demo Directory):**
- `demo2_hessian_config.yml` - Configuration for Hessian eigenvector computation
- `demo2_landscape_config.yml` - Configuration for loss landscape generation

#### The result files should be identical to the ones generated in demo1

### **Hessian Eigenvector Results (`eigenvectors/demo2_automated_hessian/`):**
- `model_eig_max.pt` - Model with weights set to maximum eigenvector
- `model_eig_min.pt` - Model with weights set to minimum eigenvector
- `config.yml` - Copy of the Hessian computation configuration
- `eigenvalues.txt` - Text file with computed eigenvalue statistics

### **Loss Landscape Results (`computed_loss_landscapes/demo2_automated_landscape/`):**
- `loss_landscapes_df.pkl` - Structured DataFrame with landscapes and metadata (~variable size)
- `raw_loss_landscape_array.npy` - Raw 3D numpy array (samples × grid_x × grid_y) (~variable size)
- `config.yml` - Copy of the landscape generation configuration

## **Scripts We'll Use**

1. **`generate_hessian_eigenvector.py`** - Computes Hessian eigenvectors from config
2. **`generate_loss_landscapes.py`** - Generates 2D loss landscapes from config  

## **Quick Overview**

The automated workflow consists of just two main commands:
```bash
# Step 1: Compute Hessian eigenvectors
python generate_hessian_eigenvector.py hessian_config.yml

# Step 2: Generate loss landscapes  
python generate_loss_landscapes.py landscape_config.yml
```

Each script uses YAML configuration files to specify all parameters, making the process **reproducible, configurable, and suitable for batch processing**.




## 1. Setup and Verify Demo Files

First, let's import the necessary libraries and verify that our demo files are available.


In [None]:
import os
import sys
import yaml
import pandas as pd
import numpy as np
import json
from pathlib import Path

print("Libraries imported successfully!")

# Check if we're in the demo directory or need to navigate
if os.path.basename(os.getcwd()) == 'demo':
    os.chdir('..')
    base_path = '.'
    demo_path = 'demo'

print(f"Current directory: {os.getcwd()}")
print(f"Base path: {base_path}")
print(f"Demo path: {demo_path}")


We will use the previously selected lowest error samples for Hessian eigenvector computation and the full dataset for loss landscape generation, just like how we did it in demo 1.

In [None]:
# Verify demo files exist
demo_files = {
    'model': os.path.join('demo', 'demo_JVDFT_dHf_model.pt'),
    'dataset': os.path.join('demo', 'demo_JVDFT_dHf_dataset_50.pkl'),
    'low_error_dataset': os.path.join('demo', 'demo_JVDFT_dHf_dataset_lowest_20_error_samples_from_50.pkl')
}

print("Checking demo files:")
all_files_exist = True
for name, path in demo_files.items():
    if os.path.exists(path):
        size = os.path.getsize(path)
        print(f"  {name}: {path} ({size:,} bytes)")
    else:
        print(f"  {name}: {path} - NOT FOUND")
        all_files_exist = False

if all_files_exist:
    print("\nAll demo files are available!")
else:
    print("\nSome demo files are missing. Please ensure you have the demo files in the demo/ directory.")

# Quick peek at the dataset
if os.path.exists(demo_files['dataset']):
    df = pd.read_pickle(demo_files['dataset'])
    print(f"\nDataset info:")
    print(f"  Samples: {len(df)}")
    print(f"  Columns: {list(df.columns)}")
    print(f"  Target (formation energy) range: {df['formation_energy_peratom'].min():.3f} to {df['formation_energy_peratom'].max():.3f}")
else:
    print("\nCannot load dataset for inspection.")


## 2. Create Configuration for Hessian Eigenvector Computation

Now we'll create a YAML configuration file for the first step: computing Hessian eigenvectors.


In [None]:
# Create configuration for Hessian eigenvector computation
hessian_config = {
    'model_path': demo_files['model'],
    'data_path': demo_files['low_error_dataset'], 
    'run_id': 'demo2_automated_hessian',
    'target': 'formation_energy_peratom',
    'device': 'cuda'  # Change to 'cpu' if no GPU available
}

# Save configuration to YAML file
hessian_config_path = (os.path.join('demo', 'demo2_hessian_config.yml'))

with open(hessian_config_path, 'w') as f:
    yaml.dump(hessian_config, f, default_flow_style=False)

print(f"\nConfiguration saved to: {hessian_config_path}")

# Display the generated YAML file
print(f"\nContents of {hessian_config_path}:")
with open(hessian_config_path, 'r') as f:
    hessian_config_dict = yaml.safe_load(f)

display(hessian_config_dict)



### Configuration Explanation

**Key Parameters for Hessian Computation:**

- **`model_path`**: Path to the trained ALIGNN model checkpoint (.pt file)
- **`data_path`**: Path to the dataset pickle file (.pkl) - script will use ALL samples in this file
- **`run_id`**: Unique identifier that creates output folder `eigenvectors/{run_id}/`
- **`target`**: Target property name in the dataset (column name)
- **`device`**: Computation device ('cuda' for GPU, 'cpu' for CPU)

**Important**: The script will use the **entire dataset** specified in `data_path`. For large datasets, this can be memory-intensive. Consider creating a subset if needed.


## 3. Execute Hessian Eigenvector Computation

Now we'll run the automated script to compute the Hessian eigenvectors. This may take several minutes depending on your hardware and dataset size.

```bash
python generate_hessian_eigenvector.py hessian_config.yml

```

In [None]:
import subprocess
import time

print("Starting Hessian eigenvector computation...")
print("This may take 2-3 minutes depending on your hardware and dataset size.")

start_time = time.time()

try:
    # Run the hessian eigenvector script
    cmd = ['python', 'generate_hessian_eigenvector.py', hessian_config_path]
    print(f"\nExecuting: {' '.join(cmd)}")
    
    result = subprocess.run(cmd, capture_output=True, text=True, timeout=1800)  # 30 min timeout
    
    end_time = time.time()
    computation_time = end_time - start_time
    
    if result.returncode == 0:
        print(f"\nHessian computation completed successfully!")
        print(f"Total time: {computation_time:.2f} seconds ({computation_time/60:.2f} minutes)")
        print(f"\nScript output:")
        print(result.stdout)
    else:
        print(f"\nError occurred during computation:")
        print(f"Return code: {result.returncode}")
        print(f"Error output: {result.stderr}")
        print(f"Standard output: {result.stdout}")
        
except subprocess.TimeoutExpired:
    print(f"\nScript timed out after 30 minutes")
except Exception as e:
    print(f"\nUnexpected error: {str(e)}")

## 4. Verify Hessian Results

Let's check what files were created by the Hessian eigenvector computation.


In [None]:
# Check the eigenvector output directory
eigenvector_dir = os.path.join(base_path, 'eigenvectors', hessian_config['run_id'])

print(f"Checking eigenvector output directory: {eigenvector_dir}")

if os.path.exists(eigenvector_dir):
    print(f"Output directory exists!")
    
    # List all files in the directory
    files = os.listdir(eigenvector_dir)
    print(f"\nFiles created:")
    for file in files:
        file_path = os.path.join(eigenvector_dir, file)
        if os.path.isfile(file_path):
            size = os.path.getsize(file_path)
            print(f"  {file} ({size:,} bytes)")
    
    # Load and display eigenvalue information if available
    eigenvalue_file = os.path.join(eigenvector_dir, 'eigenvalues.txt')
    if os.path.exists(eigenvalue_file):
        print(f"\nEigenvalue Information:")
        print("-" * 30)
        with open(eigenvalue_file, 'r') as f:
            print(f.read())
    
    # Load and display config that was saved
    config_file = os.path.join(eigenvector_dir, 'config.yml')
    if os.path.exists(config_file):
        print(f"\nSaved Configuration:")
        print("-" * 25)
        with open(config_file, 'r') as f:
            saved_config = yaml.safe_load(f)
            for key, value in saved_config.items():
                print(f"  {key}: {value}")
    
    print(f"\nHessian eigenvector computation completed successfully!")
    print(f"Results saved in: {eigenvector_dir}")
    
else:
    print(f"Output directory not found: {eigenvector_dir}")
    print("The Hessian computation may have failed or not completed yet.")


## 5. Create Configuration for Loss Landscape Generation

Now that we have the Hessian eigenvectors, we can create the configuration for generating loss landscapes.


In [None]:
# Create configuration for loss landscape generation
landscape_config = {
    'model_path': demo_files['model'],
    'data_path': demo_files['dataset'],
    'target': 'formation_energy_peratom',
    'eigenvector_folder_path': eigenvector_dir,  # Points to the results from step 1
    'run_id': 'demo2_automated_landscape',
    'steps': 20,  # Creates a 21x21 grid
    'device': 'cuda',  # Change to 'cpu' if no GPU available
    'scale_factor': 1.0,  # Scaling for eigenvector perturbations
    'half': False  # Set to True to skip every other computation for speed
}

# Save configuration to YAML file
landscape_config_path = os.path.join(demo_path, 'demo2_landscape_config.yml')

print("Creating Loss Landscape configuration:")


with open(landscape_config_path, 'w') as f:
    yaml.dump(landscape_config, f, default_flow_style=False)

print(f"\nConfiguration saved to: {landscape_config_path}")

# Display the generated YAML file
print(f"\nContents of {landscape_config_path}:")
print("-" * 40)
with open(landscape_config_path, 'r') as f:
    print(f.read())


### Configuration Explanation

**Key Parameters for Loss Landscape Generation:**

- **`model_path`**: Same model as before
- **`data_path`**: The full dataset you wish to compute loss landscape on
- **`target`**: Same target property
- **`eigenvector_folder_path`**: Path to the eigenvector results from Step 1
- **`run_id`**: Creates output folder `computed_loss_landscapes/{run_id}/`
- **`steps`**: Grid resolution
- **`device`**: Computation device
- **`scale_factor`**: Multiplier for eigenvector perturbation magnitude
- **`half`**: Boolean to skip every other grid point (speeds up computation by ~2x)


## 6. Execute Loss Landscape Generation

Now we'll run the second script to generate the 2D loss landscapes. This step typically takes longer than the Hessian computation.

```bash 
python generate_loss_landscapes.py landscape_config.yml
```

In [None]:
print("Starting Loss Landscape generation...")
print(f"Processing {len(df)} samples with {landscape_config['steps']}×{landscape_config['steps']} grids")
print(f"Total model evaluations: ~{len(df) * landscape_config['steps']**2:,}")

start_time = time.time()

try:
    # Run the loss landscape generation script
    cmd = ['python', 'generate_loss_landscapes.py', landscape_config_path]
    print(f"\nExecuting: {' '.join(cmd)}")
    
    result = subprocess.run(cmd, capture_output=True, text=True, timeout=3600)  # 60 min timeout
    
    end_time = time.time()
    computation_time = end_time - start_time
    
    if result.returncode == 0:
        print(f"\nLoss landscape generation completed successfully!")
        print(f"Total time: {computation_time:.2f} seconds ({computation_time/60:.2f} minutes)")
        print(f"\nScript output:")
        print(result.stdout)
    else:
        print(f"\nError occurred during generation:")
        print(f"Return code: {result.returncode}")
        print(f"Error output: {result.stderr}")
        print(f"Standard output: {result.stdout}")
        
except subprocess.TimeoutExpired:
    print(f"\nScript timed out after 60 minutes")
except Exception as e:
    print(f"\nUnexpected error: {str(e)}")



## 7. Verify Loss Landscape Results

Let's examine the output files from the loss landscape generation.


In [None]:
# Check the loss landscape output directory
landscape_dir = os.path.join(base_path, 'computed_loss_landscapes', landscape_config['run_id'])

print(f"Checking loss landscape output directory: {landscape_dir}")

if os.path.exists(landscape_dir):
    print(f"Output directory exists!")
    
    # List all files in the directory
    files = os.listdir(landscape_dir)
    print(f"\nFiles created:")
    total_size = 0
    for file in files:
        file_path = os.path.join(landscape_dir, file)
        if os.path.isfile(file_path):
            size = os.path.getsize(file_path)
            total_size += size
            print(f"  {file} ({size:,} bytes)")
    
    print(f"\nTotal output size: {total_size:,} bytes ({total_size/1024/1024:.2f} MB)")
    
    # Load and examine the loss landscapes DataFrame if available
    landscape_pkl = os.path.join(landscape_dir, 'loss_landscapes_df.pkl')
    if os.path.exists(landscape_pkl):
        print(f"\nLoading loss landscapes DataFrame...")
        try:
            landscapes_df = pd.read_pickle(landscape_pkl)
            print(f"  Shape: {landscapes_df.shape}")
            print(f"  Columns: {list(landscapes_df.columns)}")
            print(f"  Sample JIDs: {landscapes_df['jid'].head().tolist()}")
            
            # Check the landscape array shape
            first_landscape = landscapes_df['raw_loss_landscapes'].iloc[0]
            print(f"  Individual landscape shape: {first_landscape.shape}")
            
        except Exception as e:
            print(f"  Error loading DataFrame: {str(e)}")
    
    # Load and display the saved config
    config_file = os.path.join(landscape_dir, 'config.yml')
    if os.path.exists(config_file):
        print(f"\nSaved Configuration:")
        print("-" * 25)
        with open(config_file, 'r') as f:
            saved_config = yaml.safe_load(f)
            for key, value in saved_config.items():
                print(f"  {key}: {value}")
    
    print(f"\nLoss landscape generation completed successfully!")
    print(f"Results saved in: {landscape_dir}")
    
else:
    print(f"Output directory not found: {landscape_dir}")
    print("The loss landscape generation may have failed or not completed yet.")
