# Example

This notebook demonstrates how to use the Neural Adjoint (NA) package with different model architectures for both forward and inverse modeling.

## Overview

The example covers two main scenarios:

### Example 1: Built-in Models
- **Data**: Loads geometry-spectra pairs from CSV files (8D geometry → 300D spectra)
- **Models**: Demonstrates both `ConvModel` and `LinModel` architectures
- **Training**: Trains forward models (geometry → spectra) with automatic checkpointing
- **Inference**: Shows both forward prediction and NA inverse prediction (spectra → geometry)

### Example 2: Custom PyTorch Models
- **Custom Architecture**: Defines a simple feedforward neural network
- **Synthetic Data**: Generates 1000 samples with 200D input → 10D output
- **Training**: Trains the custom model using the NA framework
- **Evaluation**: Demonstrates geometry prediction and evaluation on test data

## **Example 1**

### Imports

In [1]:
from mm_neural_adjoint import NANetwork, ConvModel, LinModel
import torch
import torch.nn as nn
import pandas as pd
from torch.utils.data import DataLoader, TensorDataset
from tqdm import tqdm
import numpy as np

### Read example data

In [2]:
X = pd.read_csv('data/data_x_tiny.csv', header=None, delimiter=' ')
y = pd.read_csv('data/data_y_tiny.csv', header=None, delimiter=' ')

X_tensor = torch.tensor(X.values, dtype=torch.float32)
y_tensor = torch.tensor(y.values, dtype=torch.float32)

### Convert to Dataloaders

In [3]:
dataset = TensorDataset(X_tensor, y_tensor)

# Split the dataset into train, validation, and test
total_size = len(dataset)
train_size = int(0.7 * total_size)
val_size = int(0.15 * total_size)
test_size = total_size - train_size - val_size

train_dataset, val_dataset, test_dataset = torch.utils.data.random_split(
    dataset, [train_size, val_size, test_size]
)

# Create dataloaders
train_loader = DataLoader(train_dataset, batch_size=10, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=10, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1, shuffle=True)

### Train Convolutional Model

This cell trains a convolutional neural network using the Neural Adjoint framework which was used in the original paper.

**Setup:**
- Creates a `ConvModel` with 8 input features and 300 output features
- Wraps it in `NANetwork` for enhanced training capabilities
- Sets up device (GPU if available, otherwise CPU)

**Training:**
- Trains for 50 epochs using the provided train and validation data loaders
- Uses a custom progress bar to track training progress
- Automatically saves the best model during training based on validation loss

In [5]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model_conv = ConvModel(8, 300)

model = NANetwork(model_conv, device=device)

epochs = 50

pbar = tqdm(total=epochs, desc='Training Progress')

model.train(epochs, train_loader, val_loader, progress_bar=pbar, save=True)

pbar.close()


# model.evaluate_geometry(test_loader)

Training Progress: 100%|██████████| 50/50 [00:04<00:00, 11.49it/s, train_loss=0.012768, val_loss=0.017610]


### Train FC Model

This cell trains a fully connected neural network using the Neural Adjoint framework worked better for some data.

**Setup:**
- Creates a `LinModel` with 8 input features and 300 output features
- Wraps it in `NANetwork` for enhanced training capabilities
- Sets up device (GPU if available, otherwise CPU)

**Training:**
- Trains for 50 epochs using the provided train and validation data loaders
- Uses a custom progress bar to track training progress
- Automatically saves the best model during training based on validation loss

In [6]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model_lin = LinModel(8, 300)

model = NANetwork(model_lin, device=device)

epochs = 50

pbar = tqdm(total=epochs, desc='Training Progress')

model.train(epochs, train_loader, val_loader, progress_bar=pbar, save=True)

pbar.close()


# model.evaluate_geometry(test_loader)

Training Progress: 100%|██████████| 50/50 [00:06<00:00,  7.68it/s, train_loss=0.009149, val_loss=0.022766]


### Train FC Model

This cell trains a fully connected neural network using the Neural Adjoint framework worked better for some data.

**Setup:**
- Creates a `LinModel` with 8 input features and 300 output features
- Wraps it in `NANetwork` for enhanced training capabilities
- Sets up device (GPU if available, otherwise CPU)

**Training:**
- Trains for 50 epochs using the provided train and validation data loaders
- Uses a custom progress bar to track training progress
- Automatically saves the best model during training based on validation loss

**Tracking & MLflow:**
- Training metrics are logged to MLflow for experiment tracking
- Records learning rate, batch size, regularization parameters, and geometry bounds
- Tracks training loss, validation loss, and best validation loss over time
- Saves total training time and model checkpoints
- Results stored in local SQLite database (`mlflow.db`)
- View results by running: `mlflow ui --backend-store-uri sqlite:///mlflow.db` in the same directory

In [8]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model_lin = LinModel(8, 300)

model = NANetwork(model_lin, device=device, tracking=True)

epochs = 50

pbar = tqdm(total=epochs, desc='Training Progress')

model.train(epochs, train_loader, val_loader, progress_bar=pbar, save=True)

pbar.close()


# model.evaluate_geometry(test_loader)

Training Progress:   0%|          | 0/50 [00:12<?, ?it/s]
2025/07/03 11:32:48 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2025/07/03 11:32:48 INFO mlflow.store.db.utils: Updating database tables
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> 451aebb31d03, add metric step
INFO  [alembic.runtime.migration] Running upgrade 451aebb31d03 -> 90e64c465722, migrate user column to tags
INFO  [alembic.runtime.migration] Running upgrade 90e64c465722 -> 181f10493468, allow nulls for metric values
INFO  [alembic.runtime.migration] Running upgrade 181f10493468 -> df50e92ffc5e, Add Experiment Tags Table
INFO  [alembic.runtime.migration] Running upgrade df50e92ffc5e -> 7ac759974ad8, Update run tags with larger limit
INFO  [alembic.runtime.migration] Running upgrade 7ac759974ad8 -> 89d4b8295536, create latest metrics table
INFO  [89d4b8295536_

### Load Pre-trained Model

This cell loads a previously trained model from a checkpoint file.

**What's Loaded:**
- Model architecture (`LinModel` with 8 input features, 300 output features)
- Trained model weights and parameters
- Training metadata including best validation loss
- Geometry preprocessing parameters (mean, lower/upper bounds)
- Optimizer state (if available)

**Usage:**
- Creates a new `NANetwork` instance with the same architecture
- Loads the saved state from `checkpoints/best_model.pt`
- Prints the geometry lower bounds to verify successful loading
- Model is ready for inference without retraining

In [9]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = LinModel(8, 300)
model = NANetwork(model, device=device)


model.load('checkpoints/best_model.pt')
print(model.geometry_lower_bound)

Successfully loaded model from checkpoints/best_model.pt
tensor([-1., -1., -1., -1., -1., -1., -1., -1.])


### Forward Prediction (Geometry → Spectra)

This cell demonstrates standard forward inference using the trained model.

**Process:**
- Takes a single geometry sample from the dataset (`X_tensor[0]`)
- Adds batch dimension with `unsqueeze(0)` for model input
- Predicts the corresponding spectra using the trained forward model
- Returns predicted spectra with shape `(1, 300)` - one sample with 300 spectral features

**Usage:**
- Standard neural network inference (no Neural Adjoint method)
- Useful for validating model performance on known geometries
- Can be used for batch predictions by passing multiple geometries

In [7]:
spectra = model.predict_spectra(X_tensor[0].unsqueeze(0))
print(spectra.shape)

(1, 300)


### Neural Adjoint Prediction (Spectra → Geometry)

This cell demonstrates the core Neural Adjoint method for inverse prediction.

**Process:**
- Takes a target spectra (`y_tensor[0]`) as input
- Uses gradient-based optimization to find the best geometry that produces this spectra
- Runs multiple optimization trials with different initializations
- Returns the top prediction based on MSE loss

**Outputs:**
- `Xpred_top`: Predicted geometry with shape `(1, 8)` - one sample with 8 geometry features
- `Ypred_top`: Predicted spectra from the best geometry with shape `(1, 300)`
- `MSE_top`: Mean squared error between target and predicted spectra

**Key Features:**
- Inverse modeling using gradient descent on the input space
- Multiple random initializations for robust optimization
- Boundary constraints to keep predictions within valid geometry ranges

In [10]:
Xpred_top, Ypred_top, MSE_top = model.predict_geometry(y_tensor[0])
print(Xpred_top.shape, Ypred_top.shape, MSE_top.shape)


Using first layer extraction with 8 features


                                                                                          

(1, 8) (1, 300) (1,)




# Example 2

## Custom PyTorch Model Definition

This cell defines a custom feedforward neural network architecture.

**Architecture:**
- Input layer: 200 features → 128 hidden units
- Hidden layer 1: 128 → 64 units
- Hidden layer 2: 64 → 32 units
- Output layer: 32 → 10 features
- ReLU activation functions and 20% dropout for regularization

**Flexibility:**
This demonstrates that **any PyTorch model** can be used with the Neural Adjoint framework.

The only requirement is that the model inherits from `nn.Module` and implements a `forward()` method.

In [11]:
class SimpleModel(nn.Module):
    def __init__(self, input_size=200, output_size=10, hidden_size=128):
        super(SimpleModel, self).__init__()
        
        # Define the layers
        self.layers = nn.Sequential(
            # Input layer: 200 -> 128
            nn.Linear(input_size, hidden_size),
            nn.ReLU(),
            nn.Dropout(0.2),
            
            # Hidden layer: 128 -> 64
            nn.Linear(hidden_size, hidden_size // 2),
            nn.ReLU(),
            nn.Dropout(0.2),
            
            # Hidden layer: 64 -> 32
            nn.Linear(hidden_size // 2, hidden_size // 4),
            nn.ReLU(),
            nn.Dropout(0.2),
            
            # Output layer: 32 -> 10
            nn.Linear(hidden_size // 4, output_size)
        )
    
    def forward(self, x):
        return self.layers(x)

### Synthetic Data Generation

This cell creates synthetic training data to demonstrate the Neural Adjoint framework.

**Data Generation:**
- **1000 samples** with **200 input features** and **10 output features**
- Input features: Random normal distribution (mean=0, std=0.5)
- Output features: Non-linear transformation of inputs with added noise

**Data Loaders:**
- Training: Batch size 32 with shuffling
- Validation: Batch size 32 without shuffling
- Test: Batch size 1 for individual predictions

In [10]:
# Set random seed for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Generate synthetic training data
n_samples = 1000
n_features = 200
n_outputs = 10

# Create input features (X) - random values between -1 and 1
X = torch.randn(n_samples, n_features) * 0.5  # Normal distribution with std=0.5

# Create a simple relationship for outputs (Y)
# This creates a non-linear relationship between inputs and outputs
def generate_outputs(X):
    # Create some non-linear transformations
    Y = torch.zeros(n_samples, n_outputs)
    
    for i in range(n_outputs):
        # Each output depends on different combinations of input features
        feature_indices = torch.randperm(n_features)[:20]  # Use 20 random features per output
        weights = torch.randn(20) * 0.1
        
        # Non-linear transformation
        Y[:, i] = torch.sum(X[:, feature_indices] * weights, dim=1) + \
                  torch.sin(torch.sum(X[:, feature_indices], dim=1)) * 0.1 + \
                  torch.randn(n_samples) * 0.05  # Add some noise
    
    return Y

# Generate outputs
y = generate_outputs(X)

# Convert to pandas DataFrames (optional, for easy viewing)
X_df = pd.DataFrame(X.numpy())
y_df = pd.DataFrame(y.numpy())

print(f"X shape: {X.shape}")
print(f"y shape: {y.shape}")
print(f"X range: [{X.min():.3f}, {X.max():.3f}]")
print(f"y range: [{y.min():.3f}, {y.max():.3f}]")

# Create DataLoader
dataset = TensorDataset(X, y)

# Split into train/validation/test
total_size = len(dataset)
train_size = int(0.7 * total_size)
val_size = int(0.29 * total_size)
test_size = total_size - train_size - val_size

train_dataset, val_dataset, test_dataset = torch.utils.data.random_split(
    dataset, [train_size, val_size, test_size]
)

# Create dataloaders
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=1, shuffle=False)

print(f"\nDataset splits:")
print(f"Train: {len(train_dataset)} samples")
print(f"Validation: {len(val_dataset)} samples")
print(f"Test: {len(test_dataset)} samples")

X shape: torch.Size([1000, 200])
y shape: torch.Size([1000, 10])
X range: [-2.295, 2.315]
y range: [-1.052, 0.850]

Dataset splits:
Train: 700 samples
Validation: 290 samples
Test: 10 samples


In [11]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

sample_X, sample_y = next(iter(train_loader))

model_base = SimpleModel(sample_X.shape[1], sample_y.shape[1])

model = NANetwork(model_base, device=device)

epochs = 50

pbar = tqdm(total=epochs, desc='Training Progress')

model.train(epochs, train_loader, val_loader, progress_bar=pbar, save=True)

pbar.close()
# model.evaluate_geometry(test_loader)

Training Progress: 100%|██████████| 50/50 [00:02<00:00, 23.33it/s, train_loss=0.015782, val_loss=0.017491]


## Evaluate Model Performance

This cell performs comprehensive evaluation of the trained model on the test dataset.

**Process:**
- Evaluates the model on all test samples using the Neural Adjoint method
- For each test sample, performs inverse prediction (spectra → geometry)
- Compares predicted geometries with true geometries
- Saves results to CSV files for analysis

**Output Files:**
- `val_results/val_Ypred.csv`: Predicted spectra from best geometries
- `val_results/val_Ytruth.csv`: True target spectra
- `val_results/val_Xtruth.csv`: True input geometries
- `val_results/val_Xpred.csv`: Predicted geometries

- Batch size must be 1 for evaluation (individual sample processing)

In [None]:
model.evaluate_geometry(test_loader)

## Single Sample Neural Adjoint Prediction

This cell demonstrates the Neural Adjoint method on a single target sample.

**Process:**
- Takes the first target spectra from the synthetic dataset (`y[0]`)
- Uses gradient-based optimization to find the best input that produces this target
- Runs multiple optimization trials with different random initializations
- Returns the top prediction based on MSE loss

**Outputs:**
- `Xpred_top`: Predicted input features with shape `(1, 200)` - one sample with 200 features
- `Ypred_top`: Predicted output from the best input with shape `(1, 10)`
- `MSE_top`: Mean squared error between target and predicted output

**Parameters:**
- `save_top`: Controls how many top predictions to return (default=1 for best prediction only)

**Note** This does not save the results to a file.

In [17]:
sample = next(iter(test_loader))[1]

Xpred_top, Ypred_top, MSE_top = model.predict_geometry(sample, save_top=10)
print(Xpred_top.shape, Ypred_top.shape, MSE_top.shape)

Using first layer extraction with 8 features


                                                                                          

(10, 8) (10, 300) (10,)


