> Note: Hawkes simulations now call `python.simulate`, which loads the native `hawkes_bridge` shared library from `build/lib`. Before executing simulation cells, run `cmake -S . -B build` and `cmake --build build --target hawkes_bridge`, or set the environment variable `HFT_HAWKES_BRIDGE` to the compiled binary.

# Neural Hawkes Tutorial

This notebook demonstrates how to use `neural_hawkes.py` to train a neural surrogate of Hawkes-process dynamics.
We cover dataset preparation, training, evaluation, and inspection of predictions.


## 1. Environment setup
Ensure you have the required dependencies installed:

```bash
pip install torch numpy matplotlib
```

The `neural_hawkes.py` script is located at the project root and can be executed as a standalone CLI or imported as a module.


## 2. Quick start (CLI)

The script can be run directly. By default it generates synthetic sequences if no dataset is provided:

```bash
python neural_hawkes.py --epochs 5 --batch-size 128 --num-types 4 --backbone mlp --mlp-layers 3
```

Use `--dataset path/to/data.json` to load custom sequences stored as JSON/NPZ (lists of `[timestamp, type]`).


## 3. Programmatic usage inside Python

We can import the key components and run training inside this notebook for reproducibility.


In [None]:

import json
from pathlib import Path
import numpy as np
import torch
from torch.utils.data import DataLoader

import neural_hawkes as nh

# For notebook runs, keep deterministic seeds
np.random.seed(0)
torch.manual_seed(0)


### 3.1 Generate synthetic sequences

`generate_synthetic_sequences` creates IID sequences with exponential inter-arrival times and random event types. Replace this block with your dataset loader if you have real trading data.


In [None]:

sequences = nh.generate_synthetic_sequences(num_sequences=100, num_events=200, num_types=4, seed=123)
dataset = nh.EventSequenceDataset(sequences, window_size=64, stride=32)
train_set, val_set, test_set = nh.split_dataset(dataset, (0.7, 0.15, 0.15))

collate = nh.collate_windows
train_loader = DataLoader(train_set, batch_size=64, shuffle=True, collate_fn=collate)
val_loader = DataLoader(val_set, batch_size=128, shuffle=False, collate_fn=collate)
test_loader = DataLoader(test_set, batch_size=128, shuffle=False, collate_fn=collate)


### 3.2 Instantiate the neural Hawkes model

The model combines an event-type embedding, an RNN (GRU or LSTM), and two heads for predicting the next event type and inter-arrival time.


In [None]:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = nh.NeuralHawkesModel(num_types=4, embed_dim=32, hidden_dim=64, backbone='gru').to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)


### 3.3 Training loop

We train for a small number of epochs and track loss/accuracy/MAE on the validation set.


In [None]:

EPOCHS = 5
delta_weight = 1.0
train_history = []
val_history = []

for epoch in range(1, EPOCHS + 1):
    train_metrics = nh.train_one_epoch(model, train_loader, optimizer, device, delta_weight)
    val_metrics = nh.evaluate(model, val_loader, device, delta_weight)
    train_history.append(train_metrics)
    val_history.append(val_metrics)
    print(
        f"Epoch {epoch:02d}: train loss {train_metrics['loss']:.4f} "
        f"acc {train_metrics['acc']:.4f} mae {train_metrics['mae']:.4f} | "
        f"val loss {val_metrics['loss']:.4f} acc {val_metrics['acc']:.4f} mae {val_metrics['mae']:.4f}"
    )


### 3.4 Evaluation on test set


In [None]:

test_metrics = nh.evaluate(model, test_loader, device, delta_weight)
print(test_metrics)


## 4. Visualise training curves

Plot the training/validation losses and accuracy to inspect convergence.


In [None]:

import matplotlib.pyplot as plt

train_loss = [m['loss'] for m in train_history]
val_loss = [m['loss'] for m in val_history]
val_acc = [m['acc'] for m in val_history]
val_mae = [m['mae'] for m in val_history]

fig, axes = plt.subplots(1, 3, figsize=(12, 3))
axes[0].plot(train_loss, label='train loss')
axes[0].plot(val_loss, label='val loss')
axes[0].set_title('Loss')
axes[0].legend()

axes[1].plot(val_acc, label='val acc', color='tab:green')
axes[1].set_title('Validation accuracy')
axes[1].set_ylim(0, 1)

axes[2].plot(val_mae, label='val MAE', color='tab:orange')
axes[2].set_title('Validation MAE')
plt.tight_layout()
plt.show()


## 5. Inspect predictions for a single batch


In [None]:

batch = next(iter(test_loader))
batch = nh.move_batch(batch, device)
logits, delta_pred = model(batch['input_types'], batch['input_deltas'], batch['lengths'])
probs = torch.softmax(logits, dim=-1)

print('Predicted event type probabilities (first window):')
print(probs[0, :5].cpu().numpy())
print('
True next event types:')
print(batch['target_types'][0, :5].cpu().numpy())
print('
Predicted inter-arrivals vs true:')
print(list(zip(delta_pred[0, :5].detach().cpu().numpy(), batch['target_deltas'][0, :5].cpu().numpy())))


## 6. Runtime comparison
Use helper `measure_runtime` to benchmark CPU vs GPU (when available).


In [None]:

cpu_time = nh.measure_runtime(model.to('cpu'), test_loader, torch.device('cpu'))
print(f'CPU runtime (1 pass): {cpu_time:.3f} s')
if torch.cuda.is_available():
    gpu_model = nh.NeuralHawkesModel(num_types=4, embed_dim=32, hidden_dim=64)
    gpu_model.load_state_dict(model.state_dict())
    gpu_time = nh.measure_runtime(gpu_model.to('cuda'), test_loader, torch.device('cuda'))
    print(f'GPU runtime (1 pass): {gpu_time:.3f} s')
else:
    print('GPU not available; skipping GPU timing')


## 7. Saving model checkpoints
Optional: save trained weights and optimizer state for reuse.


In [None]:

checkpoint_path = Path('neural_hawkes_checkpoint.pth')
torch.save({'model': model.state_dict(), 'optimizer': optimizer.state_dict()}, checkpoint_path)
checkpoint_path


## 8. Next steps
- Swap the synthetic generator with real trade streams or LOBSTER feeds preprocessed as `(t_i, type_i)`.
- Incorporate volumes/marks by extending the dataset and augmenting the model inputs.
- Replace the surrogate loss with a proper neural Hawkes likelihood (e.g., log-intensity integration) for better calibration.


## 9. Batch experiments

Use `experiments/run_matrix.py` with a JSON config (see `experiments/configs/multi_symbol_example.json`) to sweep across symbols/backbones, then summarise results via `experiments/aggregate_results.py`.