# PyTorch Lightning Integration

This tutorial shows how to use `projio` with PyTorch Lightning for managing checkpoints and TensorBoard logs.

## Overview

`projio` provides:

- Built-in paths for Lightning artifacts (checkpoints, tensorboard, logs)
- `IOCheckpointCallback` for checkpoint management
- `IOLogCallback` for log directory routing
- Consistent datestamp handling across all Lightning paths

## Lightning Directory Structure

By default, Lightning artifacts are organized under a `lightning/` directory:

In [1]:
import tempfile
from project_io import ProjectIO

tmp = tempfile.mkdtemp()
io = ProjectIO(root=tmp, use_datestamp=False)

print(f"Lightning root: {io.lightning_root}")
print(f"Checkpoints: {io.checkpoints}")
print(f"TensorBoard: {io.tensorboard}")

Lightning root: /private/var/folders/f7/7pcpvrhn0p9gw509gyzmh8fxrwyskv/T/tmperee_qvt/lightning
Checkpoints: /private/var/folders/f7/7pcpvrhn0p9gw509gyzmh8fxrwyskv/T/tmperee_qvt/lightning/checkpoints
TensorBoard: /private/var/folders/f7/7pcpvrhn0p9gw509gyzmh8fxrwyskv/T/tmperee_qvt/lightning/tensorboard


## Checkpoint Paths

Use `checkpoint_path()` to build paths for saving model checkpoints:

In [2]:
# Simple checkpoint path
ckpt = io.checkpoint_path('best_model')
print(f"Checkpoint: {ckpt}")

# With run name for organization
ckpt_run = io.checkpoint_path('epoch_10', run='experiment_1')
print(f"With run: {ckpt_run}")

# Custom extension
ckpt_pt = io.checkpoint_path('model', ext='.pt')
print(f"With .pt extension: {ckpt_pt}")

Checkpoint: /private/var/folders/f7/7pcpvrhn0p9gw509gyzmh8fxrwyskv/T/tmperee_qvt/lightning/checkpoints/best_model.ckpt
With run: /private/var/folders/f7/7pcpvrhn0p9gw509gyzmh8fxrwyskv/T/tmperee_qvt/lightning/checkpoints/experiment_1/epoch_10.ckpt
With .pt extension: /private/var/folders/f7/7pcpvrhn0p9gw509gyzmh8fxrwyskv/T/tmperee_qvt/lightning/checkpoints/model.pt


## TensorBoard Directories

Use `tensorboard_run()` to get directories for TensorBoard logs:

In [3]:
# Default tensorboard directory
tb = io.tensorboard_run()
print(f"TensorBoard dir: {tb}")

# With run name
tb_run = io.tensorboard_run(run='baseline')
print(f"With run: {tb_run}")

TensorBoard dir: /private/var/folders/f7/7pcpvrhn0p9gw509gyzmh8fxrwyskv/T/tmperee_qvt/lightning/tensorboard
With run: /private/var/folders/f7/7pcpvrhn0p9gw509gyzmh8fxrwyskv/T/tmperee_qvt/lightning/tensorboard/baseline


## Using Callbacks

`projio` provides Lightning callbacks that integrate with ProjectIO:

In [4]:
from project_io.callbacks import IOCheckpointCallback, IOLogCallback

# Create callbacks sharing the same IO instance
io = ProjectIO(root=tmp, use_datestamp=False)

ckpt_callback = IOCheckpointCallback(io=io, run='experiment_1')
log_callback = IOLogCallback(io=io, run='experiment_1')

print(f"Checkpoint directory: {ckpt_callback.checkpoint_dir}")
print(f"Log directory: {log_callback.log_dir}")

Checkpoint directory: /private/var/folders/f7/7pcpvrhn0p9gw509gyzmh8fxrwyskv/T/tmperee_qvt/lightning/checkpoints/experiment_1
Log directory: /private/var/folders/f7/7pcpvrhn0p9gw509gyzmh8fxrwyskv/T/tmperee_qvt/lightning/tensorboard/experiment_1


## Custom Checkpoint Filenames

Customize checkpoint naming with format strings:

In [5]:
# Default format: {epoch:02d}-{step:06d}
cb_default = IOCheckpointCallback(io=io)
path = cb_default.get_checkpoint_path(epoch=5, step=1000)
print(f"Default format: {path.name}")

# Custom format
cb_custom = IOCheckpointCallback(
    io=io,
    filename="model_e{epoch:03d}_s{step:08d}"
)
path = cb_custom.get_checkpoint_path(epoch=5, step=1000)
print(f"Custom format: {path.name}")

Default format: 05-001000.ckpt
Custom format: model_e005_s00001000.ckpt


## Datestamps with Lightning

Datestamps work seamlessly with Lightning paths:

In [6]:
io_dated = ProjectIO(
    root=tmp,
    use_datestamp=True,
    datestamp_in="dirs",
    auto_create=False
)
io_dated.datestamp_value = lambda ts=None: "2024_03_15"

# Checkpoint with datestamp
ckpt = io_dated.checkpoint_path('model', run='exp1')
print(f"Dated checkpoint: {ckpt}")

# Callbacks also respect datestamp
cb = IOCheckpointCallback(io=io_dated, run='exp1', datestamp=True)
print(f"Callback checkpoint dir: {cb.checkpoint_dir}")

Dated checkpoint: /private/var/folders/f7/7pcpvrhn0p9gw509gyzmh8fxrwyskv/T/tmperee_qvt/lightning/checkpoints/2024_03_15/exp1/model.ckpt
Callback checkpoint dir: /private/var/folders/f7/7pcpvrhn0p9gw509gyzmh8fxrwyskv/T/tmperee_qvt/lightning/tensorboard/checkpoints/exp1


## Integration with Lightning Trainer

Here's how to use the callbacks with Lightning Trainer:

```python
from lightning import Trainer
from project_io import ProjectIO
from project_io.callbacks import IOCheckpointCallback, IOLogCallback

# Set up ProjectIO
io = ProjectIO(root="./experiments", use_datestamp=True)

# Create callbacks
ckpt_callback = IOCheckpointCallback(
    io=io,
    run="baseline_v1",
    filename="{epoch:02d}-{step:06d}"
)
log_callback = IOLogCallback(
    io=io,
    run="baseline_v1"
)

# Create trainer with callbacks
trainer = Trainer(
    callbacks=[ckpt_callback, log_callback],
    default_root_dir=str(io.lightning_root),
    max_epochs=10
)

# Train your model
trainer.fit(model, datamodule)
```

## Producer Tracking for Checkpoints

Track which script produced each checkpoint:

In [7]:
from pathlib import Path

io = ProjectIO(root=tmp, use_datestamp=False)

cb = IOCheckpointCallback(
    io=io,
    run='exp1',
    track_producer=True,
    producer_script=Path('train.py')
)

print(f"Producer tracking enabled: {cb.track_producer}")
print(f"Producer script: {cb.producer_script}")

Producer tracking enabled: True
Producer script: train.py


## Log Paths

For general log files (not TensorBoard), use `log_path()`:

In [8]:
io = ProjectIO(root=tmp, use_datestamp=False)

# Training log
train_log = io.log_path('training', run='exp1')
print(f"Training log: {train_log}")

# Validation log
val_log = io.log_path('validation', run='exp1', ext='.json')
print(f"Validation log: {val_log}")

Training log: /private/var/folders/f7/7pcpvrhn0p9gw509gyzmh8fxrwyskv/T/tmperee_qvt/logs/exp1/training.log
Validation log: /private/var/folders/f7/7pcpvrhn0p9gw509gyzmh8fxrwyskv/T/tmperee_qvt/logs/exp1/validation.json


## Best Practices

1. **Share a single ProjectIO instance** across all callbacks to ensure consistent paths

2. **Use run names** to organize experiments: `run="experiment_name"`

3. **Enable datestamps** for long-running projects to track when experiments were run

4. **Use producer tracking** if you need to trace which script created a checkpoint

5. **Set up paths at the start** of your training script:

```python
from project_io import PIO, ProjectIO

PIO.default = ProjectIO(
    root="./experiments",
    use_datestamp=True,
    datestamp_in="dirs"
)
```

## Next Steps

- Explore [templates](04_templates.ipynb) for common file patterns
- Check out [advanced features](05_advanced.ipynb) like dry-run mode and gitignore integration