tensorcas is a checkpoint storage library for machine learning models that deduplicates checkpoints at the tensor level. It uses content-addressable storage as its storage layer.
ML frameworks store checkpoints as complete snapshots — every weight, every parameter, every time you call .save. For warm-start tree models or fine-tuned neural networks, most of those parameters haven't changed since the last checkpoint. You pay full storage cost to write the same bytes repeatedly.
tensorcas solves this by operating on tensors instead of files. Each adapter extracts the model's individual components — trees, layers, weight matrices — as named numpy arrays. tensorcas hashes each array and compares it against the previous checkpoint. Arrays that haven't changed are skipped entirely; only new or updated arrays are written to disk.
For example, a warm-start GBM that adds 10 trees per step writes only those 10 new trees. The previous 90 are already on disk and cost nothing to "save" again.
tensorcas replaces your save and load calls. Each adapter handles framework-specific serialization internally.
Measured over 20-step training runs. DVC stores full checkpoints on every save; tensorcas deduplicates at the tensor level.
| Framework | Scenario | DVC stores | tensorcas stores | Savings |
|---|---|---|---|---|
| sklearn | 20 warm-start steps | 1.4 MB | 80 KB | 94% |
| XGBoost | 20 warm-start steps | 721 KB | 150 KB | 79% |
| PyTorch | 20 epochs, frozen backbone | 10.6 MB | 5.4 MB | 49% |
Measured on Apple Silicon (macOS), Python 3.12. Save uses a cold store per rep; load reuses the same store (warm page cache). Full methodology in benchmark/results/.
| Framework | Model | Save | Load | No-op save |
|---|---|---|---|---|
| sklearn | GBM, 50 trees, 4 KB | 38 ms | 9 ms | 1.4 ms |
| XGBoost | Booster, 50 rounds, 25 KB | 29 ms | 5 ms | 1.9 ms |
| PyTorch | MLP 256×2, 332 KB | 7 ms | 1 ms | 0.5 ms |
| PyTorch | MLP 1024×4, 12 MB | 36 ms | 16 ms | — |
The no-op fast path (tensor hash matches previous checkpoint → zero writes) is the dominant case for warm-start and fine-tuning workflows. At 0.5–1.9 ms per step it is effectively free.
# pip
pip install tensorcas
# uv
uv add tensorcasFramework adapters are included but their dependencies are optional:
# pip
pip install "tensorcas[sklearn]" # scikit-learn
pip install "tensorcas[xgboost]" # XGBoost
pip install "tensorcas[torch]" # torch
pip install "tensorcas[sklearn,xgboost,torch]" # all of the above
# uv
uv add "tensorcas[sklearn]"
uv add "tensorcas[xgboost]"
uv add "tensorcas[torch]"from pathlib import Path
import torch
import torch.nn as nn
from tensorcas.store import TensorCasStore
from tensorcas.adapters.pytorch import PyTorchAdapter
model = nn.Sequential(nn.Linear(128, 64), nn.ReLU(), nn.Linear(64, 10))
store = TensorCasStore(
root=Path("./checkpoints"),
run_id="mlp-run-001",
adapter=PyTorchAdapter(),
)
# Save a checkpoint at each epoch
for epoch in range(1, 21):
# ... training loop ...
store.save(model, step=epoch)
# Load any checkpoint
store.load(step=10, original=model)
# Inspect what was saved
print(store.stats())
# {'run_id': 'mlp-run-001', 'checkpoints': 20, 'total_chunks': 60,
# 'unique_chunks': 8, 'dedup_ratio': 0.1333, 'total_bytes': 245760}
# dedup_ratio = unique/total chunks — lower means more reuse (0.13 = 87% reused)| Framework | Adapter class | Supported model types |
|---|---|---|
| scikit-learn | tensorcas.adapters.sklearn.SklearnAdapter |
GradientBoostingClassifier, GradientBoostingRegressor, LogisticRegression, any model with coef_ / intercept_ |
| XGBoost | tensorcas.adapters.xgboost.XGBoostAdapter |
xgb.Booster |
| PyTorch | tensorcas.adapters.pytorch.PyTorchAdapter |
Any model with a state_dict() |
| Custom | Implement ModelAdapter |
extract(model) -> Dict[str, ndarray] and reconstruct(tensors, original) -> model |
User / Framework
│ model object
▼
TensorCasStore (store.py)
save / load / gc / stats
│ Dict[str, ndarray] │ registry queries
▼ ▼
ModelAdapter Registry (registry/registry.py)
extract(model) register_checkpoint
reconstruct(tensors, original) list_runs / list_checkpoints
delete_checkpoint / gc / stats
│ Backed by SQLite (registry.db)
▼
StorageEngine (storage.py)
Per-tensor no-op fast path
(shape/dtype pre-check → full_hash comparison)
Tensors fanned out via ThreadPoolExecutor
│ TensorArrayRecord │ manifest JSON
▼ ▼
CASEngine ManifestWriter/Reader (manifest.py)
chunk → hash → batch_has Atomic write (temp + rename)
parallel put / get Path: manifests/{run_id}/step_{n:06d}.json
(ThreadPoolExecutor)
│ bytes
▼
FilesystemBackend (cas/filesystem.py)
objects/{h[:2]}/{h[2:4]}/{h[4:]}.chunk
zstd compression, atomic writes
-
Run and step. A run is a training experiment, identified by a string
run_id. A step is a checkpoint within that run, identified by an integer. Multiple runs can share the same store root — deduplication works across runs. -
Content-addressable storage. Every tensor is split into fixed-size chunks (default 256 KB) and each chunk is stored once, keyed by its BLAKE3 hash, under
{root}/objects/. The SQLite registry at{root}/registry.dbtracks which checkpoints reference which chunks. A grace-period GC sweeps unreferenced chunks after checkpoint deletion. -
No-op fast path. Before writing anything, tensorcas hashes the full tensor and compares it against the previous checkpoint's manifest. If the hash matches, the tensor is skipped entirely — no chunking, no CAS write, no disk I/O. This is the primary source of storage savings: frozen trees, frozen layers, and unchanged weight matrices are identified in memory and never written again. At 0.5–1.9 ms per step for typical models, the no-op path is effectively free.
PyTorch: BatchNorm running statistics update in train mode.
Freezing layers via requires_grad_(False) does not prevent BatchNorm buffers (running_mean, running_var, num_batches_tracked) from updating — they are updated on every forward pass in model.train() mode regardless. For ResNet-18 this means 80 of 122 state dict tensors change every epoch even when the entire backbone is frozen, capping the no-op rate at ~48% instead of the theoretical ~99%.
If you want these buffers to stop changing, call model.eval() before saving:
model.eval()
store.save(model, step=epoch)
model.train()This freezes running stats to their current values. The trade-off is that saved checkpoints reflect eval-mode statistics, not the running averages from the most recent training batches.
tensorcas --root ./checkpoints list
tensorcas --root ./checkpoints list --run mlp-run-001
tensorcas --root ./checkpoints stats
tensorcas --root ./checkpoints stats --run mlp-run-001 --format json
tensorcas --root ./checkpoints gc --grace 24 --yes
tensorcas --root ./checkpoints delete --run mlp-run-001 --step 3 --yes| Command | What it does |
|---|---|
list |
List all runs and their steps. --run scopes to one run. |
stats |
Dedup statistics: chunk counts, unique chunks, dedup ratio, total bytes. |
gc |
Sweep orphaned blobs older than --grace hours (default 24). Prompts unless --yes. |
delete |
Delete a single checkpoint. Blob files are cleaned by the next gc. Prompts unless --yes. |
Run tests
uv run pytest tests/ -qThe test suite requires xgboost and torch (installed as dev dependencies via uv). All 53 tests run without skips.
Add an adapter
Implement the ModelAdapter protocol from tensorcas.adapters.base:
class ModelAdapter(Protocol):
def extract(self, model: Any) -> Dict[str, np.ndarray]:
"""Extract tensors from a model. Return a flat dict of named arrays."""
...
def reconstruct(self, tensors: Dict[str, np.ndarray], original: Any) -> Any:
"""Reconstruct a model from tensors. original is the template model, if needed."""
...extract should return the minimal set of arrays needed to restore model state — not raw serialization bytes. This is what makes tensor-level deduplication effective: weight updates between steps, not serialization artifacts.
original is required for adapters that cannot reconstruct a model from tensors alone — for example, sklearn's internal Cython tree structures cannot be built from scratch and need a fitted model as a template. PyTorch adapters typically don't need it.
See src/tensorcas/adapters/sklearn.py for a reference implementation.