Finance-native model implementations for latent-factor estimation, stochastic discount factor learning, direct asset prediction, and end-to-end portfolio learning.
Documentation: https://ml4trading.io/docs/models/
This library is one of six interconnected ML4T libraries supporting the research and production workflow described in Machine Learning for Trading.
ml4t-models packages paper-faithful model families that are common in modern empirical asset pricing and portfolio learning:
- Latent-factor estimators with explicit structural outputs:
PCAModelRPPCAModelIPCAModelCAEModel
- Weight-native stochastic discount factor modeling:
StochasticDiscountFactorModel
- Direct asset prediction:
SAEModel(SAE= supervised autoencoder)
- End-to-end portfolio learning:
LinearFeaturePortfolioModelLSTMPortfolioModelDeepPortfolioModel
The library is built around finance-native contracts rather than generic tensor trainers:
PersistentPanelBatchfor stable-ID panelsCrossSectionBatchfor ragged dated cross-sectionsPortfolioSequenceBatchfor sequence-to-allocation models
It also keeps the predictive steps explicit:
- structural extraction
- factor-premium forecasting
- asset mapping
- downstream prediction and weight frames for
ml4t-backtestandml4t-diagnostic
pip install ml4t-modelsOptional extras:
pip install ml4t-models[deep] # torch-backed neural models
pip install ml4t-models[integration] # polars + ml4t-specs bridges
pip install ml4t-models[docs] # mkdocs site build
pip install ml4t-models[all]import numpy as np
from ml4t.models import (
BetaLambdaMapper,
CrossSectionBatch,
ExpandingMeanFactorForecaster,
IPCAConfig,
IPCAModel,
LatentFactorForecastPipeline,
)
batch = CrossSectionBatch(
characteristics=np.random.randn(24, 200, 12),
returns=np.random.randn(24, 200),
timestamps=tuple(range(24)),
)
pipeline = LatentFactorForecastPipeline(
model=IPCAModel(IPCAConfig(n_factors=3)),
forecaster=ExpandingMeanFactorForecaster(),
mapper=BetaLambdaMapper(),
)
pipeline.fit(batch)
prediction = pipeline.predict(batch)
print(prediction.asset_forecast.expected_returns.shape)
# (24, 200)import numpy as np
from ml4t.models import CrossSectionBatch, StochasticDiscountFactorConfig, StochasticDiscountFactorModel
batch = CrossSectionBatch(
characteristics=np.random.randn(36, 300, 16),
returns=np.random.randn(36, 300),
context_features=np.random.randn(36, 8),
timestamps=tuple(range(36)),
)
model = StochasticDiscountFactorModel(
StochasticDiscountFactorConfig(checkpoint_epochs=(256, 512, 768, 1024, 1280))
)
model.fit(batch)
state = model.extract(batch, checkpoint=1280)
print(state.asset_weights.shape)
# (36, 300)import numpy as np
from ml4t.models import LSTMPortfolioConfig, LSTMPortfolioModel, PortfolioSequenceBatch
batch = PortfolioSequenceBatch(
features=np.random.randn(8, 63, 20, 10),
returns=np.random.randn(8, 63, 20),
timestamps=tuple(range(63)),
asset_ids=tuple(f"asset_{i}" for i in range(20)),
)
model = LSTMPortfolioModel(LSTMPortfolioConfig(max_iters=20, checkpoint_every=5))
model.fit(batch)
weights = model.predict(batch, checkpoint=20)
print(weights.weights.shape)
# (8, 63, 20)from ml4t.models import predictions_frame_from_asset_forecast, write_backtest_frames
frame = predictions_frame_from_asset_forecast(prediction.asset_forecast)
write_backtest_frames("artifacts/run_001", predictions=frame)These models estimate a structural representation first, then let a separate forecaster produce ex ante factor premia.
| Model | Contract | Native output | Predictive step |
|---|---|---|---|
PCAModel |
PersistentPanelBatch |
static loadings, factor returns | factor-premium forecaster + mapper |
RPPCAModel |
PersistentPanelBatch |
risk-premium-aware latent factors | factor-premium forecaster + mapper |
IPCAModel |
CrossSectionBatch |
characteristic-implied betas, factor history | factor-premium forecaster + mapper |
CAEModel |
CrossSectionBatch |
nonlinear characteristic betas, factor history | factor-premium forecaster + mapper |
StochasticDiscountFactorModel is not a beta × lambda latent-factor model. It learns a weight-native no-arbitrage object and exposes:
- asset weights
- SDF series
- checkpointed phase-aware training state
Optional return projections are handled by separate mappers.
SAEModel is a supervised autoencoder signal model. In this library it is treated as a direct predictor, not a latent-factor model.
Portfolio models learn allocations directly:
LinearFeaturePortfolioModelas a deterministic baselineLSTMPortfolioModelas a sequence baselineDeepPortfolioModelas a structured DeePM-style allocator
- Finance-native data contracts rather than generic dataloaders
- Explicit structural and predictive stages
- Checkpoint-aware neural training
- Clear separation between:
- model estimation
- forecasting
- backtest and diagnostic integration
- Integration boundaries with sibling libraries instead of duplicated evaluation logic
