Context
Trained model .pkl files are saved to /app/ml_models/ on the Railway worker container's ephemeral disk (no volume). They survive until the next worker redeploy, then vanish — but the MLModel DB row (including file_path) persists, so file_path can dangle.
This is currently harmless:
- Monitoring (
alert_evaluator → ConfidenceBasedRetrainingSystem) does not load the model file.
- Grading is rule-based;
HybridQueryGrader is not wired into the request path.
train_ml_model trains fresh from TrainingData rather than loading a prior model.
So no code path reads the artifact today.
Trigger / when this matters
Before wiring ML prediction into the request path (e.g. activating HybridQueryGrader / model_manager load in the grade flow), the artifact must be durable and reachable by any service that loads it (web for prediction, worker for training/eval).
Proposed approach
Object storage as a model registry:
- Training uploads the
.pkl by key on save (analyzer/ml/core/training_pipeline.py:_save_model / _record_training_metrics).
- Loaders download by key (
analyzer/ml/core/model_manager.py, hybrid_grader.py).
MLModel.file_path becomes a storage key rather than a local path.
- Backend: Supabase Storage (S3-compatible, account available) or Railway buckets. Neon is Postgres — not suitable for blobs.
Out of scope / notes
- Until this lands, monitoring works fine on the ephemeral file (it's never read).
- Add a storage client + credentials as env vars (reference-variable pattern on Railway).
Context
Trained model
.pklfiles are saved to/app/ml_models/on the Railway worker container's ephemeral disk (no volume). They survive until the next worker redeploy, then vanish — but theMLModelDB row (includingfile_path) persists, sofile_pathcan dangle.This is currently harmless:
alert_evaluator→ConfidenceBasedRetrainingSystem) does not load the model file.HybridQueryGraderis not wired into the request path.train_ml_modeltrains fresh fromTrainingDatarather than loading a prior model.So no code path reads the artifact today.
Trigger / when this matters
Before wiring ML prediction into the request path (e.g. activating
HybridQueryGrader/model_managerload in the grade flow), the artifact must be durable and reachable by any service that loads it (web for prediction, worker for training/eval).Proposed approach
Object storage as a model registry:
.pklby key on save (analyzer/ml/core/training_pipeline.py:_save_model/_record_training_metrics).analyzer/ml/core/model_manager.py,hybrid_grader.py).MLModel.file_pathbecomes a storage key rather than a local path.Out of scope / notes