Skip to content

Move ML model artifacts to object storage (model registry) #91

@ringo380

Description

@ringo380

Context

Trained model .pkl files are saved to /app/ml_models/ on the Railway worker container's ephemeral disk (no volume). They survive until the next worker redeploy, then vanish — but the MLModel DB row (including file_path) persists, so file_path can dangle.

This is currently harmless:

  • Monitoring (alert_evaluatorConfidenceBasedRetrainingSystem) does not load the model file.
  • Grading is rule-based; HybridQueryGrader is not wired into the request path.
  • train_ml_model trains fresh from TrainingData rather than loading a prior model.

So no code path reads the artifact today.

Trigger / when this matters

Before wiring ML prediction into the request path (e.g. activating HybridQueryGrader / model_manager load in the grade flow), the artifact must be durable and reachable by any service that loads it (web for prediction, worker for training/eval).

Proposed approach

Object storage as a model registry:

  • Training uploads the .pkl by key on save (analyzer/ml/core/training_pipeline.py:_save_model / _record_training_metrics).
  • Loaders download by key (analyzer/ml/core/model_manager.py, hybrid_grader.py).
  • MLModel.file_path becomes a storage key rather than a local path.
  • Backend: Supabase Storage (S3-compatible, account available) or Railway buckets. Neon is Postgres — not suitable for blobs.

Out of scope / notes

  • Until this lands, monitoring works fine on the ephemeral file (it's never read).
  • Add a storage client + credentials as env vars (reference-variable pattern on Railway).

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions