feat: Real AI/ML stack with continuous training, drift detection, and production-grade Lakehouse#38
Open
devin-ai-integration[bot] wants to merge 11 commits into
Open
Conversation
- Python DeepFace liveness engine (passive + active challenges, anti-spoofing) - Python document OCR engine (PaddleOCR, VLM classification, Docling parsing) - Go KYC orchestrator (NIN/BVN/CAC verification, AML screening, risk scoring) - Rust identity matching engine (embedding comparison, fraud detection) - TypeScript tRPC routers + comprehensive KYC/KYB frontend pages - KYC gate integration into Claims flow - API clients for all 4 backend services Co-Authored-By: Patrick Munis <pmunis@gmail.com>
Co-Authored-By: Patrick Munis <pmunis@gmail.com>
…e ThemeProvider) Co-Authored-By: Patrick Munis <pmunis@gmail.com>
- Revert vite.ts to use inline config spread (configFile: false) instead of configFile path - Revert vite.config.ts to remove define/dedupe/optimizeDeps additions that didn't fix React hooks issue - These reverts restore the original working configuration from previous PRs Co-Authored-By: Patrick Munis <pmunis@gmail.com>
…t plugin double-init) Co-Authored-By: Patrick Munis <pmunis@gmail.com>
…oral, PostgreSQL, Keycloak, Permify, Redis, Mojaloop, OpenSearch, OpenAppSec, APISix, TigerBeetle, Lakehouse Go orchestrator (8085): - PostgreSQL persistence replacing in-memory maps - Redis caching for KYC session lookups - Kafka producer for KYC completion events - Temporal client for workflow orchestration - OpenSearch auditor for compliance trail - APISix gateway with OpenAppSec WAF plugin - Mojaloop bridge for mobile money KYC-gated transfers - Keycloak/Permify authorization middleware - All 9 middleware clients wired into main.go Rust ledger service (8113): - TigerBeetle double-entry ledger with KYC-level transfer limits - Dapr sidecar for state management and pub/sub - OpenAppSec WAF validation on all requests - 10 ledger types with KYC level requirements Python services: - Lakehouse analytics (8114) with Delta Lake compliance reporting - Fluvio stream processor (8115) with WebSocket real-time events TypeScript platform integration: - KYC gate checks on claims.create, payments.process, wallet.topUp/withdraw - KYC gate on application.create/submit with level requirements - Onboarding wired to trigger KYC verification on identity step - KYB wired to Go orchestrator for CAC/TIN/director/UBO verification - Middleware integration endpoints (ledger stats, analytics metrics, stream topics, transfer limits, NDPR report) - New service clients: kycLedgerService, kycAnalyticsService, kycStreamService, checkKYCGate helper Co-Authored-By: Patrick Munis <pmunis@gmail.com>
- 6 PyTorch models: fraud detection (residual+attention), churn prediction (GLU), claims adjudication (multi-task), credit scoring (Wide&Deep), anomaly detection (VAE), GNN fraud ring detection (GraphSAGE) - Synthetic Nigerian insurance data generation (275k+ samples across 6 domains) - Real training loops with FocalLoss, OneCycleLR, early stopping, metric tracking - Trained .pt weight files for all 6 models - ONNX export for CPU-optimized inference (4 models) - Delta Lake feature store with versioning (6 tables) - MCMC Bayesian risk modeling with NumPyro/JAX (16 product lines, VaR/CVaR) - Ray distributed training infrastructure with local fallback - Neo4j graph schema for fraud ring detection with offline mode - FastAPI inference server for all models - All models run on CPU (no GPU required) Co-Authored-By: Patrick Munis <pmunis@gmail.com>
Author
Original prompt from Patrick
|
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
…sioning, scheduled retraining, platform data ingestion - drift_detector.py: PSI, KS test, JS divergence for data drift + performance monitoring - model_registry.py: Champion-challenger versioning with auto-promotion - data_ingestion.py: Platform data connectors with watermarking and fallback chain - pipeline.py: 5-step orchestration (ingest → drift → retrain → validate → promote → ONNX export) - scheduler.py: Cron-based + event-driven triggers with background thread - api.py: FastAPI endpoints for CT management (/ct/retrain, /ct/drift, /ct/models, /ct/scheduler) - Fixed api_server.py imports for standalone execution - All 4 models retrained, promoted, and exported to ONNX with zero errors Co-Authored-By: Patrick Munis <pmunis@gmail.com>
…g in CT API drift check Co-Authored-By: Patrick Munis <pmunis@gmail.com>
Author
E2E Test Results — AI/ML Platform + Continuous Training12/12 tests passed, 1 bug found and fixed. Bug Fixed During Testing
Inference API (port 8000) — 6/6 Passed
Continuous Training API (port 8001) — 6/6 Passed
Pipeline VerificationAll 4 models trained, promoted to champion v1, and exported to ONNX with zero errors. |
…eaming ingestion, online serving, lineage, RBAC, Feature Store API, Go SDK Components implemented: - Storage: Object store abstraction (Local/S3/MinIO) with unified interface - Schema: Registry with versioning, compatibility checks (backward/forward/full), evolution tracking - Streaming: Kafka/Fluvio ingestion engine with micro-batching, DLQ, checkpointing - Computation: Real-time feature engine with sliding windows, EMA, time-decay scoring - Serving: Online feature server with L1 (LRU) + L2 (Redis) + L3 (Delta Lake) caching - API: FastAPI REST API with DuckDB SQL queries, CRUD, materialization endpoints - Lineage: Full DAG tracking (source→table→model), quality metrics, mutation audit - RBAC: Role-based access control with table/column-level policies, audit logging - Connectors: Python EventBridge + Go SDK for microservice event publishing - All components tested with functional verification (9 features computed, 3 events delivered) Co-Authored-By: Patrick Munis <pmunis@gmail.com>
Co-Authored-By: Patrick Munis <pmunis@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a complete
ai-ml-platform/directory implementing a real end-to-end AI/ML stack to replace the previously rule-based scoring system (assessed at 2/10 in a prior audit). Includes 6 PyTorch model architectures, synthetic Nigerian insurance data generation, training infrastructure, inference serving, and a continuous training pipeline with drift detection, model versioning, and scheduled retraining.Additionally implements a production-grade Lakehouse (previously assessed at 3/10, now upgraded to full implementation) with 10 subsystems: object store abstraction, schema registry, streaming ingestion, online feature serving, data lineage, RBAC, REST API, microservice connectors, and real-time feature computation.
Models (all trained,
.ptweights included):Infrastructure:
/predict/fraud,/predict/churn, etc.)train_all.py) runs the full pipeline in ~3.5 min on CPUContinuous Training Infrastructure
ai-ml-platform/continuous_training/with 7 modules implementing automated retraining:drift_detector.pymodel_registry.pydata_ingestion.pypipeline.pyscheduler.pyapi.py/ct/retrain,/ct/drift/{model},/ct/models,/ct/scheduler)Pipeline verified: All 4 models (fraud, churn, claims, anomaly) successfully retrained, promoted to champion, and re-exported to ONNX through the continuous training pipeline with zero errors.
Production-Grade Lakehouse (10 Subsystems)
ai-ml-platform/lakehouse/upgraded from basic Delta Lake wrapper to full production system:storage/object_store.pyschema/registry.pystreaming/ingestion.pystreaming/feature_computation.pyserving/feature_server.pylineage/tracker.pyaccess_control/rbac.pyapi/feature_store_api.pyconnectors/event_bridge.pyconnectors/go-sdk/lakehouse_client.goDefault computations (10):
claims_count_1h,claims_total_amount_24h,avg_claim_amount_7d,max_single_claim_30d,txn_count_1h,txn_rate_5m,txn_stddev_24h,txn_p95_amount_7d,payment_frequency_30d,distinct_payment_days_30dDefault RBAC service accounts (8): claims-engine, fraud-service, kyc-service, payments-service, inference-server, training-pipeline, dashboard-api, audit-service
Updates since last revision — Bug fixes
numpy.boolserialization error indrift_detector.py— Pydantic cannot serializenumpy.booltypes returned by comparison operators. Added explicitbool()casts onis_driftedandshould_retrainfields inDriftResult.to_dict()andDatasetDriftReport.to_dict().api.py) — The/ct/drift/{model_name}endpoint now engineers_enccolumns from raw categorical columns (e.g.doc_type→doc_type_enc) before running drift analysis, matching the same logic already inpipeline.py.numeric_value=1.0instead of trying to float-cast string source fields (e.g.claim_id).Other changes:
inference/api_server.py— changed relative imports to absolute imports withsys.pathmanipulation for standalone executiondoc_type→doc_type_encvia category codes).ptweights and.onnxmodels with retrained versionsReview & Testing Checklist for Human
rbac.pyuse deterministic SHA256 hashes of service names as API keys (e.g.sha256("claims-engine-api-key")). These must be replaced with proper secret management before any production deployment.connectors/go-sdk/lakehouse_client.goincludes unit tests but no CI step builds or runs them. Runcd ai-ml-platform/lakehouse/connectors/go-sdk && go test ./...to verify.confluent-kafka(librdkafka) — Theingestion.pymodule importsconfluent_kafkawhich requires the librdkafka C library. This is not installed in the current environment. Verify graceful degradation or install dependency.data_ingestion.pyattempts PostgreSQL connections but generates synthetic fallback data when unavailable. The "ingested" data is therefore synthetic, not real platform data..ptweights,.onnxmodels,.parquetdata, and.npzposteriors are committed directly to the repo. Consider whether these should use LFS or be generated on-demand instead.Suggested test plan:
cd ai-ml-platform && python train_all.py— verify full pipeline completes and produces weight filespython -m uvicorn inference.api_server:app --port 8000) and hit/health+ each/predict/*endpoint with sample data from Swagger docspython -m uvicorn continuous_training.api:ct_app --port 8001) and verify/ct/health,/ct/models,/ct/drift/fraud_detection,/ct/scheduler/configure-defaultsendpoints respondpython -m uvicorn lakehouse.api.feature_store_api:app --port 8002) and verify/health,/features/get,/query/tables,/schemas,/lineage/graph,/access/statusendpointsPOST /ct/retrainand verify pipeline completescd ai-ml-platform/lakehouse/connectors/go-sdk && go test ./...E2E Test Results
Both APIs were tested locally — 12/12 tests passed after fixing the numpy.bool serialization bug:
:8000):8001)Lakehouse components verified via functional tests (import + exercise):
Notes
Link to Devin session: https://app.devin.ai/sessions/0475192a778b45cea30202f85ad52b63