Skip to content

pespila/incre-ml

Repository files navigation

incre-ml

Incremental machine learning in Python. Every algorithm processes one observation at a time — no batches, no retraining, no accumulated history.

from incre_ml.forecasting import HoltWinters

model = HoltWinters(season_length=24)

for x, y in stream:
    prediction = model.predict_one(x)
    model.learn_one(x, y)

Why incre-ml?

Traditional ML libraries require batches. When data arrives continuously — sensor telemetry, financial ticks, patient vitals, API logs — you need models that update incrementally and predict instantly. incre-ml provides a complete ecosystem for this: forecasting, anomaly detection, classification, clustering, drift detection, uncertainty quantification, federated learning, and physics-informed constraints.

Core API contract — every model implements:

Method Purpose
model.learn_one(x, y) Update on one observation
model.predict_one(x) Predict from one observation
model.explain_one(x) Feature contributions for one observation
model.clear() Reset internal state

All models use x: dict[str, Any] for features — sparse, heterogeneous, schema-agnostic by design.

Installation

pip install incre-ml

With optional connectors:

pip install "incre-ml[kafka]"       # Confluent Kafka
pip install "incre-ml[mqtt]"        # MQTT / IoT
pip install "incre-ml[connectors]"  # All connectors
pip install "incre-ml[dashboard]"   # Streamlit demo app

Capabilities

Forecasting

8 forecasters: Naive, Holt-Winters, AR/SNARIMAX, Kalman Filter, RLS, Croston/TSB, Bootstrapped ensembles, and model selection.

from incre_ml.forecasting import BootstrappedRegressor, HoltWinters

model = BootstrappedRegressor(HoltWinters(season_length=24), n_models=5)

pred, uncertainty = model.predict_with_uncertainty(x)
model.learn_one(x, y)

Anomaly Detection

Statistical (Z-score), geometric (Half-Space Trees), and predictive detectors — composable via weighted ensemble voting. Includes CUSUM and EWMA industrial detectors.

from incre_ml.anomaly import AnomalyEnsemble, ZScoreDetector, PredictiveAnomalyDetector
from incre_ml.forecasting import HoltWinters

ensemble = AnomalyEnsemble({
    "stat": ZScoreDetector(feature_name="temperature"),
    "pred": PredictiveAnomalyDetector(
        model=HoltWinters(season_length=96),
        feature_name="temperature",
    ),
})

score = ensemble.score_one({"temperature": 95.2})  # 0.0 (normal) to 1.0 (anomalous)
ensemble.learn_one({"temperature": 95.2})

Streaming Classification

Hoeffding Tree, Logistic Regression, Naive Bayes, SGD, Adaptive Random Forest, and Windowed KNN — all incremental.

from incre_ml.classification import HoeffdingTreeClassifier

clf = HoeffdingTreeClassifier(grace_period=10)

proba = clf.predict_proba_one(x)    # class probabilities
explanation = clf.explain_one(x)     # feature contributions
clf.learn_one(x, y)

Pipelines

Chain transformers and predictors into unified streaming workflows.

from incre_ml.compose import Pipeline
from incre_ml.preprocessing import StandardScaler, SelectKBest

pipe = Pipeline([StandardScaler(), SelectKBest(k=5), model])
pipe.learn_one(x, y)

Drift Detection

ADWIN (exponential histogram, O(log n) memory), statistical detectors, and DriftAdaptiveWrapper for automatic model adaptation via reset, decay, or replacement strategies.

Federated Learning

FederatedEnsemble trains local models per site/region and aggregates via averaging or median — without centralizing raw data.

from incre_ml.federated import FederatedEnsemble
from incre_ml.linear import LinearRegression

fed = FederatedEnsemble(LinearRegression(), ["site_a", "site_b", "site_c"])

fed.learn_one("site_a", x, y)
global_pred = fed.predict_global(x)
fed.sync()  # aggregate local models

Physics-Informed Constraints

Wrap any regressor with domain constraints to prevent physically implausible predictions.

from incre_ml.physics.thermal import NewtonCoolingConstraint
from incre_ml.base.physics import PhysicsInformedWrapper

guard = NewtonCoolingConstraint(k=0.05, ambient_temp=15.0, max_deviation=3.0)
safe_model = PhysicsInformedWrapper(model, guard)

Also Included

  • Clustering — OnlineKMeans, DBSTREAM (density-based stream clustering)
  • Uncertainty — Conformal prediction, adaptive conformal intervals (ACI), bootstrapped wrappers
  • Preprocessing — Welford's scalers, online feature selection, temporal features, encoders
  • Evaluation — Prequential (test-then-train) scoring protocol
  • Active Learning — Uncertainty sampling
  • Explainability — Per-prediction feature contributions
  • Metrics — Online regression and classification metrics
  • Simulation — Synthetic data generators for manufacturing, clinical, demand, finance, traffic, building, and plant workforce scenarios
  • Serving — Production serving utilities
  • I/O — CSV, Kafka, and MQTT connectors
  • Model Selection — Bandit-based AutoML for streaming

Interactive Dashboard

Explore all capabilities through 8 real-world scenarios with live streaming data:

pip install "incre-ml[dashboard]"
streamlit run app.py

Scenarios: Manufacturing quality, supply chain demand, sales anomaly monitoring, clinical triage, connected vehicle safety, API security monitoring, smart building energy, and plant workforce orchestration.

Plant Workforce Orchestration

The workforce scenario demonstrates closed-loop control for a multi-line manufacturing plant. When a production line degrades or goes down, the system detects the failure via online anomaly detection (CUSUM + Z-Score), selects a labor reallocation strategy using an epsilon-greedy bandit that learns from throughput-vs-cost rewards, and redistributes workers across remaining lines — all incrementally, one 15-minute tick at a time. Rush orders, quality crises, and sudden equipment failures trigger real-time priority shifts with cost impact tracking.

from incre_ml.simulation.generators import PlantWorkforceGenerator

gen = PlantWorkforceGenerator(total_steps=1200, total_workers=48)

for obs in gen:
    # obs contains per-line health, throughput, quality, crew, status
    # plus plant-level shift, rush orders, and worker pool
    print(obs["A_health"], obs["B_status"], obs["rush_order"])

Design Principles

  • Welford's algorithm everywhere — all statistics use O(1) memory incremental computation
  • Lazy state initialization — internal state created on first learn_one(), not __init__
  • Composition over inheritance — shallow hierarchies (1-2 levels), compose via Pipeline and ensembles
  • Strict typingmypy --strict on all library code
  • Return self from learn_one() — enables method chaining

Development

git clone https://github.com/pespila/incre-ml.git
cd incre-ml
python -m venv venv && source venv/bin/activate
pip install -e ".[dev]"
pre-commit install
ruff check . && ruff format .   # lint + format
mypy src                        # strict type checking
pytest                          # tests with coverage

License

MIT — see LICENSE for details.

About

Incremental machine learning in Python — learn one observation at a time. Forecasting, anomaly detection, classification, drift adaptation, federated learning, and physics-informed constraints for streaming data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors