In [1]:
from datetime import datetime
from typing import Mapping

import numpy as np

import milvus

# A minimalist system description

Starting off from [our experiments](https://github.com/mlatcl/kafker) with Kafka and [Faust](https://faust.readthedocs.io/en/latest/), I wondered how a Python "DSL" for system descriptions that heavily hints towards data oriented architectures might look like. On a high level, I'd like to:

- A system is built from streams that contain state and mostly stateless computational nodes
- We do not only want to capture the "real world" system but also the assumptions we make about this system, such as mdoels we want to employ or knowledge that we will need.
- All components in the system need to play nice: No state that is not in a stream!
- Nodes should be black boxes to some degree: We don't really care what's going on inside them and calculations might be really complex. However, the results of computations that are relevant to the system should be exposed to the outside.

Ideally, a complete system description would contain

- The context/environment in which calculations take place
- Deterministic computations that carry semantics but are not ML
- Modelling assumptions such as inputs/outputs, model choices, inference methods, ...
- Model state such as artifacts/training results
- The "production" environment, i.e. monitoring tools, metrics, ...

I'd like to make a distinction between _(computational) infrastructure choices_ which I would not see as part of the system description (it does not really matter which learning framework is used), and _domain choices_ which are what we would like to focus on (what form does the data have, where does it come from, which parameters are relevant, ...). We'd like to capture as much knowledge as possible and avoid imposing computational choices.

# A small system

I'm mostly following Faust's async design here, because it is very succinct an pretty close to how I'd like a DSL to look like. This DSL is represented by a dummy library called `milvus`. The following system is pretty much the simplest use-case I could come up with: We're ingesting sensor data - temperature and fan rpm readings - from some machine. The data is slightly preprocessed and published to some stream. The machine has a number of properties that will be used to analyze the data.

This part introduces the main data structures, with one important difference from Faust:

- **Record**: A record is a description of the data contained in a stream, pretty much a data class.
- **Stream**: A stream (topic) is a potentially infinite queue that contains the _data_ that flows through the system.
- **Node**: A (computational) node reads data from input streams and publishes data to output streams.
- **Context**: A context is a special kind of stream: It contains meta-information about the system that describs the system and might change, but it is not data generated by the system.

I introduced a distinction between streams and contexts here because I think these kinds of information need to be handled differently on a semantic level. Both things could probably be mapped to a Kafka topic, but a context is not something we would do machine learning on, we would use it to inform our machine learning tasks. A stream would contain data.

In [2]:
class Sensors(milvus.Record):
    timestamp: datetime
    temperature: float
    fan_rpm: float


sensor_api = milvus.data_sources.REST("sensor_api", poll=True)
sensor_readings = milvus.stream("sensor_readings", value_type=Sensors)


@milvus.node(inputs=[sensor_api], outputs=[sensor_readings])
async def ingest_sensor_data(self, sensor_events):
    async for event in sensor_events:
        yield Sensors(
            **event,
            timestamp=milvus.parse_time(event["strange_timetamp"]),
        )

In [3]:
class Machine(milvus.Record):
    name: str
    fan_control_curve: Mapping[float, float]
    alert_temperature: float


machine_properties = milvus.context("machine_properties", value_type=Machine)

# Adding a predictive model
The code above fully describes a simple real-world system. We now add machine learning to the mix.
The building blocks above map nicely to the ML context.
Inputs and outputs are streams and snapshots of streams could nicely be captured in more classical ML datasets.
The configuration of a model (hyperparameters) and atrifacts produced by the model (weights) ar emore context than stream:
We could deploy a model to production with exacly one set of weights, but changing the weights changes the system and a well-formualted data oriented architecture could do this on the fly.

We can map concepts from ML libraries such as Keras to this DSL:
A computational node could be interpreted as a step in data preprocessing, yielding a `Sequential` unit of steps that generate intermediate data.
A `Model` could just be a computational node with well-defined inputs and outputs.

We could also identify a single layer of - say - a deep GP with a model and connect layers via streams.
While this is nice on a conceptional level, it will be quite challenging to do efficient inference in that setting.
Note however, that if we want to propagate uncertainties between models, the lines will probably blur somewhat here:
We'll have to do "joint" inference in differnt nodes of the graph.

In [4]:
predicted_temperatures = milvus.stream("predicted_temperatures", value_type=float)

model_config = milvus.context("model_config", value_type=dict)
model_artifacts = milvus.context("model_artifacts")

preprocessing_pipeline = milvus.Sequential(
    {
        "add_history": milvus.Windowed(minutes=10),
        "clean_data": milvus.Filter(
            lambda window: len(window) > 10 and not np.any(np.isnan(window))
        ),
        "whiten": milvus.Whiten(),
    }
)

is_next_temperature_bad = milvus.Model(
    "is_next_temperature_bad",
    model=milvus.models.MagicTemperatureClassificationModel,
    inputs=preprocessing_pipeline(sensor_readings),
    outputs=predicted_temperatures,
    config=model_config,
    artifacts=model_artifacts,
)

# Monitoring
Besides cleanly formulating ML models, we can use the graphy nature of the system description to formulate monitoring tasks.
This example introduces a new context to configure problematic events and then plugs into the system in the right places.
We do not need to change things above to get to what we need - pretty cool!

In [5]:
system_alerts = milvus.stream("system_alerts", value_type=str)
rpm_monitoring_config = milvus.context(
    "rpm_monitoring_config",
    default={
        "acceptable_deviation_ratio": 0.2,
        "acceptable_deviations_per_window": 2,
    },
)


@milvus.node(
    inputs=[preprocessing_pipeline["clean_data"]],
    contexts=[machine_properties, rpm_monitoring_config],
    outputs=[system_alerts],
)
async def check_rpm(windows):
    def is_rpm_bad(sensor_state):
        rpm_deviation = abs(
            sensor_state.rpm
            - machine_properties.fan_control_curve(sensor_state.temperature)
        )
        return (
            rpm_deviation / sensor_state.rpm
            > rpm_monitoring_config["acceptable_deviation_ratio"]
        )

    async for window in windows:
        num_deviations = sum(is_rpm_bad(sensor_state) for sensor_state in window)
        if num_deviations > rpm_monitoring_config["acceptable_deviations_per_window"]:
            yield "Too many deviations!"

# Deployment

## Technical Level
The system above can be visualized as a graph containing three types of nodes:

- **Green Streams** that contain data
- **Yellow Nodes** that perform calculation
- **Violet Contexts** that specify the interface to the outside world

These different types of system information are typically handled independently:

- Tools like Kafka care about data and don't know about computation
- Tools like Tensorflow/Pytorch care about computation and don't know about data
- Context is injected through application code and only an implicit part of the system description or design

I wonder: Is AutoAI about adding Context to the mix to empower us to dynamically reason about data and computation?

## Organization Level
The example above can be thought of as being implemented in independent teams A, B and C.
The technical system information can form an interface for communication between the teams.
This interfaces could be implemented in a highly-coupled fashion through shared code or through an information broker in a larger organization.

![temperature system](images/temperature_system.png)