# Model Monitoring

![Status](https://img.shields.io/static/v1.svg?label=Status&message=Finished&color=brightgreen)
[![Source](https://img.shields.io/static/v1.svg?label=GitHub&message=Source&color=181717&logo=GitHub)](https://github.com/particle1331/inefficient-networks/blob/master/docs/notebooks/mlops/04-deployment)
[![Stars](https://img.shields.io/github/stars/particle1331/inefficient-networks?style=social)](https://github.com/particle1331/inefficient-networks)

```text
𝗔𝘁𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻: Notes for Module 5 of the MLOps Zoomcamp (2022) by DataTalks.Club.
```

---

## Introduction

Monitoring is a critical but often overlooked phase of the model life cycle[^ref]. During training, a model studies past examples. But a production model works with new data which sooner or later deviates from training data. The effects can range from [silly product recommendations](https://twitter.com/GirlFromBlupo/status/982156453396996096) to [multimillion dollar losses](https://www.fiddler.ai/blog/zillow-offers-a-case-for-model-risk-management). 
Note that the latter case demonstrates that impact on downstream business KPIs can only be noticed once damage has already been done. Hence, to operate ML models in production successfully, we need near real-time view of model performance. 

Monitoring can include: tests on model correctness, feature and target drift, prediction probability drift, data outage, schema change, underperforming segments, and so on. Adding visibility to all aspects of the data and model
prediction pipeline give us better chances of finding out what causes change in model performance.


[^ref]: This introduction is based on the [series of blog posts](https://evidentlyai.com/blog#!/tfeeds/393523502011/c/machine%20learning%20monitoring%20series) on ML monitoring by [Evidently AI](https://evidentlyai.com/).


```{margin}
[`evidentlyai.com/blog`](https://evidentlyai.com/blog/machine-learning-monitoring-data-and-concept-drift)
```
```{figure} ../../../img/data_drift.webp
---
width: 40em
---
Gradual model decay due to data drift in source channel distribution.
```

```{margin}
[`evidentlyai.com/blog`](https://evidentlyai.com/blog/machine-learning-monitoring-data-and-concept-drift)
```
```{figure} ../../../img/concept_drift.png
---
width: 40em
---
Concept drift can be observed here as a change in target distribution for a certain feature value. 
```

Deviation of data in production from some reference data distribution, e.g. training data, is known as **data drift**. There is also the related notion of **concept drift** where the feature-target relationship changes over time. In practice, this semantic distinction makes little difference. More often than not, the drift will be combined and subtle in causing model degradation. Another possibility is that the models themselves can affect data distribution, e.g. recommender systems.


```{margin}
[`evidentlyai.com/blog`](https://evidentlyai.com/blog/machine-learning-monitoring-data-and-concept-drift)
```
```{figure} ../../../img/sudden_concept_drift.png
---
width: 40em
---
This scenario makes it difficult to separate concept drift and data drift. Data drift occurs
as users become confined in their homes, resulting in different user behaviors. There 
is also concept drift in that the concept of loungewear as distinct from work clothes 
also begins to change.
```

Another issue for production models is that **data pipelines just break**: we can have malformed data, we can lose access or database outages can occur, and values which are never missing during training are now missing. Or we can have code that does not handle corner cases that are only introduced in production. If a model receives wrong or unusual input, it will generally make an unreliable prediction.

```{margin}
[`evidentlyai.com/blog`](https://evidentlyai.com/blog/machine-learning-monitoring-what-can-go-wrong-with-your-data)
```
```{figure} ../../../img/data_schema_change.webp
---
width: 40em
---
Schema change can cause data processing pipelines to fail.
```

Note that this can be implemented along with more traditional monitoring of software services such as **health checks** for broken components, CPU and memory utilization, latency, and so on. Monitoring data-based services just adds an extra layer of complexity.

```{margin}
[`evidentlyai.com/blog`](https://evidentlyai.com/machine-learning-monitoring-how-to-track-data-quality-and-integrity)
```
```{figure} ../../../img/sensor_statistic.png
---
width: 40em
---
Temperature feature statistics indicate that the sensor has started to failed.
```

```{margin}
[`evidentlyai.com/blog`](https://evidentlyai.com/machine-learning-monitoring-how-to-track-data-quality-and-integrity)
```
```{figure} ../../../img/model_calls.png
---
width: 40em
---
Zero model calls can indicate that the service has crashed.
```

In mature systems, monitoring becomes a critical component that connects production with modeling. If we detect a quality drop, we can trigger retraining or step back into the research phase to issue a model remake. Here quality drop can be measured in terms of usual technical metrics such as accuracy or F1 score[^ref2]. 

[^ref2]: As usual, metrics values are only useful when interpreted in the context of business goals.

```{margin}
[`evidentlyai.com/blog`](https://evidentlyai.com/blog/machine-learning-monitoring-what-it-is-and-how-it-differs)
```
```{figure} ../../../img/model_lifecycle_2.png
---
width: 40em
---
Monitoring as part of model lifecycle that connects production to the research phase of a project.
```

```{margin}
[`evidentlyai.com/blog`](https://evidentlyai.com/blog/machine-learning-monitoring-data-and-concept-drift)
```
```{figure} ../../../img/model_decay_retraining.webp
---
width: 40em
---
Retraining models periodically to maintain performance. This should be effective for tasks with data drift and not much concept drift. Period size generally depends on the model and the task. Large vision and language models can last years without needing an update. But models that perform forecasting typically need to be retrained at shorter intervals.
```