# Concept Drift

Model Drift refers to a model’s predictive performance degrading over time due to a change in the environment that violates the model’s assumptions. Predictive performance will degrade, it will degrade over some period of time and at some rate, and this degradation will be due to changes in the environment that violate the modeling assumptions. Each of these variables should be taken into account when determining how to diagnose model drift and how to correct it through model retraining.

:::{admonition} Note
:class: tip
Model drift is a bit of a misnomer because it’s not the model that is changing, but rather the environment in which the model is operating. For that reason, the term [concept drift](https://machinelearningmastery.com/gentle-introduction-concept-drift-machine-learning/) may actually be a better name, but both terms describe the same phenomenon.
:::

As soon as you deploy your machine learning model in production, the performance of your model degrades. This is because your model is sensitive to changes in the real world, and user behaviour keeps changing with time. Although all machine learning models decay, the speed of decay varies with time. This is mostly caused by data drift, concept drift, or both.

<p><center><img src='_images/L194114_1.png'></center></p>

Data drift (covariate shift) is a change in the statistical distribution of production data from the baseline data used to train or build the model. Data from real-time serving can drift from the baseline data due to:

- Changes in the real world,
- Training data not being a representation of the population,
- Data quality issues like outliers in the dataset.

For example, if you built a model with temperature data collected from a sensor in Celsius degrees, but the unit changed to Fahrenheit – it means there’s been a change in your input data, so the data has drifted.

### **How to monitor data drift in production**

The best approach to handling data drift is to continuously monitor your data with advanced MLOps tools instead of using traditional rule-based methods. Rule based methods, like calculating the data range or comparing data attributes to detect alien values, can be time-consuming and are susceptible to error.

**Steps you can take to detect data drift:**

1. Take advantage of the [JS-Divergence algorithm](https://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence) to identify prediction drift in real-time model output and compare it with training data.
2. Compare the data distribution from both upstream and downstream data to view the actual difference.

As mentioned above, you can also take advantage of the [Fiddler AI](https://www.fiddler.ai/) platform to monitor data drift in production.

### **Data drift vs concept drift**

It’s an obvious fact that data is generated at every moment in the world. As data is collected from multiple sources, data itself is changing. This change can be due to the dynamic nature of the data, or it can be caused by changes in the real world.

If the input distribution changes but the true labels don’t (the probability of the model’s input changes but the probability of the target class given the probability of the model input doesn’t change), then this kind of change is considered as data drift.

Meanwhile, if there’s a change in the labels or target classes of your model, that is the probability of the target class changes given the probability of the input data. This means we’re detecting the effect of concept drift. Both data drift and concept drift cause model decay and should both be addressed separately.

## References

- [Learning in the Presence of Concept Drift and Hidden Contexts](https://pdfs.semanticscholar.org/4ccc/553d7774748be878002381877d70932b2717.pdf)
- [The problem of concept drift: definitions and related work](http://www-ai.cs.uni-dortmund.de/LEHRE/FACHPROJEKT/SS12/paper/concept-drift/tsymbal2004.pdf)
- [Concept Drift Detection for Streaming Data](https://arxiv.org/abs/1504.01044)
- [Learning under Concept Drift: an Overview](https://arxiv.org/abs/1010.4784)
- [An overview of concept drift applications](http://www.win.tue.nl/~mpechen/publications/pubs/CD_applications15.pdf)
- [What Is Concept Drift and How to Measure](https://link.springer.com/chapter/10.1007/978-3-642-16438-5_17)
- [Understanding Concept Drift](https://arxiv.org/abs/1704.00362)
- [https://machinelearningmastery.com/gentle-introduction-concept-drift-machine-learning](https://machinelearningmastery.com/gentle-introduction-concept-drift-machine-learning/)
- [Concept drift on Wikipedia](https://en.wikipedia.org/wiki/Concept_drift)
- [Handling Concept Drift: Importance, Challenges and Solutions](https://ieeexplore.ieee.org/document/6042653/)
- [https://mlinproduction.com/model-retraining](https://mlinproduction.com/model-retraining/)
- [Retraining Models on New Data](https://docs.aws.amazon.com/machine-learning/latest/dg/retraining-models-on-new-data.html)
- [Should a machine learning model be retrained each time new observations are available?](https://www.quora.com/Should-a-machine-learning-model-be-retrained-each-time-new-observations-are-available)
- [MACHINE LEARNING AND AUTOMATED MODEL RETRAINING WITH SAGEMAKER](https://www.inawisdom.com/machine-learning/machine-learning-automated-model-retraining-sagemaker/)
- [A Gentle Introduction to Concept Drift in Machine Learning](https://machinelearningmastery.com/gentle-introduction-concept-drift-machine-learning/)
- [Lessons learned turning machine learning models into real products and services](https://www.oreilly.com/ideas/lessons-learned-turning-machine-learning-models-into-real-products-and-services)
- [What’s your ML Test Score? A rubric for ML production systems](https://storage.googleapis.com/pub-tools-public-publication-data/pdf/45742.pdf)
- [Machine Learning: The High-Interest Credit Card of Technical Debt](https://storage.googleapis.com/pub-tools-public-publication-data/pdf/43146.pdf)
- [Lessons learned turning machine learning models into real products and services](https://www.oreilly.com/ideas/lessons-learned-turning-machine-learning-models-into-real-products-and-services)
- [Machine learning models get stale with time](https://neptune.ai/blog/retraining-model-during-deployment-continuous-training-continuous-testing)