New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add monitor()
method for monitoring model performance in production
#179
Comments
There will be a new method
Example pseudo-code: def need_retrain(self, X, y):
return self.is_drift(X, y) or self.performance_decrease(X, y) Maybe there should be also some summary Markdown file created with reasons why the model needs to be retrained. |
closed by mistake |
OK, at the beginning I want to make this feature super sensitive to any input data changes. But in the end, I finished with a simple approach of just performance monitoring. I hope it will be enough. If there will be a change in the data then the performance of AutoML prediction will decrease. There is a new method added: def need_retrain(self, X, y, sample_weight=None, decrease=0.1):
"""Decides about model retraining based on new data.
Arguments:
X (numpy.ndarray or pandas.DataFrame):
New data.
y (numpy.ndarray or pandas.Series):
True labels for X.
sample_weight (numpy.ndarray or pandas.Series):
Sample weights.
decrease (float): The ratio of change in the performance used as a threshold for retraining decision.
By default, it is set to `0.1` which means that if the performance of AutoML will decrease by 10%
on new data then there is a need to retrain. This value should be set depending on your project needs.
Sometimes, 10% is enough, but for some projects, it can be even lower than 1%.
Returns:
boolean: Decides if there is a need to retrain the AutoML.
""" It works as follows:
The change is computed as follows: change = np.abs((old_score - new_score) / old_score) |
The
AutoML
API should be extended withmonitor()
method:monitor()
should track the model performance on new dataI propose to have the following arguments in
monitor()
:X
(new test data)y
(new test data targets)y_predicted
(predictions from theAutoML
)The
monitor()
should return a report about incidents in new data. For example, warnings list with explanations what was the problem.The text was updated successfully, but these errors were encountered: