Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add monitor() method for monitoring model performance in production #179

Closed
pplonski opened this issue Sep 10, 2020 · 3 comments
Closed
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed
Milestone

Comments

@pplonski
Copy link
Contributor

The AutoML API should be extended with monitor() method:

  • the monitor() should track the model performance on new data
  • it should check prediction distribution on new data and compare with the distribution from training (out of folds predictions)
  • it should detect outliers in new data
  • it should detect data drifts in new data

I propose to have the following arguments in monitor():

  • X (new test data)
  • y (new test data targets)
  • y_predicted (predictions from the AutoML)

The monitor() should return a report about incidents in new data. For example, warnings list with explanations what was the problem.

@pplonski pplonski added enhancement New feature or request help wanted Extra attention is needed labels Sep 10, 2020
@pplonski pplonski added this to the 0.9.0 milestone Feb 22, 2021
@pplonski
Copy link
Contributor Author

pplonski commented Feb 22, 2021

There will be a new method need_retrain(). It will take new data as input. It will invoke two new methods:

  • is_drift() to check changes in new data
  • performance_decrease() to check the performance on new data

Example pseudo-code:

def need_retrain(self, X, y):
    return self.is_drift(X, y) or self.performance_decrease(X, y)

Maybe there should be also some summary Markdown file created with reasons why the model needs to be retrained.

@pplonski pplonski added this to To do in mljar-supervised Feb 23, 2021
@pplonski pplonski moved this from To do to In progress in mljar-supervised Feb 23, 2021
@pplonski pplonski moved this from In progress to Done in mljar-supervised Feb 24, 2021
@pplonski pplonski moved this from Done to In progress in mljar-supervised Feb 24, 2021
@pplonski
Copy link
Contributor Author

closed by mistake

@pplonski
Copy link
Contributor Author

OK, at the beginning I want to make this feature super sensitive to any input data changes. But in the end, I finished with a simple approach of just performance monitoring. I hope it will be enough. If there will be a change in the data then the performance of AutoML prediction will decrease.

There is a new method added:

def need_retrain(self, X, y, sample_weight=None, decrease=0.1):
        """Decides about model retraining based on new data.

        Arguments:
            X (numpy.ndarray or pandas.DataFrame):
                New data.

            y (numpy.ndarray or pandas.Series):
                True labels for X.

            sample_weight (numpy.ndarray or pandas.Series):
                Sample weights.

            decrease (float): The ratio of change in the performance used as a threshold for retraining decision.
                By default, it is set to `0.1` which means that if the performance of AutoML will decrease by 10% 
                on new data then there is a need to retrain. This value should be set depending on your project needs.
                Sometimes, 10% is enough, but for some projects, it can be even lower than 1%.

            Returns:
                boolean: Decides if there is a need to retrain the AutoML.
        """

It works as follows:

  • a user calls the need_retrain() with new X and y data
  • the metric score is computed on new data
  • the performance of the best model is restored from its params.json file
  • if there is a decrease we check if it is larger than the decrease parameter.

The change is computed as follows:

change = np.abs((old_score - new_score) / old_score)

@pplonski pplonski moved this from In progress to Done in mljar-supervised Feb 24, 2021
@pplonski pplonski self-assigned this Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
Development

No branches or pull requests

1 participant