New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip training of constant metrics. #12212
Conversation
Detect dimensions whose values do not change, and skip them from training. This allows us to reduce the number of training operations by ~40-50%. Notice that we don't skip the very 1st training iteration, because a dimension's value might change at any point in time, and we need to have a trained model in order to compute its anomaly score.
I'll test this today on some devml nodes. I was going to ask about extending the idea of all this to flag when a dim is too unique or lumpy in some sense to make sense to retrain the model. But that would maybe get a bit too complicated for now in having to define some measure of uniqueness or something like that. I like the simple implementation here that i imagine has hardly any cost impact whereas maintaining of making some more complicated measure to decide to retrain or not would just be more complex for now. So i think this is a great easy initial optimization that will remove a good chunk of unnecessary computation in some average sense of a node that will always have a large subset of dims that are all a constant value. In future we may define some concept of upfront "dimension data validation" checks and in some way use that to decide when to retrain or not but for now makes total sense to start simple like this. |
This does indeed look like it saves a lot of netdata cpu usage on retraining on a node thats just default settings and not doing much. It will of course depend on each specific node how this will generalize but it will only ever help and potentially will help quite a lot. I'll leave my dev node running for a few more hours and follow up on it tomorrow but so far all looks good to me. |
Thanks for the feedback @andrewm4894. Out of curiosity, could you share a screenshot from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
major improvement in terms of CPU overhead!
@vkalintiris devml4 is the one with this branch - seems much lower than the others too: |
great work! You rock @vkalintiris ! |
Do we have a dedicated chart for the training thread cpu consumption? If not, I think we should add it. The code has a lot of examples for similar threads. |
No, we have one for total time spent on training +
I'll create a PR from a local branch now that #12083 got merged. |
Summary
Detect dimensions whose values do not change, and skip them from
training. This allows us to reduce the number of training operations
by ~40-50%.
Notice that we don't skip the very 1st training iteration, because a
dimension's value might change at any point in time, and we need to
have a trained model in order to compute its anomaly score.
Test Plan
Additional Information
Resolves #12180