New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Send ML feature information with UpdateNodeInfo. #11913
Conversation
We achieve this by adding the `ml_{capable,enabled}` fields in `system_info`. When streaming, these fields allow a parent to understand if the child has ML and if it runs ML for itself. The UpdateNodeInfo includes this information about a child, plus a boolean that is set to true when the parent runs ML for the child.
@vkalintiris the following warning is not associated with this PR, but it has relationship with ML: ml/ml-dummy.c: In function 'ml_get_host_info':
ml/ml-dummy.c:13:50: warning: control reaches end of non-void function [-Wreturn-type]
13 | char *ml_get_host_info(RRDHOST *RH) { (void) RH; } |
@vkalintiris last commit does not allow me to compile with the following errors: aclk/schema-wrappers/node_info.cc: In function 'int generate_node_info(nodeinstance::info::v1::NodeInfo*, aclk_node_info*)':
aclk/schema-wrappers/node_info.cc:65:29: error: 'MachineLearningInfo' is not a member of 'nodeinstance::info::v1'
65 | nodeinstance::info::v1::MachineLearningInfo *ml_info = info->mutable_ml_info();
| ^~~~~~~~~~~~~~~~~~~
aclk/schema-wrappers/node_info.cc:65:50: error: 'ml_info' was not declared in this scope; did you mean 'mc_info'?
65 | nodeinstance::info::v1::MachineLearningInfo *ml_info = info->mutable_ml_info();
| ^~~~~~~
| mc_info
aclk/schema-wrappers/node_info.cc:65:66: error: 'class nodeinstance::info::v1::NodeInfo' has no member named 'mutable_ml_info'
65 | nodeinstance::info::v1::MachineLearningInfo *ml_info = info->mutable_ml_info();
| ^~~~~~~~~~~~~~~
aclk/schema-wrappers/node_info.cc: In function 'char* generate_update_node_info_message(size_t*, update_node_info*)':
aclk/schema-wrappers/node_info.cc:93:29: error: 'MachineLearningInfo' is not a member of 'nodeinstance::info::v1'
93 | nodeinstance::info::v1::MachineLearningInfo *ml_info = msg.mutable_ml_info();
| ^~~~~~~~~~~~~~~~~~~
aclk/schema-wrappers/node_info.cc:93:50: error: 'ml_info' was not declared in this scope; did you mean 'mc_info'?
93 | nodeinstance::info::v1::MachineLearningInfo *ml_info = msg.mutable_ml_info();
| ^~~~~~~
| mc_info
aclk/schema-wrappers/node_info.cc:93:64: error: 'class nodeinstance::info::v1::UpdateNodeInfo' has no member named 'mutable_ml_info'
93 | nodeinstance::info::v1::MachineLearningInfo *ml_info = msg.mutable_ml_info(); |
@thiagoftsm this PR updates the aclk-schema submodule. You need to make sure it points to 9fa80e397be191b56320b0cfa7da97b249c3b3eb (ie. netdata/aclk-schemas#28). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything worked as expected, I did not observe any problem with stream during two hours. LGTM!
Hi @vkalintiris ! First test, with a single (no children) agent, ML enabled, connected to staging with new architecture:
|
Same node with ML disabled:
|
This is the UpdateNodeInfo message sent by the parent (winterland) for it's child (6am) to the cloud. Both are compiled with ML but not enabled:
|
(Sorry for the spamming). Parent enabled ml, child not. Parent's UpdateNodeInfo for itself:
Parent's UpdateNodeInfo for it's child:
|
Both parent and child enabled ML, looks good:
|
Child compiled with
|
Same with the last check, if the child is from current master and not from this PR. I think it looks good! @vkalintiris anything else you think we should check? |
Nothing I can think of. |
Summary
We achieve this by adding the
ml_{capable,enabled}
fields insystem_info
. When streaming, these fields allow a parent to understand ifthe child has ML and if it runs ML for itself.
A newly added protobuf message contains the aforementioned field. Inside
NodeInfo
the information is an exact copy ofsystem_info
. InsideUpdateNodeInfo the
ml_capable
field reflects the ability of the localhostto run ML, and the
ml_enabled
field denotes the training status of thespecified node instance.
Component Name
aclk
Test Plan
TBD
Additional Information
Setting the
ml_{capable,enabled}
fields insiderrdhost_create
is a little bitawkward. The reason we have to do this is because there's an ML configuration
option that allows a user to filter by hostname the nodes to be trained. This means
that we need to create/initialize the
RRDHOST
object first and then check if theML component will train it based on the configuration options of the user.