Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ML-related charts #12574

Merged
merged 5 commits into from Apr 4, 2022
Merged

Update ML-related charts #12574

merged 5 commits into from Apr 4, 2022

Conversation

vkalintiris
Copy link
Contributor

Summary
  • move training/prediction stats charts under the netdata section,
  • update type/id/name for ml charts,
  • assign aforementioned charts to child host instance,
  • do not stream ML charts by default
Test Plan

Inspect parent/child dashboards with build_external/scenarios/parent-child
with ML enabled and with/without streaming of ML charts.

Additional Information

Resolves #12549

@thiagoftsm
Copy link
Contributor

@vkalintiris , please, rebase your PR to fix the conflict.

@andrewm4894 I tested the PR and I think it is reaching the goal you propose in initial discussion, after the rebase, can you take a look to confirm?

I am waiting for rebase and confirmation to retest and approve. Thank you guys!

@andrewm4894
Copy link
Contributor

testing this today. will update when i have nodes created and available in here: https://staging.netdata.cloud/spaces/mlstress/rooms/general/overview

@k-manolis is helping me adapt the mlstress stuff to just run off this branch.

Copy link
Contributor

@andrewm4894 andrewm4894 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is working as expected in this room https://staging.netdata.cloud/spaces/mlstress/rooms/general/overview

anomaly advisor working for all node instances even though the parent is running the ml for all thress children.

Copy link
Contributor

@thiagoftsm thiagoftsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that @andrewm4894 confirmed my initial impression while I was testing, LGTM!

@andrewm4894
Copy link
Contributor

andrewm4894 commented Apr 3, 2022

@vkalintiris one question. I have the following set up:

# parent will run ml for itself and child 1,2.
# child 0 will run its own ml at the edge.
# child 1 will run its own ml at the edge, even though parent will also run ml for it.
# child 2 will not run ml at the edge, it will be run in the parent.

# parent-ml-ml-stress-0
# run ml on all hosts apart from child-ml-ml-stress-0
[ml]
        enabled = yes
        minimum num samples to train = 900
        train every = 900
        charts to skip from training = !*
        hosts to skip from training = child-ml-ml-stress-0

# child-ml-ml-stress-0
# run ml on child-ml-ml-stress-0
[ml]
        enabled = yes
        minimum num samples to train = 900
        train every = 900

# child-ml-ml-stress-1
# run ml on child-ml-ml-stress-1
[ml]
        enabled = yes
        minimum num samples to train = 900
        train every = 900

# child-ml-ml-stress-2
# don't run ml on child-ml-ml-stress-2
[ml]
        enabled = no

I don't see any anomaly detection charts on parent-ml-ml-stress-0/host/child-ml-ml-stress-0, i think because they are not streamed, i was thinking they would still be streamed but just given the new chart naming conventions would never clash.

I also don't see child0 on anomaly advisor either, this may be perhaps because only the parent is claimed to this room, ill need to double check.

So i think this all is working apart from that we do still want to stream the anomaly detection charts such that i would see the anomaly detection charts for child-ml-ml-stress-0 at parent-ml-ml-stress-0/host/child-ml-ml-stress-0. And then at parent-ml-ml-stress-0/host/child-ml-ml-stress-1 i would expect to see two sets of anomaly detection charts - the ones done at the edge by child-ml-ml-stress-1 itself and also the ones generated by the parent on parent-ml-ml-stress-0/host/child-ml-ml-stress-1. I am expecting the both sets of anomaly detection charts at parent-ml-ml-stress-0/host/child-ml-ml-stress-1 will have an appropriate suffix so they dont clash and will have the same context name such that the anomaly advisor will just consume them. child-ml-ml-stress-1 is a silly configuration where i am running the ml on the parent and the child so i wanted to see if cloud would just handle that as its more important that child-ml-ml-stress-0 work for the anomaly advisor.

image

fyi you can see all this in here:

https://staging.netdata.cloud/spaces/mlstress/rooms/general

@andrewm4894 andrewm4894 self-requested a review April 3, 2022 11:35
Copy link
Contributor

@andrewm4894 andrewm4894 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my last comment, just wanted to undo my approve.

@vkalintiris
Copy link
Contributor Author

vkalintiris commented Apr 4, 2022

@andrewm4894 see #12477 (reply in thread)

IIRC, @ktsaou did not want anomaly_detection.* charts to be streamed by default. His suggestion at the time was to exclude them in the send charts matching option in the stream.conf file.

Instead of modifying the send charts matching option, I decided to add a new one, ie. stream anomaly detection charts = yes|no under the [ml] section in the main netdata.conf file. I opted for that approach, because users that already have a parent/child setup, usually have a modified stream.conf file whose contents are not overwritten when the agent is updated.

IOW, you can explicitly enable streaming of anomaly detection charts by setting the aforementioned option to yes. It's trivial for me to change the default behaviour, if this not the desirable configuration we want our users to have.

@andrewm4894
Copy link
Contributor

@vkalintiris cool, I'll update my config to test this. Will report back here once it's all trained.

@andrewm4894
Copy link
Contributor

@vkalintiris yep - when i add stream anomaly detection charts = yes to child-ml-ml-stress-0 and child-ml-ml-stress-1 i see what i was expecting and the anomaly advisor works as expected.

So we are all good here. I feel like the default of stream anomaly detection charts = no should be yes. I dont see the value the default of no provides. But we can discuss this separately with @ktsaou i think.

# parent will run ml for itself and child 1,2.
# child 0 will run its own ml at the edge.
# child 1 will run its own ml at the edge, even though parent will also run ml for it.
# child 2 will not run ml at the edge, it will be run in the parent.

# parent-ml-ml-stress-0
# run ml on all hosts apart from child-ml-ml-stress-0
[ml]
        enabled = yes
        minimum num samples to train = 900
        train every = 900
        charts to skip from training = !*
        hosts to skip from training = child-ml-ml-stress-0

# child-ml-ml-stress-0
# run ml on child-ml-ml-stress-0
[ml]
        enabled = yes
        minimum num samples to train = 900
        train every = 900
        stream anomaly detection charts = yes

# child-ml-ml-stress-1
# run ml on child-ml-ml-stress-1
[ml]
        enabled = yes
        minimum num samples to train = 900
        train every = 900
        stream anomaly detection charts = yes

# child-ml-ml-stress-2
# don't run ml on child-ml-ml-stress-2
[ml]
        enabled = no

@andrewm4894 andrewm4894 self-requested a review April 4, 2022 11:11
Copy link
Contributor

@andrewm4894 andrewm4894 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@andrewm4894
Copy link
Contributor

andrewm4894 commented Apr 4, 2022

@juacker there may be a little bit of backend work we need then to have the correct one of there read "@parent" instead of "@child"

but i think this is one we can get to once this PR is merged into agent.

(or maybe i'm wrong in assuming this is a BE thing? could just be FE?)

image

@vkalintiris vkalintiris merged commit 9c834ef into netdata:master Apr 4, 2022
@juacker
Copy link
Contributor

juacker commented Apr 4, 2022

@andrewm4894 Yep I think the name after @ is the name associated to the node ID, where this data comes from, so in this case both are from child-ml-ml-stress-1. Maybe for these contexts it makes more sense to take the name of the node from the chart name (anomaly_detection.dimensions_on_xxxx-xxx-xxxx-xxxx-xxxxxxxx), to show the host that generated them.

@andrewm4894
Copy link
Contributor

@juacker yep i think that makes sense.

@ktsaou
Copy link
Member

ktsaou commented Apr 4, 2022

I also like it set to yes by default

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/database area/ml Machine Learning Related Issues area/streaming
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support parent/child setups for ML-related charts.
5 participants