-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr: update default rook_cluster of ServiceMonitor #12293
mgr: update default rook_cluster of ServiceMonitor #12293
Conversation
7267430
to
e487ea3
Compare
This pull request has merge conflicts that must be resolved before it can be merged. @kerryeon please rebase it. https://rook.io/docs/rook/latest/Contributing/development-flow/#updating-your-fork |
67ac2d8
to
dd3a39c
Compare
pkg/operator/ceph/cluster/mgr/mgr.go
Outdated
@@ -519,6 +519,7 @@ func (c *Cluster) EnableServiceMonitor() error { | |||
return errors.Wrapf(err, "failed to set owner reference to service monitor %q", serviceMonitor.Name) | |||
} | |||
serviceMonitor.Spec.NamespaceSelector.MatchNames = []string{c.clusterInfo.Namespace} | |||
serviceMonitor.Spec.Selector.MatchLabels["rook_cluster"] = c.clusterInfo.Namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think why the helm configuration to default value,
Is the rook-ceph
default coded somewhere?
If the installtion has the problem I think so the fix should be required in helm charts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When it comes to creating a ServiceMonitor
, this function below is concerned.
rook/pkg/operator/ceph/cluster/mgr/mgr.go
Lines 505 to 529 in 4af3f37
func (c *Cluster) EnableServiceMonitor() error { | |
serviceMonitor, err := k8sutil.GetServiceMonitor(path.Join(monitoringPath, serviceMonitorFile)) | |
if err != nil { | |
return errors.Wrap(err, "service monitor could not be enabled") | |
} | |
serviceMonitor.SetName(AppName) | |
serviceMonitor.SetNamespace(c.clusterInfo.Namespace) | |
cephv1.GetMonitoringLabels(c.spec.Labels).OverwriteApplyToObjectMeta(&serviceMonitor.ObjectMeta) | |
if c.spec.External.Enable { | |
serviceMonitor.Spec.Endpoints[0].Port = controller.ServiceExternalMetricName | |
} | |
err = c.clusterInfo.OwnerInfo.SetControllerReference(serviceMonitor) | |
if err != nil { | |
return errors.Wrapf(err, "failed to set owner reference to service monitor %q", serviceMonitor.Name) | |
} | |
serviceMonitor.Spec.NamespaceSelector.MatchNames = []string{c.clusterInfo.Namespace} | |
applyMonitoringLabels(c, serviceMonitor) | |
if _, err = k8sutil.CreateOrUpdateServiceMonitor(c.clusterInfo.Context, serviceMonitor); err != nil { | |
return errors.Wrap(err, "service monitor could not be enabled") | |
} | |
return nil | |
} |
In the line 506, the function that imports the ServiceMonitor
template is done as follows in k8sutil.GetServiceMonitor
.
rook/pkg/operator/k8sutil/prometheus.go
Lines 48 to 59 in 4af3f37
func GetServiceMonitor(filePath string) (*monitoringv1.ServiceMonitor, error) { | |
file, err := os.ReadFile(filepath.Clean(filePath)) | |
if err != nil { | |
return nil, fmt.Errorf("servicemonitor file could not be fetched. %v", err) | |
} | |
var servicemonitor monitoringv1.ServiceMonitor | |
err = k8sYAML.NewYAMLOrJSONDecoder(bytes.NewBufferString(string(file)), 1000).Decode(&servicemonitor) | |
if err != nil { | |
return nil, fmt.Errorf("servicemonitor could not be decoded. %v", err) | |
} | |
return &servicemonitor, nil | |
} |
However, as you can see, this code is simply responsible for parsing the data of the file located in the filepath
.
Again, in the line 506, filepath
is defined as path.Join(monitoringPath, serviceMonitorFile)
.
rook/pkg/operator/ceph/cluster/mgr/mgr.go
Lines 57 to 58 in 4af3f37
monitoringPath = "/etc/ceph-monitoring/" | |
serviceMonitorFile = "service-monitor.yaml" |
This file location is hard-coded for the Dockerfile, as follows.
Line 30 in 4af3f37
COPY ceph-monitoring /etc/ceph-monitoring |
This directory is prepared by Makefile
, as follows.
Line 32 in 4af3f37
MANIFESTS_DIR=../../deploy/examples |
Line 75 in 4af3f37
@cp -r $(MANIFESTS_DIR)/monitoring $(TEMP)/ceph-monitoring |
So, monitoringPath
is ./deploy/examples/monitoring
, serviceMonitorFile
is service-monitor.yaml
, and finally the original file is ./deploy/examples/monitoring/service-monitor.yaml
, as follows.
rook/deploy/examples/monitoring/service-monitor.yaml
Lines 1 to 19 in 4af3f37
apiVersion: monitoring.coreos.com/v1 | |
kind: ServiceMonitor | |
metadata: | |
name: rook-ceph-mgr | |
namespace: rook-ceph | |
labels: | |
team: rook | |
spec: | |
namespaceSelector: | |
matchNames: | |
- rook-ceph | |
selector: | |
matchLabels: | |
app: rook-ceph-mgr | |
rook_cluster: rook-ceph | |
endpoints: | |
- port: http-metrics | |
path: /metrics | |
interval: 5s |
Yeah, the rook_cluster
is hard-coded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great explanation,
I feel then you did is correct,
But this is something we need to look into more deeply, as what I think is we hsould include the yaml in the code, directly so we can have cutomise namespace,
Instead of having a seprate yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I also see no reason why this YAML template shouldn't be deployed as a rook-ceph-cluster
helm chart.
However, to accomplish this, I believe the following preparations are required:
- For now,
ServiceMonitor
object is dynamically deployed according to CephCluster CR. If we want to deploy this template in a declarative way via helm or something else, I think we should remove the relevant functionality from CephCluster CR and notify it in a future version release. - If we can statically deploy the
ServiceMonitor
object, creating it via helm will be a breeze.
Luckily, I have some free time this week, so let me cooperate with you to solve the problem in the way suggested :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to think about a design which satisfies both of our needs,
Helm installation and the non helm installtion,
So I was thinking to keep the code likewise,
But just include the yaml file template in the code, with customize parameters.
So design will be-> Instead of pointing to the specific file, we should directly insert that yaml in the code with customise values,
@travisn @BlaineEXE @avanthakkar ^^ for more suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so ./deploy/examples/monitoring/service-monitor.yaml
=> func CreateServiceMonitorTemplate(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long talk. I modified the code as suggested!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's squash the commits to a single commit, thanks. Otherwise, just a tiny suggestion
cephv1.GetCephExporterLabels(cephCluster.Spec.Labels).OverwriteApplyToObjectMeta(&serviceMonitor.ObjectMeta) | ||
|
||
err = controllerutil.SetControllerReference(&cephCluster, serviceMonitor, scheme) | ||
var err = controllerutil.SetControllerReference(&cephCluster, serviceMonitor, scheme) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var err = controllerutil.SetControllerReference(&cephCluster, serviceMonitor, scheme) | |
err := controllerutil.SetControllerReference(&cephCluster, serviceMonitor, scheme) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed!
When deploying a cluster with Helm Chart, if the namespace is not the default value `rook-ceph`, and if the monitoring feature is enabled, then the generated ServiceMonitor's `rook_cluster` selector now follows the namespace, not the hard-coded value `rook-ceph`. Signed-off-by: Ho Kim <ho.kim@ulagbulag.io>
57b81eb
to
42caef1
Compare
}, | ||
}, | ||
Selector: metav1.LabelSelector{ | ||
MatchLabels: map[string]string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mgr_role: active
Is one of the match label
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the service rook-ceph-mgr
itself doesn't have the label mgr_role
.
As I know, the label mgr_role
is opposed for deployments, and the service already has the selector with mgr_role
.
So, we can ignore it because the target service rook-ceph-mgr
is already selected as an active mgr.
To simply prove this, if you deploy ServiceMonitor
with the mgr_role
selector enabled, you will see that metrics are not collected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mgr: update default rook_cluster of ServiceMonitor (backport #12293)
When deploying a cluster with Helm Chart, if the namespace is not the default value
rook-ceph
, and if the monitoring feature is enabled, then the generated ServiceMonitor'srook_cluster
selector now follows the namespace, not the hard-coded valuerook-ceph
.Example Scenario
my-rook-ceph
monitoring
kubectl -n csi-rook-ceph get servicemonitor rook-ceph-mgr -o jsonpath --template '{.spec.selector.matchLabels.rook_cluster}'
rook-ceph
.my-rook-ceph
.Description of your changes:
Which issue is resolved by this Pull Request:
Resolves #
Checklist:
skip-ci
on the PR.