Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr: update default rook_cluster of ServiceMonitor #12293

Merged

Conversation

HoKim98
Copy link
Contributor

@HoKim98 HoKim98 commented May 27, 2023

When deploying a cluster with Helm Chart, if the namespace is not the default value rook-ceph, and if the monitoring feature is enabled, then the generated ServiceMonitor's rook_cluster selector now follows the namespace, not the hard-coded value rook-ceph.

Example Scenario

  1. Deploy a rook-ceph cluster (rook-ceph, rook-ceph-cluster) with Helm Chart.
    • With modifying the namespace to my-rook-ceph
    • With enabling the feature monitoring
  2. Command kubectl -n csi-rook-ceph get servicemonitor rook-ceph-mgr -o jsonpath --template '{.spec.selector.matchLabels.rook_cluster}'
  3. Before patch, the command output should be rook-ceph.
    • After patch, it should be my-rook-ceph.

Description of your changes:

Which issue is resolved by this Pull Request:
Resolves #

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Skip Tests for Docs: If this is only a documentation change, add the label skip-ci on the PR.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

@HoKim98 HoKim98 force-pushed the fix-rook_cluster-bug-in-service-monitor branch from 7267430 to e487ea3 Compare May 27, 2023 18:22
@mergify
Copy link

mergify bot commented May 27, 2023

This pull request has merge conflicts that must be resolved before it can be merged. @kerryeon please rebase it. https://rook.io/docs/rook/latest/Contributing/development-flow/#updating-your-fork

@HoKim98 HoKim98 force-pushed the fix-rook_cluster-bug-in-service-monitor branch 2 times, most recently from 67ac2d8 to dd3a39c Compare May 27, 2023 18:26
@@ -519,6 +519,7 @@ func (c *Cluster) EnableServiceMonitor() error {
return errors.Wrapf(err, "failed to set owner reference to service monitor %q", serviceMonitor.Name)
}
serviceMonitor.Spec.NamespaceSelector.MatchNames = []string{c.clusterInfo.Namespace}
serviceMonitor.Spec.Selector.MatchLabels["rook_cluster"] = c.clusterInfo.Namespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think why the helm configuration to default value,

Is the rook-ceph default coded somewhere?

If the installtion has the problem I think so the fix should be required in helm charts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it comes to creating a ServiceMonitor, this function below is concerned.

func (c *Cluster) EnableServiceMonitor() error {
serviceMonitor, err := k8sutil.GetServiceMonitor(path.Join(monitoringPath, serviceMonitorFile))
if err != nil {
return errors.Wrap(err, "service monitor could not be enabled")
}
serviceMonitor.SetName(AppName)
serviceMonitor.SetNamespace(c.clusterInfo.Namespace)
cephv1.GetMonitoringLabels(c.spec.Labels).OverwriteApplyToObjectMeta(&serviceMonitor.ObjectMeta)
if c.spec.External.Enable {
serviceMonitor.Spec.Endpoints[0].Port = controller.ServiceExternalMetricName
}
err = c.clusterInfo.OwnerInfo.SetControllerReference(serviceMonitor)
if err != nil {
return errors.Wrapf(err, "failed to set owner reference to service monitor %q", serviceMonitor.Name)
}
serviceMonitor.Spec.NamespaceSelector.MatchNames = []string{c.clusterInfo.Namespace}
applyMonitoringLabels(c, serviceMonitor)
if _, err = k8sutil.CreateOrUpdateServiceMonitor(c.clusterInfo.Context, serviceMonitor); err != nil {
return errors.Wrap(err, "service monitor could not be enabled")
}
return nil
}

In the line 506, the function that imports the ServiceMonitor template is done as follows in k8sutil.GetServiceMonitor.

func GetServiceMonitor(filePath string) (*monitoringv1.ServiceMonitor, error) {
file, err := os.ReadFile(filepath.Clean(filePath))
if err != nil {
return nil, fmt.Errorf("servicemonitor file could not be fetched. %v", err)
}
var servicemonitor monitoringv1.ServiceMonitor
err = k8sYAML.NewYAMLOrJSONDecoder(bytes.NewBufferString(string(file)), 1000).Decode(&servicemonitor)
if err != nil {
return nil, fmt.Errorf("servicemonitor could not be decoded. %v", err)
}
return &servicemonitor, nil
}

However, as you can see, this code is simply responsible for parsing the data of the file located in the filepath.

Again, in the line 506, filepath is defined as path.Join(monitoringPath, serviceMonitorFile).

monitoringPath = "/etc/ceph-monitoring/"
serviceMonitorFile = "service-monitor.yaml"

This file location is hard-coded for the Dockerfile, as follows.

COPY ceph-monitoring /etc/ceph-monitoring

This directory is prepared by Makefile, as follows.

MANIFESTS_DIR=../../deploy/examples

@cp -r $(MANIFESTS_DIR)/monitoring $(TEMP)/ceph-monitoring

So, monitoringPath is ./deploy/examples/monitoring, serviceMonitorFile is service-monitor.yaml, and finally the original file is ./deploy/examples/monitoring/service-monitor.yaml, as follows.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: rook-ceph-mgr
namespace: rook-ceph
labels:
team: rook
spec:
namespaceSelector:
matchNames:
- rook-ceph
selector:
matchLabels:
app: rook-ceph-mgr
rook_cluster: rook-ceph
endpoints:
- port: http-metrics
path: /metrics
interval: 5s

Yeah, the rook_cluster is hard-coded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great explanation,

I feel then you did is correct,

But this is something we need to look into more deeply, as what I think is we hsould include the yaml in the code, directly so we can have cutomise namespace,
Instead of having a seprate yaml

Copy link
Contributor Author

@HoKim98 HoKim98 May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I also see no reason why this YAML template shouldn't be deployed as a rook-ceph-cluster helm chart.

However, to accomplish this, I believe the following preparations are required:

  1. For now, ServiceMonitor object is dynamically deployed according to CephCluster CR. If we want to deploy this template in a declarative way via helm or something else, I think we should remove the relevant functionality from CephCluster CR and notify it in a future version release.
  2. If we can statically deploy the ServiceMonitor object, creating it via helm will be a breeze.

Luckily, I have some free time this week, so let me cooperate with you to solve the problem in the way suggested :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to think about a design which satisfies both of our needs,
Helm installation and the non helm installtion,

So I was thinking to keep the code likewise,
But just include the yaml file template in the code, with customize parameters.

So design will be-> Instead of pointing to the specific file, we should directly insert that yaml in the code with customise values,

@travisn @BlaineEXE @avanthakkar ^^ for more suggestions

Copy link
Contributor Author

@HoKim98 HoKim98 May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so ./deploy/examples/monitoring/service-monitor.yaml => func CreateServiceMonitorTemplate(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long talk. I modified the code as suggested!

Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's squash the commits to a single commit, thanks. Otherwise, just a tiny suggestion

cephv1.GetCephExporterLabels(cephCluster.Spec.Labels).OverwriteApplyToObjectMeta(&serviceMonitor.ObjectMeta)

err = controllerutil.SetControllerReference(&cephCluster, serviceMonitor, scheme)
var err = controllerutil.SetControllerReference(&cephCluster, serviceMonitor, scheme)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var err = controllerutil.SetControllerReference(&cephCluster, serviceMonitor, scheme)
err := controllerutil.SetControllerReference(&cephCluster, serviceMonitor, scheme)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed!

When deploying a cluster with Helm Chart, if the namespace is not the default value `rook-ceph`, and if the monitoring feature is enabled, then the generated ServiceMonitor's `rook_cluster` selector now follows the namespace, not the hard-coded value `rook-ceph`.

Signed-off-by: Ho Kim <ho.kim@ulagbulag.io>
@HoKim98 HoKim98 force-pushed the fix-rook_cluster-bug-in-service-monitor branch from 57b81eb to 42caef1 Compare May 30, 2023 15:00
},
},
Selector: metav1.LabelSelector{
MatchLabels: map[string]string{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mgr_role: active Is one of the match label

Copy link
Contributor Author

@HoKim98 HoKim98 May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the service rook-ceph-mgr itself doesn't have the label mgr_role.

As I know, the label mgr_role is opposed for deployments, and the service already has the selector with mgr_role.

So, we can ignore it because the target service rook-ceph-mgr is already selected as an active mgr.

To simply prove this, if you deploy ServiceMonitor with the mgr_role selector enabled, you will see that metrics are not collected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parth-gr We just removed that in #12269 a few days ago, so it's not needed

@travisn travisn merged commit 44a9863 into rook:master May 30, 2023
48 of 50 checks passed
travisn added a commit that referenced this pull request May 30, 2023
mgr: update default rook_cluster of ServiceMonitor (backport #12293)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants