Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose critical lbs metrics #538

Merged

Conversation

MaysaMacedo
Copy link
Contributor

@MaysaMacedo MaysaMacedo commented Jun 30, 2021

Includes a new Gauge metric that records the
number of members of load balancers considered
critical. The metric is labeled with the Name
of load balancer and pool name, and the amount
of members. Also includes an Enum with the
current state of the lb.

@openshift-ci openshift-ci bot requested review from dulek and luis5tb June 30, 2021 16:14
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 30, 2021
@MaysaMacedo MaysaMacedo changed the title Expose number of critical lb members metric Expose critical lbs metrics Jul 1, 2021
Includes a new Gauge metric that records the
number of members of load balancers considered
critical. The metric is labeled with the Name
of load balancer and pool name, and the amount
of members. Also includes an Enum with the
current state of the lb.
'kuryr_critical_lb_state', 'Critical Load Balancer State',
labelnames={'lb_name'},
states=['ERROR', 'ACTIVE', 'DELETED', 'PENDING_CREATE',
'PENDING_UPDATE', 'PENDING_DELETE'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, if it's intentional, but for every loadbalancer with certain label (in this case it would be 'namespace/lbname') this will create 6 metrics for Enum type. Those metrics will stay forever (or at least during the service lifetime), which in busy environment could easily grow to hundreds of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's intentional, we're monitoring only 2 critical load balancers. So, we'll have 2* ['ERROR', 'ACTIVE', 'DELETED', 'PENDING_CREATE', 'PENDING_UPDATE', 'PENDING_DELETE']

lb_id = lb.get('id')
if not lb_id:
continue
lb = self._os_lb.find_load_balancer(lb_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird that you end up using find_load_balancer and not get_load_balancer but I'm confused by the latter method too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find_load_balancer expects a name_or_id, while get_load_balancer expects the loadbalancer
https://docs.openstack.org/openstacksdk/latest/user/proxies/load_balancer_v2.html

klb = utils.get_kuryrloadbalancer(name, namespace)
lb = klb.get('status', {}).get('loadbalancer', {})
lb_id = lb.get('id')
if not lb_id:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if missing KLB is a problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this wouldn't cause any problems:
{}.get('status', {}).get('loadbalancer', {}).get('id')
And eventually the controller takes care of creating a CRD, in case that was deleted manually by the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I noticed we'll just ignore it. Okay, makes sense, it's controller's job to take care of this.

@dulek
Copy link
Contributor

dulek commented Jul 7, 2021

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 7, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 7, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dulek, MaysaMacedo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit a126827 into openshift:master Jul 7, 2021
@MaysaMacedo MaysaMacedo deleted the include-lbs-metrics branch January 20, 2022 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants