New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose critical lbs metrics #538
Expose critical lbs metrics #538
Conversation
1d9a3dc
to
68fc4ad
Compare
68fc4ad
to
b264ec4
Compare
Includes a new Gauge metric that records the number of members of load balancers considered critical. The metric is labeled with the Name of load balancer and pool name, and the amount of members. Also includes an Enum with the current state of the lb.
b264ec4
to
75153de
Compare
'kuryr_critical_lb_state', 'Critical Load Balancer State', | ||
labelnames={'lb_name'}, | ||
states=['ERROR', 'ACTIVE', 'DELETED', 'PENDING_CREATE', | ||
'PENDING_UPDATE', 'PENDING_DELETE'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, if it's intentional, but for every loadbalancer with certain label (in this case it would be 'namespace/lbname') this will create 6 metrics for Enum type. Those metrics will stay forever (or at least during the service lifetime), which in busy environment could easily grow to hundreds of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's intentional, we're monitoring only 2 critical load balancers. So, we'll have 2* ['ERROR', 'ACTIVE', 'DELETED', 'PENDING_CREATE', 'PENDING_UPDATE', 'PENDING_DELETE']
lb_id = lb.get('id') | ||
if not lb_id: | ||
continue | ||
lb = self._os_lb.find_load_balancer(lb_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird that you end up using find_load_balancer
and not get_load_balancer
but I'm confused by the latter method too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
find_load_balancer expects a name_or_id, while get_load_balancer expects the loadbalancer
https://docs.openstack.org/openstacksdk/latest/user/proxies/load_balancer_v2.html
klb = utils.get_kuryrloadbalancer(name, namespace) | ||
lb = klb.get('status', {}).get('loadbalancer', {}) | ||
lb_id = lb.get('id') | ||
if not lb_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if missing KLB is a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this wouldn't cause any problems:
{}.get('status', {}).get('loadbalancer', {}).get('id')
And eventually the controller takes care of creating a CRD, in case that was deleted manually by the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I noticed we'll just ignore it. Okay, makes sense, it's controller's job to take care of this.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dulek, MaysaMacedo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Includes a new Gauge metric that records the
number of members of load balancers considered
critical. The metric is labeled with the Name
of load balancer and pool name, and the amount
of members. Also includes an Enum with the
current state of the lb.