New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Egress IP health monitoring over GRPC #3100
Conversation
5ddd7a3
to
3d690f5
Compare
/assign @bpickard22 |
3d690f5
to
2773b7f
Compare
/retest |
/retest-failed |
} | ||
} | ||
|
||
func checkEgressNodesReachabilityIterate(oc *Controller) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: best to view these changes with ignoring white-spacing
option
/retest-failed |
/retest-failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Run this against a linter and address all issues:
[root@ovnkubernetes healthcheck]# golint
egressip_healthcheck.go:15:2: const ServiceEgressIpNode should be ServiceEgressIPNode
egressip_healthcheck.go:15:2: exported const ServiceEgressIpNode should have comment (or a comment on this block) or be unexported
egressip_healthcheck.go:36:2: don't use underscores in Go names; struct field node_mgmt_ip should be nodeMgmtIP
egressip_healthcheck.go:39:2: don't use underscores in Go names; struct field health_check_port should be healthCheckPort
egressip_healthcheck.go:42:1: exported function NewEgressIPHealthServer should have comment or be unexported
egressip_healthcheck.go:42:30: don't use underscores in Go names; func parameter node_mgmt_ip should be nodeMgmtIP
egressip_healthcheck.go:42:51: don't use underscores in Go names; func parameter health_check_port should be healthCheckPort
egressip_healthcheck.go:42:75: exported func NewEgressIPHealthServer returns unexported type *healthcheck.egressIPHealthServer, which can be annoying to use
egressip_healthcheck.go:81:6: exported type EgressIPHealthClient should have comment or be unexported
egressip_healthcheck.go:83:10: don't use underscores in Go names; interface method parameter dial_ctx should be dialCtx
egressip_healthcheck.go:83:54: don't use underscores in Go names; interface method parameter health_check_port should be healthCheckPort
egressip_healthcheck.go:85:8: don't use underscores in Go names; interface method parameter dial_ctx should be dialCtx
egressip_healthcheck.go:95:2: don't use underscores in Go names; struct field probe_failed should be probeFailed
egressip_healthcheck.go:98:1: exported function NewEgressIPHealthClient should have comment or be unexported
egressip_healthcheck.go:106:42: don't use underscores in Go names; method parameter dial_ctx should be dialCtx
egressip_healthcheck.go:106:86: don't use underscores in Go names; method parameter health_check_port should be healthCheckPort
egressip_healthcheck.go:109:6: don't use underscores in Go names; var node_addr should be nodeAddr
egressip_healthcheck.go:112:9: don't use underscores in Go names; range var node_mgmt_ip should be nodeMgmtIP
egressip_healthcheck.go:143:40: don't use underscores in Go names; method parameter dial_ctx should be dialCtx
egressip_healthcheck.go:158:3: don't use underscores in Go names; var prev_probe_failed should be prevProbeFailed
Lint is playing tricks on me! :P |
@andreaskaris I made the changes you pointed out, including lint. |
24fedd7
to
afbe099
Compare
/retest-failed |
1 similar comment
/retest-failed |
I have this here in my bashrc:
|
TY. I'm addressing the last issues I found soon
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, some more linter stuff .. feel free to correct what you see fit but at least fix the python underscore syntax to golang camelcase
[akaris@linux ovn ((afbe09999...))]$ cd healthcheck/
[akaris@linux healthcheck ((afbe09999...))]$ lint
egressip_healthcheck.go:37:1: comment on exported type EgressIPHealthServer should be of the form "EgressIPHealthServer ..." (with optional leading article)
egressip_healthcheck.go:49:1: comment on exported function NewEgressIPHealthServer should be of the form "NewEgressIPHealthServer ..."
egressip_healthcheck.go:93:1: comment on exported type EgressIPHealthClient should be of the form "EgressIPHealthClient ..." (with optional leading article)
egressip_healthcheck.go:111:1: comment on exported function NewEgressIPHealthClient should be of the form "NewEgressIPHealthClient ..."
[akaris@linux ovn ((afbe09999...))]$ lint | grep egressip.go
egressip.go:1393:6: exported type EgressIPPatchStatus should have comment or be unexported
egressip.go:1982:9: if block ends with a return statement, so drop this else and outdent its block (move short variable declaration to its own line if necessary)
egressip.go:2241:60: don't use underscores in Go names; method parameter health_check_port should be healthCheckPort
egressip.go:2252:2: don't use underscores in Go names; var dial_ctx should be dialCtx
egressip.go:2252:12: don't use underscores in Go names; var dial_cancel should be dialCancel
[akaris@linux node ((afbe09999...))]$ lint | grep node.go
node.go:586:2: don't use underscores in Go names; var health_check_port should be healthCheckPort
node.go:589:7: don't use underscores in Go names; var node_mgmt_ip should be nodeMgmtIP
node.go:602:3: don't use underscores in Go names; var health_server should be healthServer
node.go:619:1: exported method OvnNode.WatchEndpoints should have comment or be unexported
node.go:742:1: exported method OvnNode.WatchNamespaces should have comment or be unexported
afbe099
to
0ed7237
Compare
a9f8ac0
to
48a4bff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good except for some test cases and swapping the default config behavior
@martinkennelly below are the places I used to come up to speed on using gRPC. Please take a look and let me know if that is not good for you:
|
d614401
to
602542f
Compare
Copied proto definition from https://github.com/grpc/grpc/blob/master/doc/health-checking.md and generated the associated code. Steps taken: $ export PATH="$PATH:$(go env GOPATH)/bin" $ go version go version go1.18.3 linux/amd64 $ protoc --version libprotoc 3.14.0 $ cd go-controller/pkg/ovn/healthcheck $ protoc --go_out=. --go_opt=paths=source_relative \ --go-grpc_out=. --go-grpc_opt=paths=source_relative health.proto Signed-off-by: Flavio Fernandes <flaviof@redhat.com>
Break checkEgressNodesReachability into checkEgressNodesReachabilityIterate calls so implementation can honor oc.stopChan events and know when egressip should stop probing its egress nodes. Signed-off-by: Flavio Fernandes <flaviof@redhat.com>
602542f
to
e2c0b0d
Compare
/retest |
@flavio-fernandes one of the jobs failed with ovnkbue node crashing 9x:
https://github.com/ovn-org/ovn-kubernetes/runs/8123914416?check_suite_focus=true |
my changes broke ipv6. Fixing it now |
de57394
to
d93e072
Compare
When ovnkube container, in both master and node pods, is started with the newly introduced flag 'egressip-node-healthcheck-port', egressip implementation will now use gRPC with that parameter. Signed-off-by: Flavio Fernandes <flaviof@redhat.com>
Extend kind and dist to support a new flag added to ovn-kube: egress ip health check This flag is used by ovnkube-master and ovnkube-node templates, so eip probe uses gRPC to that port instead of dialing discard port (9). NOTE: With this change, kind will use port 9107 by default. Signed-off-by: Flavio Fernandes <flaviof@redhat.com>
d93e072
to
77ceb00
Compare
Added settling change to ensure it is safe to use ipv6. TY @jcaamano and @trozet !!!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/lgtm |
/lgtm Thanks for the info flavio. |
This PR includes a new functionality to egress ip, so it can
use gRPC in order to determine the health of its egress nodes.
Info on the protobuf definition used for the probing is here:
https://github.com/grpc/grpc/blob/master/doc/health-checking.md
The requirement in order to use this method is that both
ovnkube-master and ovnkube-node pods are started with a
new flag:
--egressip-node-healthcheck-port <TCP_PORT>
.The original behavior remains as a viable method, should
ovnkube-master pod get started without the new flag.
For sake of completeness, the original method is to
have the ovnkube-master dial to discard (TCP port 9) and
declare node up upon receiving a TCP reject answer.
Signed-off-by: Flavio Fernandes flaviof@redhat.com
Reported-at: https://issues.redhat.com/browse/SDN-3156