Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add grpc health checking #4779

Merged
merged 33 commits into from Jun 8, 2022
Merged

feat: add grpc health checking #4779

merged 33 commits into from Jun 8, 2022

Conversation

JoanFM
Copy link
Member

@JoanFM JoanFM commented May 11, 2022

This PR aims to change the way health check is done in the core for Executors but also for any runtime using grpc. We aim to use the standard grpc https://github.com/grpc/grpc/blob/master/doc/health-checking.md as suggested by grpc, and which is something that lots of Load Balancers, etc ... will use to make sure grpc-based microservices are healthy.

This also has the benefit that ControlRequest can be removed from the code.

@github-actions github-actions bot added size/M area/core This issue/PR affects the core codebase area/entrypoint This issue/PR affects the entrypoint codebase area/helper This issue/PR affects the helper functionality area/network This issue/PR affects network functionality area/setup This issue/PR affects setting up Jina component/client component/proto labels May 11, 2022
@github-actions
Copy link

github-actions bot commented May 12, 2022

Latency summary

Current PR yields:

  • 🐢🐢 index QPS at 1137, delta to last 2 avg.: -6%
  • 🐢🐢 query QPS at 62, delta to last 2 avg.: -9%
  • 🐢🐢 avg flow time within 1.3451 seconds, delta to last 2 avg.: -9%
  • 🐎🐎🐎🐎 import jina within 0.6029 seconds, delta to last 2 avg.: +18%

Breakdown

Version Index QPS Query QPS Avg Flow Time (s) Import Time (s)
current 1137 62 1.3451 0.6029
3.4.10 1085 55 1.357 0.5723
3.4.9 1346 81 1.6049 0.4471

Backed by latency-tracking. Further commits will update this comment.

@codecov
Copy link

codecov bot commented May 12, 2022

Codecov Report

Merging #4779 (7434ce9) into master (abe735f) will decrease coverage by 0.27%.
The diff coverage is 82.73%.

@@            Coverage Diff             @@
##           master    #4779      +/-   ##
==========================================
- Coverage   88.18%   87.91%   -0.28%     
==========================================
  Files         110      111       +1     
  Lines        8481     8647     +166     
==========================================
+ Hits         7479     7602     +123     
- Misses       1002     1045      +43     
Flag Coverage Δ
jina 87.91% <82.73%> (-0.20%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
jina/clients/helper.py 100.00% <ø> (ø)
jina/orchestrate/pods/helper.py 88.88% <ø> (+6.34%) ⬆️
jina/serve/runtimes/base.py 74.19% <ø> (ø)
jina/proto/jina_pb2.py 58.66% <9.09%> (+0.83%) ⬆️
jina/serve/runtimes/gateway/websocket/app.py 77.85% <53.19%> (-11.68%) ⬇️
jina/clients/base/__init__.py 67.94% <66.66%> (-0.06%) ⬇️
jina/clients/base/helper.py 86.73% <70.58%> (-3.39%) ⬇️
jina/proto/jina_pb2_grpc.py 76.19% <75.00%> (ø)
jina/resources/health_check/pod.py 50.00% <80.00%> (+2.94%) ⬆️
jina/serve/runtimes/asyncio.py 80.95% <83.33%> (+2.30%) ⬆️
... and 36 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4254a90...7434ce9. Read the comment docs.

@github-actions github-actions bot added the area/testing This issue/PR affects testing label May 13, 2022
@github-actions
Copy link

github-actions bot commented Jun 7, 2022

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

f.block()
```

On another terminal, we can install or download [grpcurl](https://github.com/fullstorydev/grpcurl) image to be able to send `rpc` requests to our services.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
On another terminal, we can install or download [grpcurl](https://github.com/fullstorydev/grpcurl) image to be able to send `rpc` requests to our services.
On another terminal, we can install or download [grpcurl](https://github.com/fullstorydev/grpcurl) image to be able to send `rpc` health check requests to our services.

Otherwise it could be confusing what this line is doing


### Gateway health check with http or websocket

When using grpc or http as a procotol for the Gateway, then it exposes and endpoint '/' that one can query to check the status.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When using grpc or http as a procotol for the Gateway, then it exposes and endpoint '/' that one can query to check the status.
When using grpc or http as a procotol for the Gateway, then it exposes and endpoint `"/"` that one can query to check the status.

Otherwise it is not formatted correctly


### Check status from outside

To understand better what is going on under the hood, we can check how to target the endpoint from outside Jina or the client.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To understand better what is going on under the hood, we can check how to target the endpoint from outside Jina or the client.
To better understand what is going on under the hood, we can check how to target the endpoint from outside Jina or the client.

Copy link
Contributor

@JohannesMessner JohannesMessner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put the part about checking the entire Flow, dry_run() etc at the top. To me it seems like the most useful and commonly needed feature.

Also, what is the difference between the sections at the top about checking Executor/Gateway, and the sections at the bottom about 'from the outside'? I didn't leave comments on the latter because it seemed the same to me. Could you explain the difference?

docs/index.md Outdated
@@ -50,6 +50,7 @@ fundamentals/architecture-overview
fundamentals/executor/index
fundamentals/flow/index
fundamentals/clean-code
fundamentals/health-check
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the page go under Flow?

docs/fundamentals/health-check.md Outdated Show resolved Hide resolved
docs/fundamentals/health-check.md Outdated Show resolved Hide resolved
docs/fundamentals/health-check.md Outdated Show resolved Hide resolved
docs/fundamentals/health-check.md Outdated Show resolved Hide resolved
docs/fundamentals/health-check.md Outdated Show resolved Hide resolved
docs/fundamentals/health-check.md Outdated Show resolved Hide resolved
docs/fundamentals/health-check.md Outdated Show resolved Hide resolved
docs/fundamentals/health-check.md Outdated Show resolved Hide resolved
docs/fundamentals/health-check.md Outdated Show resolved Hide resolved
@github-actions
Copy link

github-actions bot commented Jun 7, 2022

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

2 similar comments
@github-actions
Copy link

github-actions bot commented Jun 7, 2022

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

github-actions bot commented Jun 7, 2022

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

github-actions bot commented Jun 7, 2022

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

github-actions bot commented Jun 7, 2022

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

github-actions bot commented Jun 7, 2022

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

github-actions bot commented Jun 7, 2022

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

github-actions bot commented Jun 7, 2022

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

github-actions bot commented Jun 8, 2022

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

github-actions bot commented Jun 8, 2022

📝 Docs are deployed on https://grpc-based-health-check--jina-docs.netlify.app 🎉

@JoanFM JoanFM merged commit ef662b5 into master Jun 8, 2022
@JoanFM JoanFM deleted the grpc-based-health-check branch June 8, 2022 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core This issue/PR affects the core codebase area/docs This issue/PR affects the docs area/entrypoint This issue/PR affects the entrypoint codebase area/helper This issue/PR affects the helper functionality area/network This issue/PR affects network functionality area/setup This issue/PR affects setting up Jina area/testing This issue/PR affects testing component/client component/proto component/resource component/type size/XL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document HealthCheck behavior for every Pod and Gateway on Kubernetes and Docker Compose
4 participants