Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(healthcheck): Various healthcheck improvements #1166

Merged
merged 9 commits into from
Feb 26, 2024

Conversation

slowli
Copy link
Contributor

@slowli slowli commented Feb 21, 2024

What ❔

  • Adds HeathStatus::ShuttingDown set immediately after a component receives a termination signal. Makes the /health endpoint conforming to K8s readiness probe expectations.
  • Makes slow / hard time limits for health checks configurable and decreases their values by default.
  • Adds metric for slow, timed out and dropped health checks.

Why ❔

Improves healthcheck observability.

Checklist

  • PR title corresponds to the body of PR (we generate changelog entries from PRs).
  • Tests for the changes have been added / updated.
  • Documentation comments have been added / updated.
  • Code has been formatted via zk fmt and zk lint.
  • Spellcheck has been run via zk spellcheck.
  • Linkcheck has been run via zk linkcheck.

@slowli slowli marked this pull request as ready for review February 21, 2024 11:27
@slowli
Copy link
Contributor Author

slowli commented Feb 21, 2024

Just in case: I've checked that if /heath is requested with a small client timeout (e.g., using curl -m ..) so that it doesn't complete in time, then axum drops the handling future together with pending futures it depends on (in particular, CheckHealth::check_health() implementations). So a drop guard added in this PR will actually be triggered in this case.

core/lib/health_check/README.md Show resolved Hide resolved
core/lib/health_check/README.md Show resolved Hide resolved
core/lib/health_check/src/lib.rs Show resolved Hide resolved
core/lib/health_check/src/metrics.rs Show resolved Hide resolved
@RomanBrodetski RomanBrodetski added this pull request to the merge queue Feb 26, 2024
Merged via the queue into main with commit 1e34148 Feb 26, 2024
37 checks passed
@RomanBrodetski RomanBrodetski deleted the aov-pla-804-add-metrics-for-health-endpoints branch February 26, 2024 08:50
RomanBrodetski pushed a commit that referenced this pull request Feb 26, 2024
🤖 I have created a release *beep* *boop*
---


##
[20.8.0](core-v20.7.0...core-v20.8.0)
(2024-02-26)


### Features

* Add more buckets to call tracer
([#1137](#1137))
([dacd8c9](dacd8c9))
* **api:** add a config flag for disabling filter api
([#1078](#1078))
([b486d7e](b486d7e))
* **api:** Create RPC method to return all tokens
([#1103](#1103))
([b538d1a](b538d1a))
* **api:** Implement TxSink abstraction
([#1204](#1204))
([11a34d4](11a34d4))
* **en:** Add health checks for EN components
([#1088](#1088))
([4ea1520](4ea1520))
* **en:** Start health checks early into EN lifecycle
([#1146](#1146))
([f983e80](f983e80))
* **en:** switch to tree light mode
([#1152](#1152))
([ce6c120](ce6c120))
* **en:** Take into account nonce from tx proxy
([#995](#995))
([22099cb](22099cb))
* **healthcheck:** Various healthcheck improvements
([#1166](#1166))
([1e34148](1e34148))
* Integration tests enhancement for L1
([#1209](#1209))
([a1c866c](a1c866c))
* **node_framework:** Support Eth Watch in the framework
([#1145](#1145))
([4f41b68](4f41b68))
* **shared bridge:** preparation for shared bridge migration (server)
([#1012](#1012))
([2a766a7](2a766a7))
* **vlog:** Remove env getters from vlog
([#1077](#1077))
([00d3429](00d3429))
* **vm:** Add new VM folder
([#1208](#1208))
([66cdefc](66cdefc))
* **vm:** integrate new vm version
([#1215](#1215))
([63d1f52](63d1f52))


### Bug Fixes

* **contract-verifier:** Add force_evmla flag
([#1179](#1179))
([e75aa11](e75aa11))
* **contract-verifier:** allow other zksolc settings
([#1174](#1174))
([72c60bd](72c60bd))
* **state-keeper:** Add GasForBatchTip criterion
([#1096](#1096))
([de4d729](de4d729))


### Performance Improvements

* **db:** Improve `get_logs_by_tx_hashes` query
([#1171](#1171))
([0dda7cc](0dda7cc))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
github-merge-queue bot pushed a commit that referenced this pull request Mar 5, 2024
🤖 I have created a release *beep* *boop*
---


##
[12.0.0](prover-v11.0.0...prover-v12.0.0)
(2024-03-04)


### ⚠ BREAKING CHANGES

* **prover:** Add EIP4844 support for provers subsystem
([#1200](#1200))
* Set 21 as latest protocol version
([#1262](#1262))

### Features

* Adding ability to generate 4844 setup key and refactor
([#1143](#1143))
([975f54b](975f54b))
* **api:** Remove unused and obsolete token info
([#1071](#1071))
([e920897](e920897))
* **dal:** `zksync_types::Transaction` to use protobuf for wire encoding
(BFT-407)
([#1047](#1047))
([ee94bee](ee94bee))
* **db:** Soft-remove `storage` table
([#982](#982))
([601f893](601f893))
* **en:** Integrate snapshots recovery into EN
([#1032](#1032))
([c7cfaf9](c7cfaf9))
* **healthcheck:** Various healthcheck improvements
([#1166](#1166))
([1e34148](1e34148))
* improving verification key generation
([#1050](#1050))
([6f715c8](6f715c8))
* Prover interface and L1 interface crates
([#959](#959))
([4f7e107](4f7e107))
* **prover:** Add EIP4844 support for provers subsystem
([#1200](#1200))
([6953e89](6953e89))
* **prover:** Added --recompute-if-missing option to key generator
([#1151](#1151))
([cad7278](cad7278))
* **prover:** Added 4844 circuit to verification keys
([#1141](#1141))
([8b0cc4a](8b0cc4a))
* **prover:** Adding first support for 4844 circuit
([#1155](#1155))
([6f63c53](6f63c53))
* **prover:** adding keystore object to handle reading and writing of
prover keys
([#1132](#1132))
([1471615](1471615))
* **prover:** merging key generation into a single binary
([#1101](#1101))
([6de8b84](6de8b84))
* **prover:** Moved setup key generation logic to test harness
([#1113](#1113))
([469ab06](469ab06))
* **prover:** Use new shivini function for 4844 circuits
([#1205](#1205))
([376c09e](376c09e))
* Set 21 as latest protocol version
([#1262](#1262))
([30579ef](30579ef))
* **vlog:** Remove env getters from vlog
([#1077](#1077))
([00d3429](00d3429))


### Bug Fixes

* fix link
([#1007](#1007))
([f1424ce](f1424ce))
* make `zk status prover` use the new prover table
([#1044](#1044))
([9b21d7f](9b21d7f))
* **prover:** Decouple core/ prover database management
([#1029](#1029))
([37674fd](37674fd))
* **prover:** Fix initial prover migration
([#1083](#1083))
([6d54010](6d54010))
* **prover:** QoL socket utilization
([#1020](#1020))
([13a6816](13a6816))
* update harness to include fix to new boojum OOM
([#1053](#1053))
([4976941](4976941))


### Performance Improvements

* bump harness version
([#1003](#1003))
([1cbb4c9](1cbb4c9))
* reduce memory consumption of witness generation
([#696](#696))
([dea6768](dea6768))
* upgrade harness version to improve witness generation memory spike
([#1034](#1034))
([09bbb84](09bbb84))
* use jemalloc in witness generator
([#1014](#1014))
([917b2dc](917b2dc))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants