Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Archiving of prover in gpu_prover_queue #1537

Merged
merged 27 commits into from
Apr 4, 2024
Merged

Conversation

Artemka374
Copy link
Contributor

@Artemka374 Artemka374 commented Apr 1, 2024

What ❔

Add archiver for provers in gpu_prover_queue, which will move all provers, whose status was dead during some time to archive.
Add availability checker for provers, which will check whether prover wasn't marked dead while being alive, and shut down it if so.

Why ❔

To improve prover performance and prevent incidents with provers marked dead while being alive(autoscalers won't scale provers more, because they see that prover is alive)

Checklist

  • PR title corresponds to the body of PR (we generate changelog entries from PRs).
  • Tests for the changes have been added / updated.
  • Documentation comments have been added / updated.
  • Code has been formatted via zk fmt and zk lint.
  • Spellcheck has been run via zk spellcheck.
  • Linkcheck has been run via zk linkcheck.

@Artemka374 Artemka374 requested a review from EmilLuta April 1, 2024 15:26
@EmilLuta
Copy link
Contributor

EmilLuta commented Apr 2, 2024

Discussed offline, @Artemka374 will address and add a few clarifications.

# Conflicts:
#	core/lib/config/src/configs/fri_prover.rs
#	core/lib/config/src/configs/house_keeper.rs
#	core/lib/config/src/testonly.rs
#	core/lib/env_config/src/fri_prover.rs
#	core/lib/env_config/src/house_keeper.rs
#	core/lib/protobuf_config/src/fri_prover.rs
#	core/lib/protobuf_config/src/house_keeper.rs
#	core/lib/protobuf_config/src/proto/fri_prover.proto
#	core/lib/protobuf_config/src/proto/house_keeper.proto
#	etc/env/base/house_keeper.toml
Copy link
Collaborator

@RomanBrodetski RomanBrodetski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please add comments on the logic of components that you introduce here

core/lib/basic_types/src/prover_dal.rs Outdated Show resolved Hide resolved
core/lib/env_config/src/house_keeper.rs Outdated Show resolved Hide resolved
core/lib/zksync_core/src/lib.rs Outdated Show resolved Hide resolved
etc/env/base/house_keeper.toml Outdated Show resolved Hide resolved
prover/prover_fri/src/gpu_prover_availability_checker.rs Outdated Show resolved Hide resolved
prover/prover_fri/src/metrics.rs Show resolved Hide resolved
@Artemka374 Artemka374 added this pull request to the merge queue Apr 4, 2024
Merged via the queue into main with commit a970629 Apr 4, 2024
40 checks passed
@Artemka374 Artemka374 deleted the afo/prover-archiving branch April 4, 2024 09:22
github-merge-queue bot pushed a commit that referenced this pull request Apr 16, 2024
🤖 I have created a release *beep* *boop*
---


##
[23.0.0](core-v22.1.0...core-v23.0.0)
(2024-04-16)


### ⚠ BREAKING CHANGES

* **vm:** 1 5 0 support
([#1508](#1508))

### Features

* **api:** Add `tokens_whitelisted_for_paymaster`
([#1545](#1545))
([6da89cd](6da89cd))
* **api:** Log info about estimated fee
([#1611](#1611))
([daed58c](daed58c))
* Archive old prover jobs
([#1516](#1516))
([201476c](201476c))
* Archiving of prover in gpu_prover_queue
([#1537](#1537))
([a970629](a970629))
* **block-reverter:** only require private key for sending revert
transactions
([#1579](#1579))
([27de6b7](27de6b7))
* **config:** Initialize log config from files as well
([#1566](#1566))
([9e7db59](9e7db59))
* **configs:** Implement new format of configs and implement protobuf
for it ([#1501](#1501))
([086ba5b](086ba5b))
* **db:** Wrap sqlx errors in DAL
([#1522](#1522))
([6e9ed8c](6e9ed8c))
* EN Pruning
([#1418](#1418))
([cea6578](cea6578))
* **en:** add consistency checker condition in db pruner
([#1653](#1653))
([5ed92b9](5ed92b9))
* **en:** add manual vacuum step in db pruning
([#1652](#1652))
([c818be3](c818be3))
* **en:** Rate-limit L2 client requests
([#1500](#1500))
([3f55f1e](3f55f1e))
* **en:** Rework storing and using protective reads
([#1515](#1515))
([13c0c45](13c0c45))
* **en:** support for snapshots recovery in version_sync_task.rs
([#1585](#1585))
([f911276](f911276))
* **eth-watch:** Brush up Ethereum watcher component
([#1596](#1596))
([b0b8f89](b0b8f89))
* Expose component configs as info metrics
([#1584](#1584))
([7c8ae40](7c8ae40))
* **external-node:** external node distributed operation mode
([#1457](#1457))
([777ffca](777ffca))
* Extract commitment generator into a separate crate
([#1636](#1636))
([f763d1f](f763d1f))
* Extract eth_watch and shared metrics into separate crates
([#1572](#1572))
([4013771](4013771))
* Finalize fee address migration
([#1617](#1617))
([713f56b](713f56b))
* fix availability checker
([#1574](#1574))
([b2f21fb](b2f21fb))
* **genesis:** Add genesis config generator
([#1671](#1671))
([45164fa](45164fa))
* **genesis:** mark system contracts bytecodes as known
([#1554](#1554))
([5ffec51](5ffec51))
* Migrate gas limit to u64
([#1538](#1538))
([56dc049](56dc049))
* **node-framework:** Add consensus support
([#1546](#1546))
([27fe475](27fe475))
* **node-framework:** Add consistency checker
([#1527](#1527))
([3c28c25](3c28c25))
* remove unused variables in prover configs
([#1564](#1564))
([d32a019](d32a019))
* Remove zksync-rs SDK
([#1559](#1559))
([cc78e1d](cc78e1d))
* soft removal of `events_queue` table
([#1504](#1504))
([5899bc6](5899bc6))
* **sqlx:** Use offline mode by default
([#1539](#1539))
([af01edd](af01edd))
* Use config for max number of circuits
([#1573](#1573))
([9fcb87e](9fcb87e))
* Validium
([#1461](#1461))
([132a169](132a169))
* **vm:** 1 5 0 support
([#1508](#1508))
([a6ccd25](a6ccd25))


### Bug Fixes

* **api:** Change error code for Web3Error::NotImplemented
([#1521](#1521))
([0a13602](0a13602))
* **cache:** use factory deps cache correctly
([#1547](#1547))
([a923e11](a923e11))
* **CI:** Less flaky CI
([#1536](#1536))
([2444b53](2444b53))
* **configs:** Make genesis fields optional
([#1555](#1555))
([2d0ef46](2d0ef46))
* contract verifier config test
([#1583](#1583))
([030d447](030d447))
* **contract-verifier-api:** permissive cors for contract verifier api
server ([#1525](#1525))
([423f4a7](423f4a7))
* **db:** Fix "values cache update task failed" panics
([#1561](#1561))
([f7c5c14](f7c5c14))
* **en:** do not log error when whitelisted_tokens_for_aa is not
supported
([#1600](#1600))
([06c87f5](06c87f5))
* **en:** Fix DB pool for Postgres metrics on EN
([#1675](#1675))
([c51ca91](c51ca91))
* **en:** improved tree recovery logs
([#1619](#1619))
([ef12df7](ef12df7))
* **en:** Reduce amount of data in snapshot header
([#1528](#1528))
([afa1cf1](afa1cf1))
* **eth-client:** Use local FeeHistory type
([#1552](#1552))
([5a512e8](5a512e8))
* instruction count diff always N/A in VM perf comparison
([#1608](#1608))
([c0f3104](c0f3104))
* **vm:** Fix storage oracle and estimation
([#1634](#1634))
([932b14b](932b14b))
* **vm:** Increase log demuxer cycles on far calls
([#1575](#1575))
([90eb9d8](90eb9d8))


### Performance Improvements

* **db:** rework "finalized" block SQL query
([#1524](#1524))
([2b27290](2b27290))
* **merkle tree:** Manage indices / filters in RocksDB
([#1550](#1550))
([6bbfa06](6bbfa06))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: romanbrodetskiy <rb@matterlabs.dev>
github-merge-queue bot pushed a commit that referenced this pull request Apr 22, 2024
🤖 I have created a release *beep* *boop*
---


##
[13.0.0](prover-v12.2.0...prover-v13.0.0)
(2024-04-22)


### ⚠ BREAKING CHANGES

* **vm:** 1 5 0 support
([#1508](#1508))

### Features

* Archive old prover jobs
([#1516](#1516))
([201476c](201476c))
* Archiving of prover in gpu_prover_queue
([#1537](#1537))
([a970629](a970629))
* **configs:** Implement new format of configs and implement protobuf
for it ([#1501](#1501))
([086ba5b](086ba5b))
* **db:** Wrap sqlx errors in DAL
([#1522](#1522))
([6e9ed8c](6e9ed8c))
* fix availability checker
([#1574](#1574))
([b2f21fb](b2f21fb))
* Prover CLI Scaffoldings
([#1609](#1609))
([9a22fa0](9a22fa0))
* Remove zksync-rs SDK
([#1559](#1559))
([cc78e1d](cc78e1d))
* **sqlx:** Use offline mode by default
([#1539](#1539))
([af01edd](af01edd))
* **vm:** 1 5 0 support
([#1508](#1508))
([a6ccd25](a6ccd25))


### Bug Fixes

* **en:** Fix miscellaneous snapshot recovery nits
([#1701](#1701))
([13bfecc](13bfecc))
* made consensus store certificates asynchronously from statekeeper
([#1711](#1711))
([d1032ab](d1032ab))


### Performance Improvements

* **merkle tree:** Manage indices / filters in RocksDB
([#1550](#1550))
([6bbfa06](6bbfa06))


### Reverts

* **env:** Remove `ZKSYNC_HOME` env var from server
([#1713](#1713))
([aed23e1](aed23e1))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: perekopskiy <53865202+perekopskiy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants