-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Move prover DAL to prover workspace #1350
Conversation
# Conflicts: # core/lib/dal/src/blocks_dal.rs # core/lib/zksync_core/src/api_server/web3/namespaces/zks.rs # core/lib/zksync_core/src/consistency_checker/tests/mod.rs # core/lib/zksync_core/src/eth_sender/aggregator.rs # core/lib/zksync_core/src/eth_sender/publish_criterion.rs
# Conflicts: # core/lib/dal/src/connection/mod.rs
CI seems red. I assume something might've changed in the way you start the server? |
It shouldn't, I didn't touch it, all the previous runs were successful though |
Note for Emil -- run the unit tests locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a pretty shallow review for now, but overall the PR feels to be in a WIP state.
@@ -32,6 +32,8 @@ num_enum = "0.6" | |||
hex = "0.4" | |||
prost = "0.12.1" | |||
|
|||
sqlx = "0.7.3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very dangerous. One of the main purposes of DAL crate was not to give direct access to SQLx to any dependent crates and make sure that all of our DB-related stuff is well encapsulated.
At the same time, zksync_types
is a crate where people move stuff when they need to share something but don't know where to put it. Which means that virtually everything in zkSync stack depends on zksync_types
. This crate is already pretty toxic in APIs it exposes, and it makes it even more toxic: people use what they can use and in a rush they may do hacks that are hard to fix later.
AFAICS you only need sqlx to expose chrono
types, which are just reexports. Instead you can depend on the same version of chrono and simply use these types directly. Which seemingly we already do so probably you don't need this dependency at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, nice suggestion
@@ -21,6 +21,8 @@ zksync_health_check = { path = "../health_check" } | |||
zksync_consensus_roles = { version = "0.1.0", git = "https://github.com/matter-labs/era-consensus.git", rev = "97d139969476a004c50f8b4a31ece748e5bee14e" } | |||
zksync_consensus_storage = { version = "0.1.0", git = "https://github.com/matter-labs/era-consensus.git", rev = "97d139969476a004c50f8b4a31ece748e5bee14e" } | |||
zksync_protobuf = { version = "0.1.0", git = "https://github.com/matter-labs/era-consensus.git", rev = "97d139969476a004c50f8b4a31ece748e5bee14e" } | |||
zksync_db_connection = { path = "../db_connection" } | |||
zksync_prover_dal = { path = "../../../prover/prover_dal" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to minimize the diff and will be removed in follow-up PRs, right?
@@ -16,7 +16,7 @@ use crate::{ | |||
sync_layer::{fetcher::FetchedBlock, sync_action::ActionQueueSender}, | |||
}; | |||
|
|||
/// Context-aware `zksync_dal::StorageProcessor` wrapper. | |||
/// Context-aware `zksync_dal::StorageProcessorWrapper` wrapper. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glob replace?
protocol_version_id, | ||
) | ||
.await; | ||
} | ||
drop(latency); | ||
// drop(latency); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WIP?
// .instrument("save_fri_proof") | ||
// .report_latency() | ||
// .with_arg("id", &id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
// TODO: value is hardcoded for now, but should be in sync with ConnectionPool::global_config() | ||
if lifetime > Duration::from_millis(5_000) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I'd expect TODOs that aren't considered to be in scope for this exact PR to be labeled with a linear ticket ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also will be done in the next PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I certainly get the conception of "smaller digestible PRs", but I strongly believe that if PR is going to be shipped in parts, then each part should either be complete on its own (e.g. the set of changes is isolated well enough) or all the leftover work has to be properly marked with TODOs and relevant task IDs.
The reason is that the work is very often reprioritized, and there is a non-zero chance that right after this PR you'll have to switch to something more urgent. All the leftovers, even if now you have them in mind, will be forgotten (or maybe someone else will pick this task up).
So it'd be great if you would create required tasks in linear and properly mark planned fixes with TODOs in this PR, just so that we're prepared for whatever future we'll have to meet. This is bureaucracy, of course, but IMHO useful one.
We have too many examples of people having to switch to something else. The latest I remember is the vm_utils
crate, which was introduced with an idea of a cool high-level reusable interface for VM, but was left in a weird state because reasons.
@@ -8,7 +8,7 @@ use anyhow::Context as _; | |||
use async_trait::async_trait; | |||
use tokio::{task::JoinHandle, time::sleep}; | |||
use zksync_config::configs::FriWitnessVectorGeneratorConfig; | |||
use zksync_dal::{fri_prover_dal::types::GpuProverInstanceStatus, ConnectionPool}; | |||
use zksync_dal::ConnectionPool; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels clunky that ConnectionPool
is not a part of zksync_db_connection
crate and you cannot really use zksync_prover_dal
without it.
I thought that the logic would be as follows:
zksync_db_connection
defines all the machinery required to interact with the database, and only it.zksync_dal
(optionally renamed tozksync_sequencer_dal
) defines the sequencer DALs and only them.zksync_prover_dal
defines the prover DALs and only them.
With segregations completed, it'd mean that crates in the core workspace would depend only on zksync_dal
and crates in the prover workspace would depend only on zksync_prover_dal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's true, but it will be done in the next PR, where we will separate ConnectionPool's, in this PR we would need a separate crates for both storage processor and connection pool, which will be avoided in the next one - so consider this one more as a preparation PR to make it more easy to review(although this one is big too)
Generally some things in this PR will be reworked in the follow-up PR, so this one is kind of preparation to decrease diff and make it easier to review. And right, I just missed some comments, that I should've uncomment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that (doubtlessly noble) considerations about minimizing the changeset should lead to noticeably worsening code quality in main
(leaky abstractions etc.). If we go with this logic, it could make sense to create a feature branch and target it.
|
||
#[derive(Debug, Metrics)] | ||
#[metrics(prefix = "sql_connection")] | ||
pub(crate) struct ConnectionMetrics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect that all metrics defined in the DAL crate would be moved here, together with the instrument
module and macros (the macros would need to be exported in order to work). Are there arguments against sharing these among DAL crates?
@@ -90,7 +92,7 @@ impl TracedConnections { | |||
} | |||
} | |||
|
|||
struct PooledStorageProcessor<'a> { | |||
pub struct PooledStorageProcessor<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worrisome that these internal structs need to be made public; looks like inappropriate abstraction boundaries. As a benchmark, this crate should export the least amount of types possible; ideally, only ConnectionPool
/ StorageProcessor
equivalents, and helpers like macros, instrumentation etc.
BlocksDal { | ||
storage: self.storage, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb question: Could Dal
structs wrap &mut StorageProcessor
instead? (Obviously, it would make sense to implement fn conn(&mut self) -> &mut PgConnection
to minimize the changeset, and DerefMut<Target = BasicStorageProcessor
for the same purpose; although in the latter case, I'd argue that replacing self.storage
with &mut self.storage.0
doesn't make the PR that harder to review.) Having to construct DALs manually looks quite clunky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can't wrap StorageProcessor - that's the most problematic part of this implementation. Since we don't want to actually separate types of ConnectionPool -> StorageProcessor
in this PR(that is why this exact PR was made) - we aren't able to implement these get_dal
methods for StorageProcessor because we will have some cyclic dependencies there - so we are using one type that we have inside of Dal structs - and the other one to actually be able to access Dal structs. I agree that it doesn't look good though - but I couldn't find a better way to implement this, but it should return to a normal state after separating the ConnectionPool's
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this module unused? (If so, please remove.) It's logically part of prover_dal
, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was checking this module, I have no idea what is its purpose. I reckon this code that can go altogether.
@@ -21,6 +21,8 @@ zksync_health_check = { path = "../health_check" } | |||
zksync_consensus_roles = { version = "0.1.0", git = "https://github.com/matter-labs/era-consensus.git", rev = "97d139969476a004c50f8b4a31ece748e5bee14e" } | |||
zksync_consensus_storage = { version = "0.1.0", git = "https://github.com/matter-labs/era-consensus.git", rev = "97d139969476a004c50f8b4a31ece748e5bee14e" } | |||
zksync_protobuf = { version = "0.1.0", git = "https://github.com/matter-labs/era-consensus.git", rev = "97d139969476a004c50f8b4a31ece748e5bee14e" } | |||
zksync_db_connection = { path = "../db_connection" } | |||
zksync_prover_dal = { path = "../../../prover/prover_dal" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds quite counter-intuitive to depend on a local crate outside the workspace. IMO, it'd make sense to temporarily place prover_dal
in the main workspace and place it in core/lib
directory, and move it to a proper directory / workspace in a follow-up PR.
@slowli Generally I agree with the point of making another branch as a target - but then we won't be able to test if everything works good with this DAL separation. So, my plan was to merge this one first, assure that everything works as intended - and then the one with ConnectionPool separation (I already started to work on it here). |
Thanks for reviewing though, I will try to get it in a better state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done a review of the prover part. I think this is a great start, but it needs more work. Left a few comments. Let's iterate on that.
} | ||
} | ||
|
||
impl Drop for MethodLatency { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like generic code. I reckon we have a very similar version in Core DAL. Maybe it makes sense to abstract this logic elsewhere?
|
||
use sqlx::{postgres::types::PgInterval, types::chrono::NaiveTime}; | ||
|
||
pub const fn pg_interval_from_duration(processing_timeout: Duration) -> PgInterval { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't those some general functions? I'm sure there's a counterpart somewhere in core. Can we make it more abstract?
@@ -249,27 +26,23 @@ impl FriProverDal<'_, '_> { | |||
depth: u16, | |||
protocol_version_id: FriProtocolVersionId, | |||
) { | |||
let latency = MethodLatency::new("save_fri_prover_jobs"); | |||
//let latency = MethodLatency::new("save_fri_prover_jobs"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this missed out by mistake?
time_utils::{duration_to_naive_time, pg_interval_from_duration}, | ||
StorageProcessor, | ||
}; | ||
use crate::{duration_to_naive_time, pg_interval_from_duration}; | ||
|
||
// TODO (PLA-775): Should not be an embedded submodule in a concrete DAL file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for cleaning this debt! Also, can we get rid of the comment as well?
@@ -8,6 +8,7 @@ vise = { git = "https://github.com/matter-labs/vise.git", version = "0.1.0", rev | |||
|
|||
zksync_types = { path = "../../core/lib/types" } | |||
zksync_dal = { path = "../../core/lib/dal" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need the DAL? I'd expect us not to. I'd check on all areas of the code for this. Prover DAL should be used in all prover areas and core DAL only on Core side.
@@ -7347,6 +7369,7 @@ dependencies = [ | |||
"serde", | |||
"serde_json", | |||
"serde_with", | |||
"sqlx", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very concerning. Why do types need sqlx?
@@ -7789,6 +7789,21 @@ dependencies = [ | |||
"windows_x86_64_msvc 0.48.5", | |||
] | |||
|
|||
[[package]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder where is this coming from.
@@ -9096,6 +9178,7 @@ dependencies = [ | |||
"serde", | |||
"serde_json", | |||
"serde_with", | |||
"sqlx", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went through most of core, but at a slightly faster speed (I see Alex and Igor insisted more here).
@@ -1,202 +0,0 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious. I was expecting a lot more queries to be gone in here. Maybe something was missed?
.await | ||
.unwrap_err(); | ||
assert_matches!( | ||
err, | ||
sqlx::Error::Database(db_err) if db_err.message().contains("statement timeout") | ||
); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tests, big kudos!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was checking this module, I have no idea what is its purpose. I reckon this code that can go altogether.
@@ -1,11 +1,13 @@ | |||
use std::{convert::TryFrom, str::FromStr}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same story about this module. Do we use it anywhere? I think this is just leftovers from some other times. Happy to remove it in a different PR (but could also go with this PR).
.sync_dal() | ||
.sync_block_inner(block_number) | ||
.await? | ||
let Some(block) = SyncDal { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen this all over the code. Maybe I'm wrong, but couldn't SyncDal have the Wrapper as storage and internally use the underlying structure? It'd make this code way nice.
@@ -0,0 +1,218 @@ | |||
//! Types exposed by the prover DAL for general-purpose use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't belong here. I'd expect it to live closer to prover dal. Maybe types in there, or a prover_types crate inside prover namespace (I'll let you look around and understand it's best place). I expect everything in here to be prover instrumentation with 0 usage on the core side.
Closing in favor of #1334 to prevent decreasing code quality |
What ❔
Why ❔
For further creating typesafe connections and prevent accessing tables from other databases with one connection pool.
Checklist
zk fmt
andzk lint
.zk spellcheck
.zk linkcheck
.