Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Move prover DAL to prover workspace #1350

Closed
wants to merge 30 commits into from
Closed

Conversation

Artemka374
Copy link
Contributor

@Artemka374 Artemka374 commented Mar 5, 2024

What ❔

  • Move prover methods to prover DAL crate
  • Generalize StorageProcessor and move it to another crate
  • Rename StorageProcessor used in ConnectionPool to ConnectionOperator

Why ❔

For further creating typesafe connections and prevent accessing tables from other databases with one connection pool.

Checklist

  • PR title corresponds to the body of PR (we generate changelog entries from PRs).
  • Tests for the changes have been added / updated.
  • Documentation comments have been added / updated.
  • Code has been formatted via zk fmt and zk lint.
  • Spellcheck has been run via zk spellcheck.
  • Linkcheck has been run via zk linkcheck.

@Artemka374 Artemka374 changed the title Move prover DAL to prover workspace feat: Move prover DAL to prover workspace Mar 5, 2024
@Artemka374 Artemka374 marked this pull request as ready for review March 7, 2024 12:58
@Artemka374 Artemka374 requested a review from EmilLuta March 7, 2024 12:58
@Artemka374 Artemka374 requested a review from slowli March 7, 2024 13:10
@EmilLuta
Copy link
Contributor

EmilLuta commented Mar 7, 2024

CI seems red. I assume something might've changed in the way you start the server?

@Artemka374
Copy link
Contributor Author

It shouldn't, I didn't touch it, all the previous runs were successful though

@EmilLuta
Copy link
Contributor

EmilLuta commented Mar 8, 2024

Note for Emil -- run the unit tests locally.

@Artemka374 Artemka374 requested a review from popzxc March 8, 2024 08:14
Copy link
Member

@popzxc popzxc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a pretty shallow review for now, but overall the PR feels to be in a WIP state.

@@ -32,6 +32,8 @@ num_enum = "0.6"
hex = "0.4"
prost = "0.12.1"

sqlx = "0.7.3"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very dangerous. One of the main purposes of DAL crate was not to give direct access to SQLx to any dependent crates and make sure that all of our DB-related stuff is well encapsulated.

At the same time, zksync_types is a crate where people move stuff when they need to share something but don't know where to put it. Which means that virtually everything in zkSync stack depends on zksync_types. This crate is already pretty toxic in APIs it exposes, and it makes it even more toxic: people use what they can use and in a rush they may do hacks that are hard to fix later.

AFAICS you only need sqlx to expose chrono types, which are just reexports. Instead you can depend on the same version of chrono and simply use these types directly. Which seemingly we already do so probably you don't need this dependency at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, nice suggestion

@@ -21,6 +21,8 @@ zksync_health_check = { path = "../health_check" }
zksync_consensus_roles = { version = "0.1.0", git = "https://github.com/matter-labs/era-consensus.git", rev = "97d139969476a004c50f8b4a31ece748e5bee14e" }
zksync_consensus_storage = { version = "0.1.0", git = "https://github.com/matter-labs/era-consensus.git", rev = "97d139969476a004c50f8b4a31ece748e5bee14e" }
zksync_protobuf = { version = "0.1.0", git = "https://github.com/matter-labs/era-consensus.git", rev = "97d139969476a004c50f8b4a31ece748e5bee14e" }
zksync_db_connection = { path = "../db_connection" }
zksync_prover_dal = { path = "../../../prover/prover_dal" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to minimize the diff and will be removed in follow-up PRs, right?

@@ -16,7 +16,7 @@ use crate::{
sync_layer::{fetcher::FetchedBlock, sync_action::ActionQueueSender},
};

/// Context-aware `zksync_dal::StorageProcessor` wrapper.
/// Context-aware `zksync_dal::StorageProcessorWrapper` wrapper.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glob replace?

protocol_version_id,
)
.await;
}
drop(latency);
// drop(latency);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP?

Comment on lines +264 to +266
// .instrument("save_fri_proof")
// .report_latency()
// .with_arg("id", &id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Comment on lines +118 to +119
// TODO: value is hardcoded for now, but should be in sync with ConnectionPool::global_config()
if lifetime > Duration::from_millis(5_000) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I'd expect TODOs that aren't considered to be in scope for this exact PR to be labeled with a linear ticket ID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also will be done in the next PR

Copy link
Member

@popzxc popzxc Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I certainly get the conception of "smaller digestible PRs", but I strongly believe that if PR is going to be shipped in parts, then each part should either be complete on its own (e.g. the set of changes is isolated well enough) or all the leftover work has to be properly marked with TODOs and relevant task IDs.

The reason is that the work is very often reprioritized, and there is a non-zero chance that right after this PR you'll have to switch to something more urgent. All the leftovers, even if now you have them in mind, will be forgotten (or maybe someone else will pick this task up).

So it'd be great if you would create required tasks in linear and properly mark planned fixes with TODOs in this PR, just so that we're prepared for whatever future we'll have to meet. This is bureaucracy, of course, but IMHO useful one.

We have too many examples of people having to switch to something else. The latest I remember is the vm_utils crate, which was introduced with an idea of a cool high-level reusable interface for VM, but was left in a weird state because reasons.

@@ -8,7 +8,7 @@ use anyhow::Context as _;
use async_trait::async_trait;
use tokio::{task::JoinHandle, time::sleep};
use zksync_config::configs::FriWitnessVectorGeneratorConfig;
use zksync_dal::{fri_prover_dal::types::GpuProverInstanceStatus, ConnectionPool};
use zksync_dal::ConnectionPool;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels clunky that ConnectionPool is not a part of zksync_db_connection crate and you cannot really use zksync_prover_dal without it.

I thought that the logic would be as follows:

  • zksync_db_connection defines all the machinery required to interact with the database, and only it.
  • zksync_dal (optionally renamed to zksync_sequencer_dal) defines the sequencer DALs and only them.
  • zksync_prover_dal defines the prover DALs and only them.

With segregations completed, it'd mean that crates in the core workspace would depend only on zksync_dal and crates in the prover workspace would depend only on zksync_prover_dal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's true, but it will be done in the next PR, where we will separate ConnectionPool's, in this PR we would need a separate crates for both storage processor and connection pool, which will be avoided in the next one - so consider this one more as a preparation PR to make it more easy to review(although this one is big too)

@Artemka374
Copy link
Contributor Author

Generally some things in this PR will be reworked in the follow-up PR, so this one is kind of preparation to decrease diff and make it easier to review. And right, I just missed some comments, that I should've uncomment

Copy link
Contributor

@slowli slowli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that (doubtlessly noble) considerations about minimizing the changeset should lead to noticeably worsening code quality in main (leaky abstractions etc.). If we go with this logic, it could make sense to create a feature branch and target it.


#[derive(Debug, Metrics)]
#[metrics(prefix = "sql_connection")]
pub(crate) struct ConnectionMetrics {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect that all metrics defined in the DAL crate would be moved here, together with the instrument module and macros (the macros would need to be exported in order to work). Are there arguments against sharing these among DAL crates?

@@ -90,7 +92,7 @@ impl TracedConnections {
}
}

struct PooledStorageProcessor<'a> {
pub struct PooledStorageProcessor<'a> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worrisome that these internal structs need to be made public; looks like inappropriate abstraction boundaries. As a benchmark, this crate should export the least amount of types possible; ideally, only ConnectionPool / StorageProcessor equivalents, and helpers like macros, instrumentation etc.

Comment on lines +572 to +574
BlocksDal {
storage: self.storage,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question: Could Dal structs wrap &mut StorageProcessor instead? (Obviously, it would make sense to implement fn conn(&mut self) -> &mut PgConnection to minimize the changeset, and DerefMut<Target = BasicStorageProcessor for the same purpose; although in the latter case, I'd argue that replacing self.storage with &mut self.storage.0 doesn't make the PR that harder to review.) Having to construct DALs manually looks quite clunky.

Copy link
Contributor Author

@Artemka374 Artemka374 Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't wrap StorageProcessor - that's the most problematic part of this implementation. Since we don't want to actually separate types of ConnectionPool -> StorageProcessor in this PR(that is why this exact PR was made) - we aren't able to implement these get_dal methods for StorageProcessor because we will have some cyclic dependencies there - so we are using one type that we have inside of Dal structs - and the other one to actually be able to access Dal structs. I agree that it doesn't look good though - but I couldn't find a better way to implement this, but it should return to a normal state after separating the ConnectionPool's

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this module unused? (If so, please remove.) It's logically part of prover_dal, isn't it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was checking this module, I have no idea what is its purpose. I reckon this code that can go altogether.

@@ -21,6 +21,8 @@ zksync_health_check = { path = "../health_check" }
zksync_consensus_roles = { version = "0.1.0", git = "https://github.com/matter-labs/era-consensus.git", rev = "97d139969476a004c50f8b4a31ece748e5bee14e" }
zksync_consensus_storage = { version = "0.1.0", git = "https://github.com/matter-labs/era-consensus.git", rev = "97d139969476a004c50f8b4a31ece748e5bee14e" }
zksync_protobuf = { version = "0.1.0", git = "https://github.com/matter-labs/era-consensus.git", rev = "97d139969476a004c50f8b4a31ece748e5bee14e" }
zksync_db_connection = { path = "../db_connection" }
zksync_prover_dal = { path = "../../../prover/prover_dal" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds quite counter-intuitive to depend on a local crate outside the workspace. IMO, it'd make sense to temporarily place prover_dal in the main workspace and place it in core/lib directory, and move it to a proper directory / workspace in a follow-up PR.

@Artemka374
Copy link
Contributor Author

@slowli Generally I agree with the point of making another branch as a target - but then we won't be able to test if everything works good with this DAL separation. So, my plan was to merge this one first, assure that everything works as intended - and then the one with ConnectionPool separation (I already started to work on it here).

@Artemka374
Copy link
Contributor Author

Thanks for reviewing though, I will try to get it in a better state

Copy link
Contributor

@EmilLuta EmilLuta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a review of the prover part. I think this is a great start, but it needs more work. Left a few comments. Let's iterate on that.

}
}

impl Drop for MethodLatency {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like generic code. I reckon we have a very similar version in Core DAL. Maybe it makes sense to abstract this logic elsewhere?


use sqlx::{postgres::types::PgInterval, types::chrono::NaiveTime};

pub const fn pg_interval_from_duration(processing_timeout: Duration) -> PgInterval {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't those some general functions? I'm sure there's a counterpart somewhere in core. Can we make it more abstract?

@@ -249,27 +26,23 @@ impl FriProverDal<'_, '_> {
depth: u16,
protocol_version_id: FriProtocolVersionId,
) {
let latency = MethodLatency::new("save_fri_prover_jobs");
//let latency = MethodLatency::new("save_fri_prover_jobs");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this missed out by mistake?

time_utils::{duration_to_naive_time, pg_interval_from_duration},
StorageProcessor,
};
use crate::{duration_to_naive_time, pg_interval_from_duration};

// TODO (PLA-775): Should not be an embedded submodule in a concrete DAL file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning this debt! Also, can we get rid of the comment as well?

@@ -8,6 +8,7 @@ vise = { git = "https://github.com/matter-labs/vise.git", version = "0.1.0", rev

zksync_types = { path = "../../core/lib/types" }
zksync_dal = { path = "../../core/lib/dal" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need the DAL? I'd expect us not to. I'd check on all areas of the code for this. Prover DAL should be used in all prover areas and core DAL only on Core side.

@@ -7347,6 +7369,7 @@ dependencies = [
"serde",
"serde_json",
"serde_with",
"sqlx",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very concerning. Why do types need sqlx?

@@ -7789,6 +7789,21 @@ dependencies = [
"windows_x86_64_msvc 0.48.5",
]

[[package]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder where is this coming from.

@@ -9096,6 +9178,7 @@ dependencies = [
"serde",
"serde_json",
"serde_with",
"sqlx",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks off.

Copy link
Contributor

@EmilLuta EmilLuta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through most of core, but at a slightly faster speed (I see Alex and Igor insisted more here).

@@ -1,202 +0,0 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious. I was expecting a lot more queries to be gone in here. Maybe something was missed?

.await
.unwrap_err();
assert_matches!(
err,
sqlx::Error::Database(db_err) if db_err.message().contains("statement timeout")
);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tests, big kudos!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was checking this module, I have no idea what is its purpose. I reckon this code that can go altogether.

@@ -1,11 +1,13 @@
use std::{convert::TryFrom, str::FromStr};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same story about this module. Do we use it anywhere? I think this is just leftovers from some other times. Happy to remove it in a different PR (but could also go with this PR).

.sync_dal()
.sync_block_inner(block_number)
.await?
let Some(block) = SyncDal {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen this all over the code. Maybe I'm wrong, but couldn't SyncDal have the Wrapper as storage and internally use the underlying structure? It'd make this code way nice.

@@ -0,0 +1,218 @@
//! Types exposed by the prover DAL for general-purpose use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't belong here. I'd expect it to live closer to prover dal. Maybe types in there, or a prover_types crate inside prover namespace (I'll let you look around and understand it's best place). I expect everything in here to be prover instrumentation with 0 usage on the core side.

@Artemka374
Copy link
Contributor Author

Artemka374 commented Mar 8, 2024

Closing in favor of #1334 to prevent decreasing code quality

@Artemka374 Artemka374 closed this Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants