Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented protobuf equivalent of network_protocol.rs #6672

Merged
merged 10 commits into from
May 4, 2022

Conversation

pompon0
Copy link
Contributor

@pompon0 pompon0 commented Apr 22, 2022

See the design: https://docs.google.com/document/d/1gCWmt9O-h_-5JDXIqbKxAaSS3Q9pryB1f9DDY1mMav4/edit
Abstracted away the network protocol format for smooth borsh -> protobuf transition.
Implemented tests of cross-version communication.

@pompon0 pompon0 requested a review from a team as a code owner April 22, 2022 15:10
@pompon0 pompon0 requested a review from matklad April 22, 2022 15:10
Copy link
Contributor

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really, really great! I finally feel like we are fixing stuff, rather than piling on work-arounds, major kudos here!

One thing I am really unsure here is error handling strategy. I don't like how anyhow::Result makes it impossible to understand what actually goes wrong. Like, did we critically messed up our serialization logic, or is it just a transient socket error? At the same time, network is close enough to the edge of the system, where "just log and move on" is the right error handling strategy, so I also can't say that I am 100% sure that anyhow is a wrong choice here.

I wonder what @nagisa thinks here, they do have opinions on error handling: https://kazlauskas.me/entries/errors :-)

Note: I didn't do line by line review for correctness here, as I am relatively unfamiliar with the net (hence no approval). Can do one if needed!

@@ -0,0 +1,3 @@
fn main() -> std::io::Result<()> {
prost_build::compile_protos(&["src/network_protocol/network.proto"], &["src/"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a pure rust solution, or would it depend on protoc and stuff being in path? If it is the latter, I would suggest to maybe just commit generated protobuf code, to make sure that git clone && cargo build works without extra steps?

No super strong opinion here.

Copy link
Contributor Author

@pompon0 pompon0 Apr 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not pure, I somehow missed that fact, sorry. We have the following options available:

  • commit the generated code - it is not that bad due to the fact that the generated files are small (they use "derive" a lot - yay a 2-staged code generation :P).
  • enable "vendored" feature of the "prost-build" package - it removes the dependency on "protoc" in PATH, but introduces dependency on "cmake" and increases the build time
  • [INFEASIBLE - dynamically linked] add a pre-pagacked protoc binary as a dependency
  • get rid of prost package and use protobuf package instead. It contains a rust implementation of protoc.

I really dislike committing generated code to repo. Having a dependency on cmake also sounds bad to me. I'd go with the prepackaged protoc, as a well tested proto compiler, but maybe such a binary blob is not welcome in our open source blockchain technology? WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokio-rs/prost#182 is a nasty failure mode I had encountered in the past with prost but it seems that they no longer execute arbitrary code downloaded off internet at build time by default. I'd still petition for vendoring. build.rs is a terrible horrible thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nagisa could you clarify what you mean by "vendoring"? "vendoring" source of prococ and compiling that (prost-build+vendored) or "vendoring" compiled protoc binary ("protobuf" package)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI protoc-bin-vendored doesn't make much sense, because it is dynamically linked. We would have to create our own package with statically linked protoc for that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, I meant checking in the generated sources, my bad.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To unblock this, let's maybe leave this thing as is? "Do what the docs suggest, even if you think you know better" is a strong heuristic. We can revisit this later.

If we do revisit:

  • I'd be against adding binaries: I'd rather have protoc in PATH than in the repo :-)
  • I do like checking in generated sources, provided that there's a test verifying the generated sources are fresh. But that's a contrarian opinion which is a bit exhaustive to argue :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented a checksum protection for both the proto file and the checked-in generated sources (will push it tomorrow with the rest of stuff). In the meantime, PTAL at: https://github.com/pompon0/protoc-static . I've implemented a github workflow statically compiling and releasing a protoc binary. It supports now x64: linux and macOS, with some bash tuning Windows can be supported as well, and with some more work I could get it to cross compile as well to other architectures (with zig-cc). It is a 6MB binary which could be downloaded and checksummed from build.rs (in a bazel-like fashion). WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aaand, it turns out that the thing is worse than I expected.
prost-build (the package needed for code generation) actually checks for the presence of protoc in the system during build of the package itself and panics if it doesn't find it. I had to make the dependency on protoc optional and use conditional compilation to make it work :/

chain/network/src/network_protocol/borsh_conv.rs Outdated Show resolved Hide resolved
chain/network/src/network_protocol/mod.rs Outdated Show resolved Hide resolved
chain/network/src/network_protocol/mod.rs Outdated Show resolved Hide resolved
chain/network/src/network_protocol/mod.rs Outdated Show resolved Hide resolved
chain/network/src/network_protocol/proto_conv.rs Outdated Show resolved Hide resolved
chain/network/src/network_protocol/proto_conv.rs Outdated Show resolved Hide resolved
chain/network/src/peer/peer_actor.rs Outdated Show resolved Hide resolved
chain/network/src/peer/peer_actor.rs Outdated Show resolved Hide resolved
chain/network/src/tests/util.rs Outdated Show resolved Hide resolved
@matklad matklad requested a review from nagisa April 22, 2022 18:40
Copy link
Collaborator

@nagisa nagisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feedback on error stuff ^^

I definitely think a concrete error type would be preferable, but also I can see there not necessarily being much value in defining one if we don't anticipate callers to actually introspect the contents of it. Were we work on adding error structs here I would also prefer having a dedicated type for each conversion method, which is a ton of work, so if there isn't an immediate use-case for it, it is probably not super worth the effort.

chain/network/src/network_protocol/borsh_conv.rs Outdated Show resolved Hide resolved
Comment on lines 41 to 42
hash: (|| CryptoHash::try_from(p.hash.as_ref().context("missing")?))()
.context("hash")?,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of how this context could be presented to the user, I think these messages should be expanded to be more descriptive than a single word. Perhaps something like

  • the hash field in the GenesisId value is missing
  • could not convert CryptoHash / could not parse CryptoHash 1

in this specific case. More descriptive messages would also obviate the need for the (||)() pattern.

Footnotes

  1. note that the error context should be describing the operation that occurs in the fallible expression, not the surrounding code. That is, in hash: foo.context(...)? the context should be describing what kind of operation foo is, and not hash: foo.context(...)? or even a broader context. In practice not maintaining a specific polarity in adding context can lead to situations where you end up with both missing context and duplicated context. While not a problem here, contextualizing strictly the failing expression also helps to avoid guessing the callers' intent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re [1]: that doesn't make sense to me. If I have a function:
fn bar() -> Result {
...
foo().context(...)? <- X
...
}
Then while passing the error from line X to the caller of bar() (via operator "?") I need to add some information why foo() was called at X in the first place - otherwise if I have 2 similar calls done for different reasons from bar(), the error message won't help me determine the precise root cause. I agree that a consistent polarity should be established, but IMO the whole context that the error is returned from (in that case, the call to bar()) should be included.

impl TryFrom<&proto::PublicKey> for PeerId {
type Error = anyhow::Error;
fn try_from(p: &proto::PublicKey) -> anyhow::Result<Self> {
Ok(Self::try_from_slice(&p.borsh)?)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call is so trivial that IMO there is no extra context to add that could be useful (i.e. it would rather obscure the information in the cause chain). The proto message types that I introduced solely for wrapping borsh should rather be treated just like one-field tuple types in rust / newtype in Haskell - extra information to the type system, but not really for the user.

chain/network/src/network_protocol/network.proto Outdated Show resolved Hide resolved
chain/network/src/peer/peer_actor.rs Show resolved Hide resolved
chain/network/src/network_protocol/mod.rs Outdated Show resolved Hide resolved
@@ -105,6 +103,11 @@ pub(crate) struct PeerActor {
routed_message_cache: LruCache<(PeerId, PeerIdOrHash, Signature), Instant>,
/// A helper data structure for limiting reading
throttle_controller: ThrottleController,
/// Whether we detected support for protocol buffers during handshake.
protocol_buffers_supported: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in theory this should be Optional, right?

(as there are 3 states: we don't know, supported, not supported

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the full name of this field should be rather: protocol_buffers_support_detected (true/false)

chain/network/src/peer/peer_actor.rs Outdated Show resolved Hide resolved
chain/network/src/peer/peer_actor.rs Outdated Show resolved Hide resolved
EpochSyncFinalizationRequest(EpochId),
}

struct FakeActor {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you added couple cool test classes (FakeActor, ActixSystem etc) -- let's document them (either within each .rs file or create a README.md within this directory).

With info on what these classes do and couple examples on how they can be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FakeActor is private and I'm not sure whether it should be reusable in a way different than through the public interface which is currently exposed. I would like to make ActixSystem reusable, but I'd have to move it outside of the network package probably - let's do that in a separate PR.

chain/network/src/tests/util.rs Show resolved Hide resolved
Comment on lines 23 to 32
let up_to_date = (|| {
let generated = std::fs::read_to_string(generated_path).ok()?;
let generated = generated.strip_prefix(&want_prefix)?;
let (prefix, content) = generated.split_once('\n')?;
if format!("{prefix}\n") != content_prefix(content) {
return None;
}
Some(())
})()
.is_some();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you didn't use starts_with?

Suggested change
let up_to_date = (|| {
let generated = std::fs::read_to_string(generated_path).ok()?;
let generated = generated.strip_prefix(&want_prefix)?;
let (prefix, content) = generated.split_once('\n')?;
if format!("{prefix}\n") != content_prefix(content) {
return None;
}
Some(())
})()
.is_some();
let up_to_date = std::fs::read_to_string(generated_path).and_then(|s| {
s.starts_with(&want_prefix)
}).unwrap_or(false);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more tricky than that. In addition to the hash of the proto file, I'm prepending the hash of the generated code to the file with the generated code ITSELF. So that if someone modifies the generated code by mistake, the compiler will should on them. To verify this hash, I need to first strip it from the file content, then compute the hash of the content remainder, and only then compare hashes.

core/primitives-core/src/hash.rs Outdated Show resolved Hide resolved
@nagisa
Copy link
Collaborator

nagisa commented Apr 28, 2022

Overall LGTM from my end.

Copy link
Contributor

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of non-blocking comments, but overall it looks good to me, and, given its size, we might want to merge it sooner and do any fixes, if any, in follow-ups.

chain/network/build.rs Outdated Show resolved Hide resolved
@@ -0,0 +1,49 @@
use sha2::Digest;

fn content_prefix(content: &str) -> String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a weak consensus for top-down code: placing entry-points and public API at the top, and helpers and impl details at the bottom:

https://near.zulipchat.com/#narrow/stream/300659-Rust-.F0.9F.A6.80/topic/Code.20style.20question/near/260109589

Comment on lines +68 to +120
pub fn make_account_id<R: Rng>(rng: &mut R) -> AccountId {
format!("account{}", rng.gen::<u32>()).parse().unwrap()
}

pub fn make_signer<R: Rng>(rng: &mut R) -> InMemorySigner {
let account_id = make_account_id(rng);
InMemorySigner::from_seed(account_id.clone(), KeyType::ED25519, &account_id)
}

pub fn make_validator_signer<R: Rng>(rng: &mut R) -> InMemoryValidatorSigner {
let account_id = make_account_id(rng);
InMemoryValidatorSigner::from_seed(account_id.clone(), KeyType::ED25519, &account_id)
}

pub fn make_addr<R: Rng>(rng: &mut R) -> SocketAddr {
SocketAddr::V4(SocketAddrV4::new(Ipv4Addr::LOCALHOST, rng.gen()))
}

pub fn make_peer_info<R: Rng>(rng: &mut R) -> PeerInfo {
let signer = make_signer(rng);
PeerInfo {
id: PeerId::new(signer.public_key),
addr: Some(make_addr(rng)),
account_id: Some(signer.account_id),
}
}

pub fn make_announce_account<R: Rng>(rng: &mut R) -> AnnounceAccount {
let peer_id = PeerId::new(make_signer(rng).public_key);
let validator_signer = make_validator_signer(rng);
let signature = validator_signer.sign_account_announce(
validator_signer.validator_id(),
&peer_id,
&EpochId::default(),
);
AnnounceAccount {
account_id: validator_signer.validator_id().clone(),
peer_id: peer_id,
epoch_id: EpochId::default(),
signature,
}
}

pub fn make_partial_edge<R: Rng>(rng: &mut R) -> PartialEdgeInfo {
let a = make_signer(rng);
let b = make_signer(rng);
PartialEdgeInfo::new(
&PeerId::new(a.public_key),
&PeerId::new(b.public_key),
rng.gen(),
&a.secret_key,
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using https://docs.rs/arbitrary/latest/arbitrary/ for generating random data.

If we want to stop at a single randomized test here, using rand is fine. If we want to grow this infrae to a long term randomizied testing infrastructure, I would feel rather strongly that we should base the thing on arbitrary.

arbitrary is like rand, but:

  • it is a finite PRNG, so it's easy to vary length of input to vary the length of the test
  • it can be easily plugged into coverage-guided fuzzing
  • it provides derive for simple cases
  • it's a better crate than rand overall (simpler API, no generics(compile time), no dependencies)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if I need more parameters than just an entropy source to construct test input? Like consistent clock timestamps for example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT arbitrary assumes that data is generated independently for each branch, which is a feeble assumption IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in a sense, that depending on the property being fuzzed, a different distribution of input data might be desirable. Otherwise you just test whether your code is able to reject obviously invalid data gracefully.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a different distribution of input data might be desirable.

Indeed, and that's feasible with arbitrary: you don't have to implement Arbitrary trait, the main value comes from the Unstructured type:

fn make_strings(u: &mut arbitrary::Unstructured<'_>, alphabet: &str) -> arbitrary::Result<Vec<String>>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.rs/wasm-smith/latest/wasm_smith/ would be a cool example of using highly-skewed distributions, esp SwarmConfig: https://docs.rs/wasm-smith/latest/wasm_smith/struct.SwarmConfig.html

chain/network/src/tests/data.rs Outdated Show resolved Hide resolved
Comment on lines +92 to +96
// Encodings should never be compatible.
for (from, to) in [(Encoding::Proto, Encoding::Borsh), (Encoding::Borsh, Encoding::Proto)] {
for m in &msgs {
let bytes = &m.serialize(from);
match PeerMessage::deserialize(to, bytes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you'd like to fuzz this property, ask @Ekleog

chain/network/src/tests/peer_actor.rs Outdated Show resolved Hide resolved
chain/network/src/tests/util.rs Outdated Show resolved Hide resolved
@@ -7,9 +7,20 @@ edition = "2021"
rust-version = "1.60.0"
publish = false

[build-dependencies]
prost-build = { version = "0.10"}
sha2 = "0.9"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use near_stable_hasher here given that we already have it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sha256 is very stable as well :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not about stability, it's about not having logical duplicates. We already use near_stable_hasher for near-network so there's no need to add sha2 on top of that. For this use-case, stable-hasher and sha2 play equivalent roles -- provide some hash you can use for fingerprinting.

@pompon0 pompon0 force-pushed the gprusak-network-protobufs branch 2 times, most recently from 15e9d1d to 85e82ae Compare May 4, 2022 08:33
pompon0 and others added 9 commits May 4, 2022 10:33
Abstracted away the network protocol format for smooth borsh -> protobuf
transition.
Implemented tests of cross-version communication.
Co-authored-by: Aleksey Kladov <aleksey.kladov@gmail.com>
hash-protected proto code generation
@pompon0 pompon0 force-pushed the gprusak-network-protobufs branch 3 times, most recently from 4375ce5 to 432c20c Compare May 4, 2022 09:10
@near-bulldozer near-bulldozer bot merged commit 0d879ac into master May 4, 2022
@near-bulldozer near-bulldozer bot deleted the gprusak-network-protobufs branch May 4, 2022 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants