KademliaRecordFiltering and filter events #2163

rubdos · 2021-07-27T14:05:39Z

This is a first version of a record filtering system, which filters both PutRecord and AddProvider, on top of the existing ttl and key filters. First comments would be nice. Currently untested (waiting for my previous PR to be merged, then I can rebase and test 😉 )

What about reporting storage failures? Document that the caller should issue Reset on the connection?

Fixes #2140

rubdos · 2021-07-28T10:27:11Z

I've tested it now, by adding

KademliaEvent::InboundPutRecordRequest { source, record, .. }
    log::info!(                                              
        "Inbound put record request from {}, record {:?}",   
        source,                                              
        record                                               
    );                                                       
    self.kademlia.store_mut().put(record).ok();              
}

to my NetworkBehaviourEventProcess<KademliaEvent> handler. Seems to work as intended.

More complete demo can be observed here.

mxinden

Just a couple of smaller suggestions. This looks great already, though it would need at least one unit test and changelog entry.

mxinden · 2021-07-30T16:51:45Z

protocols/kad/src/behaviour.rs

+    /// and if deemed correct, should call [`RecordStore::put`].
+    ///
+    /// Provider records are forwarded directly to the [`RecordStore`].
+    FilterRecords,


In which case would one always want to accept provider records, but double check normal records? Would one not always want to check both? In case there is no concrete use-case today, I would prefer not offering the additional option.

Should the FilterBoth then be renamed to Filter?

Looks like my provider_received change wasn't even correct... if self.record_filtering != KademliaStoreInserts::Unfiltered

protocols/kad/src/behaviour.rs

rubdos · 2021-08-03T15:27:52Z

Note to self wrt 55cc8d5 , Rust Analyzer "rename" doesn't take into account doc links yet.

I left one last question in your review.

Still remaining:

What kind of unit test do we want here? Some kind of copy-paste from the put test, which uses the filtering setting instead?
What with failed put requests, should we have the end user signal such failure to the requesting node? Currently we return OK, even before it's inserted good and well into the record store.

mxinden

Should the FilterBoth then be renamed to Filter?

I am fine either way. I guess FilterBoth allows us to introduce FilterRecord and FilterProvider in the future with only a semi-breaking change.

What kind of unit test do we want here? Some kind of copy-paste from the put test, which uses the filtering setting instead?

👍

What with failed put requests, should we have the end user signal such failure to the requesting node? Currently we return OK, even before it's inserted good and well into the record store.

I don't think this is necessary and might even open up an attack vector. That said, I am happy to revisit this in case someone has the need for it in the future.

protocols/kad/CHANGELOG.md

protocols/kad/src/behaviour.rs

mxinden · 2021-08-17T08:22:30Z

Thank you for the latest changes.

Note that we merged #2188 recently. To reduce merge conflict size you might want to try first formatting libp2p-kad and then merging latest master.

Only thing missing from my side would be a simple unit test.

rubdos · 2021-08-17T09:00:45Z

Awesome, rustfmt! Thanks for that :-)

Expect a test soon!

Co-authored-by: Max Inden <mail@max-inden.de>

rubdos · 2021-08-17T09:14:08Z

Note that we merged #2188 recently. To reduce merge conflict size you might want to try first formatting libp2p-kad and then merging latest master.

I've instead rebased and formatted every commit, because rustfmt+merge was a daunting merge conflict...

Co-authored-by: Max Inden <mail@max-inden.de>

rubdos · 2021-08-17T09:55:56Z

I've added two tests, one that filters out the records and asserts that they cannot be found (this is only true on an honest network, which is the case in the test), and another that persists the record.

I'm not proud of the copy-pasting that's going on, but I feel that refactoring those tests is maybe something for another day?

mxinden · 2021-08-17T11:43:21Z

protocols/kad/src/behaviour/test.rs

@@ -695,6 +695,407 @@ fn put_record() {
    QuickCheck::new().tests(3).quickcheck(prop as fn(_, _) -> _)
 }

+/// A node joining a fully connected network cannot store a record if the nodes ignore it through


A bit late to suggest this, sorry about that. Still: How about integrating this test into put_record by adding something along the lines of:

if rng.gen() { config.set_record_filtering(KademliaStoreInserts::FilterBoth); }

And then, in case one receives a InboundPutRecordRequest, double check that config.record_filtering was set to FilterBoth.

That would reduce the amount of code required, no?

Do you have some coverage analysis in place to check that these code paths are hit? Using an rng for that sounds scary :'-)

I'll implement it like that though, sounds a lot cleaner. I'll test it using provider records too.

You are right. Let's get rid of the RNG. How about the diff below + optionally setting set_record_filtering in the TestConfig Arbitrary implementation?

diff --git a/protocols/kad/src/behaviour/test.rs b/protocols/kad/src/behaviour/test.rs index 110c1e22..b27fd076 100644 --- a/protocols/kad/src/behaviour/test.rs +++ b/protocols/kad/src/behaviour/test.rs @@ -488,24 +488,28 @@ fn get_record_not_found() { /// is equal to the configured replication factor. #[test] fn put_record() { - fn prop(records: Vec<Record>, seed: Seed) { - let mut rng = StdRng::from_seed(seed.0); - let replication_factor = - NonZeroUsize::new(rng.gen_range(1, (K_VALUE.get() / 2) + 1)).unwrap(); - // At least 4 nodes, 1 under test + 3 bootnodes. - let num_total = usize::max(4, replication_factor.get() * 2); - - let mut config = KademliaConfig::default(); - config.set_replication_factor(replication_factor); - if rng.gen() { - config.disjoint_query_paths(true); + #[derive(Clone, Debug)] + struct TestConfig(KademliaConfig); + + impl Arbitrary for TestConfig { + fn arbitrary<G: Gen>(g: &mut G) -> TestConfig { + let mut config = KademliaConfig::default(); + config.set_replication_factor(NonZeroUsize::new(g.gen_range(1, (K_VALUE.get() / 2) + 1)).unwrap()); + config.disjoint_query_paths(g.gen()); + TestConfig(config) } + } + + fn prop(records: Vec<Record>, config: TestConfig) { + let config = config.0; + // At least 4 nodes, 1 under test + 3 bootnodes. + let num_total = usize::max(4, config.query_config.replication_factor.get() * 2); let mut swarms = { let mut fully_connected_swarms = build_fully_connected_nodes_with_config(num_total - 1, config.clone()); - let mut single_swarm = build_node_with_config(config); + let mut single_swarm = build_node_with_config(config.clone()); // Connect `single_swarm` to three bootnodes. for i in 0..3 { single_swarm.1.behaviour_mut().add_address( @@ -584,7 +588,7 @@ fn put_record() { ))) => { assert!(qids.is_empty() || qids.remove(&id)); assert!(stats.duration().is_some()); - assert!(stats.num_successes() >= replication_factor.get() as u32); + assert!(stats.num_successes() >= config.query_config.replication_factor.get() as u32); assert!(stats.num_requests() >= stats.num_successes()); assert_eq!(stats.num_failures(), 0); match res { @@ -635,7 +639,7 @@ fn put_record() { let expected = expected .into_iter() - .take(replication_factor.get()) + .take(config.query_config.replication_factor.get()) .collect::<HashSet<_>>(); let actual = swarms @@ -650,7 +654,7 @@ fn put_record() { }) .collect::<HashSet<_>>(); - assert_eq!(actual.len(), replication_factor.get()); + assert_eq!(actual.len(), config.query_config.replication_factor.get()); let actual_not_expected = actual.difference(&expected).collect::<Vec<&PeerId>>(); assert!( (END)

With only 3 tests, I have a test case that doesn't get hit. With four, it does seem to hit all paths. I'll add two boolean parameters to the test properties instead, although I still have to increase the test count.

mxinden · 2021-08-17T11:44:10Z

protocols/kad/src/behaviour/test.rs

+            // Consume the results, checking that each record was replicated
+            // correctly to the closest peers to the key.


This is outdated, right?

Not any more with the new approach, I think? Or do you mean to rewrite the comment itself?

mxinden · 2021-08-17T11:49:07Z

protocols/kad/src/behaviour/test.rs

+                let mut expected = swarms
+                    .iter()
+                    .skip(1)
+                    .map(Swarm::local_peer_id)
+                    .cloned()
+                    .collect::<Vec<_>>();
+                expected.sort_by(|id1, id2| {
+                    kbucket::Key::from(*id1)
+                        .distance(&key)
+                        .cmp(&kbucket::Key::from(*id2).distance(&key))
+                });
+
+                let expected = expected
+                    .into_iter()
+                    .take(replication_factor.get())
+                    .collect::<HashSet<_>>();
+


Is this needed?

mxinden · 2021-08-17T12:52:45Z

With only 3 tests, I have a test case that doesn't get hit. With four, it does seem to hit all paths. I'll add two boolean parameters to the test properties instead, although I still have to increase the test count.

Note that this one test (put_record) includes many more permutations than 3. The goal of quickcheck here is not exhaust all permutations on each execution, but instead probablistically exhaust all permutations across many runs (e.g. locally, on each pull request, on each merge, ...). It is a trait-off between running time and coverage. Just want to make sure there is no confusion. I am not opposed to increasing it to 4.

I am fine with either the additional bool parameters or the TestConfig. As you wish.

rubdos · 2021-08-17T12:59:29Z

Note that this one test (put_record) includes many more permutations than 3. The goal of quickcheck here is not exhaust all permutations on each execution, but instead probablistically exhaust all permutations across many runs

Aha. I got the feeling however that it used a deterministically seeded Rng behind the scenes, because I couldn't repro on 3 but it was stable on 4...

Are you fine with the way it turned out now? I've added two parameters and incremented the testing to 4...

mxinden

Thanks Ruben for the contributions and for bearing with me :)

rubdos · 2021-08-17T14:57:58Z

Thanks Ruben for the contributions and for bearing with me :)

Very glad to be of use. It's already been a pleasure to work with you folks!

* protocols/gossipsub: Fix inconsistency in mesh peer tracking (libp2p#2189) Co-authored-by: Age Manning <Age@AgeManning.com> * misc/metrics: Add auxiliary crate to record events as OpenMetrics (libp2p#2063) This commit adds an auxiliary crate recording protocol and Swarm events and exposing them as metrics in the OpenMetrics format. * README: Mention security@ipfs.io * examples/: Add file sharing example (libp2p#2186) Basic file sharing application with peers either providing or locating and getting files by name. While obviously showcasing how to build a basic file sharing application, the actual goal of this example is **to show how to integrate rust-libp2p into a larger application**. Architectural properties - Clean clonable async/await interface ([`Client`]) to interact with the network layer. - Single task driving the network layer, no locks required. * examples/README: Give an overview over the many examples (libp2p#2194) * protocols/kad: Enable filtering of (provider) records (libp2p#2163) Introduce `KademliaStoreInserts` option, which allows to filter records. Co-authored-by: Max Inden <mail@max-inden.de> * swarm/src/protocols_handler: Impl ProtocolsHandler on either::Either (libp2p#2192) Implement ProtocolsHandler on either::Either representing either of two ProtocolsHandler implementations. Co-authored-by: Thomas Eizinger <thomas@eizinger.io> * *: Make libp2p-core default features optional (libp2p#2181) Co-authored-by: Max Inden <mail@max-inden.de> * core/: Remove DisconnectedPeer::set_connected and Pool::add (libp2p#2195) This logic seems to be a leftover of libp2p#889 and unused today. * protocols/gossipsub: Use ed25519 in tests (libp2p#2197) With f2905c0 the secp256k1 feature is disabled by default. Instead of enabling it in the dev-dependency, simply use ed25519. * build(deps): Update minicbor requirement from 0.10 to 0.11 (libp2p#2200) Updates the requirements on [minicbor](https://gitlab.com/twittner/minicbor) to permit the latest version. - [Release notes](https://gitlab.com/twittner/minicbor/tags) - [Changelog](https://gitlab.com/twittner/minicbor/blob/master/CHANGELOG.md) - [Commits](https://gitlab.com/twittner/minicbor/compare/minicbor-v0.10.0...minicbor-v0.11.0) --- updated-dependencies: - dependency-name: minicbor dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): Update salsa20 requirement from 0.8 to 0.9 (libp2p#2206) * build(deps): Update salsa20 requirement from 0.8 to 0.9 Updates the requirements on [salsa20](https://github.com/RustCrypto/stream-ciphers) to permit the latest version. - [Release notes](https://github.com/RustCrypto/stream-ciphers/releases) - [Commits](RustCrypto/stream-ciphers@ctr-v0.8.0...salsa20-v0.9.0) --- updated-dependencies: - dependency-name: salsa20 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * *: Bump pnet to v0.22 Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Max Inden <mail@max-inden.de> * *: Dial with handler and return handler on error and closed (libp2p#2191) Require `NetworkBehaviourAction::{DialPeer,DialAddress}` to contain a `ProtocolsHandler`. This allows a behaviour to attach custom state to its handler. The behaviour would no longer need to track this state separately during connection establishment, thus reducing state required in a behaviour. E.g. in the case of `libp2p-kad` the behaviour can include a `GetRecord` request in its handler, or e.g. in the case of `libp2p-request-response` the behaviour can include the first request in the handler. Return `ProtocolsHandler` on connection error and close. This allows a behaviour to extract its custom state previously included in the handler on connection failure and connection closing. E.g. in the case of `libp2p-kad` the behaviour could extract the attached `GetRecord` from the handler of the failed connection and then start another connection attempt with a new handler with the same `GetRecord` or bubble up an error to the user. Co-authored-by: Thomas Eizinger <thomas@eizinger.io> * core/: Remove deprecated read/write functions (libp2p#2213) Co-authored-by: Max Inden <mail@max-inden.de> * protocols/ping: Revise naming of symbols (libp2p#2215) Co-authored-by: Max Inden <mail@max-inden.de> * protocols/rendezvous: Implement protocol (libp2p#2107) Implement the libp2p rendezvous protocol. > A lightweight mechanism for generalized peer discovery. It can be used for bootstrap purposes, real time peer discovery, application specific routing, and so on. Co-authored-by: rishflab <rishflab@hotmail.com> Co-authored-by: Daniel Karzel <daniel@comit.network> * core/src/network/event.rs: Fix typo (libp2p#2218) * protocols/mdns: Do not fire all timers at the same time. (libp2p#2212) Co-authored-by: Max Inden <mail@max-inden.de> * misc/metrics/src/kad: Set query_duration lowest bucket to 0.1 sec (libp2p#2219) Probability for a Kademlia query to return in less than 100 milliseconds is low, thus increasing the lower bucket to improve accuracy within the higher ranges. * misc/metrics/src/swarm: Expose role on connections_closed (libp2p#2220) Expose whether closed connection was a Dialer or Listener. * .github/workflows/ci.yml: Use clang 11 (libp2p#2233) * protocols/rendezvous: Update prost (libp2p#2226) Co-authored-by: Max Inden <mail@max-inden.de> * *: Fix clippy warnings (libp2p#2227) * swarm-derive/: Make event_process = false the default (libp2p#2214) Co-authored-by: Max Inden <mail@max-inden.de> Co-authored-by: Max Inden <mail@max-inden.de> Co-authored-by: Age Manning <Age@AgeManning.com> Co-authored-by: Ruben De Smet <ruben.de.smet@rubdos.be> Co-authored-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: rishflab <rishflab@hotmail.com> Co-authored-by: Daniel Karzel <daniel@comit.network> Co-authored-by: David Craven <david@craven.ch>

rubdos marked this pull request as ready for review July 27, 2021 14:58

rubdos force-pushed the kademlia-filtering branch from b1f1d73 to c961ad3 Compare July 28, 2021 16:16

mxinden reviewed Jul 30, 2021

View reviewed changes

mxinden reviewed Aug 9, 2021

View reviewed changes

rubdos and others added 10 commits August 17, 2021 11:04

Add KademliaRecordFiltering option

3de2622

Add KademliaRecordFiltering option to Kademlia itself

cb7eb60

Introduce record filtering events

7e42be5

Expose filtering events

28b4238

Expose KademliaRecordFiltering in kad module

f34de6d

Fix doc links

b75c3f8

Update protocols/kad/src/behaviour.rs

493ec66

Co-authored-by: Max Inden <mail@max-inden.de>

Update protocols/kad/src/behaviour.rs

ab5a65d

Co-authored-by: Max Inden <mail@max-inden.de>

Update protocols/kad/src/behaviour.rs

3a0fbff

Co-authored-by: Max Inden <mail@max-inden.de>

Move filter events up in KademliaEvent

facde99

rubdos force-pushed the kademlia-filtering branch from 402ac31 to c8d3003 Compare August 17, 2021 09:13

rubdos and others added 7 commits August 17, 2021 11:25

s/KademliaRecordFiltering/KademliaStoreInserts

5a8c9e4

Remove KademliaStoreInserts::FilterRecords

0c6474d

Add changelog entry for KademliaStoreInserts

493e005

Update protocols/kad/CHANGELOG.md

379a218

Co-authored-by: Max Inden <mail@max-inden.de>

Line break

56f4774

Replace an if with a match

e9e9c99

Replace an if with a match

636e502

rubdos force-pushed the kademlia-filtering branch from c8d3003 to 636e502 Compare August 17, 2021 09:26

Add two tests for record filtering

be9b45b

mxinden reviewed Aug 17, 2021

View reviewed changes

fixup! Add two tests for record filtering

446f980

mxinden approved these changes Aug 17, 2021

View reviewed changes

mxinden merged commit c58f697 into libp2p:master Aug 17, 2021

rubdos deleted the kademlia-filtering branch August 17, 2021 14:58

mxinden mentioned this pull request Oct 9, 2021

Allow validation from upper layer in kademlia #2281

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KademliaRecordFiltering and filter events #2163

KademliaRecordFiltering and filter events #2163

rubdos commented Jul 27, 2021 •

edited

Loading

rubdos commented Jul 28, 2021 •

edited

Loading

mxinden left a comment

mxinden Jul 30, 2021

rubdos Aug 3, 2021

rubdos Aug 3, 2021

rubdos commented Aug 3, 2021 •

edited

Loading

mxinden left a comment

mxinden commented Aug 17, 2021

rubdos commented Aug 17, 2021

rubdos commented Aug 17, 2021

rubdos commented Aug 17, 2021

mxinden Aug 17, 2021

rubdos Aug 17, 2021

mxinden Aug 17, 2021

rubdos Aug 17, 2021

mxinden Aug 17, 2021

rubdos Aug 17, 2021

mxinden Aug 17, 2021

mxinden commented Aug 17, 2021

rubdos commented Aug 17, 2021

mxinden left a comment

rubdos commented Aug 17, 2021

		// Consume the results, checking that each record was replicated
		// correctly to the closest peers to the key.

KademliaRecordFiltering and filter events #2163

KademliaRecordFiltering and filter events #2163

Conversation

rubdos commented Jul 27, 2021 • edited Loading

rubdos commented Jul 28, 2021 • edited Loading

mxinden left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rubdos commented Aug 3, 2021 • edited Loading

mxinden left a comment

Choose a reason for hiding this comment

mxinden commented Aug 17, 2021

rubdos commented Aug 17, 2021

rubdos commented Aug 17, 2021

rubdos commented Aug 17, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mxinden commented Aug 17, 2021

rubdos commented Aug 17, 2021

mxinden left a comment

Choose a reason for hiding this comment

rubdos commented Aug 17, 2021

rubdos commented Jul 27, 2021 •

edited

Loading

rubdos commented Jul 28, 2021 •

edited

Loading

rubdos commented Aug 3, 2021 •

edited

Loading