Skip to content
This repository has been archived by the owner on Oct 23, 2022. It is now read-only.

Update libp2p to v0.43.0 #499

Merged
merged 27 commits into from
Apr 1, 2022
Merged

Update libp2p to v0.43.0 #499

merged 27 commits into from
Apr 1, 2022

Conversation

rand0m-cloud
Copy link
Contributor

@rand0m-cloud rand0m-cloud commented Mar 6, 2022

This PR updates the dependency on libp2p. This is needed because currently building ipfs-http fails with an ambiguous type error inside of libp2p.

@rand0m-cloud rand0m-cloud force-pushed the libp2p_update branch 2 times, most recently from e049b6a to a0ae83d Compare March 6, 2022 18:56
@rand0m-cloud
Copy link
Contributor Author

rand0m-cloud commented Mar 6, 2022

error[E0053]: method `poll` has an incompatible type for trait
   --> src/p2p/pubsub.rs:328:10
    |
328 |     ) -> Poll<PubsubNetworkBehaviourAction> {
    |          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |          |
    |          expected enum `Void`, found enum `floodsub::layer::InnerMessage`
    |          help: change the output type to match the trait: `Poll<libp2p::libp2p_swarm::NetworkBehaviourAction<Void, OneShotHandler<FloodsubProtocol, FloodsubRpc, floodsub::layer::InnerMessage>, FloodsubRpc>>`
    |
    = note: expected fn pointer `fn(&mut Pubsub, &mut std::task::Context<'_>, &mut impl PollParameters) -> Poll<libp2p::libp2p_swarm::NetworkBehaviourAction<Void, _, _>>`
               found fn pointer `fn(&mut Pubsub, &mut std::task::Context<'_>, &mut impl PollParameters) -> Poll<libp2p::libp2p_swarm::NetworkBehaviourAction<floodsub::layer::InnerMessage, _, _>>`

I'm not sure how to approach this error. Resolved.

@rand0m-cloud
Copy link
Contributor Author

rand0m-cloud commented Mar 7, 2022

@mxinden I believe these are issues stemming from libp2p.

error[E0599]: no method named `into_inner` found for struct `KademliaHandler` in the current scope
  --> src/p2p/behaviour.rs:25:10
   |
25 | #[derive(libp2p::NetworkBehaviour)]
   |          ^^^^^^^^^^^^^^^^^^^^^^^^ method not found in `KademliaHandler<QueryId>`
   |
   = note: this error originates in the derive macro `libp2p::NetworkBehaviour` (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0599]: no method named `into_inner` found for struct `KademliaHandlerProto` in the current scope
  --> src/p2p/behaviour.rs:25:10
   |
25 | #[derive(libp2p::NetworkBehaviour)]
   |          ^^^^^^^^^^^^^^^^^^^^^^^^ method not found in `KademliaHandlerProto<QueryId>`
   |
   = note: this error originates in the derive macro `libp2p::NetworkBehaviour` (in Nightly builds, run with -Z macro-backtrace for more info)

For this struct

#[derive(libp2p::NetworkBehaviour)]
#[behaviour(out_event = "BehaviourEvent")]
pub struct Behaviour<Types: IpfsTypes> {
    #[behaviour(ignore)]
    repo: Arc<Repo<Types>>,
    // mdns: Toggle<TokioMdns>,
    kademlia: Kademlia<MemoryStore>,
    #[behaviour(ignore)]
    kad_subscriptions: SubscriptionRegistry<KadResult, String>,
    bitswap: Bitswap,
    ping: Ping,
    identify: Identify,
    pubsub: Pubsub,
    pub swarm: SwarmApi,
}

@rand0m-cloud
Copy link
Contributor Author

https://github.com/libp2p/rust-libp2p/blob/a168410dbed0d0941f2e5a14543206044ccb2260/swarm/Cargo.toml#L6

Isn't libp2p-swarm supposed to be version 0.35.0?

error: failed to select a version for the requirement `libp2p-swarm = "^0.35"`
candidate versions found which didn't match: 0.34.0, 0.33.0, 0.32.0, ...
location searched: crates.io index

@mxinden
Copy link

mxinden commented Mar 7, 2022

https://github.com/libp2p/rust-libp2p/blob/a168410dbed0d0941f2e5a14543206044ccb2260/swarm/Cargo.toml#L6

Isn't libp2p-swarm supposed to be version 0.35.0?

error: failed to select a version for the requirement `libp2p-swarm = "^0.35"`
candidate versions found which didn't match: 0.34.0, 0.33.0, 0.32.0, ...
location searched: crates.io index

v0.35.0 is not yet released, i.e. only in master thus far.

bitswap/src/behaviour.rs Outdated Show resolved Hide resolved
@koivunej
Copy link
Collaborator

Thanks for doing this @rand0m-cloud. I was on a vacation but back now. Let me know when this is ready, or if you need any help with it.

@rand0m-cloud
Copy link
Contributor Author

I've had a chance to play around more with the code and I can't figure out to how to solve the error. I've compared it to the file-sharing example of libp2p, and I can't make sense of what is wrong. The error is from the derive macro libp2p::NetworkBehaviour expecting some handler to implement into_inner, but it's my understanding that into_inner only exists for the ConnectionHandlerSelect that the macro creates.

I am lost but it feels like a bug in the derive macro.

@mxinden
Copy link

mxinden commented Mar 14, 2022

@rand0m-cloud mind posting the compiler error here?

@rand0m-cloud
Copy link
Contributor Author

Sure, the errors in the PR description are the current ones but here it is again:

error[E0599]: no method named `into_inner` found for struct `KademliaHandler` in the current scope
  --> src/p2p/behaviour.rs:25:10
   |
25 | #[derive(libp2p::NetworkBehaviour)]
   |          ^^^^^^^^^^^^^^^^^^^^^^^^ method not found in `KademliaHandler<QueryId>`
   |
   = note: this error originates in the derive macro `libp2p::NetworkBehaviour` (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0599]: no method named `into_inner` found for struct `KademliaHandlerProto` in the current scope
  --> src/p2p/behaviour.rs:25:10
   |
25 | #[derive(libp2p::NetworkBehaviour)]
   |          ^^^^^^^^^^^^^^^^^^^^^^^^ method not found in `KademliaHandlerProto<QueryId>`
   |
   = note: this error originates in the derive macro `libp2p::NetworkBehaviour` (in Nightly builds, run with -Z macro-backtrace for more info)

For more information about this error, try `rustc --explain E0599`.

Copy link

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this fix your compile time error?

bitswap/src/behaviour.rs Outdated Show resolved Hide resolved
src/p2p/behaviour.rs Outdated Show resolved Hide resolved
@rand0m-cloud
Copy link
Contributor Author

Thanks to @mxinden's workaround, I was able to continue working on the PR. Now the PR builds, lints, but fails testing:

failures:

---- p2p::swarm::tests::racy_connecting_attempts stdout ----
thread 'p2p::swarm::tests::racy_connecting_attempts' panicked at 'assertion failed: `(left == right)`
  left: `[Err(Some("addresses exhausted")), Err(Some("addresses exhausted"))]`,
 right: `[Ok(()), Err(Some("finished connecting to another address"))]`', src/p2p/swarm.rs:517:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- p2p::swarm::tests::wrong_peerid stdout ----
thread 'p2p::swarm::tests::wrong_peerid' panicked at 'assertion failed: `(left == right)`
  left: `Some("addresses exhausted")`,
 right: `Some("Pending connection: Invalid peer ID.")`', src/p2p/swarm.rs:466:21


failures:
    p2p::swarm::tests::racy_connecting_attempts
    p2p::swarm::tests::wrong_peerid

test result: FAILED. 85 passed; 2 failed; 1 ignored; 0 measured; 0 filtered out; finished in 0.66s

@rand0m-cloud
Copy link
Contributor Author

@koivunej I need some guidance on these test failures. I've traced the error message to src/p2p/swarm.rs: 378. Can you verify if the tests need to be updated, or if the underlying behavior has changed?

Some grepping I did,

$ git checkout v0.39.0 && find . -not -path "*/target/*" -name "*.rs" | xargs -n1 grep -n "Pending connection" /dev/null

./core/src/connection/error.rs:91:                write!(f, "Pending connection: I/O error: {}", err),
./core/src/connection/error.rs:93:                write!(f, "Pending connection: Transport error: {}", err),
./core/src/connection/error.rs:95:                write!(f, "Pending connection: Invalid peer ID."),

$ git checkout v0.43.0 && find . -not -path "*/target/*" -name "*.rs" | xargs -n1 grep -n "Pending connection" /dev/null
./swarm/src/connection/error.rs:100:    /// Pending connection attempt has been aborted.
./swarm/src/connection/error.rs:137:            PendingConnectionError::IO(err) => write!(f, "Pending connection: I/O error: {}", err),
./swarm/src/connection/error.rs:138:            PendingConnectionError::Aborted => write!(f, "Pending connection: Aborted."),
./swarm/src/connection/error.rs:142:                    "Pending connection: Transport error on connection: {}",
./swarm/src/connection/error.rs:152:                    "Pending connection: Unexpected peer ID {} at {:?}.",
./swarm/src/lib.rs:1381:    /// Pending connection attempt has been aborted.
./swarm/src/lib.rs:1426:                "Dial error: Pending connection attempt has been aborted."

The "finished connecting to another address" error seems to be from this crate, so I'll take that test failure as broken from this PR.

@mxinden
Copy link

mxinden commented Mar 17, 2022

Thanks to @mxinden's workaround, I was able to continue working on the PR. Now the PR builds, lints, but fails testing:

Sorry, but I don't think I am of much help on the rust-ipfs specific tests.

@koivunej
Copy link
Collaborator

@rand0m-cloud Yeah ... these connection tests ... They are most likely extra, and become more and more useless now that libp2p must've evolved. Back in the day it was a bit more tricky trying to get the similar behaviour as to go-ipfs. I'll try to check the branch locally tomorrow! Thanks for your work so far.

src/p2p/swarm.rs Outdated
Comment on lines 372 to 381
for failed in self
.pending_connections
.remove(&peer_id)
.unwrap_or_default()
{
self.connect_registry
.finish_subscription(failed.into(), Err("addresses exhausted".into()));
}
Copy link
Collaborator

@koivunej koivunej Mar 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this unconditionally creates the failure for p2p::swarm::tests::racy_connecting_attempts.

I wonder ... why were changes around this necessary? Seems this logic would had been easiest to do handle with the old structure using the entry api?

I think this change also affects the p2p::swarm::tests::wrong_peerid test case.

@koivunej
Copy link
Collaborator

So in total I see 3 test failures:

  • swarm_api (really bad name)
  • racy_connection_attempts
  • wrong_peerid

For the last two, I think a solution should come out of the discussion thread I already started.

For the first one, I think the near-correct answer is to add biased; to the tokio::select! so the events are observed in the assumed order always. Though, this is prone to break because we are racing with the idle disconnection timer. However, I remember looking at the idle disconnection should be done right away so this might be ok. I did not yet push this fix, let me know if it should be pushed. If this doesn't fail for you, please let me know!

Copy link
Collaborator

@koivunej koivunej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from my comment on the inject_dial_failure I think this is looking good!

@rand0m-cloud
Copy link
Contributor Author

Oh, sorry for the noise. I was hoping Github would collapse it all. Just reauthored those commits with my email.

@rand0m-cloud
Copy link
Contributor Author

I've been experiencing swarm_api failing maybe 1 in 10 times. I'll apply that biased; to the select, but I don't really know if it fixes the test.

@rand0m-cloud
Copy link
Contributor Author

@koivunej Okay, I think I've re-added the code fragment I removed and would make inject_dial_failure work correctly.

The problem is the original code was given the Multiaddr it failed to dial from inject_addr_reach_failure and the new inject_dial_failure doesn't provide that. I can't find a clear way to go from the PeerId to MultiaddrWithPeerId.

@rand0m-cloud
Copy link
Contributor Author

Do we need a new ConnectionHandler that can store the Multiaddr?

src/p2p/swarm.rs Outdated
// it is possible that these addresses have not been tried yet; they will be asked
// for soon.
let handler = self.new_handler();
self.events.push_back(swarm::NetworkBehaviourAction::Dial {
Copy link
Contributor Author

@rand0m-cloud rand0m-cloud Mar 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this needs to be removed? I'm not sure how to adapt the behavior because inject_dial_failure only tells us when a connection failed to dial.

I think this current block was made for attempting another connection when one fails.

Copy link
Collaborator

@koivunej koivunej Mar 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right ... Looking at the networkbehaviour, I think this is how it used to work:

  1. dial event is given to swarm
  2. swarm collects all of the known addresses with NetworkBehaviour::addresses_of_peer, dials them somehow
  3. for each of the dials, we used to get a signal that this multiaddr failed and we would signal that future as failure
  4. there used to be another trait method for having exhausted the addresses, when we'd know to launch a new dial if we'd still have addresses

While writing this @mxinden replied. Oki yeah it would appear the failures have now moved within the dialerror, which I did not expect, AND there is only one "notification" for all of the attempts.

However I think the idea with the original impl was that since there would be one gathering of addresses for the peer (NetworkBehaviour::addresses_of_peer) per dial events (caused either by "swarm_api" or by any other place) it would be possible to add new addresses during a dial attempt, and those would be noticed at (4) in the above ordered list, and thus get dialed afterwards.

I now realize that this all should had been in the swarm api implementation as comments, but I guess I was expecting the datastructures and network behaviour api to make this "apparent" and did not account for possible future changes in the network behaviour api.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the next steps would be to gather the failed addresses from the error (if found), then continue dialing to the remaining addresses, if any.

rand0m-cloud and others added 21 commits April 1, 2022 12:11
this might take care of the node-gyp problem, which might also be fixed
by updating it's version.
forgot, you must not use backticks outside apostrophes...
originally created in 8eae8e1 by
altering the single topic test, included in this commit as duplicating
version.

Co-authored-by: Addy Bryant <rand0m-cloud@outlook.com>
simplify away the use of hashset's for messages along with any
filtering, instead simply assert that who witnessed what message and
include the sent message in the assertion as well.

comment as in use less broad technical names and more context specific
names.

also removes some of the duplicate comments.
@koivunej
Copy link
Collaborator

koivunej commented Apr 1, 2022

I would rather not delay this further and a lot of work has been put into this already. Thanks @rand0m-cloud! I ended up ignoring the problematic windows tests pending a) packet capture b) additional logging in the js test. Also added a FIXME for my concern.

bors r+

@bors
Copy link
Contributor

bors bot commented Apr 1, 2022

Build succeeded:

@bors bors bot merged commit e8f6d66 into rs-ipfs:master Apr 1, 2022
This was referenced Apr 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants