New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with use #648

Open
driftluo opened this Issue Nov 16, 2018 · 15 comments

Comments

3 participants
@driftluo
Copy link

driftluo commented Nov 16, 2018

After reading the libp2p api and substrace network-libp2p source code, I tried to refer to substrace write a repo , using only the ping protocol for testing.

However, from the operation of this repo, the inject_substream method of NodeHandler is not called, which means that the substream is not upgraded.

I don't understand now that there is a problem with my code or a problem with the library call. After all, there is almost no documentation for this library right now.

repo:https://github.com/driftluo/p2p-test

output:

success /ip4/127.0.0.1/tcp/1337
dial true
/ip4/127.0.0.1/tcp/40240, /ip4/127.0.0.1/tcp/1337
sever connected PeerId(QmXh9VaHR2Fbh1QKBECbp45sh46onc4AuRn6T6L4eQA1p3)
accept one link
client connected PeerId(QmVdFTSJcUcys1zSM8Jx8qGf4CiGge5V4hdq4Qo4bW5A2i)

@tomaka tomaka added the discussion label Nov 16, 2018

@tomaka

This comment has been minimized.

Copy link
Member

tomaka commented Nov 16, 2018

You need to manually open substreams. This is done by returning a NodeHandlerEvent::OutboundSubstreamRequest from poll(). Every time you return this value, a new substream is opened.

Note that the high level API is slowly coming along, and documentation is in progress.

@driftluo

This comment has been minimized.

Copy link

driftluo commented Nov 16, 2018

Oh, yeah, it works, thank you very much.

I did not expect that I ignored this thing, I thought that out substream will open itself at the low level.

I am looking forward to a better use of API and documentation.

I look at the recent pr and discussion, it seems that we will split the transport into listener and dialer to further split out and in.

I will continue to pay attention, then I will try to implement a custom protocol.

@driftluo

This comment has been minimized.

Copy link

driftluo commented Dec 5, 2018

@tomaka I found a memory leak problem, but I can't locate what the problem is. The code is still in the repo above.

One bash:

$ cargo run --bin node_1

Another shell

$ cargo run --bin node_2

Node 1 listen on /ip4/127.0.0.1/tcp/1337, dial to /ip4/127.0.0.1/tcp/1338
Node 2 listen on /ip4/127.0.0.1/tcp/1338, dial to /ip4/127.0.0.1/tcp/1337

Now, it works well.

On my machine, do the following:

$ ps -aux | rg node_
luoc      6632  1.6  0.0  71124 11308 pts/3    Sl+  16:50   0:00 ./target/release/node_1
luoc      6647  2.7  0.0  71124 11420 pts/4    Sl+  16:50   0:00 ./target/release/node_2
luoc      6664  0.0  0.0  24880  6272 pts/5    S+   16:50   0:00 rg node_

$ kill -9 6647

Now you can directly view the node_1 process, memory usage and cpu usage will continue to rise until it is killed by the system OOM. I opened the trace level log of libp2p and found no useful output.

I see that tokio poll is constantly being awakened, it make cpu usage more than 200%.

@tomaka

This comment has been minimized.

Copy link
Member

tomaka commented Dec 5, 2018

What's the code you're running? I don't see that with the chat example.

@driftluo

This comment has been minimized.

Copy link

driftluo commented Dec 5, 2018

https://github.com/cryptape/cita-p2p

Sorry, it's kill -9 not kill -quit

trace log block on here:

2018-12-05T11:45:45Z TRACE tokio_threadpool::pool]     -> submit internal; idx=7
[2018-12-05T11:45:45Z TRACE tokio_threadpool::pool] signal_work -- wakeup; idx=7
[2018-12-05T11:45:45Z TRACE tokio_reactor] loop process - 1 events, 0.000s
[2018-12-05T11:45:45Z TRACE tokio_threadpool::worker]     -> wakeup; idx=7
[2018-12-05T11:45:45Z TRACE tokio_threadpool::task] Task::run; state=Running
[2018-12-05T11:45:45Z TRACE yamux::connection] Server: eof: Connection { mode: Server, streams: 1, incoming: 0, pending: 0, next_id: 8, tasks: 0 }
[2018-12-05T11:45:45Z DEBUG yamux::connection] Server: destroying connection
[2018-12-05T11:45:45Z DEBUG yamux::connection] stream 2 is gone, cannot read
[2018-12-05T11:45:45Z TRACE tokio_threadpool::notifier] Notifier::notify; id=0x7fcf14a14010
[2018-12-05T11:45:45Z TRACE tokio_threadpool::pool]     -> submit internal; idx=7
[2018-12-05T11:45:45Z TRACE tokio_threadpool::pool] signal_work -- wakeup; idx=3
[2018-12-05T11:45:45Z TRACE tokio_reactor] event Readable Token(4194303)
[2018-12-05T11:45:45Z TRACE tokio_reactor] loop process - 1 events, 0.000s
[2018-12-05T11:45:45Z TRACE tokio_threadpool::worker]     -> wakeup; idx=3
[2018-12-05T11:45:45Z TRACE tokio_threadpool::worker] stole task
[2018-12-05T11:45:45Z TRACE tokio_threadpool::task] Task::run; state=Running
[2018-12-05T11:45:45Z TRACE tokio_threadpool::notifier] Notifier::notify; id=0x7fcf13e1b0f0
2018-12-05T11:45:45Z TRACE tokio_threadpool::pool]     -> submit internal; idx=7
[2018-12-05T11:45:45Z TRACE tokio_threadpool::pool] signal_work -- wakeup; idx=7
[2018-12-05T11:45:45Z TRACE tokio_reactor] loop process - 1 events, 0.000s
[2018-12-05T11:45:45Z TRACE tokio_threadpool::worker]     -> wakeup; idx=7
[2018-12-05T11:45:45Z TRACE tokio_threadpool::task] Task::run; state=Running
[2018-12-05T11:45:45Z TRACE yamux::connection] Server: eof: Connection { mode: Server, streams: 1, incoming: 0, pending: 0, next_id: 8, tasks: 0 }
[2018-12-05T11:45:45Z DEBUG yamux::connection] Server: destroying connection
[2018-12-05T11:45:45Z DEBUG yamux::connection] stream 2 is gone, cannot read
[2018-12-05T11:45:45Z TRACE tokio_threadpool::notifier] Notifier::notify; id=0x7fcf14a14010
[2018-12-05T11:45:45Z TRACE tokio_threadpool::pool]     -> submit internal; idx=7
[2018-12-05T11:45:45Z TRACE tokio_threadpool::pool] signal_work -- wakeup; idx=3
[2018-12-05T11:45:45Z TRACE tokio_reactor] event Readable Token(4194303)
[2018-12-05T11:45:45Z TRACE tokio_reactor] loop process - 1 events, 0.000s
[2018-12-05T11:45:45Z TRACE tokio_threadpool::worker]     -> wakeup; idx=3
[2018-12-05T11:45:45Z TRACE tokio_threadpool::worker] stole task
[2018-12-05T11:45:45Z TRACE tokio_threadpool::task] Task::run; state=Running
[2018-12-05T11:45:45Z TRACE tokio_threadpool::notifier] Notifier::notify; id=0x7fcf13e1b0f0
@driftluo

This comment has been minimized.

Copy link

driftluo commented Dec 6, 2018

@tomaka @twittner I have found out where the problem is.

The problem is on the yamux library.

The function to create transport is as follows:

fn build_transport(key_pair: secio::SecioKeyPair) -> Boxed<(PeerId, StreamMuxerBox)> {
    let mut mplex_config = mplex::MplexConfig::new();
    mplex_config.max_buffer_len_behaviour(mplex::MaxBufferBehaviour::Block);
    mplex_config.max_buffer_len(usize::MAX);

    let base = libp2p::CommonTransport::new()
        .with_upgrade(secio::SecioConfig::new(key_pair))
        .and_then(move |out, endpoint| {
            let upgrade = upgrade::or(
                upgrade::map(yamux::Config::default(), either::EitherOutput::First),
                upgrade::map(mplex_config, either::EitherOutput::Second),
            );
            let peer_id = out.remote_key.into_peer_id();
            let upgrade = upgrade::map(upgrade, move |muxer| (peer_id, muxer));
            upgrade::apply(out.stream, upgrade, endpoint.into())
        }).map(|(id, muxer), _| (id, StreamMuxerBox::new(muxer)));

    TransportTimeout::new(base, Duration::from_secs(20)).boxed()
}

The transport created in this way takes precedence on yamux, and if I do as I said above, problems will occur.

But, when I changed to use mplex , there is no problem with the above operation.

let upgrade = upgrade::or(
                upgrade::map(mplex_config.clone(), either::EitherOutput::First),
                upgrade::map(mplex_config, either::EitherOutput::Second),
            );
@dvdplm

This comment has been minimized.

Copy link
Contributor

dvdplm commented Dec 6, 2018

Now you can directly view the node_1 process, memory usage and cpu usage will continue to rise until it is killed by the system OOM.

I tried this and see the same thing. However, if I remove the logging here and here and repeat, I see no CPU or memory increase.

@driftluo

This comment has been minimized.

Copy link

driftluo commented Dec 6, 2018

@dvdplm

However, if I remove the logging here and here and repeat, I see no CPU or memory increase.

Deleting logger init didn't solve the problem, In my case, this is still a memory leak.

when I changed to use mplex , there is no problem with the above operation.

Using mplex , there is really no memory leak problem. Change it here

@dvdplm

This comment has been minimized.

Copy link
Contributor

dvdplm commented Dec 6, 2018

Ok, that's too bad we're not seeing the same thing. Makes it harder to investigate. What OS/version are you on? And what Rust version are you using? I ran my tests on OS X 10.14.2 using rustc 1.32.0-nightly (14997d56a 2018-12-05).

Do you think you can prepare a minimal test for us to run and see what is going on? The code in your repo does not run against the latest master and contains stuff that is probably not relevant for this problem?

@driftluo

This comment has been minimized.

Copy link

driftluo commented Dec 6, 2018

OS:Ubuntu 18.04
rustc:1.30.1 stable

I'll try to write a minimal reproducible version, but, due to the complexity of libp2p itself, it's necessary to write a lot of code to run.

By the way,I use the same commit as substrace use ,but yamux upgrade to 0.1.1

@driftluo

This comment has been minimized.

Copy link

driftluo commented Dec 11, 2018

@dvdplm The latest master branch code, chat example, default use mplex, if you change it to yamux protocol, it will be directly unavailable, the error log is as follows:

Local peer id: PeerId(QmUy7RDhAGizuuuySfxUVvX6LJSAKn5XGsz1T7qxC1uBWo)
Listening on "/ip4/0.0.0.0/tcp/42969"
[2018-12-11T02:16:40Z ERROR yamux::connection] invalid stream id 1
[2018-12-11T02:16:40Z ERROR libp2p_yamux] connection error: i/o error: Broken pipe (os error 32)

change:

let upgrade = mplex::MplexConfig::new().map_outbound(move |muxer| (peer_id, muxer) );

to

let upgrade = yamux::Config::default().map_outbound(move |muxer| (peer_id, muxer) );

Other issues, I need to use RawSwarm to reproduce, I am trying now...

By the way, why Swarm api use xxx(me: &self, xxx), this must use with libp2p::Swarm::listen_on(&mut swarm, addr), I think swarm.listen_on is better.

@dvdplm

This comment has been minimized.

Copy link
Contributor

dvdplm commented Dec 11, 2018

Hi @driftluo, so sorry you're having issues. We are currently working on stabilizing APIs and add more ergonomic high-level helpers to do common tasks, but there are plenty of rough edges still, as you have discovered. To use Yamux with the chat example, try this:

    let transport = libp2p::CommonTransport::new()
        .with_upgrade(secio::SecioConfig::new(local_key))
        .and_then(move |out, cp| {
            let peer_id1 = out.remote_key.into_peer_id();
            let peer_id2 = peer_id1.clone();
            let upgrade = libp2p::yamux::Config::default()
                .map_inbound(move |muxer| (peer_id1, muxer))
                .map_outbound(move |muxer| (peer_id2, muxer));
            upgrade::apply(out.stream, upgrade, cp).map_err(|e| e.into_io_error())
        });

(The cp is a ConnectedPoint)

@driftluo

This comment has been minimized.

Copy link

driftluo commented Dec 11, 2018

@dvdplm Hey, I successfully reproduce two problems, the code is here: https://github.com/cryptape/cita-p2p/tree/p2p-bug-report

This code uses RawSwarm because the Connected event only on RawSwarmEvent.

Only the ping protocol is mounted in the handle, pinging every 5 seconds, nothing else. This is the simplest example, using the latest master code.

Ok, here are two questions I found. node_1node_1 use yamux, node_3 use mplex

First,yamux has some bug:

One bash:

$ RUST_LOG=trace cargo run --bin node_1

Another shell

$ cargo run --bin node_2

You wil look some log like follow,then same as

The latest master branch code, chat example, default use mplex, if you change it to yamux protocol, it will be directly unavailable, the error log is as follows:

2018-12-11T07:41:02Z ERROR yamux::connection] invalid stream id 1
[2018-12-11T07:41:02Z ERROR core_p2p::cita_handler] While upgrading dial to Ping: Select(MultistreamSelectError(IoError(Custom { kind: WriteZero, error: StringError("connection is closed") })))

Second, I tried to connect myself in the same swarm and rejected the connection in the connected event,code is here, but connection doesn't drop, and cpu usage is more than 200%.

run node3:

$ RUST_LOG=libp2p=trace cargo run --bin node_3

network status, connection does not broken:

$ netstat -anp | rg :1337
tcp        0      0 127.0.0.1:1337          0.0.0.0:*               LISTEN      5042/./target/debug 
tcp        0      0 127.0.0.1:34554         127.0.0.1:1337          ESTABLISHED 5042/./target/debug 
tcp        0      0 127.0.0.1:1337          127.0.0.1:34554         ESTABLISHED 5042/./target/debug

log output:

2018-12-11T07:51:15Z DEBUG libp2p_core::upgrade::apply] Successfully applied negotiated protocol
[2018-12-11T07:51:15Z TRACE libp2p_secio::handshake] checking encryption by sending back remote's nonce
[2018-12-11T07:51:15Z DEBUG libp2p_core::upgrade::apply] Successfully applied negotiated protocol
[2018-12-11T07:51:15Z DEBUG libp2p_core::upgrade::apply] Successfully applied negotiated protocol
[2018-12-11T07:51:15Z TRACE core_p2p::cita_handler] shutdown
[2018-12-11T07:51:15Z DEBUG libp2p_core::upgrade::apply] Successfully applied negotiated protocol
[2018-12-11T07:51:15Z TRACE core_p2p::cita_handler] shutdown
@driftluo

This comment has been minimized.

Copy link

driftluo commented Dec 11, 2018

@dvdplm Ok, I use this to create transport:

let peer_id2 = peer_id1.clone();
let upgrade = libp2p::yamux::Config::default()
       .map_inbound(move |muxer| (peer_id1, muxer))
       .map_outbound(move |muxer| (peer_id2, muxer));

After node_1 node_2 connection, kill -9 node_2, node_1 cpu usage become 100%. The same as previous situation. If you change to use mplex, it works well, change here to false.

node_3 connection doesn't drop, and cpu usage is more than 200%.

rustc: rustc 1.31.0 (abe02cefd 2018-12-04)

@dvdplm

This comment has been minimized.

Copy link
Contributor

dvdplm commented Dec 11, 2018

Excellent, this will definitely help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment