Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.x.x] Use blocking IO in P2P to reduce CPU load #2855

Merged
merged 4 commits into from Jul 12, 2019

Conversation

@hashmap
Copy link
Member

commented May 27, 2019

Instead we create 2 threads - one for reading, one for writing, both block for 1 second then check stop flag, if it is not set the process continue.
The current master consumes 20-25% CPU on my machine with ~ 20 peers, this version 9-12% with the same number of peers (with TUI enabled).

Also we can use all higher level reader/writer methods like read_string, read_exact etc

time::Duration::from_secs(10),
true,
)?;
self.stream.read_exact(&mut buf[..])?;

This comment has been minimized.

Copy link
@garyyu

garyyu May 31, 2019

Member

Is this now controlled by IO_TIMEOUT (i.e. 1 second) ? if the buf can't be fully filled in 1 second, read_exact will return a TIMEOUT error. So, now the minimum speed is 8kByte per second, instead of old 800 Byte/s.

Is my understanding correct?

This comment has been minimized.

Copy link
@hashmap

hashmap May 31, 2019

Author Member

Not exactly, read (syscall) timeout occurs when we we can't read a single byte in 1 second. In our read_exact we didn't have a timeout for a single read, but for the entire operation (filling a buffer). In Rust read_exact we don't have any high-level timeouts but can control the timeout for a single read.
So it's quite opposite, the previous impl was more demanding for connection speed, a new one will work for any connection faster than 1 byte/sec. A drawback that a peer's thread can be blocked by read_exact for a while on a slow connection, that what we can experience during shutdown.

This comment has been minimized.

Copy link
@garyyu

garyyu May 31, 2019

Member

read (syscall) timeout occurs when we we can't read a single byte in 1 second.

In the case of a peer's sending thread doesn't get the CPU resources for 1 second, does this read_exact will return a timeout error with part of buff filled?

This comment has been minimized.

Copy link
@garyyu

garyyu Jun 6, 2019

Member

@hashmap remind above question in case you missed it :-)

This comment has been minimized.

Copy link
@hashmap

hashmap Jun 6, 2019

Author Member

Ah, thanks! Well, this is timeout for a single read operation which is basically syscall, I'm not an expert in preemptive scheduling but I think it's safe to assume that this situation is very unlikely.

@@ -160,14 +152,14 @@ pub fn read_item<T: Readable>(stream: &mut dyn Read) -> Result<(T, u64), Error>
/// until we have a result (or timeout).
pub fn read_body<T: Readable>(h: &MsgHeader, stream: &mut dyn Read) -> Result<T, Error> {
let mut body = vec![0u8; h.msg_len as usize];
read_exact(stream, &mut body, time::Duration::from_secs(20), true)?;
stream.read_exact(&mut body)?;

This comment has been minimized.

Copy link
@garyyu

garyyu May 31, 2019

Member

Similar question here, what's the biggest possible length for a body? perhaps a Block is the biggest? then it's maximum possible value 1.5MB for example.

The timeout is 1 second and minimum speed is 1.5MB/s ?

This comment has been minimized.

Copy link
@hashmap

hashmap May 31, 2019

Author Member

See my previous comment, the minimum speed is still 1 byte/sec

@garyyu

This comment has been minimized.

Copy link
Member

commented May 31, 2019

I like the performance improvement 👍

hashmap added some commits Jun 5, 2019

@antiochp
Copy link
Member

left a comment

Yeah. Let's do this.
I was not convinced (entirely) before as I think we should be moving a more async approach and not a less async approach.
But that said - this does simplify things quite a bit and cleans a bunch of custom code that worked around our "partially async but not really" approach.

👍

I'm still doing some testing locally on this branch just to try and break it.

@antiochp antiochp changed the title Use blocking IO in P2P to reduce CPU load [2.x.x] Use blocking IO in P2P to reduce CPU load Jul 11, 2019

@antiochp antiochp changed the base branch from master to milestone/2.x.x Jul 11, 2019

@antiochp

This comment has been minimized.

Copy link
Member

commented Jul 11, 2019

I updated this PR to merge onto milestone/2.x.x.
There are a few conflicts around the protocol versioning changes but nothing too serious.

@hashmap

This comment has been minimized.

Copy link
Member Author

commented Jul 12, 2019

Synced from the scratch, no issues, on my machine it consumes 10% CPU with ~30 peers vs 25% on master

@hashmap

This comment has been minimized.

Copy link
Member Author

commented Jul 12, 2019

CPU consumption without TUI
this PR (1.7%):
Screenshot 2019-07-12 at 21 10 12

master (22.9%):
Screenshot 2019-07-12 at 21 13 07
it seems to consume a bit more memory, but the difference is not that big, so it sounds like a good tradeoff

@hashmap hashmap merged commit d3dbafa into mimblewimble:milestone/2.x.x Jul 12, 2019

10 checks passed

mimblewimble.grin Build #20190712.6 succeeded
Details
mimblewimble.grin (linux api/util/store) linux api/util/store succeeded
Details
mimblewimble.grin (linux chain/core/keychain) linux chain/core/keychain succeeded
Details
mimblewimble.grin (linux pool/p2p/src) linux pool/p2p/src succeeded
Details
mimblewimble.grin (linux release) linux release succeeded
Details
mimblewimble.grin (linux servers) linux servers succeeded
Details
mimblewimble.grin (macos release) macos release succeeded
Details
mimblewimble.grin (macos test) macos test succeeded
Details
mimblewimble.grin (windows release) windows release succeeded
Details
mimblewimble.grin (windows test) windows test succeeded
Details

@hashmap hashmap deleted the cyclefortytwo:blocking-io-p2p branch Jul 12, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.