Flood publish #15

ackintosh · 2023-04-02T23:33:19Z

👷 This PR will be ready for review once the improvement on flood publishing has been merged. 👷

Flood Publish Simulation

This simulation creates a number of nodes in which flood publishing is enabled
and help users to measure the latency of messages.

In this simulation, pictured below, each node logs the time when they emit a HandlerIn::Messageevent to the handler, and when handle_received_message() is called.

NOTE: Both the event and function are defined inside of rust-libp2p. So this simulation uses a forked rust-libp2p that includes the logging. See here for the diffs in the forked one.

sequenceDiagram
    participant Node1
    participant Node2
    participant Node3
    
    loop Simulation
        Note over Node1: HandlerIn::Message
        Node1->>Node2: message
        Note over Node2: handle_received_message()
        Note over Node1: HandlerIn::Message
        Node1->>Node3: message
        Note over Node3: handle_received_message()
    end

Using measure_latency.py we can measure the time between the two.

testground run single \
  --plan gossipsub-testground/flood_publishing \
  --testcase flood_publishing \
  ...
  ...
  | grep flood_publishing_test \ # Filter std output to be passed to mesure_latency.py
  | python3 flood_publishing/measure_latency.py # Measure latency

Running the Simulation

The type of flood publishing can be switched via --test-param flood_publish=heartbeat. Please read the flood_publishing/manifest.toml to understand test parameters.

testground run single \
  --plan gossipsub-testground/flood_publishing \
  --testcase flood_publishing \
  --builder docker:generic \
  --runner local:docker \
  --instances 50 \
  --wait \
  --test-param flood_publish=heartbeat \
  | grep flood_publishing_test \
  | python3 flood_publishing/measure_latency.py

Measurement Results

It appears that latency has been reduced by approximately 30% when comparing the Rapid and Heartbeat.

bandwidth: 30MiB
instances: 50
message size: 50KB

Rapid

Command

testground run single \
  --plan gossipsub-testground/flood_publishing \
  --testcase flood_publishing \
  --builder docker:generic \
  --runner local:docker \
  --instances 50 \
  --wait \
  --test-param flood_publish=rapid \
  | grep flood_publishing_test \
  | python3 flood_publishing/measure_latency.py

*** measure_latency.py ***
[publisher] node_id: 339681 , peer_id: 12D3KooWRaHQje9JBkjNsCN2S4bPDoeJTNQvwa7q3XSY4Xk6kBRh
[nodes] 50
[send_logs] 280
[receive_logs] 280

* Results (in milliseconds) *
[mean] 664.05
[median] 681.0

Heartbeat

Command

testground run single \
  --plan gossipsub-testground/flood_publishing \
  --testcase flood_publishing \
  --builder docker:generic \
  --runner local:docker \
  --instances 50 \
  --wait \
  --test-param flood_publish=heartbeat \
  | grep flood_publishing_test \
  | python3 flood_publishing/measure_latency.py

*** measure_latency.py ***
[publisher] node_id: 6f17de , peer_id: 12D3KooWSi9kmfo5ozCjTVGHBe5u26hhBq63bcfBLL9CJBgec8Bb
[nodes] 50
[send_logs] 290
[receive_logs] 290

* Results (in milliseconds) *
[mean] 391.94827586206895
[median] 444.0

mxinden · 2023-05-31T02:53:12Z

Thank you @ackintosh for providing these numbers. That is very helpful.

I acknowledge that libp2p/rust-libp2p#3666 shows a significant change in sending latency. Though one also needs to keep in mind that those nodes outside of the mesh will only receive the message on the next heartbeat, thus have a significant delay.

I wonder whether the problem should be solved at the Gossipsub level, or whether it is worth investing into the lower transport layer, improving base bandwidth.

Out of curiosity, I wonder how this would play out when using a more powerful transport protocol. Early results from our measurements show that libp2p/rust-libp2p#3454 has a significant bandwidth improvement compared to our existing libp2p-quic transport and libp2p-tcp transport (roughly or bigger than 10x).

See libp2p/test-plans#184 for details.

Would you mind running this test with libp2p/rust-libp2p#3454?

ackintosh · 2023-06-13T23:13:11Z

@mxinden I have created another test plan to run this test with the quic implementation. The result shows ~5% improvement on latency.
ackintosh#3

diegomrsantos · 2023-12-20T13:49:04Z

@mxinden and @ackintosh do any of you have a hypothesis that could explain why a 10x increase in throughput resulted in only a 5% improvement in latency when using QUIC?

ackintosh added 11 commits March 27, 2023 08:55

Add flood publishing

f6e68b3

Measure latency

5b091cf

a receive log may not be found due to latency

0f1919f

Add a test parameter flood_publish

96f1d4e

Fix unused field, add comments

b4ccb22

Update README

0e121be

Add parameters

07cc8fd

Update README

8a4f795

Tweak warm_up

c33e601

Add cool_down parameter

3b2fe09

Add message_size parameter

2eee7a8

AgeManning mentioned this pull request May 29, 2023

feat(gossipsub): More lenient flood publishing libp2p/rust-libp2p#3666

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flood publish #15

Flood publish #15

ackintosh commented Apr 2, 2023 •

edited

mxinden commented May 31, 2023 •

edited

ackintosh commented Jun 13, 2023

diegomrsantos commented Dec 20, 2023

Flood publish #15

Are you sure you want to change the base?

Flood publish #15

Conversation

ackintosh commented Apr 2, 2023 • edited

Flood Publish Simulation

Running the Simulation

Measurement Results

Rapid

Heartbeat

mxinden commented May 31, 2023 • edited

ackintosh commented Jun 13, 2023

diegomrsantos commented Dec 20, 2023

ackintosh commented Apr 2, 2023 •

edited

mxinden commented May 31, 2023 •

edited