Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erlay meta-issue: mainnet testing #8

Open
naumenkogs opened this issue Aug 19, 2021 · 14 comments
Open

Erlay meta-issue: mainnet testing #8

naumenkogs opened this issue Aug 19, 2021 · 14 comments

Comments

@naumenkogs
Copy link
Owner

naumenkogs commented Aug 19, 2021

Based on the experimental results from #7, I currently suggest using the following patch for testing. I currently run 12 Erlay-supporting nodes running that patch.

To test Erlay, you can connect to my nodes with the following bitcoind CLI option: -maxconnections=0 -addnode=143.198.185.21:8201 -addnode=143.198.185.21:8202 -addnode=143.198.185.21:8203 -addnode=143.198.185.21:8204 -addnode=143.198.185.21:8205 -addnode=143.198.185.21:8206 -addnode=143.198.185.21:8207 -addnode=143.198.185.21:8208. I might be restarting those nodes from time to time, so expect that.

My nodes are currently pruned, so please sync from the network first, and restart with the command above only when you're at the tip.

Then, you can use a similar to mine command to see your own node's bandwidth: bitcoin-0.21.1/bin/bitcoin-cli -rpcport=8109 getpeerinfo | grep 'inv\|sketch\|reqrecon\|reqsketchext\|reconcildiff' | awk '/[0-9]+/ {gsub(/[^0-9]/, "", $0); sum+=$0} END {print sum}'.

You can compare this number to the regular pre-Erlay Bitcoin node of the latest release running in parallel to the Erlay node.
The expected result is around 30-55% saving, or 15-30% of bandwidth saved overall.

@0xB10C
Copy link

0xB10C commented Aug 19, 2021

I think you meant to link to the 2021-03-erlay branch, right?

@0xB10C
Copy link

0xB10C commented Aug 22, 2021

I've been observing and comparing an erlay-node (naumenkogs/bitcoin@8e7033d) and a master-node (bitcoin/bitcoin@38975ec) for about 48h now.

Both nodes have 8 manual outgoing connections to 143.198.185.21:8201-8208 via addnode and don't connect to other nodes (connect=0) and do not accept incoming connections. To monitor p2p traffic, I'm hooking into the net tracepoints inbound_message and outbound_message with my bitcoind-observer project.

The erlay-node had slightly less (~3%) inbound traffic (as in received message size, not connection direction) compared to the master node. The erlay-node had 61,1% less outbound-traffic (not connection direction) usage compared to the master node. Considering both inbound and outbound traffic, there is a ~20% traffic reduction.

image

The erlay-node received (+38%) and sent (+46%) more messages than the master-node. Considering both inbound and outbound messages, about 42% more messages.

image

Both in- and outbound INV messages are more frequent with erlay compared to master. Inbound INV messages are more frequent with erlay compared to master, but outbound INV messages are /less/ frequent. (thanks @Rspigler)

image

However, erlay INV messages are on average smaller than master INV messages.
image

The erlay node receives about 60 sketch messages per minute or one message per second. With 8 erlay peers I'd have expected 8 sketch messages per second or 480 per minute? (I could totally be missing something, still reading up on Erlay).
image

The erlay node sends 60 reconcildiff and reqrecon messages per minute. reqsketchext messages are infrequent (The 0 msg/min as shown below is only for the past minute at the time of taking the screenshot).

image


The dashboard can be found here: https://bitcoind.observer/d/T7FkHfnnk/erlay-node-vs-master-node

I'd be happy to add more stats and will probably dive deeper in the future. Having a metric for INVs per TX would be good from what I understand.

@Rspigler
Copy link

Both in- and outbound INV messages are more frequent with erlay compared to master.

According to your posted graph, inbound INV messages are more frequent, but outbound INV messages are /less/ frequent.

@michaelfolkson
Copy link

@0xB10C: Wow, nice work and cool use of tracepoints. So there appears to be quite some variability on bandwidth consumption reduction. I wonder why that is. Is it due to the trade-off tweaking? The original paper quoted a 40 percent figure. One next step would be to test out increasing the number of connections and see if bandwidth stays approximately constant which was the other objective from the paper.

@naumenkogs
Copy link
Owner Author

naumenkogs commented Aug 26, 2021

@michaelfolkson yeah I think that #7 perfectly describes why we don't see 40% here: we had to make protocol changes since the paper :) The original idea was a bit too optimistic.

W.r.t extra connections, good suggestion to test. To test with 12 connections, one has to make 2 changes:

  1. Recompile core with the following change: static const int MAX_OUTBOUND_FULL_RELAY_CONNECTIONS = 12; in net.h (I don't see there is an easier way to make 12 out manual conns)
  2. Expand bitcoind starting call with -addnode=143.198.185.21:8210 -addnode=143.198.185.21:8211 -addnode=143.198.185.21:8212 -addnode=143.198.185.21:8213.

@0xB10C
Copy link

0xB10C commented Aug 31, 2021

Some more stats for the last 7 days (we reset the stats on 2021-08-24 21:30 UTC).

This time without much commentary as the observations are similar to #8 (comment)

image

image

image

image

image


I've reset the stats again and now run both nodes with 12 full-relay outbound connections to @naumenkogs erlay nodes. I'm using the ports 8201 till 8213 excluding 8209 and this patch:

--- a/src/net.h
+++ b/src/net.h
@@ -59,9 +59,9 @@ static const unsigned int MAX_PROTOCOL_MESSAGE_LENGTH = 4 * 1000 * 1000;
 /** Maximum length of the user agent string in `version` message */
 static const unsigned int MAX_SUBVERSION_LENGTH = 256;
 /** Maximum number of automatic outgoing nodes over which we'll relay everything (blocks, tx, addrs, etc) */
-static const int MAX_OUTBOUND_FULL_RELAY_CONNECTIONS = 8;
+static const int MAX_OUTBOUND_FULL_RELAY_CONNECTIONS = 12;
 /** Maximum number of addnode outgoing nodes */
-static const int MAX_ADDNODE_CONNECTIONS = 8;
+static const int MAX_ADDNODE_CONNECTIONS = 12;
 /** Maximum number of block-relay-only outgoing connections */
 static const int MAX_BLOCK_RELAY_ONLY_CONNECTIONS = 2;
 /** Maximum number of feeler connections */

@naumenkogs
Copy link
Owner Author

naumenkogs commented Sep 1, 2021

Yeah so it seems like the savings went from 20% for 8-conns to 30% for 12-conns. This is good, but Erlay actually does better than that.

The thing is that making 12 conns while keeping the in/out % of flooding, and reconciliation frequency (8s for each peer) the same doesn't make sense I think. It reduces latency, but we never asked for this.

To keep the same latency as 8-conn erlay and get better performance, we need to reduce % and increase frequency.
This is also reflected in #7 for 12-conns .
So the patch should be expanded:

root@ubuntu-s-4vcpu-8gb-nyc1-01:~/bitcoin# git diff
diff --git a/src/txreconciliation.cpp b/src/txreconciliation.cpp
index 85c22c0ee..c02852b78 100644
--- a/src/txreconciliation.cpp
+++ b/src/txreconciliation.cpp
@@ -16,8 +16,8 @@ constexpr uint32_t RECON_VERSION = 1;
 /** Static component of the salt used to compute short txids for inclusion in sketches. */
 const std::string RECON_STATIC_SALT = "Tx Relay Salting";
 /** Announce transactions via full wtxid to a limited number of inbound and outbound peers. */
-constexpr double INBOUND_FANOUT_DESTINATIONS_PERCENT = 0.1;
-constexpr double OUTBOUND_FANOUT_DESTINATIONS_PERCENT = 0.25;
+constexpr double INBOUND_FANOUT_DESTINATIONS_PERCENT = 0.02;
+constexpr double OUTBOUND_FANOUT_DESTINATIONS_PERCENT = 0.05;
 /** The size of the field, used to compute sketches to reconcile transactions (see BIP-330). */
 constexpr unsigned int RECON_FIELD_SIZE = 32;
 /**
@@ -53,7 +53,7 @@ constexpr uint16_t Q_PRECISION{(2 << 14) - 1};
  * due to reconciliation metadata (sketch sizes etc.), which would nullify the efficiency.
  * Less frequent reconciliations would introduce high transaction relay latency.
  */
-constexpr std::chrono::microseconds RECON_REQUEST_INTERVAL{8s};
+constexpr std::chrono::microseconds RECON_REQUEST_INTERVAL{12s};
 /**
  * We should keep an interval between responding to reconciliation requests from the same peer,
  * to reduce potential DoS surface.

This patch should be applied on all involved nodes. I'm doing so on my 12 nodes, @0xB10C could you do the same on your side and re-start measuring bandwidth?

Note that doing so from my side makes the whole setting suited for 12-conns. 8-conn comparison experiments become less fair.

@0xB10C
Copy link

0xB10C commented Sep 1, 2021

To archive this: Bandwidth and messages after 24h without the patch mentioned in #8 (comment)
image

image


Updated my erlay node to use the following patch. Master node still uses this patch #8 (comment).

diff --git a/src/net.h b/src/net.h
index 12d282b85..ae66a4426 100644
--- a/src/net.h
+++ b/src/net.h
@@ -59,9 +59,9 @@ static const unsigned int MAX_PROTOCOL_MESSAGE_LENGTH = 4 * 1000 * 1000;
 /** Maximum length of the user agent string in `version` message */
 static const unsigned int MAX_SUBVERSION_LENGTH = 256;
 /** Maximum number of automatic outgoing nodes over which we'll relay everything (blocks, tx, addrs, etc) */
-static const int MAX_OUTBOUND_FULL_RELAY_CONNECTIONS = 8;
+static const int MAX_OUTBOUND_FULL_RELAY_CONNECTIONS = 12;
 /** Maximum number of addnode outgoing nodes */
-static const int MAX_ADDNODE_CONNECTIONS = 8;
+static const int MAX_ADDNODE_CONNECTIONS = 12;
 /** Maximum number of block-relay-only outgoing connections */
 static const int MAX_BLOCK_RELAY_ONLY_CONNECTIONS = 2;
 /** Maximum number of feeler connections */
diff --git a/src/txreconciliation.cpp b/src/txreconciliation.cpp
index 00e220ecf..0937e2bc4 100644
--- a/src/txreconciliation.cpp
+++ b/src/txreconciliation.cpp
@@ -16,8 +16,8 @@ constexpr uint32_t RECON_VERSION = 1;
 /** Static component of the salt used to compute short txids for inclusion in sketches. */
 const std::string RECON_STATIC_SALT = "Tx Relay Salting";
 /** Announce transactions via full wtxid to a limited number of inbound and outbound peers. */
-constexpr double INBOUND_FANOUT_DESTINATIONS_FRACTION = 0.1;
-constexpr double OUTBOUND_FANOUT_DESTINATIONS_FRACTION = 0.1;
+constexpr double INBOUND_FANOUT_DESTINATIONS_FRACTION = 0.02;
+constexpr double OUTBOUND_FANOUT_DESTINATIONS_FRACTION = 0.05;
 /** The size of the field, used to compute sketches to reconcile transactions (see BIP-330). */
 constexpr unsigned int RECON_FIELD_SIZE = 32;
 /**
@@ -53,7 +53,7 @@ constexpr uint16_t Q_PRECISION{(2 << 14) - 1};
  * due to reconciliation metadata (sketch sizes etc.), which would nullify the efficiency.
  * Less frequent reconciliations would introduce high transaction relay latency.
  */
-constexpr std::chrono::microseconds RECON_REQUEST_INTERVAL{8s};
+constexpr std::chrono::microseconds RECON_REQUEST_INTERVAL{12s};
 /**
  * We should keep an interval between responding to reconciliation requests from the same peer,
  * to reduce potential DoS surface.

@naumenkogs
Copy link
Owner Author

According to my estimates in #7, we have Z=5/12=0.41, which yields ~67% overall bandwidth savings for the entire tx relay. This is inline with the observations from @0xB10C.

@0xB10C
Copy link

0xB10C commented Sep 16, 2021

According to my estimates in #7, we have Z=5/12=0.41, which yields ~67% overall bandwidth savings for the entire tx relay. This is inline with the observations from @0xB10C.

Measurements between 2021-09-01 and 2021-09-13 as screenshots:

image

image

image

@naumenkogs
Copy link
Owner Author

My nodes are currently pruned, so please sync from the network first, and restart with the command above only when you're at the tip.

@kcalvinalvin
Copy link

I've set up two separate machines at home for testing. One is running 2021-03-erlay branch and the other is running Bitcoin Core v22.0.0. bitcoin.conf setting for both are like below with rpc settings added on. I don't think it matters but one has prune=550 and the other doesn't.

maxconnections=0
addnode=143.198.185.21:8201
addnode=143.198.185.21:8202
addnode=143.198.185.21:8203
addnode=143.198.185.21:8204
addnode=143.198.185.21:8205
addnode=143.198.185.21:8206
addnode=143.198.185.21:8207
addnode=143.198.185.21:8208

After ~40 hours, I used the command bitcoin-0.21.1/bin/bitcoin-cli -rpcport=8109 getpeerinfo | grep 'inv\|sketch\|reqrecon\|reqsketchext\|reconcildiff' | awk '/[0-9]+/ {gsub(/[^0-9]/, "", $0); sum+=$0} END {print sum}'..

Erlay node is returning 92006977 and Bitcoin Core v22.0.0 is returning 153035821. Seems like the Bitcoin core node is using ~66.3% more bandwidth.

@hebasto
Copy link

hebasto commented Jan 19, 2022

Tested bitcoin/bitcoin#21515 (i.e. 2021-03-erlay) on commit 5728fac4d3d29f64ea811f5978e80dabdc083d87 in comparison with the master branch on bitcoin/bitcoin@d94dc69.

The same patchset which collects and presents data has been applied to both branches. The actually tested branches are: BASE and TEST.

Two nodes, BASE and TEST, were running simultaneously for 89 hours with 8 addnoded connections to Erlay peers only.

"Tx" statistics is related to TX network messages.
"Erlay" statistics is related to newly introduced Erlay protocol network messages.

Results look stable. I mean, with time progress relative data do not change significantly.

Data BASE TEST Diff
Screenshot from 2022-01-19 12-03-54 Screenshot from 2022-01-19 12-03-59
Total Received 746 MB 734 MB 98.4%
Total Sent 258 MB 131 MB 50.8%
Total R+S 1004 MB 865 MB 86.2%

UPDATE: Reasoning about numbers makes me think that we can increase MAX_OUTBOUND_FULL_RELAY_CONNECTIONS by one only with the current (tested) Erlay settings. -- See the following comment :)

@hebasto
Copy link

hebasto commented Jan 21, 2022

Another test: BASE (8 addnoded Erlay peers) vs TEST (12 addnoded Erlay peers), ~48 hours.

Data BASE TEST Diff
Screenshot from 2022-01-21 22-18-00 Screenshot from 2022-01-21 22-18-15
Total Received 459 MB 463 MB 100.9%
Total Sent 153 MB 96 MB 62.7%
Total R+S 612 MB 559 MB 91.3%
Peers 8 12 150.0%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants