Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get statistics on transport used for incoming connection by agent version #2205

Closed
MarcoPolo opened this issue Mar 20, 2023 · 12 comments
Closed
Labels
P3 Low: Not priority right now

Comments

@MarcoPolo
Copy link
Collaborator

It would be useful to see the transport breakdown for incomming connections by
agent version so that we could see:

  1. What percent of newer nodes are dialing via QUIC?
  2. Does a change to dialing negatively or positively impact to prevalence of
    QUIC? (smart dialing logic)
  3. Are other implementations increasing QUIC prevalence?

I wonder if this is something the folks at probe-lab could help with since it's
information about the whole libp2p network and not just go-libp2p

@MarcoPolo
Copy link
Collaborator Author

cc @yiannisbot @dennis-tra for possible probelab help.

@dennis-tra
Copy link
Contributor

is "nodes are dialing via QUIC" == "nodes listening on UDP/QUIC" true? I think you can theoretically listen on, e.g. just TCP but dial via QUIC - but that should be the exception. If we assumed the above, I could give you numbers on the distribution for the several DHT networks.

@MarcoPolo
Copy link
Collaborator Author

Nodes may be on a network that blocks udp. (Udp black hole)

@dennis-tra
Copy link
Contributor

dennis-tra commented Mar 21, 2023

sure! I thought, as a first-order approximation, the above assumption would be good enough (we'd have that data readily available). I actually have no idea or intuition about how prevalent networks that block UDP are.


If that's an unsuitable way forward, I think we'd indeed need to track incoming connections (or rather dial attempts). A simple idea would be to deploy libp2p hosts in, e.g., the IPFS (or whatever) network and then keep statistics about incoming dials. Not entirely sure how the go-libp2p behaviour of concurrently dialling up to eight addresses would impact the measurement. It could theoretically happen that if a TCP and a QUIC dial race each other that we (for whatever reason - packets dropped, UDP blocked, etc.) only see the TCP connection attempt. This would mean we would miscount "What percent of newer nodes are dialing via QUIC?".


From an implementation perspective, I know there's a mechanism to get notified about connection lifecycle events. Is there something similar for transports? In the past, I just wrapped the QUIC/TCP/WS transports to get notified if the Dial method was called. However, this is the wrong way around. I would get notified about outgoing dials but want to be notified about incoming dials.


@yiannisbot, something to discuss during our colo on Thursday 👍

@MarcoPolo
Copy link
Collaborator Author

sure! I thought, as a first-order approximation, the above assumption would be good enough (we'd have that data readily available). I actually have no idea or intuition about how prevalent networks that block UDP are.

That's partly why I want to get this information. I also want to get the information of how many nodes in the wild are choosing QUIC of their own volition rather than just accepting QUIC. My guess is these numbers are roughly the same, but we'd need to measure to be sure.

I know there's a mechanism to get notified about connection lifecycle events. Is there something similar for transports?

You can get the transport from the conn object there. You can also get the peer id, and then dedupe things later if needed (e.g. if you get both a tcp and quic conn, this should be counted as part of the peers who dial via quic set).

Happy to spend an hour to pair on this to kick it off :)

@dennis-tra
Copy link
Contributor

That's partly why I want to get this information. I also want to get the information of how many nodes in the wild are choosing QUIC of their own volition rather than just accepting QUIC. My guess is these numbers are roughly the same, but we'd need to measure to be sure.

I see 👍

Happy to spend an hour to pair on this to kick it off :)

Cool! Would love to hear more context. We didn't manage to chat about this during our colo yesterday. I'll bring it up during our weekly sync today :)

@yiannisbot
Copy link

Could we use the honeypot, or the infrastructure we want to have set up from several vantage points to count incoming connections over QUIC? Monitoring is not going to be from many vantage points, but over a long period of time, it should be a statistically stable sample.

@dennis-tra
Copy link
Contributor

Meeting with Marco 2023-04-03:

  • subscribe to connect messages. Labels transport, agent_version
    • agent version would be stripped to kubo/x.y.z + all others
  • expose prometheus metric in our DHT lookup measurement
    • increment every time a connect event occurs
    • first wait for identify to complete

alternative approach, using a database:

  • create a connections table. Columns:
    • peer_id
    • transport
    • timestamp
    • agent_version
    • ip_address
    • security transport
    • multiplexer yamux/mplex

@dennis-tra
Copy link
Contributor

Hi Marco, copying over our discussion from Slack here.

I took all the connections that I’ve tracked and found which peers connected via quic (and also tcp) and which peers only connected via tcp. I only considered those peers that connected at least 10 times with us, so that we have a reasonable chance that a quic connection has succeeded if that peer dials with quic.
Then, I checked which agent versions these peers reported and tallied them up by agent version. This is the result:

output

For most of the connections I couldn't record an agent version (see the bar on the far right). Maybe I’m doing something wrong? This is the relevant line.

Another angle to look at the data. Here are all connection events and their corresponding transport:

output2

@p-shahi p-shahi added the kind/discussion Topical discussion; usually not changes to codebase label May 22, 2023
@p-shahi p-shahi added P3 Low: Not priority right now and removed kind/discussion Topical discussion; usually not changes to codebase labels May 30, 2023
@sukunrt
Copy link
Member

sukunrt commented Jun 3, 2023

@dennis-tra

For most of the connections I couldn't record an agent version (see the bar on the far right). Maybe I’m doing something wrong? This is the relevant line.

The code that you have pointed to uses go-libp2p v0.26.3
Maybe you're running into this race condition with using IdentifyWait within a net.notifee
#2173

@dennis-tra
Copy link
Contributor

nice, thanks for the pointer @sukunrt! This IdentifyWait thing was bugging me not only for this - great to see the race condition go away 👍

@MarcoPolo
Copy link
Collaborator Author

I think we can close this issue now. Thanks @dennis-tra !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 Low: Not priority right now
Projects
Status: 🎉 Done
Development

No branches or pull requests

5 participants