Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling relay client deteriorates p2p connectivity #9751

Closed
3 tasks done
smrz2001 opened this issue Mar 24, 2023 · 12 comments
Closed
3 tasks done

Enabling relay client deteriorates p2p connectivity #9751

smrz2001 opened this issue Mar 24, 2023 · 12 comments
Labels
kind/bug A bug in existing code (including security flaws) kind/stale need/author-input Needs input from the original author

Comments

@smrz2001
Copy link
Contributor

smrz2001 commented Mar 24, 2023

Checklist

Installation method

built from source

Version

For our node:

/ # ipfs version --all
Kubo version: 0.18.1-675f8bd-dirty
Repo version: 13
System version: amd64/linux
Golang version: go1.19.1

For our partner's node:

/ # ipfs version --all
Kubo version: 0.18.1-675f8bd-dirty
Repo version: 13
System version: amd64/linux
Golang version: go1.19.1


### Config

```json
For our node:

/ # ipfs config show
{
  "API": {
    "HTTPHeaders": {}
  },
  "Addresses": {
    "API": "/ip4/0.0.0.0/tcp/5011",
    "Announce": [],
    "AppendAnnounce": [],
    "Gateway": "/ip4/0.0.0.0/tcp/9011",
    "NoAnnounce": [],
    "Swarm": [
      "/ip4/0.0.0.0/tcp/4010",
      "/ip4/0.0.0.0/tcp/4011/ws"
    ]
  },
  "AutoNAT": {},
  "Bootstrap": [
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt",
    "/ip4/104.131.131.82/tcp/4001/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
    "/ip4/104.131.131.82/udp/4001/quic/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ"
  ],
  "DNS": {
    "Resolvers": {}
  },
  "Datastore": {
    "BloomFilterSize": 0,
    "GCPeriod": "1h",
    "HashOnRead": false,
    "Spec": {
      "mounts": [
        {
          "child": {
            "accessKey": "...",
            "bucket": "...",
            "keyTransform": "next-to-last/2",
            "region": "...",
            "rootDirectory": "ipfs/blocks",
            "secretKey": "...",
            "type": "s3ds"
          },
          "mountpoint": "/blocks",
          "prefix": "s3.datastore",
          "type": "measure"
        },
        {
          "child": {
            "compression": "none",
            "path": "datastore",
            "type": "levelds"
          },
          "mountpoint": "/",
          "prefix": "leveldb.datastore",
          "type": "measure"
        }
      ],
      "type": "mount"
    },
    "StorageGCWatermark": 90,
    "StorageMax": "10GB"
  },
  "Discovery": {
    "MDNS": {
      "Enabled": true,
      "Interval": 10
    }
  },
  "Experimental": {
    "AcceleratedDHTClient": false,
    "FilestoreEnabled": false,
    "GraphsyncEnabled": false,
    "Libp2pStreamMounting": false,
    "P2pHttpProxy": false,
    "StrategicProviding": false,
    "UrlstoreEnabled": false
  },
  "Gateway": {
    "APICommands": [],
    "HTTPHeaders": {
      "Access-Control-Allow-Headers": [
        "X-Requested-With",
        "Range",
        "User-Agent"
      ],
      "Access-Control-Allow-Methods": [
        "GET"
      ],
      "Access-Control-Allow-Origin": [
        "*"
      ]
    },
    "NoDNSLink": false,
    "NoFetch": false,
    "PathPrefixes": [],
    "PublicGateways": null,
    "RootRedirect": "",
    "Writable": false
  },
  "Identity": {
    "PeerID": "QmUvEKXuorR7YksrVgA7yKGbfjWHuCRisw2cH9iqRVM9P8"
  },
  "Internal": {},
  "Ipns": {
    "RecordLifetime": "",
    "RepublishPeriod": "",
    "ResolveCacheSize": 128
  },
  "Migration": {
    "DownloadSources": [],
    "Keep": ""
  },
  "Mounts": {
    "FuseAllowOther": false,
    "IPFS": "/ipfs",
    "IPNS": "/ipns"
  },
  "Peering": {
    "Peers": [
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-private-mainnet-external.3boxlabs.com/tcp/4011/ws/p2p/QmXALVsXZwPWTUbsT8G6VVzzgTJaAWRUD7FWL5f7d5ubAL"
        ],
        "ID": "QmXALVsXZwPWTUbsT8G6VVzzgTJaAWRUD7FWL5f7d5ubAL"
      },
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-private-cas-mainnet-external.3boxlabs.com/tcp/4011/ws/p2p/QmUvEKXuorR7YksrVgA7yKGbfjWHuCRisw2cH9iqRVM9P8"
        ],
        "ID": "QmUvEKXuorR7YksrVgA7yKGbfjWHuCRisw2cH9iqRVM9P8"
      },
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-elp-1-1-external.3boxlabs.com/tcp/4011/ws/p2p/QmUiF8Au7wjhAF9BYYMNQRW5KhY7o8fq4RUozzkWvHXQrZ"
        ],
        "ID": "QmUiF8Au7wjhAF9BYYMNQRW5KhY7o8fq4RUozzkWvHXQrZ"
      },
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-elp-1-2-external.3boxlabs.com/tcp/4011/ws/p2p/QmRNw9ZimjSwujzS3euqSYxDW9EHDU5LB3NbLQ5vJ13hwJ"
        ],
        "ID": "QmRNw9ZimjSwujzS3euqSYxDW9EHDU5LB3NbLQ5vJ13hwJ"
      },
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-private-cas-clay-external.3boxlabs.com/tcp/4011/ws/p2p/QmbeBTzSccH8xYottaYeyVX8QsKyox1ExfRx7T1iBqRyCd"
        ],
        "ID": "QmbeBTzSccH8xYottaYeyVX8QsKyox1ExfRx7T1iBqRyCd"
      }
    ]
  },
  "Pinning": {
    "RemoteServices": {}
  },
  "Plugins": {
    "Plugins": null
  },
  "Provider": {
    "Strategy": ""
  },
  "Pubsub": {
    "DisableSigning": false,
    "Enabled": true,
    "Router": "",
    "SeenMessagesTTL": "10m"
  },
  "Reprovider": {},
  "Routing": {},
  "Swarm": {
    "AddrFilters": null,
    "ConnMgr": {},
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": false,
    "RelayClient": {
      "Enabled": false
    },
    "RelayService": {},
    "Transports": {
      "Multiplexers": {},
      "Network": {},
      "Security": {}
    }
  },
  "algorithm": "rsa"
}

For our partner's node:

/ # ipfs config show
{
  "API": {
    "HTTPHeaders": {}
  },
  "Addresses": {
    "API": "/ip4/0.0.0.0/tcp/5001",
    "Announce": [],
    "AppendAnnounce": [],
    "Gateway": "/ip4/0.0.0.0/tcp/8080",
    "NoAnnounce": [],
    "Swarm": [
      "/ip4/0.0.0.0/tcp/4001",
      "/ip4/0.0.0.0/tcp/8081/ws"
    ]
  },
  "AutoNAT": {},
  "Bootstrap": [
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt",
    "/ip4/104.131.131.82/tcp/4001/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
    "/ip4/104.131.131.82/udp/4001/quic/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ"
  ],
  "DNS": {
    "Resolvers": {}
  },
  "Datastore": {
    "BloomFilterSize": 0,
    "GCPeriod": "1h",
    "HashOnRead": false,
    "Spec": {
      "mounts": [
        {
          "child": {
            "accessKey": "...",
            "bucket": "...",
            "keyTransform": "next-to-last/2",
            "region": "...",
            "rootDirectory": "root",
            "secretKey": "...",
            "type": "s3ds"
          },
          "mountpoint": "/blocks",
          "prefix": "s3.datastore",
          "type": "measure"
        },
        {
          "child": {
            "compression": "none",
            "path": "datastore",
            "type": "levelds"
          },
          "mountpoint": "/",
          "prefix": "leveldb.datastore",
          "type": "measure"
        }
      ],
      "type": "mount"
    },
    "StorageGCWatermark": 90,
    "StorageMax": "10GB"
  },
  "Discovery": {
    "MDNS": {
      "Enabled": true
    }
  },
  "Experimental": {
    "AcceleratedDHTClient": false,
    "FilestoreEnabled": true,
    "GraphsyncEnabled": false,
    "Libp2pStreamMounting": true,
    "P2pHttpProxy": true,
    "StrategicProviding": false,
    "UrlstoreEnabled": true
  },
  "Gateway": {
    "APICommands": [],
    "HTTPHeaders": {
      "Access-Control-Allow-Headers": [
        "X-Requested-With",
        "Range",
        "User-Agent"
      ],
      "Access-Control-Allow-Methods": [
        "GET"
      ],
      "Access-Control-Allow-Origin": [
        "*"
      ]
    },
    "NoDNSLink": false,
    "NoFetch": false,
    "PathPrefixes": [],
    "PublicGateways": null,
    "RootRedirect": "",
    "Writable": false
  },
  "Identity": {
    "PeerID": "QmQb86uUqpB8EsV1nCUvjLZm4FQSe8Bkaw8MXNSWKt8WxG"
  },
  "Internal": {},
  "Ipns": {
    "RecordLifetime": "",
    "RepublishPeriod": "",
    "ResolveCacheSize": 128
  },
  "Migration": {
    "DownloadSources": [],
    "Keep": ""
  },
  "Mounts": {
    "FuseAllowOther": false,
    "IPFS": "/ipfs",
    "IPNS": "/ipns"
  },
  "Peering": {
    "Peers": [
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-private-mainnet-external.3boxlabs.com/tcp/4011/ws/p2p/QmXALVsXZwPWTUbsT8G6VVzzgTJaAWRUD7FWL5f7d5ubAL"
        ],
        "ID": "QmXALVsXZwPWTUbsT8G6VVzzgTJaAWRUD7FWL5f7d5ubAL"
      },
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-private-cas-mainnet-external.3boxlabs.com/tcp/4011/ws/p2p/QmUvEKXuorR7YksrVgA7yKGbfjWHuCRisw2cH9iqRVM9P8"
        ],
        "ID": "QmUvEKXuorR7YksrVgA7yKGbfjWHuCRisw2cH9iqRVM9P8"
      },
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-elp-1-1-external.3boxlabs.com/tcp/4011/ws/p2p/QmUiF8Au7wjhAF9BYYMNQRW5KhY7o8fq4RUozzkWvHXQrZ"
        ],
        "ID": "QmUiF8Au7wjhAF9BYYMNQRW5KhY7o8fq4RUozzkWvHXQrZ"
      },
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-elp-1-2-external.3boxlabs.com/tcp/4011/ws/p2p/QmRNw9ZimjSwujzS3euqSYxDW9EHDU5LB3NbLQ5vJ13hwJ"
        ],
        "ID": "QmRNw9ZimjSwujzS3euqSYxDW9EHDU5LB3NbLQ5vJ13hwJ"
      },
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-private-cas-clay-external.3boxlabs.com/tcp/4011/ws/p2p/QmbeBTzSccH8xYottaYeyVX8QsKyox1ExfRx7T1iBqRyCd"
        ],
        "ID": "QmbeBTzSccH8xYottaYeyVX8QsKyox1ExfRx7T1iBqRyCd"
      }
    ]
  },
  "Pinning": {
    "RemoteServices": {}
  },
  "Plugins": {
    "Plugins": null
  },
  "Provider": {
    "Strategy": ""
  },
  "Pubsub": {
    "DisableSigning": false,
    "Enabled": true,
    "Router": "",
    "SeenMessagesTTL": "10m0s"
  },
  "Reprovider": {},
  "Routing": {
    "Methods": null,
    "Routers": null
  },
  "Swarm": {
    "AddrFilters": null,
    "ConnMgr": {},
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": false,
    "RelayClient": {
      "Enabled": false
    },
    "RelayService": {},
    "ResourceMgr": {},
    "Transports": {
      "Multiplexers": {},
      "Network": {},
      "Security": {}
    }
  },
  "algorithm": "rsa"
}

Description

I'll preface the description by saying that we observed the issue on the latest version of Kubo at the time (v0.18.1). I could not find any relevant issues between then and the new release, and at this point will not be able to upgrade our nodes to try to recreate the issue, especially since pubsub is deprecated in v0.19.0 and we need pubsub for the time being.

Our investigation ran as follows:

  • The Relay Client was enabled on both nodes but we continued to see timeouts looking up Ceramic stream commit CIDs from our node that were definitively present on our partner's node. Conversely, their node would have trouble loading anchor commit CIDs from ours.

Side note: We spent several days upgrading all the IPFS HTTP client and related dependencies in our Ceramic code to be able to use Kubo v0.18.1. Enabling the Relay Client caused quic-v1 multiaddrs to be used, which caused our Ceramic nodes to crash because our IPFS HTTP client did not like the new multiaddrs. We were unable to apply @lidel's recommended patches because our monorepo packaging tool lerna does not support applying patches to dependencies during builds 😭

We operated under the assumption that disabling the Relay Client would have been a regression and so forged ahead trying to keep it enabled despite the additional changes required in the Ceramic code.

  • On both nodes, when running ipfs swarm peers | grep <multiaddr> we were only able to see p2p-circuit multiaddrs, even though both nodes' advertised multiaddrs are publicly accessible.

  • We even added our node to our partner's node under the Peering section of the configuration, and their node to our node's configuration, but this also didn't help with either connectivity or with the multiaddrs showing up in ipfs swarm peers. (Note that the config above no longer has our partner node's peer ID in our configuration - we decided to have all partner nodes have our Kubo nodes in their Peering config.)

  • We tried ipfs swarm disconnect <multiaddr> on both sides as well to see if anything changed when they reconnected but the behavior remained the same.

  • We tried disabling the Relay Client and repeating the previous step, but again nothing changed.

  • When we then explicitly disconnected from the Circuit Relay bootstrap nodes after disabling the Relay Client, it all started working perfectly! We saw direct swarm connections from both sides and the lookup timeouts practically disappeared. (I'm a little fuzzy on which list of nodes exactly we disconnected from but I know for sure they were related to p2p-circuit because we just weren't getting direct swarm connections even after disabling the Relay Client.)

  • Our final setup (also codified in our wrapper image for Kubo) now has the Relay Client explicitly disabled in the configuration.

Please let us know if there any additional information we can provide, or if someone would like to pair and run some tests on our nodes.

Sorry for not having more data here - we were doing all of this around midnight the night before our ComposeDB Beta launch at EthDenver. We just wanted to get things working and weren't thinking about collecting data for later :(

cc @Jorropo @lidel @BigLep

@smrz2001 smrz2001 added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Mar 24, 2023
@BigLep BigLep mentioned this issue Mar 25, 2023
@BigLep
Copy link
Contributor

BigLep commented Mar 25, 2023

Thanks for reporting @smrz2001!

We expect this will be fixed by upgrading to go-libp2p 0.26.4 per https://github.com/libp2p/go-libp2p/releases/tag/v0.26.4 and libp2p/go-libp2p#2208 . We will do this in 0.19.1 early week of 2023-03-27: #9754

@BigLep
Copy link
Contributor

BigLep commented Mar 25, 2023

I assigned to you @Jorropo as I assume you'll do the go-libp2p update.

@Jorropo
Copy link
Contributor

Jorropo commented Mar 25, 2023

@BigLep this is a multifacet bug.
This may help but I think libp2p/go-libp2p#1603 have to solved to consider this complete.

@BigLep
Copy link
Contributor

BigLep commented Mar 25, 2023

Oh ok, thanks - good to know!

@BigLep
Copy link
Contributor

BigLep commented Apr 6, 2023

Not resolving but for visibility 0.19.1 did ship with the updated go-lib2p version which "may help".

@aschmahmann aschmahmann added status/blocked Unable to be worked further until needs are met and removed need/triage Needs initial labeling and prioritization labels May 22, 2023
@aschmahmann
Copy link
Contributor

This seems to be blocked on the linked go-libp2p issues. If with the latest go-libp2p (which fixes some but not all of the related issues) this is causing you problems post back.

@lidel
Copy link
Member

lidel commented Jun 12, 2023

Triage update: we had go-libp2p updates recently, @Jorropo will check if he has time.

@Jorropo
Copy link
Contributor

Jorropo commented Jun 26, 2023

Things are dodgy while libp2p/go-libp2p#1603 is still not fixed, this can cause issues like this. I think it's better use of time to fix this in go-libp2p before trying to debug further.

@Jorropo Jorropo removed their assignment Jun 26, 2023
@lidel
Copy link
Member

lidel commented Sep 18, 2023

Triage note: seems that the last PR we are waiting for is libp2p/go-libp2p#2542 and then we need go-libp2p release to fix this.

@lidel
Copy link
Member

lidel commented Nov 20, 2023

Triage notes:

@lidel lidel added need/author-input Needs input from the original author and removed status/blocked Unable to be worked further until needs are met labels Nov 20, 2023
Copy link

Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 7 days.

@smrz2001
Copy link
Contributor Author

Triage notes:

Thanks everyone for working on this!!

@lidel, connecting from a local Kubo 0.24.0 node with the Relay Client enabled to one of our IPFS nodes (on Kubo v0.19.1) appears to be working, and the swarm connection is stable. I'm also able to fetch recent CIDs from the infra node via the local node.

It will be difficult to retest the latest release with the configuration we were running when we saw the issue, unfortunately, but from the above test, it feels reasonable to assume that the issue has, in fact, been fixed.

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) kind/stale need/author-input Needs input from the original author
Projects
No open projects
Archived in project
Development

No branches or pull requests

5 participants