Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libp2p doesn't always clean up connections #295

Closed
dirkmc opened this issue Dec 10, 2018 · 5 comments
Closed

libp2p doesn't always clean up connections #295

dirkmc opened this issue Dec 10, 2018 · 5 comments
Assignees
Labels
exp/expert Having worked on the specific codebase is important kind/bug A bug in existing code (including security flaws) P1 High: Likely tackled by core team if no one steps up

Comments

@dirkmc
Copy link
Contributor

dirkmc commented Dec 10, 2018

I've created a minimal repo with a test case demonstrating an issue with libp2p cleaning up connections:
https://github.com/dirkmc/simultaneous-connection-test

It seems like when a connection is made between two peers, each dialing the other at the same time, libp2p loses track of some of the connections. Consequently, when one of the peers is shut down, only one half of the connection is closed. This means that any pull stream that is waiting on the end of the connection will never terminate.

Note: I'm not sure if this is an issue with libp2p, libp2p-switch or libp2p-mplex or websocket-star so I created a separate repo in which to test it out.

To try it out, clone the repo above, npm install and then run
DEBUG=conn-test,conn-test:* yarn test:node

Sample output:
screen shot 2018-12-10 at 6 06 43 pm

@parkan
Copy link

parkan commented Dec 10, 2018

@daviddias can you help us route this more appropriately? I think @dirkmc is right to not jump to conclusions as to the exact culprit repository, but I'm concerned that this might get overlooked without direct maintainer attention 😄

@jacobheun
Copy link
Contributor

Thanks for the reproduction case @dirkmc, this is great! This is an issue with libp2p-switch. Currently switch keeps track of connections in a hash map with the peerId as the key, which creates this problem. Dialing out is easy to resolve because we know the peerid in advance and can check the existence of the connection. When we receiving an incoming connection, we don't know the peerid until crypto has completed. Right now, switch isn't properly managing those different connections. On incoming connections once switch knows the peerid it should be checking for an existing connection and then switching over to that. There's more we can be doing with that in the future to determine the best connection and switch over to that.

We need to be keeping track of more than 1 connection per peerid, so we can properly handle this scenario. In the future we can look at improving how we handle multiple connections to a single peer.

I've created https://github.com/libp2p/js-libp2p-switch/issues/288 to track this there since I can't move the issue.

@jacobheun jacobheun added kind/bug A bug in existing code (including security flaws) exp/expert Having worked on the specific codebase is important status/ready Ready to be worked P1 High: Likely tackled by core team if no one steps up labels Dec 11, 2018
@dirkmc
Copy link
Contributor Author

dirkmc commented Dec 11, 2018

Great, thanks @jacobheun!

@jacobheun
Copy link
Contributor

@dirkmc this should be resolved in the latest version of libp2p-switch@0.41.3. I ran it against your test repo with a clean npm install and the tests are passing. Let me know if you see any issues.

@jacobheun jacobheun self-assigned this Dec 14, 2018
@jacobheun jacobheun added status/in-progress In progress and removed status/ready Ready to be worked labels Dec 14, 2018
@dirkmc
Copy link
Contributor Author

dirkmc commented Dec 14, 2018

@jacobheun yes it works, thanks for the quick work!

@dirkmc dirkmc closed this as completed Dec 14, 2018
@ghost ghost removed the status/in-progress In progress label Dec 14, 2018
jacobheun pushed a commit to jacobheun/js-libp2p that referenced this issue Jul 29, 2019
forEach is 10x slower than a regular for(;;) loop, and it should
be avoided in hot code paths.
jacobheun added a commit to jacobheun/js-libp2p that referenced this issue Jul 29, 2019
* chore: update contributors

* chore: release version v0.29.0

* fix: move emitters to last thing in the method (libp2p#218)

* fix: move emitters to last thing in the method

* fix: setImmediate everything

* chore: update contributors

* chore: release version v0.29.1

* fix: move 'pull-stream' from devDependencies to dependencies (libp2p#220)

'pull-stream' package is needed in dependencies because it is used in './src/limit-dialer/queue.js'.

* chore: update deps

* chore: update contributors

* chore: release version v0.29.2

* feat: dial to PeerId and/or Multiaddr in addition to PeerInfo (libp2p#222)

* chore: update deps

* feat: support dial to peerId and/or multiaddr in adition to peerInfo

* chore: update CI

* chore: update contributors

* chore: release version v0.30.0

* chore: no sauce

* chore: update deps

* chore: update contributors

* chore: release version v0.31.0

* fix: use the right callback

* chore: update deps

* chore: update contributors

* chore: release version v0.31.1

* feat: increase maxListeners to Infinity (libp2p#226)

* feat: increase maxListeners to Infinity

ipfs/js-ipfs-bitswap#142 (comment)

* fix linting

* chore: update deps

* chore: update contributors

* chore: release version v0.31.2

* feat: p2p addrs situation (libp2p#229)

* chore: update gitignore and CI

* chore: update deps

* test: update tests to new p2p-webrtc-star multiaddr format

* chore: update contributors

* chore: release version v0.32.0

* chore: update deps

* chore: update contributors

* chore: release version v0.32.1

* fix: remove unused protocol-buffers dep (libp2p#230)

* chore: update contributors

* chore: release version v0.32.2

* chore: update deps

* chore: update contributors

* chore: release version v0.32.3

* chore: update deps

* fix: increase dial timeout

* chore: update contributors

* chore: release version v0.32.4

* feat: Circuit Relay (libp2p#224)

* chore: update deps

* chore: update contributors

* chore: release version v0.33.0

* fix: don't dial on relay if not enabled (libp2p#234)

* chore: update deps

* chore: fix package.json

* chore: update contributors

* chore: release version v0.33.1

* chore: update deps

* fix: don't dial circuit if no transports available (libp2p#236)

* chore: update contributors

* chore: release version v0.33.2

* fix: circuit dialing

* feat: fix circuit dialing

* chore: upgrade deps

* chore: update circle ci config

* chore: adding missing dev dependency

* fix: removing unused dependency

* test: adding tests

* fix: remove unused dep

* chore: updating CI files (libp2p#238)

* chore: update contributors

* chore: release version v0.34.0

* chore: use latest SECIO API

* chore: update deps

* feat: use latest secio API

* chore: update deps

* chore: update contributors

* chore: release version v0.35.0

* chore: update deps

* chore: update contributors

* chore: release version v0.35.1

* docs: update name references and API touches

* chore: update name references

* refactor: update name to switch, make it a class and rename start and stop methods

* test: refactor tcp transport tests to avoid code duplication

* test: reuse same test code for Websockets, remove code duplication

* test: update aegir pre and post hooks

* chore: use pre-push instead

* test: update and deduplicate code on stream muxing tests

* test: restructure test suits

* test: refactor swarm-no-muxing tests

* test: refactor circuit-relay tests

* test: refactor browser tests too

* style: fix linting

* fix: enableCircuitRelay is async and therefore needs a callback

* fix: transports.add does not need to be async at all

* docs: fix badges

* test: Linux does not like that we use multiple sockets with port 0

* test: fix test

* chore: update contributors

* chore: release version v0.36.0

* chore: update deps

* chore: update contributors

* chore: release version v0.36.1

* feat: use mplex, update CI

* docs: typo

* feat: observe traffic and expose statistics (libp2p#243)

* chore: update deps

* chore: update contributors

* chore: release version v0.37.0

* fix: for when handler func is not defined

* fix: for when peerinfo resolves to undefined

* chore: update contributors

* chore: release version v0.37.1

* chore: update deps

* chore: update contributors

* chore: release version v0.37.2

* fix: one more observer edge case

* chore: update deps

* chore: fix linting

* test: fix transport tests before all step by increasing the timeout

* chore: update contributors

* chore: release version v0.37.3

* chore: update deps

Chore which i think fixes this issue also
https://github.com/libp2p/js-libp2p-switch/issues/235

* fix: revert version back to the current release

fix for https://github.com/libp2p/js-libp2p-switch/pull/249/files#r178832198

* chore: update deps

* chore: update deps

* chore: update contributors

* chore: update deps

* test: timeout

* chore: update contributors

* chore: release version v0.39.0

* chore: update deps

* chore: update contributors

* chore: update deps

* chore: update contributors

* chore: release version v0.39.2

* feat: improve circuit err messages (libp2p#250)

* feat: improve circuit err handling

* feat: add test to to validate err when circuit not enabled

* refactor: update files and add jsdocs to improve readability

refactor: initial refactor of dial.js

refactor: add more jsdocs to dial and clean up some code

refactor: make get-peer-info more readable

fix: jsdocs in dial

docs: update some jsdocs

refactor: make dial.js a bit easier to consume

fix: fix linting

docs: add more jsdocs and comments

refactor: clean up dial methods and encryption order

* test: add tests for get-peer-info

* docs: remove answered todo comment

answered at libp2p/js-libp2p-switch#252 (comment)

* fix: dont create base conn when muxed exists

* fix: tests and conflicts

* chore: update deps

* chore: update contributors

* chore: release version v0.40.0

* test: fix require of multiplex

* fix: libp2p#189 Prevent self-dial

* test: add selfdial test

* chore: add lead maintainer

* chore: update contributors

* chore: update contributors

* chore: release version v0.40.1

* fix: return on call to nextMuxer

When the call to multistream.Dialer.select is unsuccessful, call nextMuxer to try select the next one in the list but do not continue executing callback afterwards.

License: MIT
Signed-off-by: Alan Shaw <alan@tableflip.io>

* fix: drop connection when stream ends unexpectedly

Pull streams pass true in the error position when the sream ends.
In https://github.com/multiformats/js-multistream-select/blob/5b19358b91850b528b3f93babd60d63ddcf56a99/src/select.js#L18-L21
...we're getting lots of instances of pull-length-prefixed stream
erroring early with `true` and it's passed back up to the dialer
in https://github.com/libp2p/js-libp2p-switch/blob/fef2d11850379a4720bb9c736236a81a067dc901/src/dial.js#L238-L241

The `_createMuxedConnection` contains an assumption that any error
that occurs when trying `_attemptMuxerUpgrade` is ok, and keeps the
relveant baseConnecton in the cache. If the pull-stream has ended
unexpectedly then keeping the connection arround starts causing
the "already piped" errors when we try and use the it later.

This PR adds a guard to avoid putting the connection back into the
cache if the stream has ended.

There is related work in an old PR to add a check for exactly this issue in
pull-length-prefixed dignifiedquire/pull-length-prefixed#8
...but it's still open, so this PR adds a check for true in
the error position at the site where the "already piped" errors
were appearing. Once the PR on pull-length-prefixed is merged this
check can be removed. It's not ideal to have it in this code as it
is far removed from the source, but it fixes the issue for now.

Arguably anywhere that `msDialer.handle` is called should do the
same check, but we're not seeing this error occur anywhere else so
to keep this PR small, I've left it as the minimal changeset to
fix the issue.

Of note, we had to add '/dns4/ws-star.discovery.libp2p.io/tcp/443/wss/p2p-websocket-star'
to the swarm config to trigger the "already piped" errors. There
is a minimal test app here https://github.com/tableflip/js-ipfs-already-piped-error

Manual testing shows ~50 streams fail in the first 2 mins of
running a node, and then things stabalise with ~90 active muxed
connections after that.

Fixes libp2p#235
Fixes ipfs/js-ipfs#1366
See dignifiedquire/pull-length-prefixed#8

License: MIT
Signed-off-by: Oli Evans <oli@tableflip.io>

* fix: add utility methods to prevent already piped error

* chore: update contributors

* chore: release version v0.40.2

* fix: prevent undefined error during a mutual hangup

* chore: update contributors

* chore: release version v0.40.3

* feat: swap quick-lru by hashlru

This removes the only dependency using generators in the ipfs/libp2p ecosystem.
Next version of create-react-app will support ipfs out-of-box with this change.

* chore: update contributors

* chore: release version v0.40.4

* fix: stats - observer expects protocolTag

* fix: re-enable stats tests in node

* chore: Upgrade big.js to 5.1.2

* chore: Change require('big.js') to require('big.js').Big

* chore: update contributors

* chore: release version v0.40.5

* fix: no stats on multistream proto dial

* fix: adjust test values

* fix: handle error in protocol handshake

* chore: update contributors

* chore: release version v0.40.6

* chore: remove travis and circleci

* Add private network support (libp2p#266)

* feat: add support for private networks

fix: update protector.protect usage
chore: fix linting and update deps
test: add secio to pnet tests
docs: add private network info the readme
chore: update pnet package version
test: add skipped test back in and update it

* fix: improve erroring around invalid peers

docs: add some comments
chore: update deps
test: simplify identify test

* chore: update contributors

* chore: release version v0.40.7

* test: add sample network circuit relay tests (libp2p#275)

* test: add sample network circuit relay tests

* test: use ephemeral ports

* chore: update deps

chore: remove test pre-push
chore: update test ports

* chore: update contributors

* chore: release version v0.40.8

* chore: update mplex and stats test numbers

* feat: make switch a state machine (libp2p#278)

* feat: add basic state machine functionality to switch

* feat: make connections state machines

* refactor: clean up logs

* feat: add dialFSM to the switch

* feat: add better support for closing connections

* test: add tests for some uncovered lines

* feat: add warning emitter for muxer upgrade failed

* docs: update readme

* chore: update contributors

* chore: release version v0.41.0

* fix: ignore dial request when one is in progress (libp2p#283)

* chore: update contributors

* chore: release version v0.41.1

* fix: improve connection closing and error handling (libp2p#285)

* fix: improve connection closing and error handling

* test: improve identify test

*  chore: update deps

* fix: only emit from connections if there is a listener

* test: add more connection tests

* chore: update libp2p-mplex

* fix: dont dial an address that we have

* fix: ensure circuit listens last on start

* chore: update npm publish files

* chore: update contributors

* chore: release version v0.41.2

* fix: use retimer to avoid creating so many timers (libp2p#289)

* use retimer to avoid scheduling so many timers

* Fixed linting

* fix: improve connection tracking and closing (libp2p#291)

* chore: update deps

* fix: check we have a proper transport before filtering addresses

* fix: improve connection close on stop

* fix: improve stat stopping

* test: fix stats test

* fix: improve tracking of open connections

* chore: remove log

* fix: stats stop in browser

chore: fix linting and browser tests

* fix: remove uneeded set peer info

* fix: abort the base connection on close

* fix: catch edge cases of dialTimeout calling back twice

* fix: close all connections instead of checking peerbook peers

* test: update dial fsm test waits

* test: make parallel dial tests deterministic

fix: improve logic around disconnecting

fix: remove duplicate event handling logic

* chore: fix lint

* test: improve test reliability

* chore: update contributors

* chore: release version v0.41.3

* refactor: stat use for over forEach (libp2p#295)

forEach is 10x slower than a regular for(;;) loop, and it should
be avoided in hot code paths.

* fix: avoid sync callback in async functions (libp2p#297)

* fix: avoid sync callback in async functions

* test: add error check

* refactor: clean up async usage

* chore: clean up

* refactor: remove async waterfall usage on identify

* chore: fix linting

* chore: update contributors

* chore: release version v0.41.4

* fix: peerBook undefined libp2p#299

* fix: reduce bundle size (libp2p#292)

* fix: reduce bundle size

* fix: use bignumber everywhere

* chore: update deps

* chore: update contributors

* chore: release version v0.41.5

* fix: import async/setImmediate to avoid webpack errors (libp2p#303)

* test: add pull-mplex to test suite (libp2p#305)

* chore: use travis
* chore: update dependencies

* fix: dial in series until we have proper abort support (libp2p#306)

refactor: simplify the circuit dial logic

chore: remove travis windows cache

refactor: clean up dial many error logic

test: explicitly set correct address

test(refactor): update order of echo logic and add after

refactor: cleanup per feedback

* chore: update contributors

* chore: release version v0.41.6

* fix: peer disconnect event and improve logging performance (libp2p#309)

* fix: only emit disconnects from muxed conns

* fix: update disconnect logic

* chore: clean up logging to prevent unneeded string formatting

* chore: fix spelling

* chore: update contributors

* chore: release version v0.41.7

* feat: add basic dial queue to avoid many connections to peer (libp2p#310)

BREAKING CHANGE: This adds a very basic dial queue peer peer.
This will prevent multiple, simultaneous dial requests to the same
peer from creating multiple connections. The requests will be queued
per peer, and will leverage the same connection when possible.
The breaking change here is that `.dial`, will no longer return a
connection. js-libp2p, circuit relay, and kad-dht, which use `.dial`
were not using the returned connection. So while this is a breaking change
it should not break the existing libp2p stack. If custom applications
are leveraging the returned connection, they will need to convert to only
using the connection returned via the callback.

* chore: dont log priviatized unless it actually happened
* refactor: only get our addresses for filtering once

* feat: update identify to include supported protocols (libp2p#311)

* chore: update contributors

* chore: release version v0.42.0

* fix: ensure dials always use the latest PeerInfo from the PeerBook (libp2p#312)

* fix: ensure dials always use the latest PeerInfo from the PeerBook

This fixes an issue where if dial is called with a new instance
of PeerInfo, if it is the first dial to that peer, the queue was
forever associated with that instance. This is currently the case
when Circuit checks the HOP status of a potential relay. This ensures
that whenever we dial, we are updating the peer book and using the
latest PeerInfo in that dial request.

* test: add test for get peer info

* refactor: just use id with dialer queue

* chore: update contributors

* chore: release version v0.42.1

* fix: identify on dial (libp2p#313)

* chore: update contributors

* chore: release version v0.42.2

* feat: global dial queue (libp2p#314)

* feat: add a general queue to limit all dials

* fix: improve queue count logic and add better abort

* feat: add a basic blacklist

* fix: abort dial queue on error instead of stop

* feat: add a crude priority lane

* test: add test for blacklist error

* fix: make blacklist and max dials configurable

* refactor: blacklist after callback

* test: improve testings around blacklisting

* chore: update contributors

* chore: release version v0.42.3

* fix: improve dial queue and parallel dials (libp2p#315)

* feat: allow dialer queues to do many requests to a peer

* fix: parallel dials and validate cancelled conns

* feat: make dial timeout configurable

* fix: allow already connected peers to dial immediately

* refactor: add dial timeout to consts file

* fix: keep better track of in progress queues

* refactor: make dials race

* chore: update contributors

* chore: release version v0.42.4

* feat: limit the number of cold calls we can do (libp2p#316)

* feat: limit the number of cold calls we can do

* feat: add a backoff to blacklisting

* refactor: make cold calls configurable

* fix: make blacklist duration longer

* fix: improve blacklisting

* test: add some tests for queue

* feat: add jitter to blacklist ttl

* test: validate cold queue is removed

* feat: purge old queues every hour

* test: fix aegir post script node shutdown

* fix: abort the cold call queue on manager abort

* fix: improve queue cleanup and lower interval to 15 mins

* fix: improve connection tracking (libp2p#318)

* fix: centralize connection events and peer connects

* fix: remove unneeded peerBook put

* chore: update contributors

* chore: release version v0.42.5

* fix: dont blacklist good peers (libp2p#319)

* fix: revert to try each (libp2p#320)

* chore: update contributors

* chore: release version v0.42.6

* fix: missing queue (libp2p#323)

* fix: improve stopping logic (libp2p#324)

* chore: update contributors

* chore: release version v0.42.7

* chore: add discourse badge (libp2p#327)

* fix: dial self (libp2p#329)

* feat: support a priority queue for dials (libp2p#325)

* chore: update contributors

* chore: release version v0.42.8

* fix: dont compare empty strings (libp2p#330)

* chore: update contributors

* chore: release version v0.42.9

* fix: resolve transport sort order in browsers (libp2p#333)

* fix: resolve transport sort order in browsers

* fix: update sort logic

* fix: dont use peerinfo distinct (libp2p#334)

* fix: dont use peerinfo distinct

* refactor: remove unneeded code

* refactor: clean up

* refactor: fix feedback

* chore: update contributors

* chore: release version v0.42.10

* fix(stats): prevent 0ms timeDiff breaking movingAverage (libp2p#336)

* stats - stat - prevent 0ms timeDiff breaking movingAverage

* chore: remove commitlint

* chore: update contributors

* chore: release version v0.42.11

* fix: dont blindly add observed addresses to our list (libp2p#337)

Until we can properly validate the observed address our
peer tells us about, we shouldnt blindly add it to our
address list. Until we have better NAT management we cant
reliably validate that we're adding an appropriate address
for ourselves.

* fix: clear blacklist for peer when connection is established (libp2p#340)

* chore: update contributors

* chore: release version v0.42.12

* refactor: move switch into src/switch

* refactor: cleanup switch and move tests into test dir
maschad pushed a commit to maschad/js-libp2p that referenced this issue Jun 21, 2023
…bp2p#300)

Create a private key object from the raw `Ed25519` private key and
export it as a JWK to obtain the public key.

Fixes libp2p#295
maschad pushed a commit to maschad/js-libp2p that referenced this issue Jun 21, 2023
## [1.0.12](libp2p/js-libp2p-crypto@v1.0.11...v1.0.12) (2023-02-08)

### Bug Fixes

* derive ed25519 public key from private key using node crypto ([libp2p#300](libp2p/js-libp2p-crypto#300)) ([874f820](libp2p/js-libp2p-crypto@874f820)), closes [libp2p#295](libp2p/js-libp2p-crypto#295)

### Trivial Changes

* replace err-code with CodeError ([libp2p#293](libp2p/js-libp2p-crypto#293)) ([4398cf6](libp2p/js-libp2p-crypto@4398cf6)), closes [js-libp2p#1269](libp2p#1269)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exp/expert Having worked on the specific codebase is important kind/bug A bug in existing code (including security flaws) P1 High: Likely tackled by core team if no one steps up
Projects
None yet
Development

No branches or pull requests

3 participants