Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency between implementations on connect #327

Closed
vasco-santos opened this issue Feb 19, 2019 · 1 comment
Closed

Inconsistency between implementations on connect #327

vasco-santos opened this issue Feb 19, 2019 · 1 comment
Labels
exp/expert Having worked on the specific codebase is important kind/bug A bug in existing code (including security flaws) P2 Medium: Good to have, but can wait until someone steps up status/ready Ready to be worked

Comments

@vasco-santos
Copy link
Member

  • Version: 0.25.0-rc.0
  • Platform: Mac os
  • Subsystem: Dial

Type: Inconsistency between implementations

Severity: Medium

Description:

In the context of creating interop tests for guaranteeing that peers implemented in a specific language are able to connect with other nodes, regardless of their implementation language, I found out a different behavior when connecting two JS nodes.

Connecting JS -> Go, Go -> JS and Go -> Go all work fine. However, when connecting JS A -> JS B, JS B takes some time to have its known peers updated.

Testing: https://github.com/libp2p/interop/pull/4/files#diff-cfbb48e0364acbf79524761f7b1b125fR53

@jacobheun jacobheun added kind/bug A bug in existing code (including security flaws) exp/expert Having worked on the specific codebase is important status/ready Ready to be worked P2 Medium: Good to have, but can wait until someone steps up labels Feb 19, 2019
jacobheun pushed a commit to jacobheun/js-libp2p that referenced this issue Jul 29, 2019
jacobheun added a commit to jacobheun/js-libp2p that referenced this issue Jul 29, 2019
* chore: update contributors

* chore: release version v0.29.0

* fix: move emitters to last thing in the method (libp2p#218)

* fix: move emitters to last thing in the method

* fix: setImmediate everything

* chore: update contributors

* chore: release version v0.29.1

* fix: move 'pull-stream' from devDependencies to dependencies (libp2p#220)

'pull-stream' package is needed in dependencies because it is used in './src/limit-dialer/queue.js'.

* chore: update deps

* chore: update contributors

* chore: release version v0.29.2

* feat: dial to PeerId and/or Multiaddr in addition to PeerInfo (libp2p#222)

* chore: update deps

* feat: support dial to peerId and/or multiaddr in adition to peerInfo

* chore: update CI

* chore: update contributors

* chore: release version v0.30.0

* chore: no sauce

* chore: update deps

* chore: update contributors

* chore: release version v0.31.0

* fix: use the right callback

* chore: update deps

* chore: update contributors

* chore: release version v0.31.1

* feat: increase maxListeners to Infinity (libp2p#226)

* feat: increase maxListeners to Infinity

ipfs/js-ipfs-bitswap#142 (comment)

* fix linting

* chore: update deps

* chore: update contributors

* chore: release version v0.31.2

* feat: p2p addrs situation (libp2p#229)

* chore: update gitignore and CI

* chore: update deps

* test: update tests to new p2p-webrtc-star multiaddr format

* chore: update contributors

* chore: release version v0.32.0

* chore: update deps

* chore: update contributors

* chore: release version v0.32.1

* fix: remove unused protocol-buffers dep (libp2p#230)

* chore: update contributors

* chore: release version v0.32.2

* chore: update deps

* chore: update contributors

* chore: release version v0.32.3

* chore: update deps

* fix: increase dial timeout

* chore: update contributors

* chore: release version v0.32.4

* feat: Circuit Relay (libp2p#224)

* chore: update deps

* chore: update contributors

* chore: release version v0.33.0

* fix: don't dial on relay if not enabled (libp2p#234)

* chore: update deps

* chore: fix package.json

* chore: update contributors

* chore: release version v0.33.1

* chore: update deps

* fix: don't dial circuit if no transports available (libp2p#236)

* chore: update contributors

* chore: release version v0.33.2

* fix: circuit dialing

* feat: fix circuit dialing

* chore: upgrade deps

* chore: update circle ci config

* chore: adding missing dev dependency

* fix: removing unused dependency

* test: adding tests

* fix: remove unused dep

* chore: updating CI files (libp2p#238)

* chore: update contributors

* chore: release version v0.34.0

* chore: use latest SECIO API

* chore: update deps

* feat: use latest secio API

* chore: update deps

* chore: update contributors

* chore: release version v0.35.0

* chore: update deps

* chore: update contributors

* chore: release version v0.35.1

* docs: update name references and API touches

* chore: update name references

* refactor: update name to switch, make it a class and rename start and stop methods

* test: refactor tcp transport tests to avoid code duplication

* test: reuse same test code for Websockets, remove code duplication

* test: update aegir pre and post hooks

* chore: use pre-push instead

* test: update and deduplicate code on stream muxing tests

* test: restructure test suits

* test: refactor swarm-no-muxing tests

* test: refactor circuit-relay tests

* test: refactor browser tests too

* style: fix linting

* fix: enableCircuitRelay is async and therefore needs a callback

* fix: transports.add does not need to be async at all

* docs: fix badges

* test: Linux does not like that we use multiple sockets with port 0

* test: fix test

* chore: update contributors

* chore: release version v0.36.0

* chore: update deps

* chore: update contributors

* chore: release version v0.36.1

* feat: use mplex, update CI

* docs: typo

* feat: observe traffic and expose statistics (libp2p#243)

* chore: update deps

* chore: update contributors

* chore: release version v0.37.0

* fix: for when handler func is not defined

* fix: for when peerinfo resolves to undefined

* chore: update contributors

* chore: release version v0.37.1

* chore: update deps

* chore: update contributors

* chore: release version v0.37.2

* fix: one more observer edge case

* chore: update deps

* chore: fix linting

* test: fix transport tests before all step by increasing the timeout

* chore: update contributors

* chore: release version v0.37.3

* chore: update deps

Chore which i think fixes this issue also
https://github.com/libp2p/js-libp2p-switch/issues/235

* fix: revert version back to the current release

fix for https://github.com/libp2p/js-libp2p-switch/pull/249/files#r178832198

* chore: update deps

* chore: update deps

* chore: update contributors

* chore: update deps

* test: timeout

* chore: update contributors

* chore: release version v0.39.0

* chore: update deps

* chore: update contributors

* chore: update deps

* chore: update contributors

* chore: release version v0.39.2

* feat: improve circuit err messages (libp2p#250)

* feat: improve circuit err handling

* feat: add test to to validate err when circuit not enabled

* refactor: update files and add jsdocs to improve readability

refactor: initial refactor of dial.js

refactor: add more jsdocs to dial and clean up some code

refactor: make get-peer-info more readable

fix: jsdocs in dial

docs: update some jsdocs

refactor: make dial.js a bit easier to consume

fix: fix linting

docs: add more jsdocs and comments

refactor: clean up dial methods and encryption order

* test: add tests for get-peer-info

* docs: remove answered todo comment

answered at libp2p/js-libp2p-switch#252 (comment)

* fix: dont create base conn when muxed exists

* fix: tests and conflicts

* chore: update deps

* chore: update contributors

* chore: release version v0.40.0

* test: fix require of multiplex

* fix: libp2p#189 Prevent self-dial

* test: add selfdial test

* chore: add lead maintainer

* chore: update contributors

* chore: update contributors

* chore: release version v0.40.1

* fix: return on call to nextMuxer

When the call to multistream.Dialer.select is unsuccessful, call nextMuxer to try select the next one in the list but do not continue executing callback afterwards.

License: MIT
Signed-off-by: Alan Shaw <alan@tableflip.io>

* fix: drop connection when stream ends unexpectedly

Pull streams pass true in the error position when the sream ends.
In https://github.com/multiformats/js-multistream-select/blob/5b19358b91850b528b3f93babd60d63ddcf56a99/src/select.js#L18-L21
...we're getting lots of instances of pull-length-prefixed stream
erroring early with `true` and it's passed back up to the dialer
in https://github.com/libp2p/js-libp2p-switch/blob/fef2d11850379a4720bb9c736236a81a067dc901/src/dial.js#L238-L241

The `_createMuxedConnection` contains an assumption that any error
that occurs when trying `_attemptMuxerUpgrade` is ok, and keeps the
relveant baseConnecton in the cache. If the pull-stream has ended
unexpectedly then keeping the connection arround starts causing
the "already piped" errors when we try and use the it later.

This PR adds a guard to avoid putting the connection back into the
cache if the stream has ended.

There is related work in an old PR to add a check for exactly this issue in
pull-length-prefixed dignifiedquire/pull-length-prefixed#8
...but it's still open, so this PR adds a check for true in
the error position at the site where the "already piped" errors
were appearing. Once the PR on pull-length-prefixed is merged this
check can be removed. It's not ideal to have it in this code as it
is far removed from the source, but it fixes the issue for now.

Arguably anywhere that `msDialer.handle` is called should do the
same check, but we're not seeing this error occur anywhere else so
to keep this PR small, I've left it as the minimal changeset to
fix the issue.

Of note, we had to add '/dns4/ws-star.discovery.libp2p.io/tcp/443/wss/p2p-websocket-star'
to the swarm config to trigger the "already piped" errors. There
is a minimal test app here https://github.com/tableflip/js-ipfs-already-piped-error

Manual testing shows ~50 streams fail in the first 2 mins of
running a node, and then things stabalise with ~90 active muxed
connections after that.

Fixes libp2p#235
Fixes ipfs/js-ipfs#1366
See dignifiedquire/pull-length-prefixed#8

License: MIT
Signed-off-by: Oli Evans <oli@tableflip.io>

* fix: add utility methods to prevent already piped error

* chore: update contributors

* chore: release version v0.40.2

* fix: prevent undefined error during a mutual hangup

* chore: update contributors

* chore: release version v0.40.3

* feat: swap quick-lru by hashlru

This removes the only dependency using generators in the ipfs/libp2p ecosystem.
Next version of create-react-app will support ipfs out-of-box with this change.

* chore: update contributors

* chore: release version v0.40.4

* fix: stats - observer expects protocolTag

* fix: re-enable stats tests in node

* chore: Upgrade big.js to 5.1.2

* chore: Change require('big.js') to require('big.js').Big

* chore: update contributors

* chore: release version v0.40.5

* fix: no stats on multistream proto dial

* fix: adjust test values

* fix: handle error in protocol handshake

* chore: update contributors

* chore: release version v0.40.6

* chore: remove travis and circleci

* Add private network support (libp2p#266)

* feat: add support for private networks

fix: update protector.protect usage
chore: fix linting and update deps
test: add secio to pnet tests
docs: add private network info the readme
chore: update pnet package version
test: add skipped test back in and update it

* fix: improve erroring around invalid peers

docs: add some comments
chore: update deps
test: simplify identify test

* chore: update contributors

* chore: release version v0.40.7

* test: add sample network circuit relay tests (libp2p#275)

* test: add sample network circuit relay tests

* test: use ephemeral ports

* chore: update deps

chore: remove test pre-push
chore: update test ports

* chore: update contributors

* chore: release version v0.40.8

* chore: update mplex and stats test numbers

* feat: make switch a state machine (libp2p#278)

* feat: add basic state machine functionality to switch

* feat: make connections state machines

* refactor: clean up logs

* feat: add dialFSM to the switch

* feat: add better support for closing connections

* test: add tests for some uncovered lines

* feat: add warning emitter for muxer upgrade failed

* docs: update readme

* chore: update contributors

* chore: release version v0.41.0

* fix: ignore dial request when one is in progress (libp2p#283)

* chore: update contributors

* chore: release version v0.41.1

* fix: improve connection closing and error handling (libp2p#285)

* fix: improve connection closing and error handling

* test: improve identify test

*  chore: update deps

* fix: only emit from connections if there is a listener

* test: add more connection tests

* chore: update libp2p-mplex

* fix: dont dial an address that we have

* fix: ensure circuit listens last on start

* chore: update npm publish files

* chore: update contributors

* chore: release version v0.41.2

* fix: use retimer to avoid creating so many timers (libp2p#289)

* use retimer to avoid scheduling so many timers

* Fixed linting

* fix: improve connection tracking and closing (libp2p#291)

* chore: update deps

* fix: check we have a proper transport before filtering addresses

* fix: improve connection close on stop

* fix: improve stat stopping

* test: fix stats test

* fix: improve tracking of open connections

* chore: remove log

* fix: stats stop in browser

chore: fix linting and browser tests

* fix: remove uneeded set peer info

* fix: abort the base connection on close

* fix: catch edge cases of dialTimeout calling back twice

* fix: close all connections instead of checking peerbook peers

* test: update dial fsm test waits

* test: make parallel dial tests deterministic

fix: improve logic around disconnecting

fix: remove duplicate event handling logic

* chore: fix lint

* test: improve test reliability

* chore: update contributors

* chore: release version v0.41.3

* refactor: stat use for over forEach (libp2p#295)

forEach is 10x slower than a regular for(;;) loop, and it should
be avoided in hot code paths.

* fix: avoid sync callback in async functions (libp2p#297)

* fix: avoid sync callback in async functions

* test: add error check

* refactor: clean up async usage

* chore: clean up

* refactor: remove async waterfall usage on identify

* chore: fix linting

* chore: update contributors

* chore: release version v0.41.4

* fix: peerBook undefined libp2p#299

* fix: reduce bundle size (libp2p#292)

* fix: reduce bundle size

* fix: use bignumber everywhere

* chore: update deps

* chore: update contributors

* chore: release version v0.41.5

* fix: import async/setImmediate to avoid webpack errors (libp2p#303)

* test: add pull-mplex to test suite (libp2p#305)

* chore: use travis
* chore: update dependencies

* fix: dial in series until we have proper abort support (libp2p#306)

refactor: simplify the circuit dial logic

chore: remove travis windows cache

refactor: clean up dial many error logic

test: explicitly set correct address

test(refactor): update order of echo logic and add after

refactor: cleanup per feedback

* chore: update contributors

* chore: release version v0.41.6

* fix: peer disconnect event and improve logging performance (libp2p#309)

* fix: only emit disconnects from muxed conns

* fix: update disconnect logic

* chore: clean up logging to prevent unneeded string formatting

* chore: fix spelling

* chore: update contributors

* chore: release version v0.41.7

* feat: add basic dial queue to avoid many connections to peer (libp2p#310)

BREAKING CHANGE: This adds a very basic dial queue peer peer.
This will prevent multiple, simultaneous dial requests to the same
peer from creating multiple connections. The requests will be queued
per peer, and will leverage the same connection when possible.
The breaking change here is that `.dial`, will no longer return a
connection. js-libp2p, circuit relay, and kad-dht, which use `.dial`
were not using the returned connection. So while this is a breaking change
it should not break the existing libp2p stack. If custom applications
are leveraging the returned connection, they will need to convert to only
using the connection returned via the callback.

* chore: dont log priviatized unless it actually happened
* refactor: only get our addresses for filtering once

* feat: update identify to include supported protocols (libp2p#311)

* chore: update contributors

* chore: release version v0.42.0

* fix: ensure dials always use the latest PeerInfo from the PeerBook (libp2p#312)

* fix: ensure dials always use the latest PeerInfo from the PeerBook

This fixes an issue where if dial is called with a new instance
of PeerInfo, if it is the first dial to that peer, the queue was
forever associated with that instance. This is currently the case
when Circuit checks the HOP status of a potential relay. This ensures
that whenever we dial, we are updating the peer book and using the
latest PeerInfo in that dial request.

* test: add test for get peer info

* refactor: just use id with dialer queue

* chore: update contributors

* chore: release version v0.42.1

* fix: identify on dial (libp2p#313)

* chore: update contributors

* chore: release version v0.42.2

* feat: global dial queue (libp2p#314)

* feat: add a general queue to limit all dials

* fix: improve queue count logic and add better abort

* feat: add a basic blacklist

* fix: abort dial queue on error instead of stop

* feat: add a crude priority lane

* test: add test for blacklist error

* fix: make blacklist and max dials configurable

* refactor: blacklist after callback

* test: improve testings around blacklisting

* chore: update contributors

* chore: release version v0.42.3

* fix: improve dial queue and parallel dials (libp2p#315)

* feat: allow dialer queues to do many requests to a peer

* fix: parallel dials and validate cancelled conns

* feat: make dial timeout configurable

* fix: allow already connected peers to dial immediately

* refactor: add dial timeout to consts file

* fix: keep better track of in progress queues

* refactor: make dials race

* chore: update contributors

* chore: release version v0.42.4

* feat: limit the number of cold calls we can do (libp2p#316)

* feat: limit the number of cold calls we can do

* feat: add a backoff to blacklisting

* refactor: make cold calls configurable

* fix: make blacklist duration longer

* fix: improve blacklisting

* test: add some tests for queue

* feat: add jitter to blacklist ttl

* test: validate cold queue is removed

* feat: purge old queues every hour

* test: fix aegir post script node shutdown

* fix: abort the cold call queue on manager abort

* fix: improve queue cleanup and lower interval to 15 mins

* fix: improve connection tracking (libp2p#318)

* fix: centralize connection events and peer connects

* fix: remove unneeded peerBook put

* chore: update contributors

* chore: release version v0.42.5

* fix: dont blacklist good peers (libp2p#319)

* fix: revert to try each (libp2p#320)

* chore: update contributors

* chore: release version v0.42.6

* fix: missing queue (libp2p#323)

* fix: improve stopping logic (libp2p#324)

* chore: update contributors

* chore: release version v0.42.7

* chore: add discourse badge (libp2p#327)

* fix: dial self (libp2p#329)

* feat: support a priority queue for dials (libp2p#325)

* chore: update contributors

* chore: release version v0.42.8

* fix: dont compare empty strings (libp2p#330)

* chore: update contributors

* chore: release version v0.42.9

* fix: resolve transport sort order in browsers (libp2p#333)

* fix: resolve transport sort order in browsers

* fix: update sort logic

* fix: dont use peerinfo distinct (libp2p#334)

* fix: dont use peerinfo distinct

* refactor: remove unneeded code

* refactor: clean up

* refactor: fix feedback

* chore: update contributors

* chore: release version v0.42.10

* fix(stats): prevent 0ms timeDiff breaking movingAverage (libp2p#336)

* stats - stat - prevent 0ms timeDiff breaking movingAverage

* chore: remove commitlint

* chore: update contributors

* chore: release version v0.42.11

* fix: dont blindly add observed addresses to our list (libp2p#337)

Until we can properly validate the observed address our
peer tells us about, we shouldnt blindly add it to our
address list. Until we have better NAT management we cant
reliably validate that we're adding an appropriate address
for ourselves.

* fix: clear blacklist for peer when connection is established (libp2p#340)

* chore: update contributors

* chore: release version v0.42.12

* refactor: move switch into src/switch

* refactor: cleanup switch and move tests into test dir
@achingbrain
Copy link
Member

Closing as complete - please re-open if the problem is still observed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exp/expert Having worked on the specific codebase is important kind/bug A bug in existing code (including security flaws) P2 Medium: Good to have, but can wait until someone steps up status/ready Ready to be worked
Projects
None yet
Development

No branches or pull requests

3 participants