Skip to content
This repository has been archived by the owner on Aug 2, 2021. It is now read-only.

pushsync with async pss #1782

Merged
merged 61 commits into from
Sep 25, 2019
Merged

pushsync with async pss #1782

merged 61 commits into from
Sep 25, 2019

Conversation

janos
Copy link
Member

@janos janos commented Sep 19, 2019

This PR represents the consolidated efforts on Push Syncing implementation based on spec https://hackmd.io/9eWxJ_MJS8i04onWg49UBA?both, mostly by @zelig @nonsense and @nolash (for pss).

The implementation package is pushsync and integration was done by updating other packages and making required changes in the codebase, most importantly to tags and pss package. Pss message handling was made async and tests are updated.

A backward incompatible change was made by removing --nosync cli flag and adding --sync-mode which does not enable push syncing by default. Using all or push as the value for the flag will enable push syncing.

This PR i branched from #1768 and updated with master code.

Original PR #1392.

nonsense and others added 28 commits September 19, 2019 12:08
- storer needs to take netstore not localstore to put the chunk
  so that fetchers created earlier could respond
storage/pushsync: update NetStore api from master

storage/pushsync: add sleeps and a bit more tracing, change subscription wait delay

network: try sending receipt only if there is no closer peer

storage/pushsync: use netstore, rather than localstore

storage/netstore: logs with node id

storage/pushsync: opentracing

storage/pushsync: propagate origin on receipts

is push synced - smoke test

network: kademlia closer peer

storage: very high retry timeout

measure send chunk

rlp timer

pss timers and goroutine sendChunk

traces for chunsk in localstore and subscribepush

try minbinsize: 3

increase search timeout ; move tracing before nil check

link netstore and delivery

rename tag New to tag Create

swarm-smoke: fix check max prox hosts for pull/push sync modes

tag roundtrip wip

more logs

kad as part of pusher?

kad to storer; revert pss change

metrics to pss

outbox len metric

emit metrics once every 10sec.

pss: refactor

storage/pushsync: closer than me trace

chunk: adding XOR comparison

disable pushsync tests, as they are not setting up kad properly
    - WaitTillDone does check initially
    - introduce IncN to increment with n
    - api: inspector to use tag.Done
    - unexport context, span, and Context() to be used by http server
    - minor improvements in logging
    - calls on Kademlia directly on Pss struct
    - add IsClosestTo function to pubsub using kademlia.IsClosestTo
    - move package from storage to toplevel
    - only closest peer to address returns a receipt
    - IsClosestTo(addr) is now part of the PubSub interface
    - rename TestPushSyncAndStoreWithLoopback to TestProtocol
    - for TestPusher and , IsClosest is mocked properly
    - remove Origin field from receipt message
    - pushed item remembers first and last sent time
    - retryInterval is dynamically set as 2 * average roundtrip (excluding outliers)
    - early send check is removed from unit test
    - receipts channel is just []byte for address
    - we are using mutli-set for setting synced status on chunks
    - tag increment now follows multiset and not incorrectly when the receipt arrives
      for this remember syncedItems is needed alongside syncedAddrs
    - in pusher sync loop, we use static context
    - correctly unsubscribe DB.SubscribePush
    - if pusher is closest node to a chunk, chunk is not sent using pubsub, but receipt is
      directly sent to self using a shortcut
    - therefore no self send is needed in storer
    - loopBack pubsub is shared in the protocol unit test but wrapped in different testPubSub structs
      to control the behaviour of IsClosestTo function
    - added delayResponse to TestProtocol too
    - testPushSyncIndex SubscribePush now increments tag StateStored correctly only first time
    - fix checkTags to wait for synced status (it caused occasional flakiness)
    - extract simulation test parameters (nodes/chunks/testcases) as command line flags, default is 16/16/16
  - introduce custom logger on pusher/storer
  - dynamic setting of retryinterval extracted
  - timeout changed back
  - fix and use testutil.checkTags
  - use tag context
  - tags.Create and NewTag has context argument
  - fix pusher.Close before localstore
 - changed order of select cases
 - have Tests on top followed by helpers
 - add sent,synced check to testutil.CheckTag
 - add timeout to close wait
 - adapt to testutil.Init()
 - length protection to label helper
 - resolve rebase conflicts
 - simplify simulation using error group
 - make pss message handling asynchronous
 - simulation NewBzzInProc hive disable autoconnect (to check)
 - pss outbox go routine leak
Copy link
Member

@zelig zelig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some important pss changes are lost

pss/pss.go Outdated
@@ -290,6 +292,10 @@ func New(k *network.Kademlia, params *Params) (*Pss, error) {
cp.Set(capabilitiesForward)
}
k.Capabilities.Add(cp)
for i := 0; i < hasherCount; i++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will cause flaky tests. this is a bug. see here https://github.com/ethersphere/swarm/pull/1392/files#diff-6109308aa1312b43e9a8d048e2f2d1ffR271
and maybe double check other changes on my branch

@nonsense
Copy link
Contributor

@janos and me went line by line over the PR, and considering the tests we ran over the last few days, and the time pressure we are under, we think it is OK to merge this now, and iterate on improvements on master.

Push Sync is working under the same performance requirements we have for Pull Sync, but obviously there is room for improvement both in terms of code simplicity and testing, as well as in performance.

Copy link
Member

@zelig zelig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks solid to me :)

I meticulously went through the changes.

  • the tag / request context is corrected now, no regression in tests (9->8)
  • in pushsync handleReceipt, pushReceipt does not need to be called in go routine if pss is async, right?

thanks for the great work guys

  • retry logic is reverted to the original, that is ok.

return err
}
p.logger.Trace("handleReceiptMsg", "receipt", hex.EncodeToString(receipt.Addr))
go p.pushReceipt(receipt.Addr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with async pss this is not needed i think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had worse results without it, even if recent pss changes suggest that it is not needed. We did not get much deeper why is that the case.

@janos janos merged commit d29dfb1 into master Sep 25, 2019
@janos janos deleted the pushsync-pss-3 branch September 25, 2019 14:36
@acud acud restored the pushsync-pss-3 branch October 1, 2019 16:25
@nonsense nonsense deleted the pushsync-pss-3 branch October 7, 2019 09:15
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants