Skip to content
This repository has been archived by the owner on Aug 2, 2021. It is now read-only.

simple fetchers #1492

Merged
merged 17 commits into from
Jun 18, 2019
Merged

simple fetchers #1492

merged 17 commits into from
Jun 18, 2019

Conversation

nonsense
Copy link
Contributor

@nonsense nonsense commented Jun 17, 2019

Related discussion: #1344


This PR is a subset of the simplify-fetchers branch for easier review and mostly addressing the issues described at #1309

It includes:

  1. A refactor of NetStore, where we replace FetchFunc with two new functions introduced are:
	NetStore.RemoteGet = delivery.RequestFromPeers
	NetStore.RemoteFetch

You probably want to start reviewing this PR by checking them out and checking where they are used.

  1. swarm/network/fetcher.go has been removed and replaced with singleflight.Group. Every time the Netstore.Get looks for chunk that is not in our LocalStore, it uses the singleflight.Group in order to suppress duplicate retrievals for the same chunk.

Each in-flight request for a chunk adds a Fetcher value to a map, so that the NetStore.Get receives a signal that the chunk it requested has been delivered. The signal that a chunk is delivered and stored to the LocalStore is the closing of the Fetcher.Delivered channel.

NetStore.Put is responsible for removing fetchers and closing the delivered channel. If a fetcher item has been added to the fetchers map, then we must get that chunk and there is an interested party in the delivered signal.

We need to add a timeout to the fetchers, because even though a chunk has been requested and an actor waits for it, doesn't mean it will be delivered.

  1. handleOfferedHashesMsg uses the Netstore.Has to determine if it needs a chunk (while syncing). If a chunk is needed, then a Fetcher is also created, so that we keep track while the current batch is being delivered.

It is possible for a chunk to be both requested through a retrieve request and delivered through syncing independently, but there would always be only one Fetcher and only one Delivered channel for it so that interested parties are notified when the chunk is delivered and stored.

  1. // * first it tries to request explicitly from peers that are known to have offered the chunk - this part of the functionality of RequestFromPeers has been removed. We no longer control the flow through values in the context.Context.

  2. RequestTimeout has been split into RequestTimeout and FailedPeerSkipDelay as these have different meanings. All timeouts are now placed in the timeouts package, and have documentation.

  3. Added LNetStore (we probably need a better name) - a NetStore wrapper needed for the LazyChunkReader.

  4. Found a bug in OfferedHashes where we request the same chunk from multiple peers via the OfferedHashes/WantedHashes protocol. In a future PR we will address it the following way: if we have requested a chunk, we have a fetcher for it, so subsequent OfferedHashes won't have an effect.

  5. Solved bug where the interval passed to SubscriptionPull is exclusive, meaning we lose one chunk between batches.

  6. Solved a bug with chunk deliveries described at Fix tracing for chunk delivery #1292 - chunks are delivered, but fetcher continues to make requests for the same chunk.


TODO TESTS:

  • fix broken unit tests, and remove irrelevant unit tests.
  • review and fix streamer_test.go tests
    • review and fix TestStreamerDownstream*
  • TestDeliveryFromNodes 8_200 seems to be flaky on macOS. TestDeliveryFromNodes is removed altogether, because we no longer blindly forward requests to our peers - FindPeer was changed.
  • review and fix flaky TestRetrieval
    • make sure it works when retrieval is done after syncing completes
    • make sure it works when retrieval is done before syncing completes
  • review and fix flaky TestFileRetrieval

TODO:

  • fix log levels - currently chunk not found results in errors, which is not right.
  • address all new todos and fixme leftovers in the change set
  • add global timeout for fetchers - if a chunk is never delivered, we don't want to keep a fetcher around forever - for this we need proper lifecycle and ownership handling for fetchers, something that we don't have right now.

NICE TO HAVE:

  • RequestFromPeers quit channel - notify that a peer was disconnected and we don't have to wait for timeouts.SearchTimeout - we can immediately call next peer.
  • add test for FindPeer - it has at least one new condition - not going out of depth - Add test for FindPeer #1362

@nonsense nonsense requested a review from zelig June 17, 2019 08:32
@nonsense nonsense added this to Backlog in Swarm Core - Sprint planning via automation Jun 17, 2019
@nonsense nonsense moved this from Backlog to In review (includes Documentation) in Swarm Core - Sprint planning Jun 17, 2019
@nonsense nonsense added this to the 0.4.4 milestone Jun 17, 2019
return wait

return loaded, func(ctx context.Context) error {
select {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so should we not select on the ctx.Done?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this matter? Tests seem to be passing with the current select?

if wait := c.NeedData(ctx, hash); wait != nil {
log.Trace("checking offered hash", "ref", fmt.Sprintf("%x", hash))

if _, wait := c.NeedData(ctx, hash); wait != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we are still asking all offering peers, not just the first. note TODO here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we are still asking all peers. We decided that we will be addressing this later, in order to reduce the scope of an already big PR.

func (s *SwarmSyncerClient) NeedData(ctx context.Context, key []byte) (loaded bool, wait func(context.Context) error) {
start := time.Now()

fi, loaded, ok := s.netStore.GetOrCreateFetcher(ctx, key, "syncer")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO this logic should be part of netstore.Get no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this question. Why do you want this part of Netstore.Get? In order to adhere to the interface of the syncer (which this PR is not addressing), we need to know how long to wait for a chunk, hence the returned Fetcher, and the select on <-fi.Delivered below.

If this is part of Netstore.Get, we are just pushing this complexity to the Netstore.Get, I don't really see why this is an improvement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave as it is at the moment, @zelig I've also discussed this at length with @nonsense, there's no benefit of having it here or there. I vote leave as is because the whole syncer part is gonna be heavily simplified anyway and this might as well go into that too

)

// LNetStore is a wrapper of NetStore, which implements the chunk.Store interface. It is used only by the FileStore,
// the component used by the Swarm API to store and retrieve content and to split and join chunks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO we should simplify this. AFAIU This extra object is only needed because Origin is not handled via peers to skip.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you propose we simplify this?

This extra struct LNetStore is needed due to the FileStore, which has no notion of any network requests, or the NetStore, it just expects a simple Get interface to return chunks. We want to be able to test the FileStore independently of the NetStore.

Any concrete ideas are welcome.

@acud acud merged commit d589af1 into master Jun 18, 2019
Swarm Core - Sprint planning automation moved this from In review (includes Documentation) to Done Jun 18, 2019
@nonsense nonsense modified the milestones: 0.4.4, 0.4.2 Jun 18, 2019
@nonsense nonsense deleted the simple-fetchers branch July 8, 2019 08:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants