Skip to content
This repository has been archived by the owner on Jun 26, 2023. It is now read-only.

WaitPub may wait more than necessary #38

Closed
schomatis opened this issue Dec 17, 2018 · 8 comments · Fixed by #53
Closed

WaitPub may wait more than necessary #38

schomatis opened this issue Dec 17, 2018 · 8 comments · Fixed by #53
Assignees
Labels
kind/bug A bug in existing code (including security flaws)

Comments

@schomatis
Copy link
Contributor

(I may not be completely following the Republisher logic here.) WaitPub sends a channel to a channel for Run to take it and signal through it that the publish has been issued,

go-mfs/system.go

Lines 258 to 260 in 2c1b835

wait := make(chan struct{})
p.pubnowch <- wait
<-wait

but the channel pubnowch through which the "inner" pubnowresp channel is being sent is processed in parallel with the publish timers,

go-mfs/system.go

Lines 299 to 307 in 2c1b835

case <-quick:
case <-longer:
case pubnowresp = <-np.pubnowch:
}
err := np.publish(np.ctx)
if pubnowresp != nil {
pubnowresp <- struct{}{}
}

with the intention (I'm assuming) that it will trigger an instantaneous publish without waiting for the timeouts (since someone is waiting for it).

Through this signaling mechanism there is the non-negligible chance (I think) that any of the timers will be processed instead of the pubnowch channel which would mean that the publish will indeed happen but the user will keep waiting an indefinite amount of time until another value is sent to the Republisher.

@schomatis schomatis added the kind/bug A bug in existing code (including security flaws) label Dec 17, 2018
@schomatis schomatis added this to the Review the MFS interface milestone Dec 17, 2018
@schomatis schomatis self-assigned this Dec 17, 2018
@schomatis
Copy link
Contributor Author

/cc @Stebalien

@Stebalien
Copy link
Member

Yep, that looks like a bug. Really, the bug starts in WaitPub when we release the lock. Once we release that lock, the value can be published at any time.

@schomatis
Copy link
Contributor Author

I think the easiest solution at the moment would be to remove WaitPub and use instead an exported version of the publish method, as we seem to be using WaitPub just to trigger a publish of any value (since there is a gap between the Update and WaitPub calls and the user just cares that the updated value or any superseding version was published).

This would entail adding a couple of checks in publish to detect if the Republisher has been closed and maybe also protecting the user-supplied PubFunc function (which I'm not sure if it's supposed to be thread-safe) but I think it would help to simplify the entire Republisher logic.

It would also help alleviate the concurrent WaitPub and Update issue where two (or more) users can update a value and wait for the publish simultaneously with the second value overriding the first which would cause only a single publish and it would leave a user locked in WaitPub (needing a new independent Update to unlock it).

@Stebalien WDYT?

@schomatis
Copy link
Contributor Author

(The changes would also take care of #35.)

@Stebalien
Copy link
Member

I think the easiest solution at the moment would be to remove WaitPub and use instead an exported version of the publish method, as we seem to be using WaitPub just to trigger a publish of any value (since there is a gap between the Update and WaitPub calls and the user just cares that the updated value or any superseding version was published).

The republisher is designed for IPNS where publishing is really expensive. For that, WaitPub has the right semantics.

The issue is that we're also using this to store the latest root on disk. In that case, I'd expect a call to Flush/Close (on a file) to flush all the way to disk synchronously (asking the user to then wait on the republisher has is pretty poor UX).

This would entail adding a couple of checks in publish to detect if the Republisher has been closed

The republisher doesn't currently have a concept of "closing" so I'm not sure what this would entail.

@schomatis
Copy link
Contributor Author

For that, WaitPub has the right semantics.

Could you expand on that please? My proposed solution doesn't entail generating more publish calls that we're already doing, the check in WaitPub would be moved to publish,

go-mfs/system.go

Lines 250 to 253 in 2c1b835

func (p *Republisher) WaitPub() {
p.lk.Lock()
consistent := p.lastpub == p.val
p.lk.Unlock()

which now would happen in the same lock acquisition already present (saving some contention and simplifying the code),

go-mfs/system.go

Lines 319 to 322 in 2c1b835

func (np *Republisher) publish(ctx context.Context) error {
np.lk.Lock()
topub := np.val
np.lk.Unlock()

In that case, I'd expect a call to Flush/Close (on a file) to flush all the way to disk synchronously (asking the user to then wait on the republisher has is pretty poor UX).

Agreed, but that seems a responsability of the consumer of MFS who supplied the PubFunc in the first place, that is, if I gave you an IPNS PubFunc I don't call WaitPub/Publish when flushing (I let the timer run) and if I supply a save-to-disk function then I do. But that distinction doesn't seem related to calling WaitPub or Publish (which could be renamed to PublishNow if it makes it more clear).

The republisher doesn't currently have a concept of "closing" so I'm not sure what this would entail.

go-mfs/system.go

Lines 263 to 267 in 2c1b835

func (p *Republisher) Close() error {
err := p.publish(p.ctx)
p.cancel()
return err
}

@Stebalien
Copy link
Member

Could you expand on that please? My proposed solution doesn't entail generating more publish calls that we're already doing, the check in WaitPub would be moved to publish,

Ah, got it. Yeah, that sounds like a good solution (it should also simplify this code quite a bit).

Agreed, but that seems a responsability of the consumer of MFS who supplied the PubFunc in the first place, that is, if I gave you an IPNS PubFunc I don't call WaitPub/Publish when flushing (I let the timer run) and if I supply a save-to-disk function then I do...

That's how this currently works but:

  1. We don't actually call WaitPub everywhere we should at the moment anyways. IMO, this is a strong indication of a footgun.
  2. The current system wont allow us to (easily) both record the MFS root on disk and periodically publish it via IPNS.

A simple alternative would be to:

  1. Have the Root call some OnChange function synchronously for every update.
  2. Allow the user to handle rate limiting themselves. They can pass in a debouncing republisher but they don't have to. For example:
republisher, onChange := NewDebouncedRepublisher(someCallback, shortTimeout, longTimeout)
root := NewRoot(nd, onChange)

(or something like that, onChange could also be a method on the republisher).

The republisher doesn't currently have a concept of "closing" so I'm not sure what this would entail.

Not sure how I missed that... SGTM.

@schomatis
Copy link
Contributor Author

That's how this currently works but:

  1. We don't actually call WaitPub everywhere we should at the moment anyways. IMO, this is a strong indication of a footgun.
  2. The current system wont allow us to (easily) both record the MFS root on disk and periodically publish it via IPNS.

A simple alternative would be to:

  1. Have the Root call some OnChange function synchronously for every update.
  2. Allow the user to handle rate limiting themselves. They can pass in a debouncing republisher but they don't have to. For example:
republisher, onChange := NewDebouncedRepublisher(someCallback, shortTimeout, longTimeout)
root := NewRoot(nd, onChange)

(or something like that, onChange could also be a method on the republisher).

Agreed, this sounds like a good proposal for #33.

Stebalien added a commit that referenced this issue Jan 8, 2019
Before, WaitPub could wait forever if a value was published at the same time as
the call to WaitPub.

This patch also avoids republishing the same value multiple times and allows
setting an initial value without reaching in and modifying internal state.

fixes #38
@ghost ghost assigned Stebalien Jan 8, 2019
Stebalien added a commit that referenced this issue Jan 12, 2019
Before, WaitPub could wait forever if a value was published at the same time as
the call to WaitPub.

This patch also avoids republishing the same value multiple times and allows
setting an initial value without reaching in and modifying internal state.

fixes #38
Stebalien added a commit that referenced this issue Jan 12, 2019
Before, WaitPub could wait forever if a value was published at the same time as
the call to WaitPub.

This patch also avoids republishing the same value multiple times and allows
setting an initial value without reaching in and modifying internal state.

fixes #38
Stebalien added a commit that referenced this issue Jan 12, 2019
Before, WaitPub could wait forever if a value was published at the same time as
the call to WaitPub.

This patch also avoids republishing the same value multiple times and allows
setting an initial value without reaching in and modifying internal state.

fixes #38
@ghost ghost removed the status/in-progress In progress label Jan 14, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug A bug in existing code (including security flaws)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants