`WaitPub` may wait more than necessary #38

schomatis · 2018-12-17T03:52:02Z

(I may not be completely following the Republisher logic here.) WaitPub sends a channel to a channel for Run to take it and signal through it that the publish has been issued,

go-mfs/system.go

Lines 258 to 260 in 2c1b835

    
           wait := make(chan struct{}) 
        
           p.pubnowch <- wait 
        
           <-wait

but the channel pubnowch through which the "inner" pubnowresp channel is being sent is processed in parallel with the publish timers,

go-mfs/system.go

Lines 299 to 307 in 2c1b835

    
           case <-quick: 
        
           case <-longer: 
        
           case pubnowresp = <-np.pubnowch: 
        
           } 
        
           err := np.publish(np.ctx) 
        
           if pubnowresp != nil { 
        
           	pubnowresp <- struct{}{} 
        
           }

with the intention (I'm assuming) that it will trigger an instantaneous publish without waiting for the timeouts (since someone is waiting for it).

Through this signaling mechanism there is the non-negligible chance (I think) that any of the timers will be processed instead of the pubnowch channel which would mean that the publish will indeed happen but the user will keep waiting an indefinite amount of time until another value is sent to the Republisher.

The text was updated successfully, but these errors were encountered:

schomatis · 2018-12-17T03:52:10Z

/cc @Stebalien

Stebalien · 2018-12-17T09:21:47Z

Yep, that looks like a bug. Really, the bug starts in WaitPub when we release the lock. Once we release that lock, the value can be published at any time.

schomatis · 2018-12-21T18:00:15Z

I think the easiest solution at the moment would be to remove WaitPub and use instead an exported version of the publish method, as we seem to be using WaitPub just to trigger a publish of any value (since there is a gap between the Update and WaitPub calls and the user just cares that the updated value or any superseding version was published).

This would entail adding a couple of checks in publish to detect if the Republisher has been closed and maybe also protecting the user-supplied PubFunc function (which I'm not sure if it's supposed to be thread-safe) but I think it would help to simplify the entire Republisher logic.

It would also help alleviate the concurrent WaitPub and Update issue where two (or more) users can update a value and wait for the publish simultaneously with the second value overriding the first which would cause only a single publish and it would leave a user locked in WaitPub (needing a new independent Update to unlock it).

@Stebalien WDYT?

schomatis · 2018-12-21T18:01:54Z

(The changes would also take care of #35.)

Stebalien · 2018-12-21T18:30:17Z

I think the easiest solution at the moment would be to remove WaitPub and use instead an exported version of the publish method, as we seem to be using WaitPub just to trigger a publish of any value (since there is a gap between the Update and WaitPub calls and the user just cares that the updated value or any superseding version was published).

The republisher is designed for IPNS where publishing is really expensive. For that, WaitPub has the right semantics.

The issue is that we're also using this to store the latest root on disk. In that case, I'd expect a call to Flush/Close (on a file) to flush all the way to disk synchronously (asking the user to then wait on the republisher has is pretty poor UX).

This would entail adding a couple of checks in publish to detect if the Republisher has been closed

The republisher doesn't currently have a concept of "closing" so I'm not sure what this would entail.

schomatis · 2018-12-21T18:47:30Z

For that, WaitPub has the right semantics.

Could you expand on that please? My proposed solution doesn't entail generating more publish calls that we're already doing, the check in WaitPub would be moved to publish,

go-mfs/system.go

Lines 250 to 253 in 2c1b835

    
           func (p *Republisher) WaitPub() { 
        
           	p.lk.Lock() 
        
           	consistent := p.lastpub == p.val 
        
           	p.lk.Unlock()

which now would happen in the same lock acquisition already present (saving some contention and simplifying the code),

go-mfs/system.go

Lines 319 to 322 in 2c1b835

    
           func (np *Republisher) publish(ctx context.Context) error { 
        
           	np.lk.Lock() 
        
           	topub := np.val 
        
           	np.lk.Unlock()

In that case, I'd expect a call to Flush/Close (on a file) to flush all the way to disk synchronously (asking the user to then wait on the republisher has is pretty poor UX).

Agreed, but that seems a responsability of the consumer of MFS who supplied the PubFunc in the first place, that is, if I gave you an IPNS PubFunc I don't call WaitPub/Publish when flushing (I let the timer run) and if I supply a save-to-disk function then I do. But that distinction doesn't seem related to calling WaitPub or Publish (which could be renamed to PublishNow if it makes it more clear).

The republisher doesn't currently have a concept of "closing" so I'm not sure what this would entail.

go-mfs/system.go

Lines 263 to 267 in 2c1b835

    
           func (p *Republisher) Close() error { 
        
           	err := p.publish(p.ctx) 
        
           	p.cancel() 
        
           	return err 
        
           }

Stebalien · 2018-12-27T20:22:55Z

Could you expand on that please? My proposed solution doesn't entail generating more publish calls that we're already doing, the check in WaitPub would be moved to publish,

Ah, got it. Yeah, that sounds like a good solution (it should also simplify this code quite a bit).

Agreed, but that seems a responsability of the consumer of MFS who supplied the PubFunc in the first place, that is, if I gave you an IPNS PubFunc I don't call WaitPub/Publish when flushing (I let the timer run) and if I supply a save-to-disk function then I do...

That's how this currently works but:

We don't actually call WaitPub everywhere we should at the moment anyways. IMO, this is a strong indication of a footgun.
The current system wont allow us to (easily) both record the MFS root on disk and periodically publish it via IPNS.

A simple alternative would be to:

Have the Root call some OnChange function synchronously for every update.
Allow the user to handle rate limiting themselves. They can pass in a debouncing republisher but they don't have to. For example:

republisher, onChange := NewDebouncedRepublisher(someCallback, shortTimeout, longTimeout)
root := NewRoot(nd, onChange)

(or something like that, onChange could also be a method on the republisher).

The republisher doesn't currently have a concept of "closing" so I'm not sure what this would entail.

Not sure how I missed that... SGTM.

schomatis · 2018-12-30T20:28:16Z

That's how this currently works but:

We don't actually call WaitPub everywhere we should at the moment anyways. IMO, this is a strong indication of a footgun.

The current system wont allow us to (easily) both record the MFS root on disk and periodically publish it via IPNS.

A simple alternative would be to:

Have the Root call some OnChange function synchronously for every update.

Allow the user to handle rate limiting themselves. They can pass in a debouncing republisher but they don't have to. For example:
republisher, onChange := NewDebouncedRepublisher(someCallback, shortTimeout, longTimeout)
root := NewRoot(nd, onChange)
(or something like that, onChange could also be a method on the republisher).

Agreed, this sounds like a good proposal for #33.

Before, WaitPub could wait forever if a value was published at the same time as the call to WaitPub. This patch also avoids republishing the same value multiple times and allows setting an initial value without reaching in and modifying internal state. fixes #38

schomatis added the kind/bug A bug in existing code (including security flaws) label Dec 17, 2018

schomatis added this to the Review the MFS interface milestone Dec 17, 2018

schomatis self-assigned this Dec 17, 2018

schomatis mentioned this issue Dec 20, 2018

Evaluate the possibility of using an event handler based system, not a republisher based system #33

Closed

schomatis mentioned this issue Dec 21, 2018

[WIP] remove waitpub, export publish #47

Closed

ghost added the status/in-progress In progress label Dec 21, 2018

Stebalien mentioned this issue Jan 8, 2019

fix over-wait in WaitPub #53

Merged

ghost assigned Stebalien Jan 8, 2019

schomatis closed this as completed in #53 Jan 14, 2019

ghost removed the status/in-progress In progress label Jan 14, 2019

magik6k mentioned this issue Jan 14, 2019

[WIP] Unixfs: enforce refs on files when using nocopy ipfs/kubo#5925

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`WaitPub` may wait more than necessary #38

`WaitPub` may wait more than necessary #38

schomatis commented Dec 17, 2018

schomatis commented Dec 17, 2018

Stebalien commented Dec 17, 2018

schomatis commented Dec 21, 2018

schomatis commented Dec 21, 2018

Stebalien commented Dec 21, 2018

schomatis commented Dec 21, 2018

Stebalien commented Dec 27, 2018

schomatis commented Dec 30, 2018

WaitPub may wait more than necessary #38

WaitPub may wait more than necessary #38

Comments

schomatis commented Dec 17, 2018

schomatis commented Dec 17, 2018

Stebalien commented Dec 17, 2018

schomatis commented Dec 21, 2018

schomatis commented Dec 21, 2018

Stebalien commented Dec 21, 2018

schomatis commented Dec 21, 2018

Stebalien commented Dec 27, 2018

schomatis commented Dec 30, 2018

`WaitPub` may wait more than necessary #38

`WaitPub` may wait more than necessary #38