lock.Lock() returns without the lock way before LockWaitTime #4003

laurentganne · 2018-03-29T19:17:56Z

Description of the Issue (and unexpected/desired result)

Using the lock api with these LockOptions :

LockTryOnce: true,
LockWaitTime: a duration of 45 seconds,

would expect that a call to lock.Lock(nil) would attempt to acquire the lock for 45 seconds before returning without having succeeded.

But actually, in a multi-threaded environment where many threads attempted to acquire the lock, after 5 seconds many of them could get the lock but several threads saw lock.Lock(nil) return nil,nil.
Although only 5 seconds had passed, and the LockWaitTime specified was 45 seconds.

Looking at the code, in the following lines
at https://github.com/hashicorp/consul/blob/v1.0.6/api/lock.go#L181-L189

181	// Handle the one-shot mode.
182	if l.opts.LockTryOnce && attempts > 0 {
183		elapsed := time.Since(start)
184		if elapsed > qOpts.WaitTime {
185			return nil, nil
186		}
187
188		qOpts.WaitTime -= elapsed
189 }

Line 184, the elapsed time since start is compared to qOpts.WaitTime, which seems correct.
But the issue is that line 188, qOpts.WaitTime is lowered by substracting the elapsed time.
Let's say the elapsed time since start is around 5 seconds, after each attempt which can be quick,
you will substract around 5 seconds to qOpts.WaitTime.
So elapsed time will be > qOpts.WaitTime way sooner than expected.

I think a fix should be to replace line 184 :

184		if elapsed > qOpts.WaitTime {

by :

184		if elapsed > l.opts.LockWaitTime {

to compare the elapsed time with the LockWaitTime provided by the user.

`consul version` for both Client and Server

Using the latest version 1.0.6

The text was updated successfully, but these errors were encountered:

benagricola · 2018-07-05T16:14:58Z

Am running into this as well using consul lock (the semaphore usage with multiple locks), and using much longer wait times (8h for example).

I think your fix works in that it will stop the consul lock process from returning early - however the qOpts are still set to an invalid value (?wait= has a 10 minute maximum, according to the api doco at https://www.consul.io/api/index.html).

I'm not sure what the correct behaviour is here. consul lock will happily wait for 8h with this fix in, and there seems to be no issue with the long polling as I suspect it is just set to 10m if the value is too high - however, this means that towards the end of the timeout (in the last 10 minutes) it's possible for consul lock to wait for more than the timeout, as the 'wait' var will always be 10 (or might default to 5).

Suspect this is why WaitTime was being decremented, albeit incorrectly, so that the long poll will return 'on time' if the remaining time is less than 10 minutes.

banks · 2018-07-10T14:09:20Z

@benagricola the documentation for LockWaitTime (at least in code) points out that it is a tradeoff that affects how soon you might return if lock is not acquired.

Note that the timeout specified is used as the wait param in the blocking query, but Lock also loops around until it's expired. So in your example, each blocking query will only block for 10 mins, but the Lock() method won't return unless it either acquires the lock or the full 8 hours elapses.

I think the wait time decrementing was just a bug with no tests to catch it not an intentional way to mitigate the fact that you can't separate the individual blocking request time (and hence max delay before returning) and the actual time the lock attempt will block for.

The fix in the PR should make this behaviour work correctly though - the last blocking query when there is less than 10 mins of timeout left will subtract elapsed from 8hrs and get back the smaller wait time required to unblock right at the 8hr elapsed mark.

laurentganne mentioned this issue Apr 3, 2018

Bugfix/brbdcf 1172 cannot get lock hosts pool deployment ystia/yorc#15

Merged

hamishforbes mentioned this issue Jul 6, 2018

Fix lock and semaphore timeouts #4346

Merged

banks closed this as completed in #4346 Jul 10, 2018

banks added this to the 1.2.1 milestone Jul 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lock.Lock() returns without the lock way before LockWaitTime #4003

lock.Lock() returns without the lock way before LockWaitTime #4003

laurentganne commented Mar 29, 2018 •

edited

Loading

benagricola commented Jul 5, 2018

banks commented Jul 10, 2018 •

edited

Loading

lock.Lock() returns without the lock way before LockWaitTime #4003

lock.Lock() returns without the lock way before LockWaitTime #4003

Comments

laurentganne commented Mar 29, 2018 • edited Loading

Description of the Issue (and unexpected/desired result)

consul version for both Client and Server

benagricola commented Jul 5, 2018

banks commented Jul 10, 2018 • edited Loading

laurentganne commented Mar 29, 2018 •

edited

Loading

`consul version` for both Client and Server

banks commented Jul 10, 2018 •

edited

Loading