stateless connections #11

pda · 2009-06-23T05:16:11Z

Stateful connections cause problems, especially when processing long-running jobs
or connecting over long distances or complicated network topologies. We can solve
these problems by learning from HTTP: prefer (rather, require) stateless, disposable,
short-lived connections.

However, statelessness poses challenges, especially with efficiency. For example,
repeating the tube name for every put command would use more bandwidth than
the current protocol. So this is a tradeoff.

The current beanstalkd protocol is highly stateful and its design pervasively assumes
that state is available. It uses this where possible to be more efficient.

We should reevaluate whether the current statefulness of the protocol is the best tradeoff.

Original post follows.

Currently when a client disconnects while it has a reserved job, that job is instantly released onto the front of the queue, without respect of its remaining TTR.

This means that:

A worker executing long jobs cannot disconnect/reconnect to beanstalkd during execution. Perhaps that would make it too hard to track which client owns which job reservation though...
A job that causes a worker to crash constantly hogs the front of the queue, blocking subsequent jobs from coming through.

Perhaps auto-release due to disconnect should at least release onto the end of the queue, at a lower priority, or with a delay..?

Meta-phaze · 2011-11-23T15:03:57Z

While I understand your requested behavior I just wanted to note I rely on the 'front of queue' behavior so, if this issue moves forward, I hope it can be controlled during the reserve step (or some other way to access either the new or old behavior as desired).

SyBernot · 2012-09-26T18:39:13Z

I'm currently running into this a s a problem along with issue #109 in my work around. My initial thought is another operation/state, check-out, where a job is reserved and placed in a checked-out state (a special case delay) for $timeout seconds (0=forever), the client can then disconnect and go about it's business. if timeout is reached the job returns to the ready state. Things in the checked-out state can be released or deleted by any connection via there $id.

I've actually tried to implement this as reserve (get the job id/job data)->release with a really long delay (also has the benefit of adjusting the priority for when it returns to the ready) but I haven't yet been able to find a way to implement a delete from the delayed state (I get a NOT_FOUND). My end goal is to come up with something that is not dependent on a persistent connection, our jobs can run from seconds to weeks and our workers can be on the other side of the globe so pretty much anything can happen, but also I do want any unfinished jobs to persist on the server until deleted so they can be rekicked if the worker just dies mid job.

kr · 2012-09-27T01:31:02Z

The ability to delete delayed jobs is covered in #30, which
is fixed in recent versions of beanstalkd.

SyBernot · 2012-09-30T00:09:57Z

I just updated to 1.7 and will test delete delayed again. If it woks that would be awesome as I haven't found a way to get something out of delayed other than just waiting. A working kick-job (id) would do it too. I see it's in the protocol doc but it's not a valid command in 1.7.

kr · 2012-12-06T01:10:20Z

In most popular programming languages, it's not hard to create
workers that effectively never crash, regardless of the contents
of a job. For example, if the job raises an unknown exception,
have the worker catch all exceptions in its main loop. The
strategy described How to Handle Job Failures then helps deal
with identifying, triaging, and fixing job failure types.

A more common, and harder to prevent, cause of worker crashes
is external factors, such as resource exhaustion, being killed,
power failures, etc. The current behavior is optimized for this, the
common case.

Having said that, there's still a lot of value in eliminating state
from the connection, and avoiding long-lived connections (or at
least making connections disposable). We're not going to add
configuration flags for this sort of thing, so the behavior will either
stay as it is, or change to be stateless. If we choose to go that
route, we should go all the way to fully stateless connections.

ysmolski · 2019-06-30T19:20:45Z

The answer by @kr sums it rather well. Also I second this post https://xph.us/2010/05/02/how-to-handle-job-failures.html. From my experience it is possible to handle errors or crashes gracefully. I do not know why this issue was kept open for so long since we are not going to solve it foreseeable future. I will hold it open, but mark as unplanned.

JensRantil added the needs-label label Aug 26, 2018

ysmolski added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. and removed needs-label labels Jun 26, 2019

ysmolski added Unplanned Issue is not planned for some reason, such as complexity, lack of clarity, or low priority. and removed NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Jun 30, 2019

ysmolski closed this as completed Aug 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stateless connections #11

stateless connections #11

pda commented Jun 23, 2009

Meta-phaze commented Nov 23, 2011

SyBernot commented Sep 26, 2012

kr commented Sep 27, 2012

SyBernot commented Sep 30, 2012

kr commented Dec 6, 2012

ysmolski commented Jun 30, 2019

stateless connections #11

stateless connections #11

Comments

pda commented Jun 23, 2009

Meta-phaze commented Nov 23, 2011

SyBernot commented Sep 26, 2012

kr commented Sep 27, 2012

SyBernot commented Sep 30, 2012

kr commented Dec 6, 2012

ysmolski commented Jun 30, 2019