Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for custom job ID #264

Closed
rzajac opened this issue May 20, 2015 · 13 comments
Closed

Allow for custom job ID #264

rzajac opened this issue May 20, 2015 · 13 comments

Comments

@rzajac
Copy link

rzajac commented May 20, 2015

put <pri> <delay> <ttr> <bytes> <id>\r\n
<data>\r\n

This would allow to implement failovers.

@emanuelecasadio
Copy link

I can't see the reason for specifying a custom job ID within a queue.

@aight8
Copy link

aight8 commented May 21, 2015

And I really see a reason and really hope this will implemented!
Here is an example:

  • You have number of repeatable job that you want to execute every day once but you want it handle in the queue. Today you have delete first the whole tube, that's ugly (by iterate and delete the queue items). With this solution it's lot easier and flexible!
  • You can check a specific jobs state
  • and so on...

Please implement this!! 👍

@ifduyue
Copy link

ifduyue commented May 21, 2015

You have number of repeatable job that you want to execute every day once but you want it handle in the queue. Today you have delete first the whole tube, that's ugly (by iterate and delete the queue items). With this solution it's lot easier and flexible!

This can be done easily by checking if a daily job is exucted before putting it into or after reserving it from beanstalkd.

And, why not /etc/cron.daily/?

@rzajac
Copy link
Author

rzajac commented May 22, 2015

Allowing to specify custom job IDs would allow me to implement sort of HA and Failover on the library level:

  • adding/deleting... jobs to two or more beanstalkd servers
  • in case of one failing workers / producers could connect to servers from a pool

Unless I'm missing something and current protocol allows for better solutions.

PS. Adding job with ID that already exist should trigger an error

@jabdoa2
Copy link

jabdoa2 commented May 27, 2015

How do you maintain your job ids without central point of failure? In general, in a distributed system you cannot have a job which runs exactly once (http://bravenewgeek.com/you-cannot-have-exactly-once-delivery/). Beanstalkd provides at most once and we do not want to change that. You can have at least once with more than one beanstalkd and some distributed locking or similar (some people use memcached or a db for that purpose).

@rzajac
Copy link
Author

rzajac commented May 29, 2015

Maintaining job ids is out of the scope of this ticket. What I need is to be able to specify my own job ID.

@JensRantil
Copy link
Contributor

This would allow to implement failovers.

@rzajac Could you elaborate a little on this? I'm not exactly sure about your use-case.

@rzajac
Copy link
Author

rzajac commented Apr 3, 2016

@JensRantil I explained it a little bit here: #264 (comment)

@sergeyklay
Copy link
Member

This feature request breaks BC for protocol v1.x

@JensRantil
Copy link
Contributor

@rzajac Ah, sorry. Missed that. Thanks!

I'm going to be the devil's advocate here and shoot down some of the use cases :-)

@aight8 wrote:

You have number of repeatable job that you want to execute every day once but you want it handle in the queue. Today you have delete first the whole tube, that's ugly (by iterate and delete the queue items). With this solution it's lot easier and flexible!

There are various approaches to regular cronjobs:

  • Simply having a an /etc/cron.daily putting your daily job on the queue.
  • Having a permanent job, which every day is being delayed until next midnight.
  • For multiple jobs being executed at the same time every day, you could use any of the two above cases and simply have your daily job put smaller tasks on the queue. That is, task would split into smaller tasks.

You can check a specific jobs state

Valid point. Workaround is to store the job id in another datastore.

and so on...

Not an argument. Carry on. ;)

@rzajac wrote:

Allowing to specify custom job IDs would allow me to implement sort of HA and Failover on the library level:

I really don't think this is a good idea. Doing double writes independently to two queues is bound to eventually make them diverge and have different state. There are all sorts of race conditions. Examples; One TTL times out on one queue and not on the other. Another problem is that you currently can't reserve a specific job. You can delete a specific job, but then you can't be sure that no other consumer has reserved it etc.

The real solution here would be to use something like Zookeeper's ZAB or probably even better RAFT algorithm. All writes would go through master and a majority would need to acknowledge each state change. This would obviously introduce complexity, new failure modes and additional latency to every operation.

@urjitbhatia
Copy link

@rzajac @JensRantil I've also run into this.
The way I do it right now is put the external-id in Redis like @JensRantil suggested and save a mapping to the Beanstalkd generated id. Then, I use it later to cancel, query the job etc. In a way, having Beanstalkd take a custom Id would eliminate the need for an extra piece of infra.

@yellow1912
Copy link

This will also help with self-throttling the job on the client side as well :) Simply checking if the job is already there allow us to avoid sending another one or just increase delay time.

@JensRantil
Copy link
Contributor

Okay, I'm going to close this issue as a no-go. Reasons are as follows:

  • Allowing a custom ID also has the implications that we break the uniqueness of job id. There are also potential concurrency confusions since a very recently deleted job might seem to "pop up" again if a job with a specific job id is added adding lots of confusion to both clients and developers.
  • There are lots of ways this can be solved in a different way than expanding the scope of beanstalkd:
    • submit multiple identical cronjob tasks and make the task processing idempotent (such as gracefully ignoring recently processed message or similar).
    • only running a single process pushing a cronjob task to beanstalkd.
    • taking a distributed lock to make sure only a single process pushes a cronjob task to beanstalkd. See for example https://dkron.io.

Please open a new issue describing your use-case if you believe if your use-case can't be worked around using the above approaches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants