Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Extra" args which do not matter for job uniqueness #9

Closed
Gurpartap opened this issue Aug 4, 2016 · 6 comments
Closed

"Extra" args which do not matter for job uniqueness #9

Gurpartap opened this issue Aug 4, 2016 · 6 comments

Comments

@Gurpartap
Copy link

Gurpartap commented Aug 4, 2016

Hi @cypriss,

Thanks for the work on work. 馃構

Case in point, in my current setup, I'm loading twitter followers with the job args:

{"twitter_id": 123456, "next_cursor_str": "-1", "ids": []}

If I'm rate limited by twitter API, I schedule a new job from within the current job with the args:

{"twitter_id": 123456, "next_cursor_str": "something", "ids": [1,2,3,4,5]}

(I have to make a query for uniqueness at this time)

If I were to use work for background processing, it seems like embedding partial args like above will have to go unless:

a) There's support to include "extra" args, which do not count towards job uniqueness; or
b) I use a reference to external cache. This sounds reasonably good for my case since the ids part can get pretty large.

With (a) it would be guaranteed that there's only one followers checking job for a user with twitter_id 123456. The next_cursor_str and ids can move into the extra/cache args. What do you think? Are there breaking concerns about how work is implemented, or is this simple enough to do?

I'm willing to take a stab at this with some guidance on what and how this would be possible.

@Gurpartap
Copy link
Author

Or a uniqueKey variable, perhaps.

@cypriss
Copy link
Member

cypriss commented Aug 4, 2016

Hi!

It seems like there's a couple options here:

a) The option that modifies gocraft/work to support "extra" args. The best way I can see to do this is to have a magic argument when enqueuing, such as

enqueuer.Enqueue("load_twitter_followers", work.Q{"twitter_id": 123456, ..., "work_unique_key": 123456})

If this unique key is present, we'll use it instead of a full serialization of the args.

b) Persist the ids and next_cursor_str outside of gocraft/work. You could persist them in redis, or perhaps sql (if you use that). May I ask, what do you do with all the ids once you get them? Save them anywhere?

c) Use normal jobs instead of unique jobs, and keep track of the uniqueness yourself (in redis). You could check this unique k/v before enqueuing and clear when done.

So, I'm a bit reluctant to just go with (a) because it's a bit magic and not immediately expected/intuitive. I'm also reluctant to go with (a) because I don't know how common this use-case is.

Thoughts?

@Gurpartap
Copy link
Author

I believe the uniqueKey should be overridable for when you have relatively large number of args, as the options otherwise are more work (usually premature optimization) than necessary to get the same effect. (a) actually looks good to me

I'm storing/overriding ids in a row dedicated per twitter_id (i.e. keeping only the latest snapshot of all followers).

In case of (c), do normal jobs use args for anything? What can go wrong when {"ids": [has a million int64s]} other than running out of RAM?

@cypriss
Copy link
Member

cypriss commented Aug 4, 2016

All jobs, unique and normal, serialize the args as json and store those bytes in redis. Redis can handle some fairly long values, but obviously you pay linearly for the bytes you use to transfer to redis and to encode/decode.

@scottillogical
Copy link

I would like to use the unique jobs feature as well but also would like to be able to pass an unique ID that does not affect the uniqueness check to the job.

@shdunning
Copy link
Collaborator

Fixed by PR #110

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants