reduce number of parallel data retrieving tasks #33

brad · 2016-04-25T16:43:49Z

@orcasgit/orcas-developers Please review. This greatly reduces the number of tasks we have running simultaneously, make it far less likely for conflicts resulting in bad refresh tokens.

coveralls · 2016-04-25T16:49:56Z

Coverage increased (+0.4%) to 93.995% when pulling 1f2c606 on less-parallelism into ae6b524 on master.

grokcode · 2016-04-25T17:31:04Z

fitapp/tasks.py

        raise Reject(sys.exc_info()[1], requeue=False)

    # Create a lock so we don't try to run the same task multiple times
    sdat = date.strftime('%Y-%m-%d') if date else 'ALL'
-    lock_id = '{0}-lock-{1}-{2}-{3}'.format(__name__, fitbit_user, _type, sdat)
+    cats = '-'.join('%s' % i for i in categories)
+    lock_id = '{0}-lock-{1}-{2}-{3}'.format(__name__, fitbit_user, cats, sdat)
    if not cache.add(lock_id, 'true', LOCK_EXPIRE):


@brad I don't think we can use the Django cache for the lock and guarantee that it will work with the various setups that people are likely to have. For example, Django's default caching method is local memory caching, which is a per-process cache. Depending on celery setup, this code can be executed by more than one process which would each have their own cache and not be able to see the locks created by the other workers.

@grokcode What can I do then? Would it be safe to get rid of this lock and decorate get_fitbit_data with @transaction.atomic()?

@brad I think the easiest solution is to punt on it for now and make a note in the README here that the fitbit tasks shouldn't be run concurrently, and then give an example of a way to set up celery to do that. I think we can use celery's manual routing feature to create a new queue and then when starting celery, make sure there is only one thread working on that queue.

It would be much nicer to support concurrent tasks (but trickier too). I think we can do the locking with the db. One idea is to store the lock in the DB, use the @transaction.atomic() decorator like you said, and Django's select_for_update to acquire the lock. I think it would be enough to have one lock per user so that only tasks for one user can execute at a time. That way we shouldn't have multiple processes trying to renew the token at the same time and stepping on each other.

reduce number of parallel data retrieving tasks

1f2c606

brad mentioned this pull request Apr 25, 2016

Safe refresh #32

Closed

grokcode reviewed Apr 25, 2016
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce number of parallel data retrieving tasks #33

reduce number of parallel data retrieving tasks #33

brad commented Apr 25, 2016

coveralls commented Apr 25, 2016

grokcode Apr 25, 2016

brad Apr 25, 2016

grokcode Apr 25, 2016

reduce number of parallel data retrieving tasks #33

Are you sure you want to change the base?

reduce number of parallel data retrieving tasks #33

Conversation

brad commented Apr 25, 2016

coveralls commented Apr 25, 2016

grokcode Apr 25, 2016

Choose a reason for hiding this comment

brad Apr 25, 2016

Choose a reason for hiding this comment

grokcode Apr 25, 2016

Choose a reason for hiding this comment