Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion errors from promise when using graphene-django. #57

Open
darrint opened this issue May 21, 2018 · 10 comments
Open

Assertion errors from promise when using graphene-django. #57

darrint opened this issue May 21, 2018 · 10 comments

Comments

@darrint
Copy link

darrint commented May 21, 2018

I'm looking for advice as I haven't succeeded in creating a small test case to demonstrate what I'm seeing.

We have a large Django 1.9/Python 2.7 application that we recently added graphene to. Graphene generally works well. However, in development, a front end engineer wrote some code which resulted in 8 separate graphql requests being sent simultaneously to Django's development runserver (threading is on).

Sometimes this works, but sometimes we see one of two problems:

  1. A request is started in graphene/graphql/promise but never returns. We may have observed that terminating Django runserver with Ctrl-C may have caused a dead request or two to finish. That may have been our imagination.

  2. Occasionally we see this exception:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/graphql/execution/executor.py", line 337, in complete_value_catching
  return completed.catch(handle_error)
  File "/usr/local/lib/python2.7/dist-packages/promise/promise.py", line 476, in catch
    return self.then(None, on_rejection)
  File "/usr/local/lib/python2.7/dist-packages/promise/promise.py", line 537, in then
    return self._then(did_fulfill, did_reject)
  File "/usr/local/lib/python2.7/dist-packages/promise/promise.py", line 484, in _then
    target._add_callbacks(did_fulfill, did_reject, promise)
  File "/usr/local/lib/python2.7/dist-packages/promise/promise.py", line 334, in _add_callbacks
    assert not self._rejection_handler0
AssertionError

We never see this if we turn off threading.

How can we proceed with finding out what is going on? Any advice on narrowing this down? So far I have not been able to produce a small test which reproduces this.

This is a cross post from my original here: graphql-python/graphene-django#421

Please feel free to delete this issue if my cross post isn't appropriate.

@melancholy
Copy link

I've noticed that nested resolvers with middleware produce undesirable behavior with respect to threads.

I ended up having to monkey patch the library and replacing async_instance here: https://github.com/syrusakbary/promise/blob/master/promise/promise.py#L21
with a thread local

perhaps that will work for you as well

@melancholy
Copy link

Upon further reflection and investigation, this library appears to not be thread safe when trampoline is enabled. It appends and drains from a single queue within checks of other variables on a global object without locking sections.

it seems either monkey patching as I mentioned above, or using the following will work. I haven't done any investigation into how this affects performance.

from promise import promise;
promise.async_instance.disable_trampoline()

@charettes
Copy link

Thanks for the workaround @melancholy.

We hit this issue when using the mutli-threaded Django development server (the default manage.py runserver) and sometimes hitting one of these blank assertion errors that were resulting in the following GraphQL error responses:

{"errors":[{"message":""}]}

Or the following when batching queries

[{"errors":[{"message":""}], "id": null, "status": 400}]

@prokher
Copy link

prokher commented Jul 21, 2018

Dear @darrint, I've spent a whole day & night digging in the code of graphene-django, graphql-core, and promise, to understand why the multithreaded execution strategy on my GraphQL Websocket server eventually leads to hanging of all the worker threads! Now I am happy to find out I am not the only one who faced such an issue!

My conclusions are that indeed the promise library (at least of version 2.1) is not thread-safe! In particular, I see that something goes wrong in the global instance async_instance (file promise.py ) of the class Async. I am not 100% sure I understand the exact reason correctly, but I see some suspicious code pieces inside. Consider the two threads with the same tracebacks are in the following snippet from the promise.py:

def queue_tick(self):
    # THREAD1 is here with the stack: then -> _then -> invoke -> _async_invoke -> queue_tick
    if not self.is_tick_used:
        self.is_tick_used = True
        self.schedule.call(self.drain_queues)
        # THREAD2 is here with the stack: then -> _then -> invoke -> _async_invoke -> queue_tick

Looks like THREAD1 the will not invoke self.schedule.call(self.drain_queues) in spite of is has added a message to the self.normal_queue in the _async_invoke invocation. So looks like we have a hanging message/event in result. I suppose that check & assignment of self.is_tick_used shall be done atomically to avoid this.

I see there are modifications in the related files in the masterbranch, but my test hands anyway, so I can conclude the issue is still here.

Anyway, the @melancholy workaround works well, thank you very much for it!

@prokher
Copy link

prokher commented Jul 21, 2018

I would also like to bring @syrusakbary attention here. I think the thread-safety issue is a significant one cause it may silently affect many projects which use this promise library. @syrusakbary, what do you think? May be it is better to disable "trampoline" by default until the issue is resolved? Not sure how will this influence the performance, but comparing to hanging threads it does not look that bad.

@leebenson
Copy link

Performance aside, are there any side-effects to calling disable_trampoline() that you've observed?

I'm trying to figure out if Promises resolve differently having it enabled or not.

@prokher
Copy link

prokher commented Jul 30, 2018

@leebenson I did not observe any side effects yet, but I have not tested it well enough.

@mmerickel
Copy link

I've been running with trampoline disabled for a long time as a result of this issue and hitting the assertions. I was recently diving into some performance issues and noticed that with the trampoline disabled the DataLoader.batch_load is always only invoked with one item at a time because promises are resolved synchronously using the immediate scheduler. This is a very significant hit to performance as it defeats one of the primary purposes of the data loader which is to defer queries until all keys are known in order to construct more efficient loads. Note that I have only looked at this issue using the sync executor.

@mmerickel
Copy link

If it helps anyone else, below is the threadlocal monkeypatch I have been using since I discovered the performance issues with disabling trampolining and have not seen any issues since when used in conjunction with threaded wsgi apps.

def _patch_promise():
    # https://github.com/syrusakbary/promise/issues/57#issuecomment-392925205
    from promise import promise
    from promise.async_ import Async
    import threading

    class ThreadLocalAsync(threading.local):
        def __init__(self):
            self.instance = Async()

            # found by grepping the source for async_instance usages
            for method_name in [
                'fatal_error',
                'invoke',
                'settle_promises',
                'wait',
            ]:
                method = getattr(self.instance, method_name)
                setattr(self, method_name, method)

    promise.async_instance = ThreadLocalAsync()

@PepperTeasdale
Copy link

@mmerickel can you give a little more detail about how you're using this monkey patch please? I'm trying this and noticing that we don't get hanging queries anymore, but that running queries might take a little bit longer overall.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants