Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible thread leak #343

Closed
chmoder opened this issue Feb 24, 2020 · 19 comments · Fixed by #362
Closed

Possible thread leak #343

chmoder opened this issue Feb 24, 2020 · 19 comments · Fixed by #362
Assignees
Labels
priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. 🚨 This issue needs some love. status: investigating The issue is under investigation, which is determined to be non-trivial. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@chmoder
Copy link
Contributor

chmoder commented Feb 24, 2020

Hi,

We investigated a bug about a process running out of threads in a long lasting process with thousands of requests being made. After researching we narrowed down the cause to key.get; we also can reproduce with query.fetch.

The below functions are ran in threading.py start when key.get is called. It seems like one or more of them may have a thread that needs to close at the end of the request?

Thank you for the help!

<function _run_channel_spin_thread.<locals>.channel_spin at 0x1119ec048>
<grpc._plugin_wrapping._Plugin object at 0x1119fc9e8>
<function _worker at 0x1119afae8>

Environment details

  1. python-ndb
  2. OS X Catalina
  3. python 3.7.3
  4. google-cloud-ndb==1.0.0

Steps to reproduce

  1. invoke key.get()

Code example

 from google.cloud import ndb


class House(ndb.Model):
    room = ndb.StringProperty(default='kitchen')


class HouseHandler(webapp2.RequestHandler):
    def get(self):
        client = ndb.Client(project='project_id')
        with client.context():
            k = ndb.Key(House, 5508237726318592)
            o = k.get()

app = webapp2.WSGIApplication(
    [webapp2.Route('/v1/houses/1', HouseHandler)],
    debug=True
)

if __name__ == '__main__':
    from paste import httpserver

    httpserver.serve(app, host='127.0.0.1', port='8080')

When making the request, watch the number of threads in the python process. You should see it climb for each request. In the screenshot below it is the highlighted process with 19 threads.

ab -n 1 -H http://127.0.0.1:8080/v1/houses/1

image

Stack Trace

INFO:root:default: "POST /cloud_task_queue HTTP/1.1" 200 4 0.73653s
[2020-02-21 15:09:44 -0600] [3172] [DEBUG] POST /cloud_task_queue
ERROR:grpc._plugin_wrapping:AuthMetadataPluginCallback "<google.auth.transport.grpc.AuthMetadataPlugin object at 0x12d9d0978>" raised exception!
Traceback (most recent call last):
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/grpc/_plugin_wrapping.py", line 79, in __call__
    callback_state, callback))
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/auth/transport/grpc.py", line 87, in __call__
    future = self._pool.submit(self._get_authorization_headers, context)
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/thread.py", line 160, in submit
    self._adjust_thread_count()
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/thread.py", line 181, in _adjust_thread_count
    t.start()
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 847, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new threadΩ
ERROR:root:Unhandled exception: can't start new thread
Traceback (most recent call last):
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/csg/logger/request_logger_clients/base_client.py", line 277, in process_exception
    raise exception
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/webapp2.py", line 604, in dispatch
    return method(*args, **kwargs)
  File "/Users/tcross/development/csgapi/handlers/cloud_task_handler.py", line 10, in post
    CloudTaskQueue.process_cloud_task(request_body)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/csg/tasks/queues/cloud_task_queue.py", line 51, in process_cloud_task
    execute_task(func, **request_body)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/csg/tasks/execute.py", line 14, in execute_task
    func(**kwargs)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/csg/tasks/deferred_pickle/deferred.py", line 117, in run_pickle
    func_, args_ = manager.get(key)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/csg/tasks/deferred_pickle/db_interface/ndb/ndb_manager.py", line 37, in get
    d_pickle = key.get()
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/_options.py", line 89, in wrapper
    return wrapped(*pass_args, **kwargs)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/utils.py", line 110, in positional_wrapper
    return wrapped(*args, **kwds)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/key.py", line 811, in get
    return self.get_async(_options=_options).result()
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/tasklets.py", line 190, in result
    self.check_success()
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/tasklets.py", line 137, in check_success
    raise self._exception
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/tasklets.py", line 309, in _advance_tasklet
    self.generator.throw(type(error), error, traceback)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/key.py", line 896, in get
    entity_pb = yield _datastore_api.lookup(self._key, _options)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/tasklets.py", line 309, in _advance_tasklet
    self.generator.throw(type(error), error, traceback)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/_datastore_api.py", line 169, in lookup
    entity_pb = yield batch.add(key)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/tasklets.py", line 309, in _advance_tasklet
    self.generator.throw(type(error), error, traceback)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/_retry.py", line 78, in retry_wrapper
    raise error
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/_retry.py", line 73, in retry_wrapper
    result = yield result
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/tasklets.py", line 312, in _advance_tasklet
    yielded = self.generator.send(send_value)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/_datastore_api.py", line 108, in rpc_call
    call = method.future(request, timeout=timeout)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/grpc/_channel.py", line 595, in future
    (operations,), event_handler, self._context)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/grpc/_channel.py", line 863, in create
    _run_channel_spin_thread(state)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/grpc/_channel.py", line 827, in _run_channel_spin_thread
    channel_spin_thread.start()
  File "src/python/grpcio/grpc/_cython/_cygrpc/fork_posix.pyx.pxi", line 119, in grpc._cython.cygrpc.ForkManagedThread.start
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 847, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
ERROR    2020-02-21 15:09:44 _plugin_wrapping.py:83] Traceback (most recent call last):
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/grpc/_plugin_wrapping.py", line 79, in __call__
    callback_state, callback))
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/auth/transport/grpc.py", line 87, in __call__
    future = self._pool.submit(self._get_authorization_headers, context)
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/thread.py", line 160, in submit
    self._adjust_thread_count()
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/thread.py", line 181, in _adjust_thread_count
    t.start()
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 847, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
ERROR    2020-02-21 15:09:44 base_client.py:281] Traceback (most recent call last):
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/csg/logger/request_logger_clients/base_client.py", line 277, in process_exception
    raise exception
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/webapp2.py", line 604, in dispatch
    return method(*args, **kwargs)
  File "/Users/tcross/development/csgapi/handlers/cloud_task_handler.py", line 10, in post
    CloudTaskQueue.process_cloud_task(request_body)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/csg/tasks/queues/cloud_task_queue.py", line 51, in process_cloud_task
    execute_task(func, **request_body)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/csg/tasks/execute.py", line 14, in execute_task
    func(**kwargs)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/csg/tasks/deferred_pickle/deferred.py", line 117, in run_pickle
    func_, args_ = manager.get(key)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/csg/tasks/deferred_pickle/db_interface/ndb/ndb_manager.py", line 37, in get
    d_pickle = key.get()
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/_options.py", line 89, in wrapper
    return wrapped(*pass_args, **kwargs)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/utils.py", line 110, in positional_wrapper
    return wrapped(*args, **kwds)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/key.py", line 811, in get
    return self.get_async(_options=_options).result()
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/tasklets.py", line 190, in result
    self.check_success()
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/tasklets.py", line 137, in check_success
    raise self._exception
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/tasklets.py", line 309, in _advance_tasklet
    self.generator.throw(type(error), error, traceback)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/key.py", line 896, in get
    entity_pb = yield _datastore_api.lookup(self._key, _options)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/tasklets.py", line 309, in _advance_tasklet
    self.generator.throw(type(error), error, traceback)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/_datastore_api.py", line 169, in lookup
    entity_pb = yield batch.add(key)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/tasklets.py", line 309, in _advance_tasklet
    self.generator.throw(type(error), error, traceback)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/_retry.py", line 78, in retry_wrapper
    raise error
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/_retry.py", line 73, in retry_wrapper
    result = yield result
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/tasklets.py", line 312, in _advance_tasklet
    yielded = self.generator.send(send_value)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/google/cloud/ndb/_datastore_api.py", line 108, in rpc_call
    call = method.future(request, timeout=timeout)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/grpc/_channel.py", line 595, in future
    (operations,), event_handler, self._context)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/grpc/_channel.py", line 863, in create
    _run_channel_spin_thread(state)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/grpc/_channel.py", line 827, in _run_channel_spin_thread
    channel_spin_thread.start()
  File "src/python/grpcio/grpc/_cython/_cygrpc/fork_posix.pyx.pxi", line 119, in grpc._cython.cygrpc.ForkManagedThread.start
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 847, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
INFO:root:default: "POST /cloud_task_queue HTTP/1.1" 500 90 2.07239s
[2020-02-21 15:09:46 -0600] [3172] [DEBUG] POST /cloud_task_queue
INFO     2020-02-21 15:09:46 base_client.py:234] default: "POST /cloud_task_queue HTTP/1.1" 500 90 2.07239s
Exception in thread Thread-54699:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/tcross/development/virtual-environments/csgapi/lib/python3.7/site-packages/grpc/_channel.py", line 815, in channel_spin
    event = state.channel.next_call_event()
  File "src/python/grpcio/grpc/_cython/_cygrpc/channel.pyx.pxi", line 464, in grpc._cython.cygrpc.Channel.next_call_event
  File "src/python/grpcio/grpc/_cython/_cygrpc/channel.pyx.pxi", line 150, in grpc._cython.cygrpc._next_call_event
  File "src/python/grpcio/grpc/_cython/_cygrpc/completion_queue.pyx.pxi", line 67, in grpc._cython.cygrpc._latent_event
  File "src/python/grpcio/grpc/_cython/_cygrpc/credentials.pyx.pxi", line 56, in grpc._cython.cygrpc._get_metadata
  File "src/python/grpcio/grpc/_cython/_cygrpc/credentials.pyx.pxi", line 31, in grpc._cython.cygrpc._spawn_callback_async
  File "src/python/grpcio/grpc/_cython/_cygrpc/credentials.pyx.pxi", line 22, in grpc._cython.cygrpc._spawn_callback_in_thread
  File "src/python/grpcio/grpc/_cython/_cygrpc/fork_posix.pyx.pxi", line 119, in grpc._cython.cygrpc.ForkManagedThread.start
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 847, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread

possibly related to #336

@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Feb 24, 2020
@chrisrossi chrisrossi self-assigned this Feb 25, 2020
@chrisrossi chrisrossi added 🚨 This issue needs some love. and removed triage me I really want to be triaged. labels Feb 25, 2020
@chrisrossi
Copy link
Contributor

@chmoder #336 seems to be specifically related to App Engine. Are you running this in App Engine or something newer?

@chmoder
Copy link
Contributor Author

chmoder commented Feb 25, 2020

This happens on my computer locally - I can see the thread count go up in a python process with each call to key.get.

We also use app engine standard37 and flexible python37.

@chrisrossi
Copy link
Contributor

I'm trying to reproduce this and haven't so far.

Here's a sample app I'm using to test:
https://gist.github.com/chrisrossi/352efe8e4133de59e0ddb818c4138ad9

Here is a transcript of a session:

(flask-ndb) chris@spirit:~/proj/google/flask-ndb$ curl http://localhost:5000/threads
<_MainThread(MainThread, started 139848366495552)>
<Thread(Thread-52, started daemon 139848297903872)>
(flask-ndb) chris@spirit:~/proj/google/flask-ndb$ curl http://localhost:5000/make/hay
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>Redirecting...</title>
<h1>Redirecting...</h1>
<p>You should be redirected automatically to target URL: <a href="/show/5676714420076544">/show/5676714420076544</a>.  If not click the link.(flask-ndb) chris@spirit:~/proj/google/flask-ndb$ curl http://localhost:5000/show/5676714420076544
{'key': Key('SomeThing', 5676714420076544), 'properties': {'foo': 'hay'}}
(flask-ndb) chris@spirit:~/proj/google/flask-ndb$ curl http://localhost:5000/threads
<_MainThread(MainThread, started 139848366495552)>
<Thread(Thread-59, started daemon 139848297903872)>
(flask-ndb) chris@spirit:~/proj/google/flask-ndb$ curl http://localhost:5000/show/5676714420076544
{'key': Key('SomeThing', 5676714420076544), 'properties': {'foo': 'hay'}}
(flask-ndb) chris@spirit:~/proj/google/flask-ndb$ curl http://localhost:5000/threads
<_MainThread(MainThread, started 139848366495552)>
<Thread(Thread-61, started daemon 139848297903872)>
(flask-ndb) chris@spirit:~/proj/google/flask-ndb$ 

The threads remain constant, despite the call to key.get().

One difference is my sample app creates the client only once and encapsulates the context in a middleware. I wonder if you still see this issue using the middleware in my sample app rather than instantiating a client in a Flask view?

Also, you can try upgrading to NDB 1.0.1.

@chmoder
Copy link
Contributor Author

chmoder commented Feb 27, 2020

Thanks for the suggestion @chrisrossi! There is not a lot of good examples for how to properly make a middleware for webapp2. I'm trying to fit it into my sprint - I will let you know how it goes when we have time.

@chrisrossi
Copy link
Contributor

Here is your example adapted to include the middleware:
https://gist.github.com/chrisrossi/559413684cebd0fbe6f50bba9101c48f

I've done some testing with this and haven't found a thread leak. I'll do a little more stress testing before putting this away, though.

@chrisrossi
Copy link
Contributor

Ok, I just realized the cache was masking the issue. I have been able to reproduce the thread leak. Since NDB doesn't create threads, it's obviously an issue with something we're using. It's unclear at this time if there will be a solution on the NDB side, or if it will wind up being something for the maintainers of a library we depend on.

@chrisrossi
Copy link
Contributor

I've also confirmed that the leak occurs with webapp2 but not Flask, which makes it more likely to be related to/the same as #336.

@andrewsg
Copy link
Contributor

That's very interesting!! I wonder if this category of problems is an issue with webapp2 specifically and not GAE Py2 in general. If so it would be much easier to fix. Time permitting, let's try to verify that #336 is reproducible on webapp2 but not on flask, and, especially, if both this and #336 are reproducible locally, outside of GAE Py2.

@chmoder
Copy link
Contributor Author

chmoder commented Feb 27, 2020

Thanks for the extra effort! I just verified your results; We still see the thread leak with the middleware as well.

More context: over the years we have seen memory leaks and thread leaks. Like variables not being destroyed after the handler finishes. Which leads to critical OOM errors on GAE.

My hunch is something to do with context managers. Like __exit__ not being called. Please feel free to ask for information, testing, or any kind of help!

@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Mar 2, 2020
@chrisrossi
Copy link
Contributor

I can verify that we're leaking grpc._plugin_wrapping._Plugin on each request. They're getting left behind with refcount of 1, but whatever is referencing it is not tracked by the garbage collector, as the garbage collector has 0 referrers for that object.

Anyway. Still investigating. This is a tricky one.

chrisrossi pushed a commit to chrisrossi/python-ndb that referenced this issue Mar 11, 2020
This brings our practice in line with `google.cloud.datastore`, which
also creates one channel per client. This works around a resource leak issue
by not requiring the channel to clean up after itself properly in normal
usage. The root cause of that issue seems to lie somewhere in
`google.auth`, which is where I will follow up.

Fixes googleapis#343
@chrisrossi
Copy link
Contributor

#362 works around this issue. You will still have to limit yourself to one ndb.Client per process, rather than creating multiple clients. I've just moved the leak to the client instead of the context, so that under normal circumstances you won't notice it. I will follow up with the Google core team about the resource leak, which ultimately is not in NDB.

@chmoder
Copy link
Contributor Author

chmoder commented Mar 11, 2020

Thanks @chrisrossi and team. I verified the resource leak is gone after using python-ndb:master and the middleware as demonstrated here

Nice work! It would be nice to be able to follow the progress in google.auth, but understand if it is internal.

chrisrossi pushed a commit that referenced this issue Mar 11, 2020
This brings our practice in line with `google.cloud.datastore`, which
also creates one channel per client. This works around a resource leak issue
by not requiring the channel to clean up after itself properly in normal
usage. The root cause of that issue seems to lie somewhere in
`google.auth`, which is where I will follow up.

Fixes #343
@waltermoreira
Copy link

I'm in App Engine Standard on Python 2.7, and I'm still seeing the issue after installing python-ndb:master and adding a middleware to create only one ndb.Client as @chmoder suggested in their snippet.

Has someone tried @chrisrossi 's fix on Standard 2.7? And any ideas for a workaround?

@chrisrossi
Copy link
Contributor

Hi @waltermoreira I did some testing just now with Standard 2.7 and I'm not seeing a thread leak. Is it possible you're getting NDB from PyPi and not GitHub? I don't think this particular fix has seen release yet.

@andrewsg
Copy link
Contributor

I'll go ahead and cut a release now so that that isn't an issue going forward.

@waltermoreira
Copy link

Thanks for the response @chrisrossi and @andrewsg . I did make sure to install from github, and I checked I'm actually importing the new code by adding logging to it.

I could be doing the instantiation of the client wrong, though. I know it might be out of scope, @chrisrossi , but would you mind sharing if possible your experiment?

@chrisrossi
Copy link
Contributor

@waltermoreira

Sure, here is a test app:

import flask
import pprint
import threading

from google.cloud import ndb


def ndb_wsgi_middleware(wsgi_app):
    def middleware(environ, start_response):
        global_cache = None   #ndb.RedisCache.from_environment()
        with client.context(global_cache=global_cache):
            return wsgi_app(environ, start_response)

    return middleware


client = ndb.Client()

app = flask.Flask("NDB Example")
app.wsgi_app = ndb_wsgi_middleware(app.wsgi_app)  # Wrap the app in middleware.


class SomeThing(ndb.Model):
    foo = ndb.StringProperty()


@app.route("/")
def hello_world():
    return text("is this thing on?")


@app.route("/make/<foo>")
def make_thing(foo):
    thing = SomeThing(foo=foo)
    key = thing.put()
    return flask.redirect("/show/{}".format(key.id()), code=302)


@app.route("/show/<id>")
def show_thing(id):
    key = ndb.Key("SomeThing", int(id))
    thing = key.get()
    if thing is None:
        flask.abort(404)

    return text(thing_json(thing))


@app.route("/threads")
def threads():
    output = "\n".join(map(str, threading.enumerate()))
    return text(output)


def thing_json(thing):
    return {
        "key": thing.key,
        "properties": thing.to_dict(),
    }


def text(s, status=200):
    if not isinstance(s, str):
        s = pprint.pformat(s)

    return s + "\n", status, {
        "Content-type": "text/plain; charset=utf-8",
    }

Here it is in action:

chris@spirit:~$ curl -s https://impactful-web-725.appspot.com
is this thing on?
chris@spirit:~$ curl -s https://impactful-web-725.appspot.com/make/vaccine
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>Redirecting...</title>
<h1>Redirecting...</h1>
<p>You should be redirected automatically to target URL: <a href="/show/5738904997068800">/show/5738904997068800</a>.  If not click the link.chris@spirit:~$ 
chris@spirit:~$ curl -s https://impactful-web-725.appspot.com/show/5738904997068800
{'key': Key(u'SomeThing', 5738904997068800L),
 'properties': {'foo': u'vaccine'}}
chris@spirit:~$ curl -s https://impactful-web-725.appspot.com/threads
<_MainThread(MainThread, started 68648921100160)>
<_DummyThread(Dummy-2, started daemon 68648809961216)>
<Thread(ThreadPoolExecutor-0_0, started daemon 68648767063808)>
<_DummyThread(Dummy-4, started daemon 68648881264384)>
<_DummyThread(Dummy-7, started daemon 68648777549568)>
chris@spirit:~$ curl -s https://impactful-web-725.appspot.com/threads | grep ThreadPoolExecutor | wc -l
1
chris@spirit:~$ curl -s https://impactful-web-725.appspot.com/show/5738904997068800
{'key': Key(u'SomeThing', 5738904997068800L),
 'properties': {'foo': u'vaccine'}}
chris@spirit:~$ curl -s https://impactful-web-725.appspot.com/threads | grep ThreadPoolExecutor | wc -l
1
chris@spirit:~$ curl -s https://impactful-web-725.appspot.com/show/5738904997068800
{'key': Key(u'SomeThing', 5738904997068800L),
 'properties': {'foo': u'vaccine'}}
chris@spirit:~$ curl -s https://impactful-web-725.appspot.com/threads | grep ThreadPoolExecutor | wc -l
1
chris@spirit:~$ curl -s https://impactful-web-725.appspot.com/show/5738904997068800
{'key': Key(u'SomeThing', 5738904997068800L),
 'properties': {'foo': u'vaccine'}}
chris@spirit:~$ curl -s https://impactful-web-725.appspot.com/threads | grep ThreadPoolExecutor | wc -l
1

@waltermoreira
Copy link

@chrisrossi Thank you so much for the example.

This is weird. There must be something wrong with our environment. I copied and pasted your code and I'm still getting a hanging thread at the end of the request to /make/<foo>. I'm attaching the log (although, of course, I don't expect you to debug it for us, just informational). Notice line 70 where it shows the logs for the thread.

From your experience with appengine, do you have a suggestion to where to concentrate our debugging?

  1 {
  2   "protoPayload": {
  3     "@type": "type.googleapis.com/google.appengine.logging.v1.RequestLog",
  4     "appId": "s~searchbertha-qa1",
  5     "versionId": "walter-cloud-ndb",
  6     "requestId": "5e725c6500ff0483f66f98b9270001737e7365617263686265727468612d716131000177616c7465722d636c6f75642d6e6462000100",
  7     "ip": "24.155.241.56",
  8     "startTime": "2020-03-18T17:37:41.295926Z",
  9     "endTime": "2020-03-18T17:38:41.492576Z",
 10     "latency": "60.196650s",
 11     "method": "GET",
 12     "resource": "/make/foo",
 13     "httpVersion": "HTTP/1.1",
 14     "status": 500,
 15     "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
 16     "urlMapEntry": "search_v2.app2.app",
 17     "host": "walter-cloud-ndb-dot-searchbertha-qa1.appspot.com",
 18     "instanceIndex": -1,
 19     "finished": true,
 20     "instanceId": "00c61b117c68521f8e36cd7dc188b77ba27d3dd48f4e8f574556dde14a56b82c542e8e4be1",
 21     "line": [
 22       {
 23         "time": "2020-03-18T17:37:41.362049Z",
 24         "severity": "DEBUG",
 25         "logMessage": "Commit(mode: NON_TRANSACTIONAL\nmutations {\n  upsert {\n    key {\n      partition_id {\n        project_id: \"searchbertha-qa1\"\n      }\n      path {\n        kind: \"SomeThing\"\n      }\n    }\n    properties {\n      key: \"foo\"\n      value {\n        string_value: \"foo\"\n      }\n    }\n  }\n}\nproject_id: \"searchbertha-qa1\"\n)",
 26         "sourceLocation": {
 27           "file": "/base/data/home/apps/s~searchbertha-qa1/walter-cloud-ndb.425350201042821690/lib/standard/google/cloud/ndb/_datastore_api.py",
 28           "line": "85",
 29           "functionName": "rpc_call"
 30         }
 31       },
 32       {
 33         "time": "2020-03-18T17:37:41.362245Z",
 34         "severity": "DEBUG",
 35         "logMessage": "timeout=None",
 36         "sourceLocation": {
 37           "file": "/base/data/home/apps/s~searchbertha-qa1/walter-cloud-ndb.425350201042821690/lib/standard/google/cloud/ndb/_datastore_api.py",
 38           "line": "86",
 39           "functionName": "rpc_call"
 40         }
 41       },
 42       {
 43         "time": "2020-03-18T17:37:41.836019Z",
 44         "severity": "DEBUG",
 45         "logMessage": "mutation_results {\n  key {\n    partition_id {\n      project_id: \"searchbertha-qa1\"\n    }\n    path {\n      kind: \"SomeThing\"\n      id: 6191186996101120\n    }\n  }\n  version: 1584553061580000\n}\nindex_updates: 3",
 46         "sourceLocation": {
 47           "file": "/base/data/home/apps/s~searchbertha-qa1/walter-cloud-ndb.425350201042821690/lib/standard/google/cloud/ndb/_datastore_api.py",
 48           "line": "786",
 49           "functionName": "_process_commit"
 50         }
 51       },
 52       {
 53         "time": "2020-03-18T17:37:41.846910Z",
 54         "severity": "INFO",
 55         "logMessage": "Saved; key: __appstats__:061300, part: 110 bytes, full: 15683 bytes, overhead: 0.001 + 0.006; link: http://walter-cloud-ndb-dot-searchbertha-qa1.appspot.com/_ah/stats/details?time=1584553061338",
 56         "sourceLocation": {
 57           "file": "/base/alloc/tmpfs/dynamic_runtimes/python27g/38de6b60e0e1d177/python27/python27_lib/versions/1/google/appengine/ext/appstats/recording.py",
 58           "line": "676",
 59           "functionName": "save"
 60         }
 61       },
 62       {
 63         "time": "2020-03-18T17:38:41.491175Z",
 64         "severity": "ERROR",
 65         "logMessage": "Thread running after request. Creation traceback:\n  File \"/base/alloc/tmpfs/dynamic_runtimes/python27g/38de6b60e0e1d177/python27/python27_lib/versions/1/google/appengine/runtime/runtime.py\", line 80, in Run\n    target(*args, **kw)\n  File \"/base/alloc/tmpfs/dynamic_runtimes/python27g/38de6b60e0e1d177/python27/python27_dist/lib/python2.7/threading.py\", line 794, in __bootstrap\n    self.__bootstrap_inner()\n  File \"/base/alloc/tmpfs/dynamic_runtimes/python27g/38de6b60e0e1d177/python27/python27_dist/lib/python2.7/threading.py\", line 821, in __bootstrap_inner\n    self.run()\n  File \"/base/alloc/tmpfs/dynamic_runtimes/python27g/38de6b60e0e1d177/python27/python27_dist/lib/python2.7/threading.py\", line 774, in run\n    self.__target(*self.__args, **self.__kwargs)\n  File \"/base/alloc/tmpfs/dynamic_runtimes/python27g/38de6b60e0e1d177/python27/python27_lib/versions/third_party/grpcio-1.0.0/grpc/_channel.py\", line 678, in channel_spin\n    event = state.completion_queues[key].poll()\n  File \"/base/alloc/tmpfs/dynamic_runtimes/python27g/38de6b60e0e1d177/python27/python27_lib/versions/third_party/grpcio-1.0.0/grpc/_plugin_wrapping.py\", line 107, in __call__\n    wrapped_context, AuthMetadataPluginCallback(wrapped_cygrpc_callback))\n  File \"/base/data/home/apps/s~searchbertha-qa1/walter-cloud-ndb.425350201042821690/lib/standard/google/auth/transport/grpc.py\", line 87, in __call__\n    future = self._pool.submit(self._get_authorization_headers, context)\n  File \"/base/data/home/apps/s~searchbertha-qa1/walter-cloud-ndb.425350201042821690/lib/standard/concurrent/futures/thread.py\", line 137, in submit\n    self._adjust_thread_count()\n  File \"/base/data/home/apps/s~searchbertha-qa1/walter-cloud-ndb.425350201042821690/lib/standard/concurrent/futures/thread.py\", line 159, in _adjust_thread_count\n    t.start()\n  File \"/base/alloc/tmpfs/dynamic_runtimes/python27g/38de6b60e0e1d177/python27/python27_dist/lib/python2.7/threading.py\", line 749, in start\n    _start_new_thread(self.__bootstrap, ())\n  File \"/base/alloc/tmpfs/dynamic_runtimes/python27g/38de6b60e0e1d177/python27/python27_lib/versions/1/google/appengine/runtime/runtime.py\", line 83, in StartNewThread\n    return base_start_new_thread(Run, ())\n"
 66       },
 67       {
 68         "time": "2020-03-18T17:38:41.492550Z",
 69         "severity": "WARNING",
 70         "logMessage": "Threads started by this request continued executing past the hard deadline."
 71       }
 72     ],
 73     "appEngineRelease": "1.9.71",
 74     "traceId": "d2a93d69a4baf0647bd24a6cd98d9f55",
 75     "first": true,
 76     "traceSampled": true
 77   },
 78   "insertId": "5e725ca10007858a874db0f7",
 79   "httpRequest": {
 80     "status": 500
 81   },
 82   "resource": {
 83     "type": "gae_app",
 84     "labels": {
 85       "module_id": "default",
 86       "zone": "us12",
 87       "project_id": "searchbertha-qa1",
 88       "version_id": "walter-cloud-ndb"
 89     }
 90   },
 91   "timestamp": "2020-03-18T17:37:41.295926Z",
 92   "severity": "ERROR",
 93   "labels": {
 94     "clone_id": "00c61b117c68521f8e36cd7dc188b77ba27d3dd48f4e8f574556dde14a56b82c542e8e4be1"
 95   },
 96   "logName": "projects/searchbertha-qa1/logs/appengine.googleapis.com%2Frequest_log",
 97   "operation": {
 98     "id": "5e725c6500ff0483f66f98b9270001737e7365617263686265727468612d716131000177616c7465722d636c6f75642d6e6462000100",
 99     "producer": "appengine.googleapis.com/request_id",
100     "first": true,
101     "last": true
102   },
103   "trace": "projects/searchbertha-qa1/traces/d2a93d69a4baf0647bd24a6cd98d9f55",
104   "receiveTimestamp": "2020-03-18T17:38:41.495223513Z",
105   "traceSampled": true
106 }

@chrisrossi
Copy link
Contributor

Hi @waltermoreira

The issue of threads staying open beyond the end of an App Engine request is actually issue #336. I thought they might be related, but it looks like I need to turn my attention to that one. The reporters suggest a workaround of calling gc.collect(). If you try the workaround, let me know if it works for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. 🚨 This issue needs some love. status: investigating The issue is under investigation, which is determined to be non-trivial. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants