-
Notifications
You must be signed in to change notification settings - Fork 156
Stuck / hangs at Initializing Jaeger Tracer with UDP reporter
#60
Comments
can you get a stack trace of where it's hanging? |
Threads:
C Stack Trace:
I wasn't able to get the Python stack trace unfortunately. |
can you share a minimum example to reproduce? |
I'm encountering this same issue when trying to integrate an application running under tornado in a docker container but as a WSGIApplication (so no async allowed) . I also wasn't able to get a stack trace, but the issue appears to be in local_agent_net.py::_create_new_thread_loop() Seems to be that things lock up as soon as self._thread_loop.is_ready() is called |
tornado is used by jaeger_client internally. We have it working with WSGI services just fine, but you need to make sure the tracer is initialized post-fork. I have seen the locking issue before due to the way our of our internal framework is structured, namely that it performs initialization as part of an Do you have a reproducable example? |
It's completely reproducible in the context of our full production system. I tried to distill it down to a trivial example that is ordered in the same way using a tornado WSGIApplication, but in that example the deadlock doesn't happen. As far as in can tell, the structure of my toy example is essentially the same as our overall production code, so I'm not clear why it doesn't have the same issue. How would I ensure that I'm initializing the tracer post-fork? I'm going to try messing around with uwsgi's @postfork decorator. |
Yes, in our internal lib that wraps WSGI we're using |
https://docs.python.org/2/library/threading.html#importing-in-threaded-code
The trouble is in a lot of cases, Configuring jaeger from $ cat >foo.py
from jaeger_client import Config
config = Config(config={'logging': True}, service_name='postal-main')
tracer = config.initialize_tracer()
$ python foo.py
<jaeger_client.tracer.Tracer object at 0x7f0996b1e490> But try to import that module, and you get a deadlock: $ python
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import foo
[deadlock] |
@bitglue I added a section to the README, does it help? |
I think there may be two problems here. One, starting jaeger and then forking leads to problems. I'd guess it's because the children will be running the same event loop, on the same file descriptors. The other problem is this deadlock. As my case above illustrates, it's not necessary to fork anything to get this deadlock: all that's required is to import something which tries to initialize jaeger. With django in particular this is problematic: opentracing-contrib/python-django configures middleware (distinct from WSGI middleware) in a settings file: I don't believe there's a way to add middleware post-fork. The settings file is a module that gets imported. The middleware takes a tracer as a parameter. Obtaining a tracer implicitly starts jager_client's event loop. Thus it's not really possible to configure the django middleware to use jager without deadlocking. |
I'm in the same situation as @bitglue describes with Django / gunicorn. I guess because of django works, using gunicorn's |
I was able to work around the issue by extending the opentracing-contrib/python-django to handle the case of not yet having a tracer: class LazyOpenTracingMiddleware(OpenTracingMiddleware):
'''Opentracing middleware which evaluates the tracer lazily.
In part due to
https://github.com/uber/jaeger-client-python/issues/60, and further because
it's just a good idea to avoid issues like unintentionally shared file
descriptors between processes, we need to initialize the jaeger tracer
after the gunicorn fork. Since OpenTracingMiddleware is configured in
MIDDLEWARE_CLASSES in sites/postmates/settings/base.py, which is imported
pre-fork, this presents a problem.
This class modifies OpenTracingMiddleware such that it will do nothing
until a global tracer exists at `opentracing.tracer`.
'''
def __init__(self):
'''Do nothing.
OpenTracingMiddleware.__init__ only sets self._tracer, which we don't
want to do.
'''
@property
def _tracer(self):
'''Get a DjangoTracer, or None if opentracing not initialized.
Once opentracing is initialized (indicated by the presence of
`opentracing.tracer`), will return the same instance of DjangoTracer in
perpetuity.
'''
try:
return self._django_tracer
except AttributeError:
pass
try:
tracer = opentracing.tracer
except AttributeError:
return None
self._django_tracer = DjangoTracer(tracer)
return self._tracer
def process_view(self, *a, **kw):
if self._tracer is None:
return
OpenTracingMiddleware.process_view(self, *a, **kw)
def process_response(self, request, response):
if self._tracer is None:
return response
return OpenTracingMiddleware.process_response(self, request, response) And then I deferred the initialization of the tracer to gunicorn's post_fork hook: def post_fork(server, worker):
config = jaeger_client.Config(
config={
'sampler': {'type': 'const', 'param': 1},
'logging': True,
},
service_name='postal-main')
config.initialize_tracer() # sets opentracing.tracer global Pretty hairy, but gets the job done. |
@bitglue Awesome, thanks for that! |
@bitglue what is your suggestion now to make it easier for the users? Should this lazy class be made a part of |
If I could get a tracer object but not start the event loop, that would solve it elegantly. Then the event loop can be controlled with For backwards compatibility perhaps I'm not a fan of my |
that's what we ended up doing internally as well. Has anyone taken a look at lightstep python client? why didn't it have a problem with django? I assume it also uses a bg thread for reporting. |
On a very cursory read, it doesn't appear to wait for the thread to become ready after starting it. That would explain why it does not deadlock. Not sure about the consequences of forking after starting the thread. It's possible if it's using blocking IO instead of epoll(), and some care is taken with the file descriptors to not have two processes on the same TCP connection (or there are no TCP connections?), it works out fine. |
👍 I'm bumping into a similar deadlock working with flask, and looking into post-forking options. Thanks for writing this issue, it's been a big help. |
#31 should address the deadlocking issue. Meanwhile I'll update the README |
I'm running the latest version of the client. When
initialize_tracer()
is called, it gets stuck / hangs atInitializing Jaeger Tracer with UDP reporter
.I'm running the agent in another Docker container that is bound to
localhost
ports, so it should be able to connect (I even changed the reporting host to my internal IP, rather thanlocalhost
).EDIT: So, this is for my web app, it works when running in
gunicorn
.The text was updated successfully, but these errors were encountered: