Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to do threadlocal context for only certain binds? #222

Closed
radix opened this issue Sep 3, 2019 · 12 comments · Fixed by #225

Comments

@radix
Copy link
Contributor

@radix radix commented Sep 3, 2019

I looked at the threadlocal module to solve a problem I have, but from what I can tell, it has undesirable behavior in that it changes all my existing logging code. From what I can tell, all binds become thread-global. But what I want is: "bind this request user and request ID to all messages during this request", and have all the rest of my existing logging code work as it would otherwise.

@hynek

This comment has been minimized.

Copy link
Owner

@hynek hynek commented Sep 4, 2019

You could use an explicit wrap_logger when creating loggers for your requests?

@radix

This comment has been minimized.

Copy link
Contributor Author

@radix radix commented Sep 4, 2019

@hynek can you elaborate a little more?

Just in case it helps, here's what I want:

  1. in my request middleware, I want to use some explicit API to set a thread-local context with some specific binds (request ID, user making the request, whatever)
  2. application logic and third-party library code should act as it always has, by using its own structlog.get_logger() calls and doing .bind() which act like normal, local binds.
  3. any calls to logging methods use a combination of the explicitly thread-local binds + the local binds.

AFAICT this should be possible with a custom context class that is more complex than just being a singleton thread-local dict, so I'm going to try that out.

@radix

This comment has been minimized.

Copy link
Contributor Author

@radix radix commented Sep 4, 2019

Here's what I threw together. This is intended to be used with structlog.configure(context_class=ThreadLocalContext) (no wrap_dict call). I'm sure it's still buggy / super inefficient.

edit: to explain, the way this is used is by explicitly modifying THREADLOCAL_CONTEXT.context when you want to bind thread-global values; otherwise regular logging.bind() calls are strictly local, just like they are in regular structlog use.

edit 2 This is horrible, please see my comment below for a much simpler solution

import threading


THREADLOCAL_CONTEXT = threading.local()


def _ensure_threadlocal():
    if not hasattr(THREADLOCAL_CONTEXT, 'context'):
        THREADLOCAL_CONTEXT.context = {}


class _marker(object):
    pass


class ThreadLocalContext(object):
    """
    A dict-like object used as a structlog context which allows SOME
    thread-local keys, but still allows normal keys.
    """

    def __init__(self, *args, **kw):
        self._dict = {}
        if args and isinstance(args[0], type(self)):
            # our state is global, no need to look at args[0] if it's of our
            # class
            self._dict.update(**kw)
        else:
            self._dict.update(*args, **kw)

    def __repr__(self):
        return "<{0}({1!r})>".format(type(self).__name__, self._dict)

    def __eq__(self, other):
        return type(self) == type(other) and self._dict == other._dict

    def __ne__(self, other):
        return not self.__eq__(other)

    # Proxy methods necessary for structlog.
    # Dunder methods don't trigger __getattr__ so we need to proxy by hand.
    def __iter__(self):
        return iter(self.keys())

    def __setitem__(self, key, value):
        self._dict[key] = value

    def __getitem__(self, key):
        if key in self._dict:
            return self._dict[key]
        _ensure_threadlocal()
        return THREADLOCAL_CONTEXT.context[key]

    def __delitem__(self, key):
        self._dict.__delitem__(key)

    def __len__(self):
        _ensure_threadlocal()
        return len(THREADLOCAL_CONTEXT.context) + len(self._dict)

    def __getattr__(self, name):
        method = getattr(self._dict, name)
        return method

    def copy(self):
        return ThreadLocalContext(self._dict)

    def keys(self):
        _ensure_threadlocal()
        return set(self._dict.keys()).union(THREADLOCAL_CONTEXT.context.keys())

    def pop(self, k, d=_marker):
        if k in self._dict:
            return self._dict[k]
        _ensure_threadlocal()
        if d is _marker:
            return THREADLOCAL_CONTEXT.context[k]
        else:
            return THREADLOCAL_CONTEXT.context.get(k, d)
@hongymagic

This comment has been minimized.

Copy link

@hongymagic hongymagic commented Sep 9, 2019

This is exactly what I am trying to achieve using fastapi/starlette at the moment, where CorrelationIdMiddleware creates and "binds" a correlation ID per request. And have all downstreams calls resulting from the request (i.e., route handler, third party etc) add initial correlation id everytime it logs.

Being new to python, fastapi, starlette and structlog - am definately having hard time getting it all correct. Having issues where either:

  1. Initial value (correlation_id) generated via middleware is being lost; or
  2. Initial value carries across to another request (i.e., 2 parallel requests - the last one takes over, and all logs contain the correlation_id generated last)

So far, only way I could get it working is the ugly way of attaching bound logger to request.state and using it.

Edit 1: looks like threadlocal and asyncio doesn't work well
Edit 2: Found this on aiotask context

@radix

This comment has been minimized.

Copy link
Contributor Author

@radix radix commented Sep 9, 2019

Ok, I came up with a much easier solution that doesn't involve a dict-like class, or a custom context class at all:

import threading

THREADLOCAL = threading.local()

def merge_in_threadlocal(logger, method_name, event_dict):
    """A structlog processor that merges in a thread-local context"""
    _ensure_threadlocal()
    context = THREADLOCAL.context.copy()
    context.update(event_dict)
    return context

def clear_threadlocal():
    """Clear the thread-local context."""
    THREADLOCAL.context = {}

def bind_threadlocal(**kwargs):
    """Put keys and values into the thread-local context."""
    _ensure_threadlocal()
    THREADLOCAL.context.update(kwargs)

def _ensure_threadlocal():
    if not hasattr(THREADLOCAL, 'context'):
        THREADLOCAL.context = {}

Configuration:

structlog.configure(processors=[merge_in_threadlocal])

Then, in order to bind thread-local variables, you use clear_threadlocal() at the start of your request, and bind_threadlocal(client_ip=foo, user=bar) to bind them.

@hongymagic

This comment has been minimized.

Copy link

@hongymagic hongymagic commented Sep 10, 2019

We've created a context_class based on aiocontext and is working great so far! You can even use standard python logger with filters to mimic similar event_dict behaviour to structlog. (We are still using processors and formatters from structlog).

@radix

This comment has been minimized.

Copy link
Contributor Author

@radix radix commented Sep 10, 2019

@hongymagic I recommend trying an approach similar to my previous comment; you should be able to avoid using a custom context class entirely, and just merge in your async context as the first step in your processor chain. This will probably be more performant, not to mention simpler.

@radix

This comment has been minimized.

Copy link
Contributor Author

@radix radix commented Sep 10, 2019

@hynek I'm curious if you think this approach is a good idea for inclusion in structlog; to me it's hugely preferred over the "make all binds global" approach to the current structlog.threadlocal. Would you accept (for review) a PR that adds those functions to structlog.threadlocal?

@hynek

This comment has been minimized.

Copy link
Owner

@hynek hynek commented Sep 21, 2019

Looks good in general, since it doesn't seem to need any changes to structlog's core operation whatsoever?

What implementation timeframe do you have in mind? Should I hold 19.2 for it?

@radix

This comment has been minimized.

Copy link
Contributor Author

@radix radix commented Sep 21, 2019

@hynek I'll try to get a PR up today or tomorrow. In general I would not suggest waiting for me for anything :)

@hynek

This comment has been minimized.

Copy link
Owner

@hynek hynek commented Sep 22, 2019

Nah I love it, I want it in. :) Maybe I'll even start using threadlocals myself now. :)

@hynek hynek closed this in #225 Sep 23, 2019
@cetanu

This comment has been minimized.

Copy link

@cetanu cetanu commented Oct 16, 2019

I revisited the referenced fastapi issue today looking to allow a view function to bind more fields to the current logger, reached boiling point while trying to get contextvars to work and then used the snippet above. Works perfectly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.