Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-30996: Implement log4cxx appender which logs to Python logging #55

Merged
merged 13 commits into from
Jul 12, 2021
92 changes: 88 additions & 4 deletions doc/mainpage.dox
Original file line number Diff line number Diff line change
Expand Up @@ -359,7 +359,7 @@ or alternatively using `%%X` format code without additional key to display full
In C++, the following MDC macros are available:
- `LOG_MDC(key, value)` Map the value '''`value`''' to the global key '''`key`''' such that it may be included in subsequent log messages by including the directive `%%X{`'''`key`'''`}` in the formatting string of an associated appender.
- `LOG_MDC_REMOVE(key)` Delete the existing value from the global map that is associated with '''`key`'''.
- `LOG_MDC_SCOPE(key, value)` Adds key/value to MDC and restores previous value when execution leaves the scope. Typically used at the beggining of the function if one needs to define MDC key/value for the whole duration of the function.
- `LOG_MDC_SCOPE(key, value)` Adds key/value to MDC and restores previous value when execution leaves the scope. Typically used at the beginning of the function if one needs to define MDC key/value for the whole duration of the function.

In Python, the `lsst.log` module provides the following MDC functions:
- `MDC(key, value)` Map the value '''`value`''' to the global key '''`key`''' such that it may be included in subsequent log messages by including the directive `%%X{`'''`key`'''`}` in the formatting string of an associated appender. Note that `value` is converted to a string by Python before it is stored in the MDC.
Expand Down Expand Up @@ -393,7 +393,7 @@ This has to be done after fork() and corresponding formatting code needs to be a

\subsection TID Display thread ID

The conversion pattern in log4cxx has special conversion code (`%%t`) to render "thread ID", the rendered value is a pointer to `thread_t` object (on POSIX systems) and is rendered as a long string of hexadecimal digits which makes it difficult to read for non-hardcore programmers. `lsst.log` provides more human-friendly way to print thread-identifying information. Special function `lsst::log::lwpID()` returns light-weight process ID (LWP) on the plaftorms that support this feature and small incremental integer number on other platforms. Using this function one can add LWP to MDC and configure output format to include this special MDC key:
The conversion pattern in log4cxx has special conversion code (`%%t`) to render "thread ID", the rendered value is a pointer to `thread_t` object (on POSIX systems) and is rendered as a long string of hexadecimal digits which makes it difficult to read for non-hardcore programmers. `lsst.log` provides more human-friendly way to print thread-identifying information. Special function `lsst::log::lwpID()` returns light-weight process ID (LWP) on the platftorms that support this feature and small incremental integer number on other platforms. Using this function one can add LWP to MDC and configure output format to include this special MDC key:
andy-slac marked this conversation as resolved.
Show resolved Hide resolved

In C++:

Expand All @@ -412,9 +412,9 @@ In Python:

\subsection MDCthreads MDC and multi-threading

MDC is a great tool to associate some context information with the messages but its use in multi-threaded applications can be complicated. MDC is per-thread instance and it is initialized as empty in every new thread. If application adds somethig to MDC in main thread this information will not exist in other threads unless it is explicitly set in those other threads. This may be an issue for applications which do not have complete control over thread lifetime.
MDC is a great tool to associate some context information with the messages but its use in multi-threaded applications can be complicated. MDC is per-thread instance and it is initialized as empty in every new thread. If application adds something to MDC in main thread this information will not exist in other threads unless it is explicitly set in those other threads. This may be an issue for applications which do not have complete control over thread lifetime.

This package provides a mechanism to initialize MDC in all threads started by a process. Special macro `LOG_MDC_INIT(function)` can be used to register a function to be called in in every new thread (and in current thread) before any logging message is sent to a logging system. This function can do anything but its main pupose is to initialize MDC with useful values.
This package provides a mechanism to initialize MDC in all threads started by a process. Special macro `LOG_MDC_INIT(function)` can be used to register a function to be called in in every new thread (and in current thread) before any logging message is sent to a logging system. This function can do anything but its main purpose is to initialize MDC with useful values.

Here is an example of this:

Expand Down Expand Up @@ -451,6 +451,90 @@ or in Python:
One should note that this initialization function will only be called in the current thread from which `LOG_MDC_INIT()` was called and all new threads, if there are some other threads running at a time of this call the function will not be called for them. Normally the function registered with `LOG_MDC_INIT()` will be called only once per thread but under some circumstances it may be called more than once. Registering the same function multiple times will result is multiple calls to the same function.


\section logToPython Redirecting to Python logging

For many Python applications it is preferable to have everything forwarded to the Python `logging` system.
This package implements a special `log4cxx` appender class that supports this functionality.
To simply forward all log4cxx events to Python one needs to configure `log4cxx` using this special appender (and which is naturally done from Python):

import lsst.log

PROP = """
log4j.rootLogger = INFO, PyLog
log4j.appender.PyLog = org.apache.log4j.PyLogAppender
"""
lsst.log.configure_prop(PROP)

Logging level configuration needs some special care in this case, e.g. if DEBUG level output is needed then DEBUG level needs to be enabled in both `log4cxx` and `logging`.
Alternative is to always configure `log4cxx` with TRACE level and adjust `logging` configuration to a selected level.
Latter option has performance implications as all logging messages will be generated and formatted at C++ level which can slow down things significantly.

One complication with this scheme is support for MDC.
`PyLogAppender` converts MDC to a Python dictionary-like object and adds it as an `MDC` attribute to a `LogRecord` instance to make MDC accessible on Python side.
If `MDC` attribute already exists in `LogRecord` (e.g. it is added by record factory) `PyLogAppender` assumes that it behaves like a dictionary and updates it with MDC contents.
Unfortunately accessing MDC attribute in a portable way is not very easy in Python.
One can render MDC using a regular `"%(MDC)s"` expressions in a `logging` format string, but that requires `MDC` attribute to be present in every instance of `LogRecord` which may not be true for records originated from other sources.
There are two possible workarounds to handle this complication described below.

\subsection logToPythonMDC1 Rendering MDC in `log4cxx`

One option is to render MDC into the log message before it is passed to Python.
This can be done by specifying `MessagePattern` configuration option for the appender, e.g.:

PROP = """
log4j.rootLogger = INFO, PyLog
log4j.appender.PyLog = org.apache.log4j.PyLogAppender
log4j.appender.PyLog.MessagePattern = %m (%X{LABEL})
"""
lsst.log.configure_prop(PROP)

`MessagePattern` value has the same format as `ConversionPattern` option of `PatternLayout` and can potentially include other items.
If `MessagePattern` is missing it has the same effect as specifying `%m`.
In this example if MDC contains "LABEL" key it will be added to a message enclosed in parentheses.
Note that message formatting still happens in Python, above configuration only changes the `message` part of the output and not the whole record format.
andy-slac marked this conversation as resolved.
Show resolved Hide resolved

\subsection logToPythonMDC2 Rendering MDC in `logging`

It is also possible to specify MDC formatting in a `logging` format string but one has to make sure that corresponding attribute exists in every generated record.
The easiest way to guarantee that is to implement a `logging` record factory method following this pattern:

import lsst.log
import lsst.log.utils
import logging

old_factory = logging.getLogRecordFactory()

def record_factory(*args, **kwargs):
record = old_factory(*args, **kwargs)
record.MDC = lsst.log.utils._MDC()
return record

logging.setLogRecordFactory(record_factory)

PROP = """
log4j.rootLogger = DEBUG, PyLog
log4j.appender.PyLog = org.apache.log4j.PyLogAppender
"""

lsst.log.configure_prop(PROP)
andy-slac marked this conversation as resolved.
Show resolved Hide resolved
logging.basicConfig(
level=logging.INFO,
format="%(levelname)s %(asctime)s %(name)s (%(MDC)s) %(filename)s:%(lineno)s - %(message)s")

The record factory adds an `MDC` attribute to every record using special `_MDC` class instance, same type that is used by PyLogAppender when `MDC` attribute does not exist in `LogRecord`.
andy-slac marked this conversation as resolved.
Show resolved Hide resolved
`_MDC` class is a dictionary with a modified formatting which is more suitable for logging output. It also act similarly to `defaultdict` returning empty string for any key that does not exist in a dictionary.

It is possible to render individual MDC items in formatted message using format-style option for format string which supports indexing access in format expressions:

logging.basicConfig(
level=logging.INFO,
style="{",
format="{levelname} {asctime} {name} ({MDC[LABEL]}) {filename}:{lineno} - {message}")

If "LABEL" key is present in MDC then it will appear in the output, if "LABEL" is missing then empty string is rendered thanks to defaultdict-like behavior of `_MDC` objects.
As before this depends on `MDC` present in every LogRecord so it has to be added by a record factory.
andy-slac marked this conversation as resolved.
Show resolved Hide resolved


\section benchmarks Benchmarks

Measuring the performance of lsst.log when actually writing log messages to output targets such as a file or socket provides little to no information due to buffering and the fact that in the absence of buffering these operations are I/O limited. Conversely, timing calls to log functions when the level threshold is not met is quite valuable since an ideal logging system would add no appreciable overhead when deactivated. Basic measurements of the performance of Log have been made with the level threshold such that logging messages are not written. These measurements are made within a single-node instance of Qserv running on lsst-dev03 without significant competition from other system activity. The average time required to submit the following suppressed log message is 26 nanoseconds:
Expand Down
2 changes: 1 addition & 1 deletion lib/SConscript
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# -*- python -*-
from lsst.sconsUtils import scripts
scripts.BasicSConscript.lib()
scripts.BasicSConscript.lib(libs="self python")
30 changes: 30 additions & 0 deletions python/lsst/log/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,3 +150,33 @@ def disable_notebook_logging():
if _redirect is not None:
_redirect.finish()
_redirect = None


class _MDC(dict):
"""Dictionary for MDC data.

This is internal class used for better formatting of MDC in Python logging
output. It behaves like `defaultdict(str)` but overrides ``__str__`` and
``__repr__`` method to produce output better suited for logging records.
It also provides namespace-like access to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"too."? Or "to keys"? Or something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was obviously distracted by something important (squirrels or 🐱), will expand 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually inclined to drop attribute-like access and only keep indexing access. It may be convenient to say MDC.key instead of MDC[key] in format string but it does not work for keys that do not look like identifiers, and it sort of adds additional requirement on other implementations of MDC dictionary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with just doing **MDC and having the formatting find the right element. We can always add the .key later. In python logging I would probably add **MDC directly into the kwargs in the LoggingAdapter and pull them out of the LogRecord directly later on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that exposes you to potential attribute collision.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If someone sets MDC to lineno or something. People can indeed shoot themselves in the foot like that but why would they? I don't think python lets me use a format string of %(extra['LABEL'])s though so how would I ever include it in a logging format string?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use style='{'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Hmm. So we are okay with a LogFormatter using f-style but all the Loggers using %-style?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loggers don't have an option for style (would be nice to have but probably hard to implement). Formatter style is set in the same place as format string so we should be OK if we do it consistently.

"""
def __getitem__(self, name: str):
"""Returns value for a given key or empty string for missing key.
"""
return self.get(name, "")

def __getattr__(self, name: str):
"""Return value as object attribute.
"""
return self.get(name, "")

def __str__(self):
"""Return string representation, strings are interpolated without
quotes.
"""
keys = sorted(self)
items = ("{}={!s}".format(k, self[k]) for k in keys)
andy-slac marked this conversation as resolved.
Show resolved Hide resolved
return "{" + ", ".join(items) + "}"

def __repr__(self):
return str(self)