Skip to content

Commit

Permalink
Merge pull request #55 from lsst/tickets/DM-30996
Browse files Browse the repository at this point in the history
DM-30996: Implement log4cxx appender which logs to Python logging
  • Loading branch information
andy-slac committed Jul 12, 2021
2 parents cc89fd6 + 51d3620 commit d2ef7b4
Show file tree
Hide file tree
Showing 9 changed files with 760 additions and 19 deletions.
15 changes: 7 additions & 8 deletions doc/conf.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
"""Sphinx configuration file for an LSST stack package.
This configuration only affects single-package Sphinx documenation builds.
This configuration only affects single-package Sphinx documentation builds.
"""

from documenteer.sphinxconfig.stackconf import build_package_configs
import lsst.log
from documenteer.conf.pipelinespkg import * # noqa: F403, import *


_g = globals()
_g.update(build_package_configs(
project_name='log',
version=lsst.log.version.__version__))
project = "log"
html_theme_options["logotext"] = project # noqa: F405, unknown name
html_title = project
html_short_title = project
doxylink = {}
102 changes: 98 additions & 4 deletions doc/mainpage.dox
Original file line number Diff line number Diff line change
Expand Up @@ -359,7 +359,7 @@ or alternatively using `%%X` format code without additional key to display full
In C++, the following MDC macros are available:
- `LOG_MDC(key, value)` Map the value '''`value`''' to the global key '''`key`''' such that it may be included in subsequent log messages by including the directive `%%X{`'''`key`'''`}` in the formatting string of an associated appender.
- `LOG_MDC_REMOVE(key)` Delete the existing value from the global map that is associated with '''`key`'''.
- `LOG_MDC_SCOPE(key, value)` Adds key/value to MDC and restores previous value when execution leaves the scope. Typically used at the beggining of the function if one needs to define MDC key/value for the whole duration of the function.
- `LOG_MDC_SCOPE(key, value)` Adds key/value to MDC and restores previous value when execution leaves the scope. Typically used at the beginning of the function if one needs to define MDC key/value for the whole duration of the function.

In Python, the `lsst.log` module provides the following MDC functions:
- `MDC(key, value)` Map the value '''`value`''' to the global key '''`key`''' such that it may be included in subsequent log messages by including the directive `%%X{`'''`key`'''`}` in the formatting string of an associated appender. Note that `value` is converted to a string by Python before it is stored in the MDC.
Expand Down Expand Up @@ -393,7 +393,7 @@ This has to be done after fork() and corresponding formatting code needs to be a

\subsection TID Display thread ID

The conversion pattern in log4cxx has special conversion code (`%%t`) to render "thread ID", the rendered value is a pointer to `thread_t` object (on POSIX systems) and is rendered as a long string of hexadecimal digits which makes it difficult to read for non-hardcore programmers. `lsst.log` provides more human-friendly way to print thread-identifying information. Special function `lsst::log::lwpID()` returns light-weight process ID (LWP) on the plaftorms that support this feature and small incremental integer number on other platforms. Using this function one can add LWP to MDC and configure output format to include this special MDC key:
The conversion pattern in log4cxx has special conversion code (`%%t`) to render "thread ID", the rendered value is a pointer to `thread_t` object (on POSIX systems) and is rendered as a long string of hexadecimal digits which makes it difficult to read for non-hardcore programmers. `lsst.log` provides more human-friendly way to print thread-identifying information. Special function `lsst::log::lwpID()` returns light-weight process ID (LWP) on the platforms that support this feature and small incremental integer number on other platforms. Using this function one can add LWP to MDC and configure output format to include this special MDC key:

In C++:

Expand All @@ -412,9 +412,9 @@ In Python:

\subsection MDCthreads MDC and multi-threading

MDC is a great tool to associate some context information with the messages but its use in multi-threaded applications can be complicated. MDC is per-thread instance and it is initialized as empty in every new thread. If application adds somethig to MDC in main thread this information will not exist in other threads unless it is explicitly set in those other threads. This may be an issue for applications which do not have complete control over thread lifetime.
MDC is a great tool to associate some context information with the messages but its use in multi-threaded applications can be complicated. MDC is per-thread instance and it is initialized as empty in every new thread. If application adds something to MDC in main thread this information will not exist in other threads unless it is explicitly set in those other threads. This may be an issue for applications which do not have complete control over thread lifetime.

This package provides a mechanism to initialize MDC in all threads started by a process. Special macro `LOG_MDC_INIT(function)` can be used to register a function to be called in in every new thread (and in current thread) before any logging message is sent to a logging system. This function can do anything but its main pupose is to initialize MDC with useful values.
This package provides a mechanism to initialize MDC in all threads started by a process. Special macro `LOG_MDC_INIT(function)` can be used to register a function to be called in in every new thread (and in current thread) before any logging message is sent to a logging system. This function can do anything but its main purpose is to initialize MDC with useful values.

Here is an example of this:

Expand Down Expand Up @@ -451,6 +451,100 @@ or in Python:
One should note that this initialization function will only be called in the current thread from which `LOG_MDC_INIT()` was called and all new threads, if there are some other threads running at a time of this call the function will not be called for them. Normally the function registered with `LOG_MDC_INIT()` will be called only once per thread but under some circumstances it may be called more than once. Registering the same function multiple times will result is multiple calls to the same function.


\section logToPython Redirecting to Python logging

For many Python applications it is preferable to have everything forwarded to the Python `logging` system.
This package implements a special `log4cxx` appender class that supports this functionality.
To simply forward all log4cxx events to Python one needs to configure `log4cxx` using this special appender (and which is naturally done from Python):

import lsst.log

PROP = """
log4j.rootLogger = INFO, PyLog
log4j.appender.PyLog = PyLogAppender
"""
lsst.log.configure_prop(PROP)

Logging level configuration needs some special care in this case, e.g. if DEBUG level output is needed then DEBUG level needs to be enabled in both `log4cxx` and `logging`.
Alternative is to always configure `log4cxx` with TRACE level and adjust `logging` configuration to a selected level.
Latter option has performance implications as all logging messages will be generated and formatted at C++ level which can slow down things significantly.

One complication with this scheme is support for MDC.
`PyLogAppender` converts MDC to a Python dictionary-like object and adds it as an `MDC` attribute to a `LogRecord` instance to make MDC accessible on Python side.
If `MDC` attribute already exists in `LogRecord` (e.g. it is added by record factory) `PyLogAppender` assumes that it behaves like a dictionary and updates it with MDC contents.
Unfortunately accessing MDC attribute in a portable way is not very easy in Python.
One can render MDC using a regular `"%(MDC)s"` expressions in a `logging` format string, but that requires `MDC` attribute to be present in every instance of `LogRecord` which may not be true for records originated from other sources.
There are two possible workarounds to handle this complication described below.

\subsection logToPythonMDC1 Rendering MDC in log4cxx

One option is to render MDC into the log message before it is passed to Python.
This can be done by specifying `MessagePattern` configuration option for the appender, e.g.:

PROP = """
log4j.rootLogger = INFO, PyLog
log4j.appender.PyLog = PyLogAppender
log4j.appender.PyLog.MessagePattern = %m (%X{LABEL})
"""
lsst.log.configure_prop(PROP)

`MessagePattern` value has the same format as `ConversionPattern` option of `PatternLayout` and can potentially include other items.
If `MessagePattern` is missing it has the same effect as specifying `%m`.
In this example if MDC contains "LABEL" key it will be added to a message enclosed in parentheses.
Note that final message formatting still happens in Python, above configuration only changes the `message` part of the output and not the whole record format.

\subsection logToPythonMDC2 Rendering MDC in logging

It is also possible to specify MDC formatting in a `logging` format string but one has to make sure that corresponding attribute exists in every generated record.
The easiest way to guarantee that is to implement a `logging` record factory method following this pattern:

import lsst.log
import logging

old_factory = logging.getLogRecordFactory()

def record_factory(*args, **kwargs):
record = old_factory(*args, **kwargs)
record.MDC = lsst.log.MDCDict()
return record

logging.setLogRecordFactory(record_factory)

PROP = """
log4j.rootLogger = INFO, PyLog
log4j.appender.PyLog = PyLogAppender
"""

lsst.log.configure_prop(PROP)
logging.basicConfig(
level=logging.INFO,
format="%(levelname)s %(asctime)s %(name)s (%(MDC)s) %(filename)s:%(lineno)s - %(message)s")

The record factory adds an `MDC` attribute to every record using special `MDCDict` class instance, same type that is used by PyLogAppender when `MDC` attribute does not exist in `LogRecord`.
Potentially any Python dictionary can be used in place of `MDCDict()` in the code above, `MDCDict` class is just a sub-class of a `dict` with a modified formatting which is more suitable for logging output.
It also acts similarly to `defaultdict` returning empty string for any key that does not exist in a dictionary, which is useful when format string specifies individual key access as in example below, but key does not exist in MDC.

Above detailed code can be replaces by a call to a convenience method:

import lsst.log
import logging

lsst.log.configure_pylog_MDC("INFO")
logging.basicConfig(
level=logging.INFO,
format="%(levelname)s %(asctime)s %(name)s (%(MDC)s) %(filename)s:%(lineno)s - %(message)s")

It is possible to render individual MDC items in formatted message using format-style option for format string which supports indexing access in format expressions:

logging.basicConfig(
level=logging.INFO,
style="{",
format="{levelname} {asctime} {name} ({MDC[LABEL]}) {filename}:{lineno} - {message}")

If "LABEL" key is present in MDC then it will appear in the output, if "LABEL" is missing then empty string is rendered thanks to defaultdict-like behavior of `MDCDict` objects.
As before this depends on `MDC` being present in every LogRecord so it has to be added by a record factory.


\section benchmarks Benchmarks

Measuring the performance of lsst.log when actually writing log messages to output targets such as a file or socket provides little to no information due to buffering and the fact that in the absence of buffering these operations are I/O limited. Conversely, timing calls to log functions when the level threshold is not met is quite valuable since an ideal logging system would add no appreciable overhead when deactivated. Basic measurements of the performance of Log have been made with the level threshold such that logging messages are not written. These measurements are made within a single-node instance of Qserv running on lsst-dev03 without significant competition from other system activity. The average time required to submit the following suppressed log message is 26 nanoseconds:
Expand Down
3 changes: 2 additions & 1 deletion lib/SConscript
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# -*- python -*-
from lsst.sconsUtils import scripts
scripts.BasicSConscript.lib()
#scripts.BasicSConscript.lib(libs="self python")
scripts.BasicSConscript.lib(libs="self")
3 changes: 2 additions & 1 deletion python/lsst/log/SConscript
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# -*- python -*-
from lsst.sconsUtils import scripts
scripts.BasicSConscript.pybind11(['log/log'], addUnderscore=False)
scripts.BasicSConscript.pybind11(
['log/log'], extraSrc={'log/log': ['log/PyLogAppender.cc']}, addUnderscore=False)

0 comments on commit d2ef7b4

Please sign in to comment.