lsst · andy-slac · Jul 12, 2021 · Jul 6, 2021 · Jul 6, 2021 · Jul 6, 2021
diff --git a/doc/mainpage.dox b/doc/mainpage.dox
@@ -359,7 +359,7 @@ or alternatively using `%%X` format code without additional key to display full
 In C++, the following MDC macros are available:
 - `LOG_MDC(key, value)` Map the value '''`value`''' to the global key '''`key`''' such that it may be included in subsequent log messages by including the directive `%%X{`'''`key`'''`}` in the formatting string of an associated appender.
 - `LOG_MDC_REMOVE(key)` Delete the existing value from the global map that is associated with '''`key`'''.
-- `LOG_MDC_SCOPE(key, value)` Adds key/value to MDC and restores previous value when execution leaves the scope. Typically used at the beggining of the function if one needs to define MDC key/value for the whole duration of the function.
+- `LOG_MDC_SCOPE(key, value)` Adds key/value to MDC and restores previous value when execution leaves the scope. Typically used at the beginning of the function if one needs to define MDC key/value for the whole duration of the function.
 
 In Python, the `lsst.log` module provides the following MDC functions:
 - `MDC(key, value)` Map the value '''`value`''' to the global key '''`key`''' such that it may be included in subsequent log messages by including the directive `%%X{`'''`key`'''`}` in the formatting string of an associated appender. Note that `value` is converted to a string by Python before it is stored in the MDC.
@@ -393,7 +393,7 @@ This has to be done after fork() and corresponding formatting code needs to be a
 
 \subsection TID Display thread ID
 
-The conversion pattern in log4cxx has special conversion code (`%%t`) to render "thread ID", the rendered value is a pointer to `thread_t` object (on POSIX systems) and is rendered as a long string of hexadecimal digits which makes it difficult to read for non-hardcore programmers. `lsst.log` provides more human-friendly way to print thread-identifying information. Special function `lsst::log::lwpID()` returns light-weight process ID (LWP) on the plaftorms that support this feature and small incremental integer number on other platforms. Using this function one can add LWP to MDC and configure output format to include this special MDC key:
+The conversion pattern in log4cxx has special conversion code (`%%t`) to render "thread ID", the rendered value is a pointer to `thread_t` object (on POSIX systems) and is rendered as a long string of hexadecimal digits which makes it difficult to read for non-hardcore programmers. `lsst.log` provides more human-friendly way to print thread-identifying information. Special function `lsst::log::lwpID()` returns light-weight process ID (LWP) on the platftorms that support this feature and small incremental integer number on other platforms. Using this function one can add LWP to MDC and configure output format to include this special MDC key:
 
 In C++:
 
@@ -412,9 +412,9 @@ In Python:
 
 \subsection MDCthreads MDC and multi-threading
 
-MDC is a great tool to associate some context information with the messages but its use in multi-threaded applications can be complicated. MDC is per-thread instance and it is initialized as empty in every new thread. If application adds somethig to MDC in main thread this information will not exist in other threads unless it is explicitly set in those other threads. This may be an issue for applications which do not have complete control over thread lifetime.
+MDC is a great tool to associate some context information with the messages but its use in multi-threaded applications can be complicated. MDC is per-thread instance and it is initialized as empty in every new thread. If application adds something to MDC in main thread this information will not exist in other threads unless it is explicitly set in those other threads. This may be an issue for applications which do not have complete control over thread lifetime.
 
-This package provides a mechanism to initialize MDC in all threads started by a process. Special macro `LOG_MDC_INIT(function)` can be used to register a function to be called in in every new thread (and in current thread) before any logging message is sent to a logging system. This function can do anything but its main pupose is to initialize MDC with useful values.
+This package provides a mechanism to initialize MDC in all threads started by a process. Special macro `LOG_MDC_INIT(function)` can be used to register a function to be called in in every new thread (and in current thread) before any logging message is sent to a logging system. This function can do anything but its main purpose is to initialize MDC with useful values.
 
 Here is an example of this:
 
@@ -451,6 +451,90 @@ or in Python:
 One should note that this initialization function will only be called in the current thread from which `LOG_MDC_INIT()` was called and all new threads, if there are some other threads running at a time of this call the function will not be called for them. Normally the function registered with `LOG_MDC_INIT()` will be called only once per thread but under some circumstances it may be called more than once. Registering the same function multiple times will result is multiple calls to the same function.
 
 
+\section logToPython Redirecting to Python logging
+
+For many Python applications it is preferable to have everything forwarded to the Python `logging` system.
+This package implements a special `log4cxx` appender class that supports this functionality.
+To simply forward all log4cxx events to Python one needs to configure `log4cxx` using this special appender (and which is naturally done from Python):
+
+    import lsst.log
+
+    PROP = """
+    log4j.rootLogger = INFO, PyLog
+    log4j.appender.PyLog = org.apache.log4j.PyLogAppender
+    """
+    lsst.log.configure_prop(PROP)
+
+Logging level configuration needs some special care in this case, e.g. if DEBUG level output is needed then DEBUG level needs to be enabled in both `log4cxx` and `logging`.
+Alternative is to always configure `log4cxx` with TRACE level and adjust `logging` configuration to a selected level.
+Latter option has performance implications as all logging messages will be generated and formatted at C++ level which can slow down things significantly.
+
+One complication with this scheme is support for MDC.
+`PyLogAppender` converts MDC to a Python dictionary-like object and adds it as an `MDC` attribute to a `LogRecord` instance to make MDC accessible on Python side.
+If `MDC` attribute already exists in `LogRecord` (e.g. it is added by record factory) `PyLogAppender` assumes that it behaves like a dictionary and updates it with MDC contents.
+Unfortunately accessing MDC attribute in a portable way is not very easy in Python.
+One can render MDC using a regular `"%(MDC)s"` expressions in a `logging` format string, but that requires `MDC` attribute to be present in every instance of `LogRecord` which may not be true for records originated from other sources.
+There are two possible workarounds to handle this complication described below.
+
+\subsection logToPythonMDC1 Rendering MDC in `log4cxx`
+
+One option is to render MDC into the log message before it is passed to Python.
+This can be done by specifying `MessagePattern` configuration option for the appender, e.g.:
+
+    PROP = """
+    log4j.rootLogger = INFO, PyLog
+    log4j.appender.PyLog = org.apache.log4j.PyLogAppender
+    log4j.appender.PyLog.MessagePattern = %m (%X{LABEL})
+    """
+    lsst.log.configure_prop(PROP)
+
+`MessagePattern` value has the same format as `ConversionPattern` option of `PatternLayout` and can potentially include other items.
+If `MessagePattern` is missing it has the same effect as specifying `%m`.
+In this example if MDC contains "LABEL" key it will be added to a message enclosed in parentheses.
+Note that message formatting still happens in Python, above configuration only changes the `message` part of the output and not the whole record format.
+
+\subsection logToPythonMDC2 Rendering MDC in `logging`
+
+It is also possible to specify MDC formatting in a `logging` format string but one has to make sure that corresponding attribute exists in every generated record.
+The easiest way to guarantee that is to implement a `logging` record factory method following this pattern:
+
+    import lsst.log
+    import lsst.log.utils
+    import logging
+
+    old_factory = logging.getLogRecordFactory()
+
+    def record_factory(*args, **kwargs):
+        record = old_factory(*args, **kwargs)
+        record.MDC = lsst.log.utils._MDC()
+        return record
+
+    logging.setLogRecordFactory(record_factory)
+
+    PROP = """
+    log4j.rootLogger = DEBUG, PyLog
+    log4j.appender.PyLog = org.apache.log4j.PyLogAppender
+    """
+
+    lsst.log.configure_prop(PROP)
+    logging.basicConfig(
+        level=logging.INFO,
+        format="%(levelname)s %(asctime)s %(name)s (%(MDC)s) %(filename)s:%(lineno)s - %(message)s")
+
+The record factory adds an `MDC` attribute to every record using special `_MDC` class instance, same type that is used by PyLogAppender when `MDC` attribute does not exist in `LogRecord`.
+`_MDC` class is a dictionary with a modified formatting which is more suitable for logging output. It also act similarly to `defaultdict` returning empty string for any key that does not exist in a dictionary.
+
+It is possible to render individual MDC items in formatted message using format-style option for format string which supports indexing access in format expressions:
+
+    logging.basicConfig(
+        level=logging.INFO,
+        style="{",
+        format="{levelname} {asctime} {name} ({MDC[LABEL]}) {filename}:{lineno} - {message}")
+
+If "LABEL" key is present in MDC then it will appear in the output, if "LABEL" is missing then empty string is rendered thanks to defaultdict-like behavior of `_MDC` objects.
+As before this depends on `MDC` present in every LogRecord so it has to be added by a record factory.
+
+
 \section benchmarks Benchmarks
 
 Measuring the performance of lsst.log when actually writing log messages to output targets such as a file or socket provides little to no information due to buffering and the fact that in the absence of buffering these operations are I/O limited. Conversely, timing calls to log functions when the level threshold is not met is quite valuable since an ideal logging system would add no appreciable overhead when deactivated. Basic measurements of the performance of Log have been made with the level threshold such that logging messages are not written. These measurements are made within a single-node instance of Qserv running on lsst-dev03 without significant competition from other system activity. The average time required to submit the following suppressed log message is 26 nanoseconds:

diff --git a/lib/SConscript b/lib/SConscript
@@ -1,3 +1,3 @@
 # -*- python -*-
 from lsst.sconsUtils import scripts
-scripts.BasicSConscript.lib()
+scripts.BasicSConscript.lib(libs="self python")
diff --git a/python/lsst/log/utils.py b/python/lsst/log/utils.py
@@ -150,3 +150,33 @@ def disable_notebook_logging():
     if _redirect is not None:
         _redirect.finish()
         _redirect = None
+
+
+class _MDC(dict):
+    """Dictionary for MDC data.
+
+    This is internal class used for better formatting of MDC in Python logging
+    output. It behaves like `defaultdict(str)` but overrides ``__str__`` and
+    ``__repr__`` method to produce output better suited for logging records.
+    It also provides namespace-like access to
+    """
+    def __getitem__(self, name: str):
+        """Returns value for a given key or empty string for missing key.
+        """
+        return self.get(name, "")
+
+    def __getattr__(self, name: str):
+        """Return value as object attribute.
+        """
+        return self.get(name, "")
+
+    def __str__(self):
+        """Return string representation, strings are interpolated without
+        quotes.
+        """
+        keys = sorted(self)
+        items = ("{}={!s}".format(k, self[k]) for k in keys)
+        return "{" + ", ".join(items) + "}"
+
+    def __repr__(self):
+        return str(self)