extra fields #47

dkavraal · 2015-11-01T14:26:56Z

I wasn't able to pass extra fields. Have you tested such:

from fluent import handler
custom_format = {
  'host': '%(hostname)s',
  'where': '%(module)s.%(funcName)s',
  'type': '%(levelname)s',
  'stack_trace': '%(exc_text)s'
}
log = logging.getLogger('fluent.test')
log.setLevel(logging.DEBUG)
h = handler.FluentHandler('app.python', host='127.0.0.1', port=24224)
formatter = handler.FluentRecordFormatter(custom_format)
h.setFormatter(formatter)
log.addHandler(h)

log.debug('python fluent log test', extra={'extraField': 'test'} )

on the last line I am expecting fluentd(td-agent) to pass the extraField into the collector.

Is there sth I'm missing?

repeatedly · 2016-04-14T23:05:56Z

extra option is used in other handlers or FluentHandler should care extra individually?

dkavraal · 2016-04-27T13:11:43Z

I have (had) only Fluent Handler and would wanted to see the extra fields on Fluent server. I guess this is a simple thought and wish.
I went with workaround as:

    #log.debug('python fluent log test', extra={'extraField': 'test'} )
    log.debug('python fluent log workaround. extraField: %s', 'test')

despite it did the job, it is moving out of convention I guess. I had to convert all logging lines in the project to cover such.

EvaSDK · 2016-07-26T09:34:46Z

If you don't put the data in the format, the handler cannot know that you want it to output this to fluent.

yoichi · 2017-04-24T16:43:04Z

dkavraal , you can refer keys in extra in custom_format. For example,

custom_format = {
  'extra_field': '%(extraField)s',
  'host': '%(hostname)s',
  'where': '%(module)s.%(funcName)s',
  'type': '%(levelname)s',
  'stack_trace': '%(exc_text)s'
}

siretart · 2017-08-02T00:20:46Z

@yoichi The issue with the custom_format approach is that it converts all values to string. Consider the case where you'd want to add a numerical identifier:

mylogger.exception("bad` error condition while processing job %d", job_id,
                   extra={'job_id': job_id})

This log record would embed the job_id into the "message" field of the record (which necessarily is a string), but the job_id field of the record would also be a string, and not an integer. This can be very annoying when you are trying to insert structured data into for instance elasticsearch, and end up fighting with type errors.

Maybe there should be an option to copy all excess "extra" fields into the resulting dictionary that then gets serialized with msgpack and sent out to fluentd?

yoichi · 2017-08-02T13:07:00Z

@siretart why don't you pass structured data directly as msg (1st argument)?

rtartler-desres · 2017-08-02T14:05:59Z

@yoichi because the logger might have other handlers attached, such as syslog or file hander, for which a dictionary would mess up existing logs.

yoichi · 2017-08-02T15:20:53Z

@siretart extra is merged into record.__dict__ in Logger.makeRecord. How about introducing a new style as #91 ?

rtartler-desres · 2017-08-02T15:51:37Z

@yoichi I guess that could work.

In the mean time, I came across https://github.com/vklochan/python-logstash/blob/master/logstash/formatter.py - which seems significantly simpler on the first look. I'll experiment with that a bit.

adeadman · 2018-05-08T15:08:55Z

@repeatedly in answer to your question from 2016, yes, other log handlers will accept the extra parameter. We use it internally to include extra tags when logging to something like Sentry, for example.

@yoichi's suggestion of logging a dict as msg works fine for Fluent Formatter, but it does not work with other log handlers that do not expect a structured message, and expect string message + extra dict.

It would be nice to use a single method of logging in the application that can be handled the same way by all handlers on the logger, including the standard Python ones.

arcivanov · 2018-05-08T21:34:15Z

@adeadman I'm confused. You want to use FluentFormatter without FluentHandler or FluentHandler without FluentFormatter?

I think that FluentHandler can handle other formatters. Setting up alternative logging schematics is not that difficult. This works in production perfectly:

        if self.fluent_url:
            h = self.get_fluent_handler()
            filter = self.get_fluent_overwriting_filter()
            h.addFilter(filter)
            f = self.get_fluent_formatter()
        else:
            h = StreamHandler()
            f = Formatter("%(asctime)s %(levelname)s "
                          "%(name)s: %(message)s")

        f.converter = time.gmtime
        h.setFormatter(f)
        root_logger.addHandler(h)
        root_logger.setLevel(WARNING)
        self.logger.setLevel(self.loglevel)
        self.logger = LoggerAdapter(self.logger, key_map=DEFAULT_LOG_KEY_MAP)

adeadman · 2018-05-09T11:17:18Z

@arcivanov I am using FluentHandler and the FluentFormatter (although I subclass it to add an extra tracking ID field to the log).

However, the extra parameter is ignored, unless the FluentFormatter is instantiated with exclude_attrs=[], which causes it to use the _format_by_exclusion method instead of _format_by_dict and the extra fields are then added to the dict (along with all of the other default properties of a Python LogRecord, including ['args', 'created', 'exc_info', 'exc_text', 'filename', funcName', 'levelname', 'levelno', 'lineno', message', 'module', 'msecs', 'msg', 'name', 'pathname', 'process', 'processName', 'relativeCreated', 'stack_info'] etc.

If the _format_by_dict is used, only the default formatter keys are extracted from the logrecord, and then the message is added, but any additional flags are ignored.

At the moment I'm working on adding support for extracting the extra keys from the log record, but unfortunately the Python Logger implementation adds the keys to the LogRecord.__dict__ so the only way to know what extra keys were added is to remove the default keys from that set. I will make a PR but the correct way to solve this problem may in fact be to subclass the Logger implementation and extract the extra dict at that point.

brunsgaard · 2018-08-01T13:30:19Z

Sentry is having the same problem as described here.

A solution extracting the extra data can be found here
https://github.com/getsentry/raven-python/blob/master/raven/handlers/logging.py#L34-L57

arcivanov · 2019-01-20T04:15:15Z

Per Python documentation:

While this [extra behavior] might be annoying, this feature is intended for use in specialized circumstances, such as multi-threaded servers where the same code executes in many contexts, and interesting conditions which arise are dependent on this context (such as remote client IP address and authenticated user name, in the above example). In such circumstances, it is likely that specialized Formatters would be used with particular Handlers.

Inserting general-purpose logic into special circumstances feature is, generally, ill-advised.

I'll close this as undesirable.

jgoclawski · 2019-01-21T12:36:24Z

@arcivanov
I'll write in the issue instead of the PR, to keep things in one place.

First of all, thanks for a quick and constructive feedback. I'll take my right to disagree though and hope you'll reconsider your decision :)
We've already implemented the feature on application's side (I work together with @Behoston), but I strongly believe it's a valuable addition to the library.

My main point is that so far no one has presented a clear use case for such feature of this library. Allow me to do it.

Real life use-case

I want to have one way to log things from my application, independent of the handler used to ship that log. (obviously)
commonly used and broadly accepted convention is to use the "extra" keyword (Sentry, Datadog, Logstash, etc. support it). Thus I already have my logs emmited in a standard way:

logger.warning('Something bad happened', extra={'user_id': 123})

moreover, I want to keep native types, without casting to string. This is the primary reason to use the rich msgpack or json logging format in the first place! Otherwise I can simply output the logs to stdout or to the file, tail the logs, and collect if with a much simpler log handler. I want to be able to issue a log like this:

logger.warning('something bad happened', extra={
    'object_id': 777,
    'user_data': {
        'name': 'Fred', 
        'id': 123
    }
})

This works out of the box with other commonly used logging libraries (like Sentry).

I've built my application based on those assumptions, having Sentry and Logstash work with it out of the box. Now I want to replace Logstash with Fluentd and my application no longer sends the "extra" stuff.

Please consider that all major and popular Python logging libraries handle that use-case out of the box. Whereas fluent-logger-python takes by surprise by not supporting it and requiring the user to read through issues to find clues about how to make it work. It may seriously hinder the adoption.
Examples of existing support (proof of nonwritten convention :)):

Sentry: https://github.com/getsentry/raven-python/blob/master/raven/handlers/logging.py#L34-L57
python-logstash: https://github.com/vklochan/python-logstash/blob/master/logstash/formatter.py#L23
python-logstash-async: https://github.com/eht16/python-logstash-async/blob/master/logstash_async/formatter.py#L152
Datadog by json-log-formatter: https://docs.datadoghq.com/logs/log_collection/python/#log-into-a-file and https://github.com/marselester/json-log-formatter/blob/master/json_log_formatter/__init__.py#L81
graylog: https://github.com/severb/graypy/blob/master/graypy/handler.py#L279

Actually, I found it in every library I thought about.

Referring to you concerns:

it places the burden of keeping track of Python logging reserved attributes on the library, as those attributes may change in the future. While it's a valid point, the reality is that those attributes haven't changed for the past 5 years at least. You can see that for yourself in Sentry's source code. https://github.com/getsentry/raven-python/blame/master/raven/handlers/logging.py#L22-25
the library makes an opinion about where to put those "extra" attributes. Yes it does, but it also makes other assumptions. In my personal opinion convention over configuration is the good thing, especially for the end user. That way the library works out of the box, assuming sane defaults. We may of course add some way to configure this behavior if you think it's important.

Personally, I see three solutions:

Ignore the issue
discuss how we want to support the use-case out of the box based on the submitted PR
add a configuration example (no out of the box support, but at least an easy-to-find recipe)

arcivanov · 2019-01-23T05:08:35Z

Thanks for your feedback. I still disagree.

Let's consider two scenarios:

Where things are right now
Where you things want things to be.

Right now people who need extra field functionality, as it's designed to be used in Python i.e. to augment log entries with data that is mostly not directly specified during logging call, have their own current methodologies for dealing with the extra fields. Some may fold them directly into the record fields, some may want them in extra field (like you), some may want to use them, some may derive other values from them etc.

Let's look at the legislative intent, so to speak, of the Python logger in general and FluentD logger in particular.

In Python logger extra fields are flat, folded in as fields of the LogRecord. There is not a LogRecord.extra - they fall straight into LogRecord dict: https://github.com/python/cpython/blob/master/Lib/logging/__init__.py#L1553
They are then picked by the formatter/handler by default as necessary, depending on the formatter/handler.

In FluentD logger you can log a dictionary, message or JSON encoded in string, you can specify a dictionary of fields to log, you can switch to exclude_attrs mode to log all LogRecord attributes except the ones specified, and you can even fill the missing keys with None to prevent KeyError if you're actually using specialized message formats that refer to fields that may or may not be there!

In fact this sample code works perfectly fine logging extra fields happily, preserving both types and structure:

from logging import (root as root_logger,
                     Handler)
from pprint import pprint

from fluent.handler import FluentRecordFormatter


class TestHandler(Handler):
    def emit(self, record):
        pprint(self.format(record))


def main():
    f = FluentRecordFormatter(exclude_attrs=())
    h = TestHandler()
    h.setFormatter(f)
    root_logger.addHandler(h)

    EXTRA = {"x": 1, "y": 0.1, "z": [1, "array", "of", b"basic", "types"]}
    root_logger.error({"Hello": "Dict World with no extra :("})
    root_logger.error({"Hello": "Dict World with extra! :)"},
                      extra=EXTRA)
    root_logger.error("Hello World with no extra :(")
    root_logger.error("Hello World with extra! :)",
                      extra=EXTRA)

producing the following output (less the snippage):

{'Hello': 'Dict World with no extra :(',
 'args': (),
 'created': 1548218954.407479,
 'exc_info': None,
 'exc_text': None,
 'filename': 'fluent_extra.py',
 'funcName': 'main',
 'hostname': '<snip>',
 'levelname': 'ERROR',
 'levelno': 40,
 'lineno': 20,
 'message': "{'Hello': 'Dict World with no extra :('}",
 'module': 'fluent_extra',
 'msecs': 407.47904777526855,
 'msg': {'Hello': 'Dict World with no extra :('},
 'name': 'root',
 'pathname': '/<snip>/src/unittest/python/fluent_extra.py',
 'process': 24084,
 'processName': 'MainProcess',
 'relativeCreated': 11.337995529174805,
 'stack_info': None,
 'thread': 140421174366336,
 'threadName': 'MainThread'}
{'Hello': 'Dict World with extra! :)',
 'args': (),
 'created': 1548218954.4079602,
 'exc_info': None,
 'exc_text': None,
 'filename': 'fluent_extra.py',
 'funcName': 'main',
 'hostname': '<snip>',
 'levelname': 'ERROR',
 'levelno': 40,
 'lineno': 22,
 'message': "{'Hello': 'Dict World with extra! :)'}",
 'module': 'fluent_extra',
 'msecs': 407.9601764678955,
 'msg': {'Hello': 'Dict World with extra! :)'},
 'name': 'root',
 'pathname': '/<snip>/src/unittest/python/fluent_extra.py',
 'process': 24084,
 'processName': 'MainProcess',
 'relativeCreated': 11.819124221801758,
 'stack_info': None,
 'thread': 140421174366336,
 'threadName': 'MainThread',
 'x': 1,
 'y': 0.1,
 'z': [1, 'array', 'of', b'basic', 'types']}
{'args': (),
 'created': 1548218954.4083252,
 'exc_info': None,
 'exc_text': None,
 'filename': 'fluent_extra.py',
 'funcName': 'main',
 'hostname': '<snip>',
 'levelname': 'ERROR',
 'levelno': 40,
 'lineno': 23,
 'message': 'Hello World with no extra :(',
 'module': 'fluent_extra',
 'msecs': 408.3251953125,
 'msg': 'Hello World with no extra :(',
 'name': 'root',
 'pathname': '/<snip>/src/unittest/python/fluent_extra.py',
 'process': 24084,
 'processName': 'MainProcess',
 'relativeCreated': 12.18414306640625,
 'stack_info': None,
 'thread': 140421174366336,
 'threadName': 'MainThread'}
{'args': (),
 'created': 1548218954.4086447,
 'exc_info': None,
 'exc_text': None,
 'filename': 'fluent_extra.py',
 'funcName': 'main',
 'hostname': '<snip>',
 'levelname': 'ERROR',
 'levelno': 40,
 'lineno': 25,
 'message': 'Hello World with extra! :)',
 'module': 'fluent_extra',
 'msecs': 408.6446762084961,
 'msg': 'Hello World with extra! :)',
 'name': 'root',
 'pathname': '/<snip>/src/unittest/python/fluent_extra.py',
 'process': 24084,
 'processName': 'MainProcess',
 'relativeCreated': 12.503623962402344,
 'stack_info': None,
 'thread': 140421174366336,
 'threadName': 'MainThread',
 'x': 1,
 'y': 0.1,
 'z': [1, 'array', 'of', b'basic', 'types']}

Now, you propose to add tons of specialized handling into fluent-logger that:

Always logs extras if specified.
Always puts them into data["extra"], against the intent and expectation of LogRecord to overlay the extra fields onto the record.
Hardcodes behavior as to the types of extra values etc.

If we were to add this feature, every user that ever used extra before according to their needs would now find an extra dictionary entry on their records out of the blue. This violates POLA to begin with, but also potentially exposes sensitive data that might have been safe otherwise as it would be parsed/masked during processing in Handler/Formatter/LoggerAdapter, and increases data traffic and storage for every user that uses extra in the way Python actually designed the feature to be used predominantly - as a contextual augmentation of records where the logger is not aware of the context.

You have to configure a FluentFormatter with FluentHandler anyway, not matter what. Allowing current FluentFormatter to log arbitrary extra fields is trivial with existing code as demonstrated above. Thus modifying the default behavior with this task-specific hardcoded behavior seems arbitrary if not outright harmful to the existing users.

yoichi mentioned this issue Aug 2, 2017

add new style 'key' to use raw value of record.__dict__[key] #91

Closed

arcivanov mentioned this issue Dec 3, 2017

Formatter attribute exclusion set #100

Closed

Behoston mentioned this issue Jan 18, 2019

Add support for extra data. #146

Closed

arcivanov closed this as completed Jan 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extra fields #47

extra fields #47

dkavraal commented Nov 1, 2015

repeatedly commented Apr 14, 2016

dkavraal commented Apr 27, 2016

EvaSDK commented Jul 26, 2016

yoichi commented Apr 24, 2017

siretart commented Aug 2, 2017

yoichi commented Aug 2, 2017

rtartler-desres commented Aug 2, 2017

yoichi commented Aug 2, 2017 •

edited

Loading

rtartler-desres commented Aug 2, 2017

adeadman commented May 8, 2018

arcivanov commented May 8, 2018

adeadman commented May 9, 2018

brunsgaard commented Aug 1, 2018

arcivanov commented Jan 20, 2019

jgoclawski commented Jan 21, 2019

arcivanov commented Jan 23, 2019 •

edited

Loading

extra fields #47

extra fields #47

Comments

dkavraal commented Nov 1, 2015

repeatedly commented Apr 14, 2016

dkavraal commented Apr 27, 2016

EvaSDK commented Jul 26, 2016

yoichi commented Apr 24, 2017

siretart commented Aug 2, 2017

yoichi commented Aug 2, 2017

rtartler-desres commented Aug 2, 2017

yoichi commented Aug 2, 2017 • edited Loading

rtartler-desres commented Aug 2, 2017

adeadman commented May 8, 2018

arcivanov commented May 8, 2018

adeadman commented May 9, 2018

brunsgaard commented Aug 1, 2018

arcivanov commented Jan 20, 2019

jgoclawski commented Jan 21, 2019

arcivanov commented Jan 23, 2019 • edited Loading

yoichi commented Aug 2, 2017 •

edited

Loading

arcivanov commented Jan 23, 2019 •

edited

Loading