Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extra fields #47

Closed
dkavraal opened this issue Nov 1, 2015 · 16 comments
Closed

extra fields #47

dkavraal opened this issue Nov 1, 2015 · 16 comments

Comments

@dkavraal
Copy link

dkavraal commented Nov 1, 2015

I wasn't able to pass extra fields. Have you tested such:

from fluent import handler
custom_format = {
  'host': '%(hostname)s',
  'where': '%(module)s.%(funcName)s',
  'type': '%(levelname)s',
  'stack_trace': '%(exc_text)s'
}
log = logging.getLogger('fluent.test')
log.setLevel(logging.DEBUG)
h = handler.FluentHandler('app.python', host='127.0.0.1', port=24224)
formatter = handler.FluentRecordFormatter(custom_format)
h.setFormatter(formatter)
log.addHandler(h)

log.debug('python fluent log test', extra={'extraField': 'test'} )

on the last line I am expecting fluentd(td-agent) to pass the extraField into the collector.

Is there sth I'm missing?

@repeatedly
Copy link
Member

extra option is used in other handlers or FluentHandler should care extra individually?

@dkavraal
Copy link
Author

I have (had) only Fluent Handler and would wanted to see the extra fields on Fluent server. I guess this is a simple thought and wish.
I went with workaround as:

    #log.debug('python fluent log test', extra={'extraField': 'test'} )
    log.debug('python fluent log workaround. extraField: %s', 'test')

despite it did the job, it is moving out of convention I guess. I had to convert all logging lines in the project to cover such.

@EvaSDK
Copy link
Contributor

EvaSDK commented Jul 26, 2016

If you don't put the data in the format, the handler cannot know that you want it to output this to fluent.

@yoichi
Copy link
Contributor

yoichi commented Apr 24, 2017

dkavraal , you can refer keys in extra in custom_format. For example,

custom_format = {
  'extra_field': '%(extraField)s',
  'host': '%(hostname)s',
  'where': '%(module)s.%(funcName)s',
  'type': '%(levelname)s',
  'stack_trace': '%(exc_text)s'
}

@siretart
Copy link

siretart commented Aug 2, 2017

@yoichi The issue with the custom_format approach is that it converts all values to string. Consider the case where you'd want to add a numerical identifier:

mylogger.exception("bad` error condition while processing job %d", job_id,
                   extra={'job_id': job_id})

This log record would embed the job_id into the "message" field of the record (which necessarily is a string), but the job_id field of the record would also be a string, and not an integer. This can be very annoying when you are trying to insert structured data into for instance elasticsearch, and end up fighting with type errors.

Maybe there should be an option to copy all excess "extra" fields into the resulting dictionary that then gets serialized with msgpack and sent out to fluentd?

@yoichi
Copy link
Contributor

yoichi commented Aug 2, 2017

@siretart why don't you pass structured data directly as msg (1st argument)?

@rtartler-desres
Copy link

@yoichi because the logger might have other handlers attached, such as syslog or file hander, for which a dictionary would mess up existing logs.

@yoichi
Copy link
Contributor

yoichi commented Aug 2, 2017

@siretart extra is merged into record.__dict__ in Logger.makeRecord. How about introducing a new style as #91 ?

@rtartler-desres
Copy link

@yoichi I guess that could work.

In the mean time, I came across https://github.com/vklochan/python-logstash/blob/master/logstash/formatter.py - which seems significantly simpler on the first look. I'll experiment with that a bit.

@adeadman
Copy link

adeadman commented May 8, 2018

@repeatedly in answer to your question from 2016, yes, other log handlers will accept the extra parameter. We use it internally to include extra tags when logging to something like Sentry, for example.

@yoichi's suggestion of logging a dict as msg works fine for Fluent Formatter, but it does not work with other log handlers that do not expect a structured message, and expect string message + extra dict.

It would be nice to use a single method of logging in the application that can be handled the same way by all handlers on the logger, including the standard Python ones.

@arcivanov
Copy link
Member

@adeadman I'm confused. You want to use FluentFormatter without FluentHandler or FluentHandler without FluentFormatter?

I think that FluentHandler can handle other formatters. Setting up alternative logging schematics is not that difficult. This works in production perfectly:

        if self.fluent_url:
            h = self.get_fluent_handler()
            filter = self.get_fluent_overwriting_filter()
            h.addFilter(filter)
            f = self.get_fluent_formatter()
        else:
            h = StreamHandler()
            f = Formatter("%(asctime)s %(levelname)s "
                          "%(name)s: %(message)s")

        f.converter = time.gmtime
        h.setFormatter(f)
        root_logger.addHandler(h)
        root_logger.setLevel(WARNING)
        self.logger.setLevel(self.loglevel)
        self.logger = LoggerAdapter(self.logger, key_map=DEFAULT_LOG_KEY_MAP)

@adeadman
Copy link

adeadman commented May 9, 2018

@arcivanov I am using FluentHandler and the FluentFormatter (although I subclass it to add an extra tracking ID field to the log).

However, the extra parameter is ignored, unless the FluentFormatter is instantiated with exclude_attrs=[], which causes it to use the _format_by_exclusion method instead of _format_by_dict and the extra fields are then added to the dict (along with all of the other default properties of a Python LogRecord, including ['args', 'created', 'exc_info', 'exc_text', 'filename', funcName', 'levelname', 'levelno', 'lineno', message', 'module', 'msecs', 'msg', 'name', 'pathname', 'process', 'processName', 'relativeCreated', 'stack_info'] etc.

If the _format_by_dict is used, only the default formatter keys are extracted from the logrecord, and then the message is added, but any additional flags are ignored.

At the moment I'm working on adding support for extracting the extra keys from the log record, but unfortunately the Python Logger implementation adds the keys to the LogRecord.__dict__ so the only way to know what extra keys were added is to remove the default keys from that set. I will make a PR but the correct way to solve this problem may in fact be to subclass the Logger implementation and extract the extra dict at that point.

@brunsgaard
Copy link

Sentry is having the same problem as described here.

A solution extracting the extra data can be found here
https://github.com/getsentry/raven-python/blob/master/raven/handlers/logging.py#L34-L57

@arcivanov
Copy link
Member

Per Python documentation:

While this [extra behavior] might be annoying, this feature is intended for use in specialized circumstances, such as multi-threaded servers where the same code executes in many contexts, and interesting conditions which arise are dependent on this context (such as remote client IP address and authenticated user name, in the above example). In such circumstances, it is likely that specialized Formatters would be used with particular Handlers.

Inserting general-purpose logic into special circumstances feature is, generally, ill-advised.

I'll close this as undesirable.

@jgoclawski
Copy link

@arcivanov
I'll write in the issue instead of the PR, to keep things in one place.

First of all, thanks for a quick and constructive feedback. I'll take my right to disagree though and hope you'll reconsider your decision :)
We've already implemented the feature on application's side (I work together with @Behoston), but I strongly believe it's a valuable addition to the library.

My main point is that so far no one has presented a clear use case for such feature of this library. Allow me to do it.

Real life use-case

  • I want to have one way to log things from my application, independent of the handler used to ship that log. (obviously)
  • commonly used and broadly accepted convention is to use the "extra" keyword (Sentry, Datadog, Logstash, etc. support it). Thus I already have my logs emmited in a standard way:
logger.warning('Something bad happened', extra={'user_id': 123})
  • moreover, I want to keep native types, without casting to string. This is the primary reason to use the rich msgpack or json logging format in the first place! Otherwise I can simply output the logs to stdout or to the file, tail the logs, and collect if with a much simpler log handler. I want to be able to issue a log like this:
logger.warning('something bad happened', extra={
    'object_id': 777,
    'user_data': {
        'name': 'Fred', 
        'id': 123
    }
})

This works out of the box with other commonly used logging libraries (like Sentry).

  • I've built my application based on those assumptions, having Sentry and Logstash work with it out of the box. Now I want to replace Logstash with Fluentd and my application no longer sends the "extra" stuff.

Please consider that all major and popular Python logging libraries handle that use-case out of the box. Whereas fluent-logger-python takes by surprise by not supporting it and requiring the user to read through issues to find clues about how to make it work. It may seriously hinder the adoption.
Examples of existing support (proof of nonwritten convention :)):

Actually, I found it in every library I thought about.

Referring to you concerns:

  • it places the burden of keeping track of Python logging reserved attributes on the library, as those attributes may change in the future. While it's a valid point, the reality is that those attributes haven't changed for the past 5 years at least. You can see that for yourself in Sentry's source code. https://github.com/getsentry/raven-python/blame/master/raven/handlers/logging.py#L22-25
  • the library makes an opinion about where to put those "extra" attributes. Yes it does, but it also makes other assumptions. In my personal opinion convention over configuration is the good thing, especially for the end user. That way the library works out of the box, assuming sane defaults. We may of course add some way to configure this behavior if you think it's important.

Personally, I see three solutions:

  1. Ignore the issue
  2. discuss how we want to support the use-case out of the box based on the submitted PR
  3. add a configuration example (no out of the box support, but at least an easy-to-find recipe)

@arcivanov
Copy link
Member

arcivanov commented Jan 23, 2019

Thanks for your feedback. I still disagree.

Let's consider two scenarios:

  1. Where things are right now
  2. Where you things want things to be.

Right now people who need extra field functionality, as it's designed to be used in Python i.e. to augment log entries with data that is mostly not directly specified during logging call, have their own current methodologies for dealing with the extra fields. Some may fold them directly into the record fields, some may want them in extra field (like you), some may want to use them, some may derive other values from them etc.

Let's look at the legislative intent, so to speak, of the Python logger in general and FluentD logger in particular.

In Python logger extra fields are flat, folded in as fields of the LogRecord. There is not a LogRecord.extra - they fall straight into LogRecord dict: https://github.com/python/cpython/blob/master/Lib/logging/__init__.py#L1553
They are then picked by the formatter/handler by default as necessary, depending on the formatter/handler.

In FluentD logger you can log a dictionary, message or JSON encoded in string, you can specify a dictionary of fields to log, you can switch to exclude_attrs mode to log all LogRecord attributes except the ones specified, and you can even fill the missing keys with None to prevent KeyError if you're actually using specialized message formats that refer to fields that may or may not be there!

In fact this sample code works perfectly fine logging extra fields happily, preserving both types and structure:

from logging import (root as root_logger,
                     Handler)
from pprint import pprint

from fluent.handler import FluentRecordFormatter


class TestHandler(Handler):
    def emit(self, record):
        pprint(self.format(record))


def main():
    f = FluentRecordFormatter(exclude_attrs=())
    h = TestHandler()
    h.setFormatter(f)
    root_logger.addHandler(h)

    EXTRA = {"x": 1, "y": 0.1, "z": [1, "array", "of", b"basic", "types"]}
    root_logger.error({"Hello": "Dict World with no extra :("})
    root_logger.error({"Hello": "Dict World with extra! :)"},
                      extra=EXTRA)
    root_logger.error("Hello World with no extra :(")
    root_logger.error("Hello World with extra! :)",
                      extra=EXTRA)

producing the following output (less the snippage):

{'Hello': 'Dict World with no extra :(',
 'args': (),
 'created': 1548218954.407479,
 'exc_info': None,
 'exc_text': None,
 'filename': 'fluent_extra.py',
 'funcName': 'main',
 'hostname': '<snip>',
 'levelname': 'ERROR',
 'levelno': 40,
 'lineno': 20,
 'message': "{'Hello': 'Dict World with no extra :('}",
 'module': 'fluent_extra',
 'msecs': 407.47904777526855,
 'msg': {'Hello': 'Dict World with no extra :('},
 'name': 'root',
 'pathname': '/<snip>/src/unittest/python/fluent_extra.py',
 'process': 24084,
 'processName': 'MainProcess',
 'relativeCreated': 11.337995529174805,
 'stack_info': None,
 'thread': 140421174366336,
 'threadName': 'MainThread'}
{'Hello': 'Dict World with extra! :)',
 'args': (),
 'created': 1548218954.4079602,
 'exc_info': None,
 'exc_text': None,
 'filename': 'fluent_extra.py',
 'funcName': 'main',
 'hostname': '<snip>',
 'levelname': 'ERROR',
 'levelno': 40,
 'lineno': 22,
 'message': "{'Hello': 'Dict World with extra! :)'}",
 'module': 'fluent_extra',
 'msecs': 407.9601764678955,
 'msg': {'Hello': 'Dict World with extra! :)'},
 'name': 'root',
 'pathname': '/<snip>/src/unittest/python/fluent_extra.py',
 'process': 24084,
 'processName': 'MainProcess',
 'relativeCreated': 11.819124221801758,
 'stack_info': None,
 'thread': 140421174366336,
 'threadName': 'MainThread',
 'x': 1,
 'y': 0.1,
 'z': [1, 'array', 'of', b'basic', 'types']}
{'args': (),
 'created': 1548218954.4083252,
 'exc_info': None,
 'exc_text': None,
 'filename': 'fluent_extra.py',
 'funcName': 'main',
 'hostname': '<snip>',
 'levelname': 'ERROR',
 'levelno': 40,
 'lineno': 23,
 'message': 'Hello World with no extra :(',
 'module': 'fluent_extra',
 'msecs': 408.3251953125,
 'msg': 'Hello World with no extra :(',
 'name': 'root',
 'pathname': '/<snip>/src/unittest/python/fluent_extra.py',
 'process': 24084,
 'processName': 'MainProcess',
 'relativeCreated': 12.18414306640625,
 'stack_info': None,
 'thread': 140421174366336,
 'threadName': 'MainThread'}
{'args': (),
 'created': 1548218954.4086447,
 'exc_info': None,
 'exc_text': None,
 'filename': 'fluent_extra.py',
 'funcName': 'main',
 'hostname': '<snip>',
 'levelname': 'ERROR',
 'levelno': 40,
 'lineno': 25,
 'message': 'Hello World with extra! :)',
 'module': 'fluent_extra',
 'msecs': 408.6446762084961,
 'msg': 'Hello World with extra! :)',
 'name': 'root',
 'pathname': '/<snip>/src/unittest/python/fluent_extra.py',
 'process': 24084,
 'processName': 'MainProcess',
 'relativeCreated': 12.503623962402344,
 'stack_info': None,
 'thread': 140421174366336,
 'threadName': 'MainThread',
 'x': 1,
 'y': 0.1,
 'z': [1, 'array', 'of', b'basic', 'types']}

Now, you propose to add tons of specialized handling into fluent-logger that:

  1. Always logs extras if specified.
  2. Always puts them into data["extra"], against the intent and expectation of LogRecord to overlay the extra fields onto the record.
  3. Hardcodes behavior as to the types of extra values etc.

If we were to add this feature, every user that ever used extra before according to their needs would now find an extra dictionary entry on their records out of the blue. This violates POLA to begin with, but also potentially exposes sensitive data that might have been safe otherwise as it would be parsed/masked during processing in Handler/Formatter/LoggerAdapter, and increases data traffic and storage for every user that uses extra in the way Python actually designed the feature to be used predominantly - as a contextual augmentation of records where the logger is not aware of the context.

You have to configure a FluentFormatter with FluentHandler anyway, not matter what. Allowing current FluentFormatter to log arbitrary extra fields is trivial with existing code as demonstrated above. Thus modifying the default behavior with this task-specific hardcoded behavior seems arbitrary if not outright harmful to the existing users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants