Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured logging #1041

Merged
merged 18 commits into from Jun 1, 2017
Merged

Conversation

M4rtinK
Copy link
Contributor

@M4rtinK M4rtinK commented Apr 25, 2017

This pull request adds support for structured logging to Anaconda and coverts all modules to use it. You can check the individual commit messages for how it works and and for examples.

Also as this PS is quite substantially changing how the logs look like (due to replacing the always present "anaconda" prefix with the structured logging prefixes), it is currently targeting master only.

@M4rtinK M4rtinK added the master Please, use the `f39` label instead. label Apr 25, 2017
@M4rtinK
Copy link
Contributor Author

M4rtinK commented Apr 25, 2017

This is an example how the resulting log output looks like: https://paste.fedoraproject.org/paste/qQhmZY~BLOMzpl9BWgQCgF5M1UNdIGYhyRLivL9gydE=

@M4rtinK M4rtinK changed the title Master structured logging plus Structured logging Apr 25, 2017
@jkonecny12
Copy link
Member

jenkins, test this please

@M4rtinK M4rtinK force-pushed the master-structured_logging_plus branch from f4031ee to 05feee5 Compare April 26, 2017 10:51
@M4rtinK
Copy link
Contributor Author

M4rtinK commented Apr 26, 2017

jenkins, test this please

Copy link
Member

@jkonecny12 jkonecny12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except that one idea for improvement it looks good to me. Good work!

record.log_prefix = "misc"
else:
# drop "anaconda." from the log prefix
record.log_prefix = re.sub(REMOVE_ANACONDA_PREFIX_RE, "", record.name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend you to change re. solution to startswith. It is better for performance which is good for logging.
I would go for something like:

if record.name.startswith("anaconda."):
    record.log_prefix = record.name[9:]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea - and even though using a constant like this is slightly terrifying, this part of the code is indeed potentially quite performance sensitive & we are not likely to change the number any time soon. :)

@M4rtinK M4rtinK force-pushed the master-structured_logging_plus branch from 05feee5 to 20f2aa3 Compare April 26, 2017 15:36
@jaymzh
Copy link
Contributor

jaymzh commented Apr 27, 2017

:feelsgood: 😀

CC @dcavalca125

Copy link
Contributor

@poncovka poncovka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just some suggestions :-)

@@ -130,6 +137,25 @@ class AnacondaFileHandler(_AnacondaLogFixer, logging.FileHandler):
class AnacondaStreamHandler(_AnacondaLogFixer, logging.StreamHandler):
pass

class AanacondaPrefixFilter(logging.Filter):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A typo Aa... :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks! :)

stderrLog = logging.getLogger("anaconda.stderr")
storage_log = logging.getLogger("blivet")
stdoutLog = logging.getLogger("anaconda.stdout")
from pyanaconda import anaconda_log

# command specififc loggers
authconfig_log = logging.getLogger("anaconda.kickstart.authconfig")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to create a new logger for every kickstart command? Isn't there a way, how to add these prefixes, when we log the messages? We could use log.debug("Message about a command xyz", extra={"prefix": "xyz"}) or modify the logger to be able to do something like: log.debug_with_prefix("xyz", "Message about a command xyz")

Example with extra arguments is here: https://docs.python.org/3/library/logging.html#logging.debug

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or create the prefix automatically. Something like use class.__name__ or similar.
Or implement create new "base class" with log_prefix variable which will be inherited by these kickstart classes and they just fill the log_prefix variable, everything else will be done inside of the "base class".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good - and it indeed looks like our kickstart command subclasses look like a good use case for the class instance log (eq. just call self.log.info() and the appropriate prefix will be added automatically based on the class). But as we subclass the kickstart commands in this case, it will probably have to a mixin that adds the logging functionality to the given kickstart command class.

BTW, a potential show-stopper - looks like the instantiated kickstart command classes for some reason get serialized when a storage snapshot is being created and I've seen it die when it tried to serialize a logger instance (eq. self.log = getLogger("kickstart.timezone") was enough to make it crash during serialization). It might not be an issue if self.log is is a proper property, or the serialization could be fixed to skip the kickstart command instances, but it's certainly something to watch for.

For that reason I would like to be careful and merge this method first (statically defined loggers on top of file) before investigating using the class instance loggers as there is less chance of immediate breakage like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the statically defined loggers on top of a file per module, but I don't like the loggers for kickstart commands, because I think we should treat them the same way as the class loggers (in the future). Then the self.log might be a way to go. Good point about the serialization ;-)

@@ -27,7 +27,7 @@
from pyanaconda.progress import progress_message
from pyanaconda.i18n import N_

log = logging.getLogger("anaconda")
log = logging.getLogger("anaconda.addons")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about just basing the prefix on folder structure (which I think is basically what the module property contains), but then decided against that, for a couple reasons:

  • the prefix would change if we move the file around, making it less stable
  • manually specifying the prefix is simple (a single line per module) and provides greater flexibility
    • related modules can be grouped together with a single prefix without needlessly exposing filesystem details (eq. all non spoke & hub modules can just live in the ui.gui.* namespace while spokes log to ui.gui.spoke.* and hubs log to ui.gui.hub.*)
    • as the prefix is not based on folder structure a more meaningful & better readable prefix can be used
      • ui.gui.spoke.network instead of ui.gui.spokes.network - the log messages come from the network spoke, not one of the network spokes
      • ui.gui.spoke.custom_storage_config instead of ui.gui.spokes.custom - even though anaconda developers, contributors and the QA community probably know that "custom spoke" is actually the "custom storage configuration spoke", not everyone might know that, especially when reading a log message prefix
  • it should be pretty easy to track message origin by grepping for the prefix in the Anaconda source code (the module based approach would have a similar benefit though)
  • there might actually be more than one logger in a file, if the functionality contained in it is disparate enough (as for example in the kickstart module)

In short - I want the log prefix to understandably describe what is being logged and to be reasonably stable over time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done some research as well and I agree.

@@ -32,6 +32,10 @@
STDOUT_FORMAT = "%(asctime)s %(message)s"
DATE_FORMAT = "%H:%M:%S"

# the Anaconda log uses structured logging
ANACONDA_ENTRY_FORMAT = "%(asctime)s,%(msecs)03d %(levelname)s %(log_prefix)s: %(message)s"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we have a boot option inst.debug, that would enable more detailed structured messages?
It looks like there are a lot of attributes we could log:
https://docs.python.org/3/library/logging.html#logrecord-attributes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there is a lot of stuff to log - and especially stuff like path to file where the log message originated from or the line number could certainly be useful for debugging. But at the same time I'm not sure how readable the log file would be in such a case.

So I'm thinking about two things that could be done:

  • we can do a detailed/verbose (anaconda-verbose.log, etc.) log that is created if inst.debug=1 is used, which h could contain path to file, function name and line number
    • like this we still have a readable anaconda.log for general debugging and the new detailed log for detailed debugging
  • if we switched from using the SyslongHandler to actual Journal handler for forwarding of Anaconda log messages to journal, all the extra data fields should end up in Journal as well
    • we might need to change our log grabbing procedures to either grab the full journal or some expert that has the extra fields
    • we might also want to investigate how journalctl can present the extra fields in it's output, as just displaying all of them at once would make the log mostly unreadable
    • I guess that the extra data fields can be useful for kickstart tests and other forms of testing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should at least investigate the second option, but we can do that in a different issue :-)

# messages going to the generic "anaconda" logger get the log prefix "misc"
if record.name == "anaconda":
record.log_prefix = "misc"
elif record.name.startswith("anaconda."):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked at the logs and I miss the anaconda prefix a little :-) But I can get used to it.

@M4rtinK M4rtinK force-pushed the master-structured_logging_plus branch from 20f2aa3 to 0eda6db Compare April 27, 2017 11:23
@M4rtinK M4rtinK force-pushed the master-structured_logging_plus branch 2 times, most recently from fc2fc2e to b2e77d9 Compare May 16, 2017 15:46
@M4rtinK M4rtinK force-pushed the master-structured_logging_plus branch from b2e77d9 to c0f2136 Compare May 26, 2017 14:42
Copy link
Member

@jkonecny12 jkonecny12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looks good to me ;)

Copy link
Contributor

@poncovka poncovka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great to me. I like the new names of modules. Just some questions :-)

stdoutLog = logging.getLogger("anaconda.stdout")

from pyanaconda.ilog import get_logger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the submodule called ilog? I know we had problems with name collisions between anaconda submodules and python modules, but that should be fixed now. Same for ihelp, iutil and isignal. Is there another reason for these names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's all due to naming collisions as far as I can tell (the i likely means Installer):

  • ilog - we mostly call the logger we use "log", so a module name should not collide with it
  • ihelp - there is built-in Python function called "help" and we should not shadow that
  • isys - there is the standard library "sys" module and isys was probably originally meant as an installer specific extension of the sys module
  • iutil - could be that there used to be a util module somewhere in the path so iutil was used to avoid naming collisions
  • isignal - there is a module called signal in the Python standard library

So it's basically a sort of a custom - add in in front of stuff that would shadow existing modules and built-in functions. But that does not necessarily mean it's correct and could not be improved (using a different name that does not collide, using installer_ prefix instead of just i, etc.).

Still I think that should probably be (with the possible exception of the ilog module) a separate pull request as we already change quite a bit in this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the explanation!

@@ -34,6 +34,7 @@
from uuid import uuid4
import itertools
import glob
import logging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed in the function setup_ifcfg_log. Is the function fine as it is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be - it sets-up its own separate logger & log file for ficfg related logging, so I don't really see a need to change it.

@@ -20,9 +20,11 @@
from pyanaconda.i18n import _

import logging
log = logging.getLogger("anaconda")
stdout_log = logging.getLogger("anaconda.stdout")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it a little confusing that we combine logging with anaconda.ilog?


import logging

def get_logger(module_name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that this could be done also in the __init__ method of the custom Logger class, that could be set as default with logging.setLoggerClass. Then we could get loggers as:

import logging
log = logging.getLogger(__name__)

Have you considered that?

@@ -47,7 +47,9 @@
from pykickstart.constants import AUTOPART_TYPE_LVM, AUTOPART_TYPE_LVM_THINP

import logging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have all the logging logic in one file, so we wouldn't need to import the logging anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be done now - the anaconda_loggers module now has getters for the module loggers and for all the specialized loggers.

@M4rtinK M4rtinK force-pushed the master-structured_logging_plus branch 3 times, most recently from a329d0c to 65349c0 Compare May 31, 2017 10:45
Copy link
Member

@jkonecny12 jkonecny12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The anaconda_logging and anaconda_loggers modules should be changed.

from pyanaconda import anaconda_log
anaconda_log.init()
anaconda_log.logger.setupVirtio()
from pyanaconda import anaconda_logging
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the anaconda_logging doesn't provide us loggers?

I have a bad feeling for typos in anaconda_logging and anaconda_loggers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the header of the anaconda_loggers module[0] - it's to avoid circular imports. The anaconda_logging module imports the flags module and the flags module also does logging, so it would have to import the get_module_logger() function from the anaconda_logging module, creating an import circle.

It's likely there could be other such circles, especially once we account for all the specialized loggers the anaconda_loggers module now provides getters for.

[0] https://github.com/rhinstaller/anaconda/pull/1041/files#diff-5c8919b8315eecb9a03a32af4f2bc02eR20

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious what we log in flags.py and maybe the function can_touch_runtime_system could be moved somewhere else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds promising - but that's just the firs module where the import cycle blew up. So I'm not sure there are not other places - or how easy it is to create a new cycle by incident.

Still certainly worth investigating in the future though.

Copy link
Contributor

@poncovka poncovka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no more comments. This solution looks very nice. Thank you!

@M4rtinK
Copy link
Contributor Author

M4rtinK commented May 31, 2017

PR for Blivet to account for the anaconda_log module name change:
storaged-project/blivet#598

@@ -721,7 +721,7 @@ def selected_dev_cfg(self):
return model[itr][DEVICES_COLUMN_OBJECT]

def add_dev_cfg(self, dev_cfg):
log.debug("network: GUI, device configuration added: connection %s device %s",
log.debug("GUI, device configuration added: connection %s device %s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove also "GUI, " here

@@ -734,14 +734,14 @@ def add_device_to_list(self, device):
if device.get_device_type() not in self.supported_device_types:
return
if network.is_libvirt_device(device.get_iface()):
log.debug("network: GUI, not adding %s", device.get_iface())
log.debug("GUI, not adding %s", device.get_iface())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove also "GUI, " here

return
# ignore fcoe vlan devices
# (can be chopped off to IFNAMSIZ kernel limit)
if device.get_iface().endswith(('-fcoe', '-fco', '-fc', '-f', '-')):
return
if network.is_ibft_configured_device(device.get_iface() or ""):
log.debug("network: GUI, not adding connection for device %s configured from iBFT", device.get_iface())
log.debug("GUI, not adding connection for device %s configured from iBFT", device.get_iface())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove also "GUI, " here

@@ -752,11 +752,11 @@ def add_device_to_list(self, device):
if rc:
con_setting = rc.get_setting_connection()
if con_setting and con_setting.get_read_only():
log.debug("network: GUI, not adding read-only connection "
log.debug("GUI, not adding read-only connection "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove also "GUI, " here

"(assuming iBFT) for device %s", device.get_iface())
return
else:
log.debug("network: GUI, can't get remote connection of active connection "
log.debug("GUI, can't get remote connection of active connection "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove also "GUI, " here

@@ -766,9 +766,9 @@ def add_device_to_list(self, device):
cons = device.get_available_connections()
ifcfg_uuid = None
if not cons:
log.debug("network: GUI, no connection when adding device %s", device.get_iface())
log.debug("GUI, no connection when adding device %s", device.get_iface())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove also "GUI, " here

if len(cons) > 1:
log.debug("network: GUI, %s has multiple connections: %s",
log.debug("GUI, %s has multiple connections: %s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove also "GUI, " here

@@ -875,7 +875,7 @@ def dev_cfg(self, uuid=None, device=None, iface=None):
def remove_device(self, device):
# This should not concern wifi and ethernet devices,
# just virtual devices e.g. vpn probably
log.debug("network: GUI, device removed: %s", device.get_iface())
log.debug("GUI, device removed: %s", device.get_iface())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove also "GUI, " here

Copy link
Contributor

@rvykydal rvykydal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, looking forward to read new logs. I just noticed that for some log message updates in gui/spokes/network.py you can remove also "GUI, " prefix.

M4rtinK added 17 commits June 1, 2017 15:21
Add AnacondaPrefixFilter - basically a pass-through filter (never
discards any log records), that sets the log_prefix attribute for
log records.

The logic is simple: - if the name attribute starts with "anaconda.",
it drops the "anaconda." prefix and sets log_prefix to the remainder.

So for example name "anaconda.geoloc" gets a log prefix "geoloc"
and "anaconda.payload.rpmostree" gets a log prefix "payload.rpmostree".

If the log record name is just "anaconda" the record is considered to
be something targeting Anaconda in general, so the log_prefix
attribute os set to "misc" for miscellaneous.
Make it possible to add custom filter and formatter for the syslog
handler. Also skip syslog message modification if a custom formatter is
in use.
Make it possible to set filters for file handlers added by Anaconda.
Add appropriate formatters and filters to the file and syslog handlers
of the anaconda logger, so that structured logging works as expected:

logger -> log message prefix

anaconda.geoloc -> geoloc: <message>
anaconda.ui.gui.spokes.custom -> ui.gui.spokes.custom: <message>
anaconda.blivet.autopart -> blivet.autopart: <message>

And if the anaconda logger is used directly:

anaconda -> misc: <message>
This makes the module name more consistent with what the module is
(Anaconda specififc extensions to the Python logging module) and
prepares place for the related anaconda_loggers module in the pyanaconda
namespace.
The get_module_logger() method returns "anaconda" sub-loggers based
on the provided module name.

For example the module name:

__name__ == "pyanaconda.payload.tarpayload"

Creates a logger instance named:

"anaconda.payload.tarpayload"

The other get_logger_*() methods provide access various
other special purpose loggers, such as the stdout/stderr log,
the storage log or the packaging log.
Use constants for special purpose logger names in the
anaconda_logging module.
Assign unique 3 letter log level names to improve
readability of the log files.

Previously the log level names had variable length,
making the log files less readable due to log messages
beginning on a variable offset on lines in the log file.

With the fixed with name log messages start all at the same
offset from the start of the line and the lines might also be slightly shorter,
so more text might fit on the screen when viewing the logs during installation,
especially on low resolution screens.
Use the get_logger() function to automatically provided structured
sub-loggers for Anaconda modules based on module hierarchy.

The logger naming aims to both make the logs easy to read and understand
while keeping the log prefixes reasonably stable over time.

Also use structured logging for individual kickstart commands,
so that we know form what kickstart command or kickstart processing
phase a given message comes.

This also makes it possible to easily check for log messages coming
from kickstart command processing by grepping for "kickstart\." in the log.

And finally remove some redundant logging prefixes, which are no longer needed
with structured logging. Structured logging should also be much less error
prone than having to manually specify a prefix for each log message.
Use the get_*_logger() methods provided by the anaconda_loggers
module when acquiring special purpose loggers.
Structured log message prefix is based on module path,
so renaming threads to threading should improve log message readability.
This should make it more evident that log messages come
from an "installation" module, rather then that we are
telling people to "install" something.
Same rationale as for the install.py -> installation.py renaming.
This should better describe what the module is doing.
Rename some GUI spokes to give them better understandable names:

filter -> advanced_storage (probably should have been done long ago)
custom -> custom_storage
source -> installation_source
langsupport -> language_support
password -> root_password
software -> software_selection
Rename some TUI spokes to give them better understandable names:

progress -> installation_progress
source -> installation_source
langsupport -> language_support
password -> root_password
software -> software_selection
No longer needed with structured logging as we now get a similar
prefix automatically.
@M4rtinK M4rtinK force-pushed the master-structured_logging_plus branch from 65349c0 to 86043c9 Compare June 1, 2017 13:23
Needed due to the anaconda_log module rename.
@M4rtinK M4rtinK merged commit 7587684 into rhinstaller:master Jun 1, 2017
@pallotron
Copy link
Contributor

Hey, two questions: can you tune this up using cmdline arguments or kickstart config statements? Are the docs updated?

the pastebin in the comments expired, can you post a permanent one? :)

Thanks!

@M4rtinK
Copy link
Contributor Author

M4rtinK commented Jul 11, 2017

Hey, two questions: can you tune this up using cmdline arguments or kickstart config statements? Are the docs updated?
There are some existing options for configuring logging via boot options:
inst.loglevel - can set log level
inst.syslog - logging to a remote syslog
inst.virtiolog - virtio logging
inst.nosave - can be used to set if logs are saved to the target system

And kickstart:
The logging command can be used to set log level & configure remote logging.

Do you have some other use cases that are not cover yet by these options/commands ?

the pastebin in the comments expired, can you post a permanent one? :)

Thanks!

Will do! :)

@M4rtinK
Copy link
Contributor Author

M4rtinK commented Jul 11, 2017

anaconda.log with structured logging from latest Rawhide:
https://gist.github.com/M4rtinK/1bfa0a5913a404eb42cb78c5c79b57d8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
master Please, use the `f39` label instead.
6 participants