New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Structured logging #1041
Structured logging #1041
Conversation
This is an example how the resulting log output looks like: https://paste.fedoraproject.org/paste/qQhmZY~BLOMzpl9BWgQCgF5M1UNdIGYhyRLivL9gydE= |
jenkins, test this please |
f4031ee
to
05feee5
Compare
jenkins, test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except that one idea for improvement it looks good to me. Good work!
pyanaconda/anaconda_log.py
Outdated
record.log_prefix = "misc" | ||
else: | ||
# drop "anaconda." from the log prefix | ||
record.log_prefix = re.sub(REMOVE_ANACONDA_PREFIX_RE, "", record.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend you to change re.
solution to startswith
. It is better for performance which is good for logging.
I would go for something like:
if record.name.startswith("anaconda."):
record.log_prefix = record.name[9:]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea - and even though using a constant like this is slightly terrifying, this part of the code is indeed potentially quite performance sensitive & we are not likely to change the number any time soon. :)
05feee5
to
20f2aa3
Compare
😀 CC @dcavalca125 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, just some suggestions :-)
pyanaconda/anaconda_log.py
Outdated
@@ -130,6 +137,25 @@ class AnacondaFileHandler(_AnacondaLogFixer, logging.FileHandler): | |||
class AnacondaStreamHandler(_AnacondaLogFixer, logging.StreamHandler): | |||
pass | |||
|
|||
class AanacondaPrefixFilter(logging.Filter): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A typo Aa... :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks! :)
pyanaconda/kickstart.py
Outdated
stderrLog = logging.getLogger("anaconda.stderr") | ||
storage_log = logging.getLogger("blivet") | ||
stdoutLog = logging.getLogger("anaconda.stdout") | ||
from pyanaconda import anaconda_log | ||
|
||
# command specififc loggers | ||
authconfig_log = logging.getLogger("anaconda.kickstart.authconfig") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really necessary to create a new logger for every kickstart command? Isn't there a way, how to add these prefixes, when we log the messages? We could use log.debug("Message about a command xyz", extra={"prefix": "xyz"})
or modify the logger to be able to do something like: log.debug_with_prefix("xyz", "Message about a command xyz")
Example with extra arguments is here: https://docs.python.org/3/library/logging.html#logging.debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or create the prefix automatically. Something like use class.__name__
or similar.
Or implement create new "base class" with log_prefix
variable which will be inherited by these kickstart classes and they just fill the log_prefix
variable, everything else will be done inside of the "base class".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good - and it indeed looks like our kickstart command subclasses look like a good use case for the class instance log (eq. just call self.log.info() and the appropriate prefix will be added automatically based on the class). But as we subclass the kickstart commands in this case, it will probably have to a mixin that adds the logging functionality to the given kickstart command class.
BTW, a potential show-stopper - looks like the instantiated kickstart command classes for some reason get serialized when a storage snapshot is being created and I've seen it die when it tried to serialize a logger instance (eq. self.log = getLogger("kickstart.timezone") was enough to make it crash during serialization). It might not be an issue if self.log is is a proper property, or the serialization could be fixed to skip the kickstart command instances, but it's certainly something to watch for.
For that reason I would like to be careful and merge this method first (statically defined loggers on top of file) before investigating using the class instance loggers as there is less chance of immediate breakage like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the statically defined loggers on top of a file per module, but I don't like the loggers for kickstart commands, because I think we should treat them the same way as the class loggers (in the future). Then the self.log
might be a way to go. Good point about the serialization ;-)
pyanaconda/addons.py
Outdated
@@ -27,7 +27,7 @@ | |||
from pyanaconda.progress import progress_message | |||
from pyanaconda.i18n import N_ | |||
|
|||
log = logging.getLogger("anaconda") | |||
log = logging.getLogger("anaconda.addons") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you experimented with %(module)s
?
https://docs.python.org/3/library/logging.html#logrecord-attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about just basing the prefix on folder structure (which I think is basically what the module property contains), but then decided against that, for a couple reasons:
- the prefix would change if we move the file around, making it less stable
- manually specifying the prefix is simple (a single line per module) and provides greater flexibility
- related modules can be grouped together with a single prefix without needlessly exposing filesystem details (eq. all non spoke & hub modules can just live in the ui.gui.* namespace while spokes log to ui.gui.spoke.* and hubs log to ui.gui.hub.*)
- as the prefix is not based on folder structure a more meaningful & better readable prefix can be used
ui.gui.spoke.network
instead ofui.gui.spokes.network
- the log messages come from the network spoke, not one of the network spokesui.gui.spoke.custom_storage_config
instead ofui.gui.spokes.custom
- even though anaconda developers, contributors and the QA community probably know that "custom spoke" is actually the "custom storage configuration spoke", not everyone might know that, especially when reading a log message prefix
- it should be pretty easy to track message origin by grepping for the prefix in the Anaconda source code (the module based approach would have a similar benefit though)
- there might actually be more than one logger in a file, if the functionality contained in it is disparate enough (as for example in the kickstart module)
In short - I want the log prefix to understandably describe what is being logged and to be reasonably stable over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done some research as well and I agree.
@@ -32,6 +32,10 @@ | |||
STDOUT_FORMAT = "%(asctime)s %(message)s" | |||
DATE_FORMAT = "%H:%M:%S" | |||
|
|||
# the Anaconda log uses structured logging | |||
ANACONDA_ENTRY_FORMAT = "%(asctime)s,%(msecs)03d %(levelname)s %(log_prefix)s: %(message)s" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we have a boot option inst.debug
, that would enable more detailed structured messages?
It looks like there are a lot of attributes we could log:
https://docs.python.org/3/library/logging.html#logrecord-attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there is a lot of stuff to log - and especially stuff like path to file where the log message originated from or the line number could certainly be useful for debugging. But at the same time I'm not sure how readable the log file would be in such a case.
So I'm thinking about two things that could be done:
- we can do a detailed/verbose (
anaconda-verbose.log
, etc.) log that is created if inst.debug=1 is used, which h could contain path to file, function name and line number- like this we still have a readable
anaconda.log
for general debugging and the new detailed log for detailed debugging
- like this we still have a readable
- if we switched from using the SyslongHandler to actual Journal handler for forwarding of Anaconda log messages to journal, all the extra data fields should end up in Journal as well
- we might need to change our log grabbing procedures to either grab the full journal or some expert that has the extra fields
- we might also want to investigate how journalctl can present the extra fields in it's output, as just displaying all of them at once would make the log mostly unreadable
- I guess that the extra data fields can be useful for kickstart tests and other forms of testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should at least investigate the second option, but we can do that in a different issue :-)
# messages going to the generic "anaconda" logger get the log prefix "misc" | ||
if record.name == "anaconda": | ||
record.log_prefix = "misc" | ||
elif record.name.startswith("anaconda."): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have looked at the logs and I miss the anaconda
prefix a little :-) But I can get used to it.
20f2aa3
to
0eda6db
Compare
fc2fc2e
to
b2e77d9
Compare
b2e77d9
to
c0f2136
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still looks good to me ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks great to me. I like the new names of modules. Just some questions :-)
pyanaconda/anaconda.py
Outdated
stdoutLog = logging.getLogger("anaconda.stdout") | ||
|
||
from pyanaconda.ilog import get_logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the submodule called ilog
? I know we had problems with name collisions between anaconda submodules and python modules, but that should be fixed now. Same for ihelp
, iutil
and isignal
. Is there another reason for these names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's all due to naming collisions as far as I can tell (the i likely means Installer):
- ilog - we mostly call the logger we use "log", so a module name should not collide with it
- ihelp - there is built-in Python function called "help" and we should not shadow that
- isys - there is the standard library "sys" module and isys was probably originally meant as an installer specific extension of the sys module
- iutil - could be that there used to be a util module somewhere in the path so iutil was used to avoid naming collisions
- isignal - there is a module called signal in the Python standard library
So it's basically a sort of a custom - add in in front of stuff that would shadow existing modules and built-in functions. But that does not necessarily mean it's correct and could not be improved (using a different name that does not collide, using installer_ prefix instead of just i, etc.).
Still I think that should probably be (with the possible exception of the ilog module) a separate pull request as we already change quite a bit in this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for the explanation!
@@ -34,6 +34,7 @@ | |||
from uuid import uuid4 | |||
import itertools | |||
import glob | |||
import logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed in the function setup_ifcfg_log
. Is the function fine as it is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be - it sets-up its own separate logger & log file for ficfg related logging, so I don't really see a need to change it.
pyanaconda/startup_utils.py
Outdated
@@ -20,9 +20,11 @@ | |||
from pyanaconda.i18n import _ | |||
|
|||
import logging | |||
log = logging.getLogger("anaconda") | |||
stdout_log = logging.getLogger("anaconda.stdout") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it a little confusing that we combine logging
with anaconda.ilog
?
pyanaconda/ilog.py
Outdated
|
||
import logging | ||
|
||
def get_logger(module_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that this could be done also in the __init__
method of the custom Logger
class, that could be set as default with logging.setLoggerClass
. Then we could get loggers as:
import logging
log = logging.getLogger(__name__)
Have you considered that?
@@ -47,7 +47,9 @@ | |||
from pykickstart.constants import AUTOPART_TYPE_LVM, AUTOPART_TYPE_LVM_THINP | |||
|
|||
import logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have all the logging logic in one file, so we wouldn't need to import the logging
anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be done now - the anaconda_loggers module now has getters for the module loggers and for all the specialized loggers.
a329d0c
to
65349c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The anaconda_logging
and anaconda_loggers
modules should be changed.
from pyanaconda import anaconda_log | ||
anaconda_log.init() | ||
anaconda_log.logger.setupVirtio() | ||
from pyanaconda import anaconda_logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the anaconda_logging
doesn't provide us loggers?
I have a bad feeling for typos in anaconda_logging
and anaconda_loggers
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the header of the anaconda_loggers
module[0] - it's to avoid circular imports. The anaconda_logging
module imports the flags
module and the flags
module also does logging, so it would have to import the get_module_logger()
function from the anaconda_logging
module, creating an import circle.
It's likely there could be other such circles, especially once we account for all the specialized loggers the anaconda_loggers
module now provides getters for.
[0] https://github.com/rhinstaller/anaconda/pull/1041/files#diff-5c8919b8315eecb9a03a32af4f2bc02eR20
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was curious what we log in flags.py
and maybe the function can_touch_runtime_system
could be moved somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds promising - but that's just the firs module where the import cycle blew up. So I'm not sure there are not other places - or how easy it is to create a new cycle by incident.
Still certainly worth investigating in the future though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no more comments. This solution looks very nice. Thank you!
PR for Blivet to account for the anaconda_log module name change: |
pyanaconda/ui/gui/spokes/network.py
Outdated
@@ -721,7 +721,7 @@ def selected_dev_cfg(self): | |||
return model[itr][DEVICES_COLUMN_OBJECT] | |||
|
|||
def add_dev_cfg(self, dev_cfg): | |||
log.debug("network: GUI, device configuration added: connection %s device %s", | |||
log.debug("GUI, device configuration added: connection %s device %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove also "GUI, " here
pyanaconda/ui/gui/spokes/network.py
Outdated
@@ -734,14 +734,14 @@ def add_device_to_list(self, device): | |||
if device.get_device_type() not in self.supported_device_types: | |||
return | |||
if network.is_libvirt_device(device.get_iface()): | |||
log.debug("network: GUI, not adding %s", device.get_iface()) | |||
log.debug("GUI, not adding %s", device.get_iface()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove also "GUI, " here
pyanaconda/ui/gui/spokes/network.py
Outdated
return | ||
# ignore fcoe vlan devices | ||
# (can be chopped off to IFNAMSIZ kernel limit) | ||
if device.get_iface().endswith(('-fcoe', '-fco', '-fc', '-f', '-')): | ||
return | ||
if network.is_ibft_configured_device(device.get_iface() or ""): | ||
log.debug("network: GUI, not adding connection for device %s configured from iBFT", device.get_iface()) | ||
log.debug("GUI, not adding connection for device %s configured from iBFT", device.get_iface()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove also "GUI, " here
pyanaconda/ui/gui/spokes/network.py
Outdated
@@ -752,11 +752,11 @@ def add_device_to_list(self, device): | |||
if rc: | |||
con_setting = rc.get_setting_connection() | |||
if con_setting and con_setting.get_read_only(): | |||
log.debug("network: GUI, not adding read-only connection " | |||
log.debug("GUI, not adding read-only connection " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove also "GUI, " here
pyanaconda/ui/gui/spokes/network.py
Outdated
"(assuming iBFT) for device %s", device.get_iface()) | ||
return | ||
else: | ||
log.debug("network: GUI, can't get remote connection of active connection " | ||
log.debug("GUI, can't get remote connection of active connection " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove also "GUI, " here
pyanaconda/ui/gui/spokes/network.py
Outdated
@@ -766,9 +766,9 @@ def add_device_to_list(self, device): | |||
cons = device.get_available_connections() | |||
ifcfg_uuid = None | |||
if not cons: | |||
log.debug("network: GUI, no connection when adding device %s", device.get_iface()) | |||
log.debug("GUI, no connection when adding device %s", device.get_iface()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove also "GUI, " here
pyanaconda/ui/gui/spokes/network.py
Outdated
if len(cons) > 1: | ||
log.debug("network: GUI, %s has multiple connections: %s", | ||
log.debug("GUI, %s has multiple connections: %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove also "GUI, " here
pyanaconda/ui/gui/spokes/network.py
Outdated
@@ -875,7 +875,7 @@ def dev_cfg(self, uuid=None, device=None, iface=None): | |||
def remove_device(self, device): | |||
# This should not concern wifi and ethernet devices, | |||
# just virtual devices e.g. vpn probably | |||
log.debug("network: GUI, device removed: %s", device.get_iface()) | |||
log.debug("GUI, device removed: %s", device.get_iface()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove also "GUI, " here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, looking forward to read new logs. I just noticed that for some log message updates in gui/spokes/network.py you can remove also "GUI, " prefix.
Add AnacondaPrefixFilter - basically a pass-through filter (never discards any log records), that sets the log_prefix attribute for log records. The logic is simple: - if the name attribute starts with "anaconda.", it drops the "anaconda." prefix and sets log_prefix to the remainder. So for example name "anaconda.geoloc" gets a log prefix "geoloc" and "anaconda.payload.rpmostree" gets a log prefix "payload.rpmostree". If the log record name is just "anaconda" the record is considered to be something targeting Anaconda in general, so the log_prefix attribute os set to "misc" for miscellaneous.
Make it possible to add custom filter and formatter for the syslog handler. Also skip syslog message modification if a custom formatter is in use.
Make it possible to set filters for file handlers added by Anaconda.
Add appropriate formatters and filters to the file and syslog handlers of the anaconda logger, so that structured logging works as expected: logger -> log message prefix anaconda.geoloc -> geoloc: <message> anaconda.ui.gui.spokes.custom -> ui.gui.spokes.custom: <message> anaconda.blivet.autopart -> blivet.autopart: <message> And if the anaconda logger is used directly: anaconda -> misc: <message>
This makes the module name more consistent with what the module is (Anaconda specififc extensions to the Python logging module) and prepares place for the related anaconda_loggers module in the pyanaconda namespace.
The get_module_logger() method returns "anaconda" sub-loggers based on the provided module name. For example the module name: __name__ == "pyanaconda.payload.tarpayload" Creates a logger instance named: "anaconda.payload.tarpayload" The other get_logger_*() methods provide access various other special purpose loggers, such as the stdout/stderr log, the storage log or the packaging log.
Use constants for special purpose logger names in the anaconda_logging module.
Assign unique 3 letter log level names to improve readability of the log files. Previously the log level names had variable length, making the log files less readable due to log messages beginning on a variable offset on lines in the log file. With the fixed with name log messages start all at the same offset from the start of the line and the lines might also be slightly shorter, so more text might fit on the screen when viewing the logs during installation, especially on low resolution screens.
Use the get_logger() function to automatically provided structured sub-loggers for Anaconda modules based on module hierarchy. The logger naming aims to both make the logs easy to read and understand while keeping the log prefixes reasonably stable over time. Also use structured logging for individual kickstart commands, so that we know form what kickstart command or kickstart processing phase a given message comes. This also makes it possible to easily check for log messages coming from kickstart command processing by grepping for "kickstart\." in the log. And finally remove some redundant logging prefixes, which are no longer needed with structured logging. Structured logging should also be much less error prone than having to manually specify a prefix for each log message.
Use the get_*_logger() methods provided by the anaconda_loggers module when acquiring special purpose loggers.
Structured log message prefix is based on module path, so renaming threads to threading should improve log message readability.
This should make it more evident that log messages come from an "installation" module, rather then that we are telling people to "install" something.
Same rationale as for the install.py -> installation.py renaming.
This should better describe what the module is doing.
Rename some GUI spokes to give them better understandable names: filter -> advanced_storage (probably should have been done long ago) custom -> custom_storage source -> installation_source langsupport -> language_support password -> root_password software -> software_selection
Rename some TUI spokes to give them better understandable names: progress -> installation_progress source -> installation_source langsupport -> language_support password -> root_password software -> software_selection
No longer needed with structured logging as we now get a similar prefix automatically.
65349c0
to
86043c9
Compare
Needed due to the anaconda_log module rename.
Hey, two questions: can you tune this up using cmdline arguments or kickstart config statements? Are the docs updated? the pastebin in the comments expired, can you post a permanent one? :) Thanks! |
And kickstart: Do you have some other use cases that are not cover yet by these options/commands ?
Will do! :) |
anaconda.log with structured logging from latest Rawhide: |
This pull request adds support for structured logging to Anaconda and coverts all modules to use it. You can check the individual commit messages for how it works and and for examples.
Also as this PS is quite substantially changing how the logs look like (due to replacing the always present "anaconda" prefix with the structured logging prefixes), it is currently targeting master only.