Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to enable logging automatically #3446

Closed
noklam opened this issue Dec 19, 2023 · 16 comments
Closed

How to enable logging automatically #3446

noklam opened this issue Dec 19, 2023 · 16 comments
Labels
Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation

Comments

@noklam
Copy link
Contributor

noklam commented Dec 19, 2023

Introduction

Previous discussion:

In previous discussion we banned the idea of introducing .env to enable logging, we introduce a new environment variable KEDRO_LOGGING_CONFIG that user need to set in their environment (maybe in terminal or settings.py).

With the new kedro new flow, when logging is selected, it is not automatically used until user set KEDRO_LOGGING_CONFIG manually.

Context

(Edited: removed discussion above .env since it's off topic and confused people)

One more problem of KEDRO_LOGGING_CONFIG is that it's not immediately obvious where should user put this environment. There are many options:

  1. ~/.bashrc or it's variant, this way the environment variable is loaded in the beginning
  2. putting it in one of the python file (which one? settings.py is too late)
@astrojuanlu
Copy link
Member

Not sure about this. There's a myriad mechanisms to manage environment variables, for example:

$ cat .rtx.toml
[tools]
python = {version="3.10.11", virtualenv=".venv"}

[env]
GOOGLE_APPLICATION_CREDENTIALS = 'kedro-pypi-stats-dde2342d3c96.json'

I'm not sure what makes .env special. On one hand we could say "this is what we support", but what if folks are using something else? Also, what if there's a clash between Kedro loading .env and then direnv reading .envrc?

I think we should stay away from being too smart about the user environment, and let them manage it.

The core issue is:

With the new kedro new flow, when logging is selected, it is not automatically used until user set KEDRO_LOGGING_CONFIG manually.

which is a real user problem. But maybe we need to explore other solutions.

@noklam noklam changed the title Use of .env and enable logging automatically How to enable logging automatically Jan 16, 2024
@noklam
Copy link
Contributor Author

noklam commented Jan 16, 2024

Updated the title to focus more on the logging problem.

@noklam

This comment was marked as off-topic.

@noklam
Copy link
Contributor Author

noklam commented Jan 29, 2024

Supplement on this, this will be a breaking change - because the config will no longer accept a full path but only the name of the files (or only relative to conf_source)

@DimedS
Copy link
Contributor

DimedS commented Feb 1, 2024

Why is this environment variable necessary? Can we instead observe the project directory for a 'logging.yml' file and use it if found, or default to standard logging options if not?
From what I understand, the environment variable might also introduce a security issue we're currently unable to address, as outlined here: (#3474)

@astrojuanlu
Copy link
Member

astrojuanlu commented Feb 9, 2024

I think we should consider @noklam's #3591 seriously, abstain from fiddling with user-specified logging settings (hence avoid #3474 instead of ignoring it), and if the user wants to provide a file-based logging config, they can always do so in their own code and place it wherever they like. It's clear that logging is an ongoing pain (almost 2 years since #1470 was opened), and we haven't yet fully solved it.

@noklam
Copy link
Contributor Author

noklam commented Feb 19, 2024

@DimedS IIRC it's because of two conflicting features:

  1. ConfigLoader is able to load from conf_source which can be a zip file (default of kedro package)
  2. Logging is decoupled from ConfigLoader

This means that we need to read logging without configloader, and it's very common to put logging.yml in conf folder even it's not used by the config loader. I agree that should be the default and the environment variable should be a workaround, thus I suggest to load this automatically with our best effort.

Cc @merelcht

@noklam noklam added this to the Something about logging milestone Feb 26, 2024
@noklam noklam added the Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation label Feb 26, 2024
@noklam
Copy link
Contributor Author

noklam commented Feb 26, 2024

This issue is partially overlap to #3591

  • Automatic enable logging.yml and kedro run -v is slightly overlap
  • logging.yml affect beyonds kedro run, do we need to have -v for all kedro command?
  • The default of the logging.yml is more independent, as I am challenging the idea of having too many LEVEL default there. This video seems to support that everything should propagate to root unless necessary. Only for rare case you need non-root level logger filtering. (info-handler)

I am particularly interested in your opinion about this @antonymilne.

@astrojuanlu
Copy link
Member

Some extra reading material ahead of a TD session:

@noklam noklam mentioned this issue Feb 27, 2024
7 tasks
@antonymilne
Copy link
Contributor

I'm a bit out of the loop here and will talk to @noklam to share my thoughts on it all in more detail, but some rough points:

  • the current (I think?) environment variable solution was added after a lot of thought and consideration with @idanov. You should definitely pick his brains on the topic too since he will definitely have some useful things to say here
  • the kedro handling of logging is indeed a direct departure from the 12 factor app guidance. I believe this was a conscious decision made in the early days of kedro since users wouldn't know how to e.g. show INFO level messages or direct their logs to a file easily. Hence the logging functionality was built into kedro out of the box for better UX. Possibly expertise of users and logging tools has changed since then, but the handling of logging in kedro has always been a matter of "practicality beats purity". 12FA is useful guidance but not gospel on kedro, e.g. our handling of run environments does exactly what 12FA says not to do
  • personally I don't care much about the file-based logging, but being able to expose INFO level messages by default I think is important. There is an annoying conflict here where the default Python log level of WARNING contradicts with the fact that it's assumed to be good UX to update the user with INFO on progress as their pipeline runs. Of course this assumption could be questioned with some user research since AFAIK we've always just taken it for granted. Remember that kedro pipelines can take a loooong time, and unless there's some other way of demonstrating progress (e.g. a progress bar, which we did also try in a hackathon and I liked) then I feel like kedro should display INFO messages by default. So maybe a better way to handle it would instead be a --quiet option that changes the level from INFO to WARNING?
  • generally agree that kedro should interfere as little as possible with Python's default logging system, e.g. by delegating as much as possible to the root logger. But with the caveat that again this is a question of purity vs. practicality, and Python default's logging level of WARNING doesn't seem suitable on kedro, and the default format of log messages is also pretty gross.

@astrojuanlu
Copy link
Member

Quick 2 cents

  • +1 on setting the default level of the Kedro logger to INFO (if just for backwards compatibility), and then having -q set it to WARNING and -qq to ERROR
    • I don't think the current logging.yml logic is the only way to achieve that though
    • I'm also ambivalent on whether we should change the global logging level to INFO
  • -1 on keeping the current logging.yml logic - whoever wants fine grained control of logs, file logging, rotation etc should be using journald, supervisor, Datadog, or whatever other solution. this is not Kedro's responsibility
  • +1 on directing users to current best-practices for logging, including creating official integrations and docs for Sentry, OpenTelemetry

@antonymilne
Copy link
Contributor

  • +1 on setting the default level of the Kedro logger to INFO (if just for backwards compatibility), and then having -q set it to WARNING and -qq to ERROR

And -v to DEBUG?

  • I don't think the current logging.yml logic is the only way to achieve that though
  • I'm also ambivalent on whether we should change the global logging level to INFO

Not sure whether @noklam filled you in on the chat we had about #3657 (there's a brief summary there but not all the details). But I think we should definitely not set the global logging level to INFO. This is actually what used to be done on kedro and I think it was wrong - see #1732 for more.

Agree that the current logging.yml isn't the only way to achieve setting kedro logger to INFO. Also discussed in #1732, kedro could just add a Python setLevel call manually rather than configuring logging from a file config.

@noklam
Copy link
Contributor Author

noklam commented Mar 18, 2024

I notice the discussion in this thread are mainly related to #3591, though they are related and maybe overlapping.

I have this comment in #3657 #3657 (comment)

Automatic loading logging is fine, but there are edge cases with CONF_SOURCE. I think it's still an improvement and it doesn't make anything worse in the current states.

The idea is almost the same what @DimedS suggested, which we try to locate the logging.yml in the project directory. There are some complication of whether CONF_SOURCE will be available. In any case, having a non-default CONF_SOURCE is not the majority case, having logging.yml not inside the conf folder is even rarer. This is a 90% solution and maybe good enough unless we come up with something better.

@noklam
Copy link
Contributor Author

noklam commented Mar 26, 2024

The solution is basically this: be3d28d in #3577. It should be handles the pattern slightly better, and try to access CONF_SOURCE if possible (may be not).

@sheldontsen-qb
Copy link
Contributor

sheldontsen-qb commented Mar 26, 2024

(Not adding to the solution but more sharing my view...)

As a user for several years (+5 years) of kedro since the early days before it was even called kedro, I was a bit thrown off that logging was removed by default in 0.19. Personally would still prefer it out of the box enabled by default (instead of having to read the docs to find out how to enable it, I would rather read the docs on how to disable if not needed). Also appreciate my view is one of many out there.

To counter Juan's views:
(counter) +1 on keeping the current logging.yml logic - whoever wants fine grained control of logs, file logging, rotation etc should be using journald, supervisor, Datadog, or whatever other solution. this is not Kedro's responsibility.

This is super helpful to just get people setup with something basic.

(for) +1 on directing users to current best-practices for logging, including creating official integrations and docs for Sentry, OpenTelemetry

More docs on how to integrate is always helpful

Both would be good in my case but it was really nice that logging was configured out of the box for a standard project, just like how we can choose spark to be part of the starter.

Last resort: add documentation on how to configure logging.yaml to replicate old behaviour as we know after_context_created hook would have the caveat of not logging anything before context is created. It's a one-time setup per new project which is fine but right now there's no advice on the best place to achieve this without the caveat above.

Thankssss!

@kedro-org kedro-org locked and limited conversation to collaborators Mar 28, 2024
@merelcht merelcht converted this issue into discussion #3755 Mar 28, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation
Projects
None yet
Development

No branches or pull requests

5 participants