Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug?] Hydra turns on global logging to the info level #1262

Closed
2 tasks done
JackUrb opened this issue Jan 4, 2021 · 17 comments
Closed
2 tasks done

[Bug?] Hydra turns on global logging to the info level #1262

JackUrb opened this issue Jan 4, 2021 · 17 comments
Labels
bug Something isn't working

Comments

@JackUrb
Copy link

JackUrb commented Jan 4, 2021

Description

Extending on #1012, Hydra doesn't simply duplicate logs, but it also turns on logging of all loggers globally to the INFO level:

root.setLevel(logging.INFO)

This leads me to get all kinds of spam from other used packages...

Is this actually a desired behavior? If so downstream packages will have to add:

 {"hydra/job_logging": "disabled"},
 {"hydra/hydra_logging": "disabled"},

to all defaults lists to circumvent Hydra updating the global logger, but I'm not sure if I'm losing important logging info by doing this. Am I missing something?

Checklist

  • I checked on the latest version of Hydra
  • I created a minimal repro (See this for tips).

To reproduce

Clone Mephisto.
Run an example script, like python examples/parlai_chat_task_demo/parlai_test_script.py
See logs like the following:
** Stack trace/error message **

...
[2021-01-04 11:33:53,609][sh.command][INFO] - <Command '/bin/rm -rf /Users/jju/mephisto/data/data/runs/NO_PROJECT/1034/build/router'>: starting process
[2021-01-04 11:33:53,617][sh.command][INFO] - <Command '/bin/rm -rf /Users/jju/mephisto/data/data/runs/NO_PROJECT/1034/build/router', pid 98262>: process started
[2021-01-04 11:33:53,897][sh.command][INFO] - <Command '/bin/rm -rf /Users/jju/mephisto/data/data/runs/NO_PROJECT/1034/build/router', pid 98262>: process completed
...

Expected Behavior

I wouldn't expect using Hydra would cause me to see info logs of other modules like sh.

System information

  • Hydra Version : 1.0.0
  • Python version : 3.8.3
  • Virtual environment type and version : conda 4.8.3
  • Operating system : Mac OS 10.15.7
@JackUrb JackUrb added the bug Something isn't working label Jan 4, 2021
@JackUrb JackUrb changed the title [Bug] [Bug] Hydra turns on global logging to the info level Jan 4, 2021
@JackUrb JackUrb changed the title [Bug] Hydra turns on global logging to the info level [Bug?] Hydra turns on global logging to the info level Jan 4, 2021
@omry
Copy link
Collaborator

omry commented Jan 4, 2021

@JackUrb, can you try to narrow it down?

configure_log() is only doing this if it's called with a None log configuration, which should not be the case for normal operation.
I suspect this is related to the log management in fairseq.

@JackUrb
Copy link
Author

JackUrb commented Jan 4, 2021

We're not using fairseq in Mephisto, so I'm not entirely sure where this comes in. Perhaps Mephisto isn't setting up Hydra use properly? Adding the disabled settings to the defaults list resolves the issue for me.

@omry
Copy link
Collaborator

omry commented Jan 4, 2021

Can you show me a minimal example of the problem without depending on anything except Hydra?

@omry
Copy link
Collaborator

omry commented Jan 4, 2021

(Sorry, I was thinking about parlai, not fairseq).

@JackUrb
Copy link
Author

JackUrb commented Jan 4, 2021

examples/simple_static_task/static_test_script.py also demonstrates this issue, but doesn't use ParlAI.

Here's an even more minimal example:

# easy_repro.py
#!/usr/bin/env python3

import os
import sh
import shlex
import hydra
from omegaconf import DictConfig
from dataclasses import dataclass, field
from hydra.core.config_store import ConfigStoreWithProvider

config = ConfigStoreWithProvider("test")

def register_script_config(name: str, module):
    config.store(name=name, node=module)

@dataclass
class ScriptConfig:
    test: str = "hello"

register_script_config(name="scriptconfig", module=ScriptConfig)

@hydra.main(config_name="scriptconfig")
def main(cfg: DictConfig) -> None:
    os.mkdir("test_dir/")
    with open("test_dir/test_file.txt", 'w+') as testfile:
        testfile.write('hello')

    sh.rm(shlex.split("-rf test_dir"))


if __name__ == "__main__":
    main()

And running it:

> python easy_repro.py
[2021-01-04 12:50:22,194][sh.command][INFO] - <Command '/bin/rm -rf test_dir'>: starting process
[2021-01-04 12:50:22,201][sh.command][INFO] - <Command '/bin/rm -rf test_dir', pid 7741>: process started
[2021-01-04 12:50:22,510][sh.command][INFO] - <Command '/bin/rm -rf test_dir', pid 7741>: process completed

@omry
Copy link
Collaborator

omry commented Jan 4, 2021

Thanks. what at the sh and shlex dependencies? how do I get them?

@omry
Copy link
Collaborator

omry commented Jan 4, 2021

This seems like expected behavior.
Hydra default logging is setting the root logger level to info (not in the code you pointed out but in this config):

https://github.com/facebookresearch/hydra/blob/master/hydra/conf/hydra/job_logging/default.yaml#L19

You can configure the logging by overriding the logging configuration.
This includes changing the default log level for the root logger, or just having a different log level to the sh logger.

@omry
Copy link
Collaborator

omry commented Jan 4, 2021

Found the sh package. Looking at the implications of not changing log level for the root logger.

@omry
Copy link
Collaborator

omry commented Jan 4, 2021

Regardless of the conclusion, you can do it on your side. see this for an example of how to configure the logging.

@omry
Copy link
Collaborator

omry commented Jan 4, 2021

Side note:

Hydra doesn't simply duplicate logs

That's a claim I have yet to see proven with a minimal example.

@JackUrb
Copy link
Author

JackUrb commented Jan 4, 2021

Hydra doesn't simply duplicate logs

That's a claim I have yet to see proven with a minimal example.

So they're related, the code that sets the logging level of the root logger to INFO is also applying a handler to the root logger, meaning that if I later add a handler for my specific needs, I get things logged to the stream twice when the logged event propagates up to the root:

root.addHandler(handler)

So if hydra is used in an application that applies any of its own loggers (rather than using the root logger) they get logs twice.

Here's a minimal example for this behavior:

# easy_repro.py
import os
import hydra
from omegaconf import DictConfig
from dataclasses import dataclass, field
from hydra.core.config_store import ConfigStoreWithProvider
import logging

def get_logger(name: str) -> logging.Logger:
    logger = logging.getLogger(name)

    logger.setLevel(logging.INFO)
    handler = logging.StreamHandler()
    formatter = logging.Formatter(
        "[%(asctime)s] {%(filename)s:%(lineno)d} FROM MY LOGGER: %(levelname)5s - %(message)s",
        "%m-%d %H:%M:%S",
    )

    handler.setFormatter(formatter)
    logger.addHandler(handler)
    return logger

my_logger = get_logger("repro-file-logger")

config = ConfigStoreWithProvider("test")

def register_script_config(name: str, module):
    config.store(name=name, node=module)

@dataclass
class ScriptConfig:
    test: str = "hello"

register_script_config(name="scriptconfig", module=ScriptConfig)

@hydra.main(config_name="scriptconfig")
def main(cfg: DictConfig) -> None:
    my_logger.info("This is a message from my logger")


if __name__ == "__main__":
    main()

And the output:

> python easy_repro.py
[01-04 15:18:46] {easy_repro.py:43} FROM MY LOGGER:  INFO - This is a message from my logger
[2021-01-04 15:18:46,228][repro-file-logger][INFO] - This is a message from my logger

The first log comes from my logger, the second comes from Hydra's

@omry
Copy link
Collaborator

omry commented Jan 4, 2021

First of all:

So they're related, the code that sets the logging level of the root logger to INFO is also applying a handler to the root logger,

That code path only executes if the logging config provided to the function is None. I don't think it's actually running for you.

Technically speaking, it's not Hydra that is adding the second handler, it's the app.

Hydra is configuring the logging for the app. you can fully customize that configuration (including not do it at all, which will be easier in 1.1, #1130).

You need to decide if you want to allow Hydra to configure the logging, or if you want to do it yourself.
If you want to do it yourself, take a look at how to disable Hydra log configuration in #1130.
If you want to be able to configure the logging via Hydra, don't do it in your code.
Things gets dicey where it's not your code that is doing it, but some low level library you are using.
I claim that such low level libraries should not be in the business of configuring the logging but should leave this responsibility to the user.

That sentence can also be interpreted as "Hydra should not be configuring the logging", but since you have full control over what config Hydra is using to configure the logging, I think it's worth supporting it (No logging configuration will be perfect for all use cases).

@omry
Copy link
Collaborator

omry commented Jan 4, 2021

#1263 is changing the default log level to WARN.
Thinking about it a bit more, I am becoming increasingly less confident that it's the right thing to do.

@omry
Copy link
Collaborator

omry commented Jan 4, 2021

I abandoned #1263.
Hydra is configuring the logging to to something that is useful for the majority of the applications by default.
I think INFO level for the root logger is a useful default.

Applications that does like the default behavior has two options:

  1. Customize the logging configuration to match their needs.
  2. Completely disable the logging configuration (see Implemented disabling logging configuration & added config file #1130 to see how, this will be easier in 1.1)

Happy to help you with your specific use case (Jump to the Hydra chat).

@omry omry closed this as completed Jan 4, 2021
@JackUrb
Copy link
Author

JackUrb commented Jan 4, 2021

I think 1.1 will end up being the resolution. I'm fine just disabling the logging (as we already have a fairly extensive logging setup for Mephisto) so the real concern was the simplicity by which we could disable it for all Mephisto-related hydra mains. Recursive defaults lists in 1.1 will resolve the issue for us, as we can disable this logging in a base list and then others can override if they desire.

@omry
Copy link
Collaborator

omry commented Jan 4, 2021

Gotcha, yeah - recursive defaults should cover it.

I do suggest that you see if you can migrate your logging configuration to Hydra instead of doing it yourself in code.
If possible, it will likely be more flexible.

@omry
Copy link
Collaborator

omry commented Jan 16, 2021

To clarify to anyone reading this later:

  1. Hydra sets the root logger to INFO log level by default as documented in the basic tutorial.
  2. You can customize fully customize your logging configuration through Hydra
  3. This includes the log level of the root logger.
$ python examples/tutorials/basic/running_your_hydra_app/4_logging/my_app.py 
[2021-01-15 17:11:09,444][__main__][INFO] - Info level message

You can see the current logging configuration with:

$ python my_app.py --cfg hydra -p hydra.job_logging
# @package hydra.job_logging
version: 1
formatters:
  simple:
    format: '[%(asctime)s][%(name)s][%(levelname)s] - %(message)s'
handlers:
  console:
    class: logging.StreamHandler
    formatter: simple
    stream: ext://sys.stdout
  file:
    class: logging.FileHandler
    formatter: simple
    filename: ${hydra.job.name}.log
root:
  level: INFO
  handlers:
  - console
  - file
disable_existing_loggers: false

You can see that the root level is INFO above.
You can override it normally from your config file or even the command line:

$ python my_app.py hydra.job_logging.root.level=WARN
$ 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants