Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to turn log/traceback color off? #180

Closed
stas00 opened this issue May 8, 2024 · 7 comments · Fixed by #185
Closed

how to turn log/traceback color off? #180

stas00 opened this issue May 8, 2024 · 7 comments · Fixed by #185

Comments

@stas00
Copy link

stas00 commented May 8, 2024

Trying datatrove for the first time and the program spews a bunch of logs and tracebacks in yellow and cyan which are completely unreadable on the b&w console.

Does the program make an assumption that the user is using w&b (dark) console?

I tried to grep for color to see how it controls the colors but found nothing relevant, so it's probably some 3rd party component that does that.

If the coloring logic doesn't bother to check what the console colors are to keep the output readable, any idea how to turn it off completely? I RTFM'ed - didn't find any docs that address that aspect.

Thanks a lot!

@guipenedo
Copy link
Collaborator

Hi,
Indeed that's an oversight on our part, we'll add an option somewhere.
For now I think if you set the env variable export LOGURU_COLORIZE=NO it should disable them
Sorry about that and thank you for taking an interest in datatrove :)

@guipenedo
Copy link
Collaborator

For reference it's this line here with colorize=True:
https://github.com/huggingface/datatrove/blob/main/src%2Fdatatrove%2Futils%2Flogging.py#L46

@stas00
Copy link
Author

stas00 commented May 8, 2024

I don't think it's colorize=True, it's the first thing I tried to set to False and it made no difference. I have retested it now, the color is still there.

But LOGURU_COLORIZE=NO did the trick, thank you very much, @guipenedo!

It looks like even the package that caused the problem isn't documenting this feature
https://github.com/search?q=repo%3ADelgan%2Floguru%20LOGURU_COLORIZE&type=code

Would it be kindly possible to document this somewhere in this project's docs? Thank you!

@guipenedo
Copy link
Collaborator

Could you please check if passing colorize_log_output=False to your executor when running #185 fixes the issue?

@stas00
Copy link
Author

stas00 commented May 15, 2024

I confirm that it works.

May I suggest that this is a sub-optimal way to implement this - as now the user has to pass this flag multiple times and the main issue is that this flag has nothing to do with the Executor but the library itself. This is a global behavior - either the user's setup can support the rainbow colors or it doesn't. It's a one-time binary decision.

Moreover this won't work well for different users using the same datatrove script since their setup will be different and hardcoding for one user's environment will not make other users' experience optimal.

May I propose 2 different solutions:

  1. Typically HF libraries have logging.py where logging levels and behavior is controlled this would allow the user to set the logging behavior once at the beginning of the program and be done with it. e.g. see https://github.com/huggingface/transformers/blob/main/src/transformers/utils/logging.py
    This approach works really well and ideally should belong there.

  2. The other approach is to have a DATATROVE_COLORIZE env var. (this would be the preferable solution since each user can then choose what they want - and the same script could be used by users with different envs).

Thank you.

@guipenedo
Copy link
Collaborator

Very well, point taken.
I've removed the options from the executor and added the following env variables:

  • DATATROVE_COLORIZE_LOGS "1" to add ANSI colors to console log messages and "0" to disable colorization.
  • DATATROVE_COLORIZE_LOG_FILES set to "1" to add ANSI colors to log messages saved to logs/task_XXXXX.log.

Thank you for the taking the time to discuss this

@stas00
Copy link
Author

stas00 commented May 17, 2024

Unless I'm missing something your default to colorize to the dark theme will still render datatrove problematic for anybody with a non-dark theme - since you're not detecting if it's safe to colorize or not - I have no idea what "auto-detected" in

By default, colorization will be auto detected for console messages

means. Could you please point me to where that code is?

But this is your library, and if you don't care for a smooth out of the box experience for all users and force on them a UX feature that has nothing to do with what this library was designed for, that's your choice.

Thank you for adding knobs to allow a user to have the library usable after they discovered those knobs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants