Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError on Windows when there are Unicode chars in the help message #2121

Open
leouieda opened this issue Nov 1, 2021 · 20 comments

Comments

@leouieda
Copy link

leouieda commented Nov 1, 2021

I have come across an error when I try to print the help message for my app (--help) on Windows (using bash, cmd, and powershell). My help message has unicode characters in it (the project name) which is what seems to be causing the problem:

This PR tests running the app with --help and it fails on Windows and Python 3.6 and 3.10: leouieda/nene#12

Here is a minimum example that fails:

# example.py
import click

@click.command(context_settings={"help_option_names": ["-h", "--help"]})
def main():
    """
    App description with Unicode ‣
    """
    pass

if __name__ == '__main__':
    main()
$ python example.py -h
Traceback (most recent call last):
  File "example.py", line 11, in <module>
    main()
  File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\click\core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\click\core.py", line 1052, in main
    with self.make_context(prog_name, args, **extra) as ctx:
  File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\click\core.py", line 914, in make_context
    self.parse_args(ctx, args)
  File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\click\core.py", line 1370, in parse_args
    value, args = param.handle_parse_result(ctx, opts, args)
  File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\click\core.py", line 2347, in handle_parse_result
    value = self.process_value(ctx, value)
  File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\click\core.py", line 2309, in process_value
    value = self.callback(ctx, self, value)
  File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\click\core.py", line 1270, in show_help
    echo(ctx.get_help(), color=ctx.color)
  File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\click\utils.py", line 298, in echo
    file.write(out)  # type: ignore
  File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2023' in position 62: character maps to <undefined>

I can confirm that it's the Unicode characters in the docstring of the function wrapped with the main @click.command that causes the issue. Removing them fixes the problem (the second CI run on leouieda/nene#12). This issue does not happen on Linux and Mac.

For now, I'll remove the unicode characters so I'm not pushing a broken package but it would be great to be able to include the proper spelling of the package name in the future.

Environment:

  • Python version: 3.6 and 3.10
  • Click version: 8.0.3
@davidism
Copy link
Member

davidism commented Nov 1, 2021

Please include a minimal reproducible example in the issue itself. Links to projects can be helpful, but it's much easier for contributors and maintainers to address a bug here instead of there.

@leouieda
Copy link
Author

leouieda commented Nov 1, 2021

Sorry about that. I'll edit the description with an example. Just trying to run it on CI to see if it really breaks.

@leouieda
Copy link
Author

leouieda commented Nov 1, 2021

Done.

leouieda added a commit to leouieda/nene that referenced this issue Nov 2, 2021
Running `nene --help` on Windows causes a crash due to the Unicode characters
in the app name. For now, remove the Unicode to avoid crashes and add some CI
tests to make sure it works. This may be a bug on click
(pallets/click#2121).
@NodeJSmith
Copy link

NodeJSmith commented Nov 13, 2022

I believe this can be closed, as it is not an issue caused by click. I wrote up an explanation on another issue and a gist but the tldr is that this is caused by the Windows agent redirecting command output to a file and the default locale code page not being Unicode compatible. While click may be able to solve for this, it is definitely not caused by click.

@davidism
Copy link
Member

The file path in the traceback, C:\hostedtoolcache\windows\Python\3.6.8\x64\, suggests that this issue is being reported about runs in an Azure Windows agent. It sounds like this is an issue with the behavior of the agent, not Click.

@leouieda
Copy link
Author

This was reported to me by a user on Windows and I tested on GitHub Actions since I don't have access to a Windows machine for testing. I'm sure if they were encountering this on Azure or on their own machine, though.

@rudolfbyker
Copy link

rudolfbyker commented Feb 3, 2023

Here is my repro of the same error. For me this happens when running a click program inside "git bash". https://github.com/rudolfbyker/click-git-bash-unicode-repro

I can see how this is not caused by click, but maybe we could treat it as a feature request that click should work around this somehow?

@davidism
Copy link
Member

davidism commented Feb 3, 2023

I'm happy to review a PR that fixes the issue.

@davidism
Copy link
Member

davidism commented Feb 3, 2023

Also note my original comment:

Please include a minimal reproducible example in the issue itself. Links to projects can be helpful, but it's much easier for contributors and maintainers to address a bug here instead of there.

@rudolfbyker
Copy link

A few possible workarounds for those searching:

  • Run your script with python -X utf8 …
  • Set the PYTHONIOENCODING environment variable to utf8.
  • Run sys.stdout.reconfigure(encoding="utf-8") and sys.stderr.reconfigure(encoding="utf-8") at the start of your script.

Depending on the situation, one or more of these could convince Python to use UTF-8 rather than CP1252.

@ddelange
Copy link

fwiw, encountered this error for click.echo('├─') in the CI of ddelange/pipgrip#128.

It's on Github Actions windows-latest runners, which will return sys.getfilesystemencoding() == 'utf-8', meaning it's running python in utf8 mode.

Somehow, click still goes into a cp1252 routine in that GHA environment...

logs.txt

@davidism
Copy link
Member

Happy to review a PR.

@ddelange
Copy link

ddelange commented Nov 13, 2023

could you point me to the point in code where we could set the output encoding based on sys.getfilesystemencoding(), such that these characters at least get printed on windows with python 3.7+ running in utf8 mode (PYTHONUTF8=1)?

@ddelange
Copy link

or maybe https://docs.python.org/3/library/sys.html#sys.getdefaultencoding?

or some other way to get click to respect Python UTF-8 Mode?

@ddelange
Copy link

hmm looks like utf-16?

"utf-16-le",

why does the OP and my traceback go into cp1252.py in the first place? 🤔

@ddelange
Copy link

Here is my repro of the same error. For me this happens when running a click program inside "git bash". https://github.com/rudolfbyker/click-git-bash-unicode-repro

I can see how this is not caused by click, but maybe we could treat it as a feature request that click should work around this somehow?

as shown in that screenshot, it doesnt happen in every console. would be cool to support Github Actions windows-latest, but no idea how to find a possible detection/mediation technique here

@NodeJSmith
Copy link

fwiw, encountered this error for click.echo('├─') in the CI of ddelange/pipgrip#128.

It's on Github Actions windows-latest runners, which will return sys.getfilesystemencoding() == 'utf-8', meaning it's running python in utf8 mode.

Somehow, click still goes into a cp1252 routine in that GHA environment...

logs.txt

Feel free to review my gist covering this, but just a quick heads up that checking sys.getfilesystemencoding() won't necessarily be accurate. You're better off checking sys.stdout.encoding. If you haven't set PYTHONUTF8 or PYTHONIOENCODING in your pipeline yet I would try that before doing anything else.

@ddelange
Copy link

would it make sense to simply catch this error in click.echo and re-raise it with a more verbose message?

try:
    file.write(out)
except UnicodeEncodeError as exc:
    if sys.flags.utf8_mode:
        raise
    msg = "Failed to echo some Unicode character. Try enabling [UTF-8 mode](https://docs.python.org/3/library/os.html#utf8-mode)."
    raise UnicodeEncodeError(msg) from exc

@NodeJSmith
Copy link

@ddelange +1 I think that's a fantastic idea

@davidism
Copy link
Member

If you think the error needs to be clearer, report that to python. They have been updating many errors in the last few releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants