Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poetry develop failing on non-ASCII characters #221

Closed
xsduan opened this issue Jun 15, 2018 · 26 comments
Closed

poetry develop failing on non-ASCII characters #221

xsduan opened this issue Jun 15, 2018 · 26 comments
Labels
area/init Related to 'poetry init'/project creation kind/bug Something isn't working as expected

Comments

@xsduan
Copy link

xsduan commented Jun 15, 2018

authors = [
    "Sébastien Eustace <sebastien@eustace.io>"
]
$ poetry develop -vvv

[AttributeError]
'NoneType' object has no attribute 'group'
authors = [
    "Sebastien Eustace <sebastien@eustace.io>"
]
Installing dependencies from lock file

Nothing to install or update

Installing poetry (0.11.0-alpha.3)

As far as I know, the re library doesn't have any ability to support unicode character classes but regex can handle them properly.

I don't know if this has been brought up before or this is a windows-only thing, considering this happened while poetry developing poetry itself. as far as I checked, nobody has made an issue about this before.

Windows 10, python 3.6.4, poetry 0.11.0a3.

edit: #66 is similar.

In the meantime, catching errors:

    def _get_author(self):  # type: () -> dict
+       if self._authors:
+           m = AUTHOR_REGEX.match(self._authors[0])
+       else:
+           m = None

-       if not self._authors:
+       if not m:
+           # log.info('Could not find an author') or whatever
            return {"name": None, "email": None}

        m = AUTHOR_REGEX.match(self._authors[0])

        name = m.group("name")
        email = m.group("email")

        return {"name": name, "email": email}
@xsduan xsduan changed the title AUTHOR_REGEX returns nothing on non-ASCII characters poetry develop failing on non-ASCII characters Jul 15, 2018
@xsduan
Copy link
Author

xsduan commented Jul 15, 2018

somehow the é is being encoded as iso latin-1 which is causing a unicode decode error.

    Complete output from command python setup.py egg_info:
    b"': 'S\xe9bast" # added print(data[24360:24370])
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "c:\users\droom\appdata\local\programs\python\python36\lib\codecs.py", line 331, in decode
        (result, consumed) = self._buffer_decode(data, self.errors, final)
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 24365: invalid continuation byte

@cauebs
Copy link
Contributor

cauebs commented Jul 15, 2018

I remember having this problem, but I couldn't reproduce it now. On what version are you?

@xsduan
Copy link
Author

xsduan commented Jul 16, 2018

poetry@0.11.2, on Windows 10, ran poetry develop -vvv

@sdispater
Copy link
Member

I only have this problem when the author name is retrieved from the Git config. If I set it manually by editing the pyproject.toml file directly or explicitely in the init command I don't have this issue.

@sdispater sdispater added kind/bug Something isn't working as expected area/init Related to 'poetry init'/project creation labels Jul 26, 2018
@RaptDept
Copy link
Contributor

RaptDept commented Aug 3, 2018

I've done poetry develop on Poetry itself on this Windows box before without any issues, but I'm only getting this problem now. I'm not sure why I didn't run into this before.

https://github.com/sdispater/poetry/blob/a1b97707e8b193c3b3a7ee47394c155f9e1eb0c0/poetry/console/commands/develop.py#L31-L32

Since the open() call that creates setup.py doesn't explicitly specify an encoding, it falls back to CP-1252 encoding on my Windows system.

This conflicts with the # -*- coding: utf-8 -*- encoding declaration in format string used to create setup.py:

https://github.com/sdispater/poetry/blob/a1b97707e8b193c3b3a7ee47394c155f9e1eb0c0/poetry/masonry/builders/sdist.py#L25-L43

The simplest solution here is to specify encoding="utf-8" in the open() call.

For demonstration, running poetry develop on Poetry itself works in #368. If this looks okay, I'll write a test. There might be other places in the project that need to have explicit encodings, though -- I'm willing to take a look at that.

While looking into this issue, I also found that the same thing happens to the readme parsing:
https://github.com/sdispater/poetry/blob/a1b97707e8b193c3b3a7ee47394c155f9e1eb0c0/poetry/masonry/metadata.py#L48-L50

Poetry's README.md is UTF-8 on my machine but gets decoded as CP-1252 in poetry develop, turning all the é's into é, which in turn gets written out to the long_description field in setup.py. I'm not sure how this is supposed to be handled. Should it always assume that the readme file is UTF-8?

@xsduan
Copy link
Author

xsduan commented Aug 3, 2018

I think utf8 is a reasonable assumption.

To be safe it could always retry after guessing the encoding with something like chardet but I don't think that would be necessary, or at the very least just try CP-1252/ISO-8859 and then fail.

@vlcinsky
Copy link
Contributor

vlcinsky commented May 8, 2019

As this issue is caused by setup.py being written in default encoding (and failing on systems, which have other than UTF-8 one), PR #1087 shall fix this issue (by being explicit about encoding when creating source-code like files)

@vlcinsky
Copy link
Contributor

@xsduan Can you check, that the latest poetry 0.12.17 fixes the issue?

@xsduan
Copy link
Author

xsduan commented Jul 13, 2019

poetry master = $ poetry --version
Poetry 0.12.17
# installation...
poetry master = $ cd d:\git\poetry
poetry master = $ poetry install
#...
poetry master = $ poetry run pip show poetry
Name: poetry
Version: 0.12.11
Summary: Python dependency management and packaging made easy.
Home-page: https://poetry.eustace.io/
Author: Sébastien Eustace
Author-email: sebastien@eustace.io
License: UNKNOWN
Location: d:\git\poetry
Requires: cachecontrol, cachy, cleo, html5lib, jsonschema, pkginfo, pyparsing, pyrsistent, requests-toolbelt, requests, shellingham, tomlkit
Required-by:

looks like it

@vlcinsky
Copy link
Contributor

@xsduan it looks like ... your edit left your comment incomplete.

@laxas
Copy link

laxas commented Dec 19, 2019

I can reproduce this error in poetry version 1.0.0 (Ubuntu 18.04)

>> poetry init --author "Alex Müller"
...
Package name [test]:  
Version [0.1.0]:  
Description []:  
[UnicodeDecodeError]
'ascii' codec can't decode byte 0xc3 in position 6: ordinal not in range(128)

It seams that "poetry init" can't handle the non ascii character in the author default in the dialogue. However it has no problem if the non ascii character is put after the prompt.

The following works fine

>> poetry init --author Alex
...
Package name [test]:  
Version [0.1.0]:  
Description []:  
Author [Alex, n to skip]:  Alex Müller
License []:  

@finswimmer
Copy link
Member

Thanks @laxas ,

with your example I was able to reproduce it with python 2.7. With python3 it works.

The problem seems to be well known, e.g. : https://stackoverflow.com/questions/21129020/how-to-fix-unicodedecodeerror-ascii-codec-cant-decode-byte

A simple fix would be changing this:

name = self.option("name")

into

name = self.option("name")

if isinstance(name, str):
    name = name.decode().encode("UTF-8")

But I guess, this is such a general problem and should be fixed in another place.

fin swimmer

@miedzinski
Copy link

I have the same problem as @laxan. Can't init new project because of non-ascii character in my git user name.

asciicast

@Abuelodelanada
Copy link

I have the same problem as @laxan. Can't init new project because of non-ascii character in my git user name.

asciicast

The same here! My lastname has an ó

➜  rgh git:(develop) poetry  init -vvv

This command will guide you through creating your pyproject.toml config.

Package name [rgh]:
Version [0.1.0]:
Description []:
'ascii' codec can't encode character u'\xf3' in position 29: ordinal not in range(128)
'ascii' codec can't encode character u'\xf3' in position 29: ordinal not in range(128)
'ascii' codec can't encode character u'\xf3' in position 29: ordinal not in range(128)
...

@jmfederico
Copy link
Contributor

Passing the author to the cli with non-ascii characters also triggers the error:

❯ poetry init --author="an accent í is non-ascii"

This command will guide you through creating your pyproject.toml config.

Package name [non-ascii-test]:
Version [0.1.0]:
Description []:

[UnicodeDecodeError]
'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)

But if a non-ascii character is used when prompted to confirm the author, it does not fail:

❯ poetry init --author="ascii"

This command will guide you through creating your pyproject.toml config.

Package name [non-ascii-test]:
Version [0.1.0]:
Description []:
Author [ascii, n to skip]:  non-ascii í
License []:
...

@julienmalard
Copy link

julienmalard commented Apr 23, 2020

I get the same on poetry build from non-ascii package names, for example:

[tool.poetry]
name = "lassi"
version = "0.1.0"
description = ""
authors = ["ਜ਼ੂਲੀਏਂ ਮਲਾਰ (Julien Malard) <julien.malard@mail.mcgill.ca>"]
packages = [
    { include = "ਲੱਸੀ" }
]

This is on MacOS.
Edit: Unicode author name crashes as well.

@jacebrowning
Copy link
Contributor

Poetry 1.0.10 now displays a slightly more clear error message:

$ poetry build
Building lassi (0.1.0)

[ValueError]
Invalid author string. Must be in the format: John Smith <john@example.com>

using this minimal pyproject.toml:

[tool.poetry]
name = "lassi"
version = "0.1.0"
description = ""
authors = ["ਜ਼ੂਜ਼ੂਜ਼ੂ ਜ਼ੂਜ਼ੂਜ਼ੂ <user@example.com>"]

@julienmalard
Copy link

@jacebrowning Thank you! If you could point me to the place where the AUTHOR_REGEX is defined (I can't find it!) I would be happy to contribute a pull request to help fix this. Perhaps a good approach would be to validate only what is in between <> tags, and allow the name to be anything? (Because some languages will use apostrophes, colons, combining makrs and other characters that re is likely to miss?)

@jacebrowning
Copy link
Contributor

jacebrowning commented Aug 24, 2020

@julienmalard It looks like AUTHOR_REGEX is now part of Poetry Core: https://github.com/python-poetry/poetry-core/search?q=AUTHOR_REGEX&unscoped_q=AUTHOR_REGEX

And imported here:

from poetry.core.packages.package import AUTHOR_REGEX
author = author or default
if author in ["n", "no"]:
return
m = AUTHOR_REGEX.match(author)

@julienmalard
Copy link

@jacebrowning Thank you! I had not noticed that poetry.core was not part of this repository.

@stanislaw
Copy link

I have just encountered this problem when I tried installing and running Poetry from my raw Docker container with Ubuntu Bionic.

The quick fix for me was to do:

LC_ALL=C.UTF-8 poetry

A more permanent solution for my Docker container I have found here:

sudo apt-get -y install language-pack-en
The following extra packages will be installed:
  language-pack-en-base
Generating locales...
  en_GB.UTF-8... /usr/sbin/locale-gen: done
Generation complete.

@vlcinsky
Copy link
Contributor

vlcinsky commented Nov 8, 2020

@stanislaw what version of poetry do you have?

What you describe is a workaround.

poetry shall not depend on current settings for locales etc., it shall explicitly work with utf-8. If this is still not true, the fix (in poetry code) is to explicitly specify encoding utf-8 with all file open operations.

@stanislaw
Copy link

stanislaw commented Nov 8, 2020

@vlcinsky sure I understand. I just needed to get something done really quickly.

root@95eea793181d:/app# poetry --version
Poetry version 1.1.4

The full output:

root@95eea793181d:/app# poetry
Poetry version 1.1.4

USAGE

  UnicodeEncodeError

  'ascii' codec can't encode character '\xa0' in position 30: ordinal not in range(128)

  at ~/.poetry/lib/poetry/_vendor/py3.6/clikit/io/output_stream/stream_output_stream.py:24 in write
Traceback (most recent call last):
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/console_application.py", line 131, in run
    status_code = command.handle(parsed_args, io)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/api/command/command.py", line 120, in handle
    status_code = self._do_handle(args, io)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/api/command/command.py", line 171, in _do_handle
    return getattr(handler, handler_method)(args, io, self)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/handler/help/help_text_handler.py", line 29, in handle
    usage.render(io)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/ui/help/abstract_help.py", line 31, in render
    layout.render(io, indentation)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/ui/layout/block_layout.py", line 42, in render
    element.render(io, self._indentations[i] + indentation)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/ui/components/labeled_paragraph.py", line 70, in render
    + "\n"
  File "/root/.poetry/lib/poetry/_vendor/py3.6/cleo/io/io_mixin.py", line 55, in write
    super(IOMixin, self).write(string, flags)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/api/io/io.py", line 58, in write
    self._output.write(string, flags=flags)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/api/io/output.py", line 61, in write
    self._stream.write(to_str(formatted))
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/io/output_stream/stream_output_stream.py", line 24, in write
    self._stream.write(string)
UnicodeEncodeError: 'ascii' codec can't encode character '\xa0' in position 30: ordinal not in range(128)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.poetry/bin/poetry", line 19, in <module>
    main()
  File "/root/.poetry/lib/poetry/console/__init__.py", line 5, in main
    return Application().run()
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/console_application.py", line 142, in run
    trace.render(io, simple=isinstance(e, CliKitException))
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/ui/components/exception_trace.py", line 232, in render
    return self._render_exception(io, self._exception)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/ui/components/exception_trace.py", line 269, in _render_exception
    self._render_snippet(io, current_frame)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/ui/components/exception_trace.py", line 289, in _render_snippet
    self._render_line(io, code_line)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/ui/components/exception_trace.py", line 402, in _render_line
    io.write_line("{}{}".format(indent * " ", line))
  File "/root/.poetry/lib/poetry/_vendor/py3.6/cleo/io/io_mixin.py", line 65, in write_line
    super(IOMixin, self).write_line(string, flags)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/api/io/io.py", line 66, in write_line
    self._output.write_line(string, flags=flags)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/api/io/output.py", line 69, in write_line
    self.write(string, flags=flags, new_line=True)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/api/io/output.py", line 61, in write
    self._stream.write(to_str(formatted))
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/io/output_stream/stream_output_stream.py", line 24, in write
    self._stream.write(string)
UnicodeEncodeError: 'ascii' codec can't encode character '\u2502' in position 27: ordinal not in range(128)

I have installed it like this:

root@95eea793181d:/app#  curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python
Retrieving Poetry metadata

# Welcome to Poetry!

This will download and install the latest version of Poetry,
a dependency and package manager for Python.

It will add the `poetry` command to Poetry's bin directory, located at:

$HOME/.poetry/bin

This path will then be added to your `PATH` environment variable by
modifying the profile file located at:

$HOME/.profile

You can uninstall at any time by executing this script with the --uninstall option,
and these changes will be reverted.

Installing version: 1.1.4
  - Downloading poetry-1.1.4-linux.tar.gz (57.03MB)

Poetry (1.1.4) is installed now. Great!

To get started you need Poetry's bin directory ($HOME/.poetry/bin) in your `PATH`
environment variable. Next time you log in this will be done
automatically.

To configure your current shell run `source $HOME/.poetry/env`

@vlcinsky
Copy link
Contributor

vlcinsky commented Nov 8, 2020

Thanks @stanislaw for detailed report.

My note was about robustness of poetry. Workarounds are very practical and often life savers.

I have met dependency of setuptools on current system locale which is definitely wrong (I think my fix is already in). Solution is searching through the project for all open calls and making sure, that if opening a stream in text mode, they do explicitly state the encoding "utf-8".

@julienmalard
Copy link

julienmalard commented Nov 8, 2020

@jacebrowning Thanks for the pointer a few months back regarding AUTHOR_REGEX. After a bit of experimentation, I think that this has to do not with Poetry per se but rather with a bug in the re module (see lark-parser/lark#590).

Replacing re with regex solves everything:

import re
AUTHOR_REGEX = re.compile(r"(?u)^(?P<name>[- .,\w\d'’\"()]+) <(?P<email>.+?)>$")
AUTHOR_REGEX.match("ம. ஆ. ஜூலீஎன் <julien.malard@mail.mcgill.ca>")
>>> None
# But...
import regex as re
AUTHOR_REGEX = re.compile(r"(?u)^(?P<name>[- .,\w\d'’\"()]+) <(?P<email>.+?)>$")
AUTHOR_REGEX.match("ம. ஆ. ஜூலீஎன் <julien.malard@mail.mcgill.ca>")
>>> <regex.Match object; span=(0, 44), match='ம. ஆ. ஜூலீஎன் <julien.malard@mail.mcgill.ca>'>

So my question would now be - should I submit a pull request with import regex as re to Poetry? Or would adding a dependency risk breaking things?
Thanks!

Copy link

github-actions bot commented Mar 1, 2024

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/init Related to 'poetry init'/project creation kind/bug Something isn't working as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.