New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utils: decode requirement files according to their BOM if present #3485

Merged
merged 5 commits into from Mar 4, 2016

Conversation

Projects
None yet
5 participants
@xavfernandez
Contributor

xavfernandez commented Feb 12, 2016

Work In Progrees, should fix #2865

Review on Reviewable

@pfmoore

This comment has been minimized.

Show comment
Hide comment
@pfmoore

pfmoore Feb 12, 2016

Member

LGTM. Remarkably close to what I have in my local checkout at the moment. But cleaner :-)

Member

pfmoore commented Feb 12, 2016

LGTM. Remarkably close to what I have in my local checkout at the moment. But cleaner :-)

@xavfernandez

This comment has been minimized.

Show comment
Hide comment
@xavfernandez

xavfernandez Feb 13, 2016

Contributor

Sorry to see we worked on the same thing :-/
But for some reason Python 2.6 does not like my solution...
Plus my encoding with utf8 and decoding with locale.getpreferredencoding(False) does not seem like a good idea.

Contributor

xavfernandez commented Feb 13, 2016

Sorry to see we worked on the same thing :-/
But for some reason Python 2.6 does not like my solution...
Plus my encoding with utf8 and decoding with locale.getpreferredencoding(False) does not seem like a good idea.

@xavfernandez

This comment has been minimized.

Show comment
Hide comment
@xavfernandez

xavfernandez Feb 13, 2016

Contributor

https://docs.python.org/2/library/shlex.html

Prior to Python 2.7.3, this module did not support Unicode input. 😢

Contributor

xavfernandez commented Feb 13, 2016

https://docs.python.org/2/library/shlex.html

Prior to Python 2.7.3, this module did not support Unicode input. 😢

@pfmoore

This comment has been minimized.

Show comment
Hide comment
@pfmoore

pfmoore Feb 13, 2016

Member

No problem.

I pray for the day when we can drop 2.6 support :-(
How about, if we're on 2.6 then we encode the requirements file content as ASCII (on the basis that if there's no Unicode support, ASCII is all that is valid) and report any encoding errors as "Unicode requirements files are only supported in Python 2.7 onwards"?

Member

pfmoore commented Feb 13, 2016

No problem.

I pray for the day when we can drop 2.6 support :-(
How about, if we're on 2.6 then we encode the requirements file content as ASCII (on the basis that if there's no Unicode support, ASCII is all that is valid) and report any encoding errors as "Unicode requirements files are only supported in Python 2.7 onwards"?

@xavfernandez

This comment has been minimized.

Show comment
Hide comment
@xavfernandez

xavfernandez Feb 14, 2016

Contributor

Well I tried being more forgiving and encoded to utf8 before shlex.split: tests seem happy.

I'm trying to add some code cleaning (always make get_file_content return text), hoping #1441 won't come back...

Contributor

xavfernandez commented Feb 14, 2016

Well I tried being more forgiving and encoded to utf8 before shlex.split: tests seem happy.

I'm trying to add some code cleaning (always make get_file_content return text), hoping #1441 won't come back...

Show outdated Hide outdated tests/unit/test_utils.py Outdated

@xavfernandez xavfernandez added this to the 8.1 milestone Feb 24, 2016

@dstufft

This comment has been minimized.

Show comment
Hide comment
@dstufft

dstufft Mar 3, 2016

Member

Is this still a WIP? Or ready to go?

Member

dstufft commented Mar 3, 2016

Is this still a WIP? Or ready to go?

@xavfernandez

This comment has been minimized.

Show comment
Hide comment
@xavfernandez

xavfernandez Mar 3, 2016

Contributor

Ok, from a simple mypkg example containing only:

from setuptools import setup

setup(
    name='mypkg',
    version=0.1,
    install_requires=['httpretty==0.7.1'],
)

I could reproduce the issue of #1441 with pip 1.5.0.
With this PR, mypkg could be installed without any error so no obvious regression on this issue.

Contributor

xavfernandez commented Mar 3, 2016

Ok, from a simple mypkg example containing only:

from setuptools import setup

setup(
    name='mypkg',
    version=0.1,
    install_requires=['httpretty==0.7.1'],
)

I could reproduce the issue of #1441 with pip 1.5.0.
With this PR, mypkg could be installed without any error so no obvious regression on this issue.

xavfernandez added a commit that referenced this pull request Mar 4, 2016

Merge pull request #3485 from xavfernandez/bom_detection
utils: decode requirement files according to their BOM if present

@xavfernandez xavfernandez merged commit cab10e4 into pypa:develop Mar 4, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@xavfernandez xavfernandez deleted the xavfernandez:bom_detection branch Mar 4, 2016

@aodag

This comment has been minimized.

Show comment
Hide comment
@aodag

aodag Mar 7, 2016

Contributor

Hi, this change causes error during reading requirements.txt encoded utf-8 without BOM.
On python2, locale.getpreferredencoding(False) returns ASCII encoding.
Why not use utf-8 as default encoding?

aodag@debian:~$ uname -a
Linux debian 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u3 (2016-01-17) x86_64 GNU/Linux
aodag@debian:~$ python
Python 2.7.9 (default, Mar  1 2015, 12:57:24)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getpreferredencoding(False)
'ANSI_X3.4-1968'
>>>
aodag@debian:~$ python3
Python 3.4.2 (default, Oct  8 2014, 10:45:20)
[GCC 4.9.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getpreferredencoding(False)
'UTF-8'
Contributor

aodag commented Mar 7, 2016

Hi, this change causes error during reading requirements.txt encoded utf-8 without BOM.
On python2, locale.getpreferredencoding(False) returns ASCII encoding.
Why not use utf-8 as default encoding?

aodag@debian:~$ uname -a
Linux debian 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u3 (2016-01-17) x86_64 GNU/Linux
aodag@debian:~$ python
Python 2.7.9 (default, Mar  1 2015, 12:57:24)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getpreferredencoding(False)
'ANSI_X3.4-1968'
>>>
aodag@debian:~$ python3
Python 3.4.2 (default, Oct  8 2014, 10:45:20)
[GCC 4.9.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getpreferredencoding(False)
'UTF-8'
@xavfernandez

This comment has been minimized.

Show comment
Hide comment
@xavfernandez

xavfernandez Mar 7, 2016

Contributor

Why not use utf-8 as default encoding?
Because, it might well be encoded in something else than utf8 :) Previously it worked because in python2 we were operating on str (bytes) types.

It looks like pip should be calling locale.setlocale(locale.LC_ALL, '') before calling locale.getpreferredencoding(False).

Or maybe we could accept this kind of header: # -*- coding: utf-8 -*- in requirements.txt ?

Any thought @pfmoore @dstufft ?

Contributor

xavfernandez commented Mar 7, 2016

Why not use utf-8 as default encoding?
Because, it might well be encoded in something else than utf8 :) Previously it worked because in python2 we were operating on str (bytes) types.

It looks like pip should be calling locale.setlocale(locale.LC_ALL, '') before calling locale.getpreferredencoding(False).

Or maybe we could accept this kind of header: # -*- coding: utf-8 -*- in requirements.txt ?

Any thought @pfmoore @dstufft ?

@xavfernandez

This comment has been minimized.

Show comment
Hide comment
@xavfernandez

xavfernandez Mar 7, 2016

Contributor

I've implemented belt and suspenders in #3547

Contributor

xavfernandez commented Mar 7, 2016

I've implemented belt and suspenders in #3547

@pfmoore

This comment has been minimized.

Show comment
Hide comment
@pfmoore

pfmoore Mar 7, 2016

Member

For the simple cases, I think pip should treat the requirements file as a text file (i.e. open it in the system default encoding), with BOM detection being a useful convenience for Windows users whose tools have a tendency to use things like UTF-16 with BOM in spite of the default encoding. I'm -0.5 on defaulting to UTF-8, as Windows tools need extra effort to specify UTF-8, so we'll likely just end up with Windows users complaining that pip isn't handling their requirements files properly.

I don't know what incantations are needed with setlocale I'll defer to your expertise there.

For cross-platform requirements files, an encoding header seems like a plausible approach.

(And I see that while I've been writing this response, you've created a PR implementing this. Looks like a good solution to me, go for it (although there's a text/bytes problem I've commented on in the PR).

Member

pfmoore commented Mar 7, 2016

For the simple cases, I think pip should treat the requirements file as a text file (i.e. open it in the system default encoding), with BOM detection being a useful convenience for Windows users whose tools have a tendency to use things like UTF-16 with BOM in spite of the default encoding. I'm -0.5 on defaulting to UTF-8, as Windows tools need extra effort to specify UTF-8, so we'll likely just end up with Windows users complaining that pip isn't handling their requirements files properly.

I don't know what incantations are needed with setlocale I'll defer to your expertise there.

For cross-platform requirements files, an encoding header seems like a plausible approach.

(And I see that while I've been writing this response, you've created a PR implementing this. Looks like a good solution to me, go for it (although there's a text/bytes problem I've commented on in the PR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment