Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make autogen/pari Python 3 compatible #21874

Closed
jdemeyer opened this issue Nov 13, 2016 · 29 comments
Closed

Make autogen/pari Python 3 compatible #21874

jdemeyer opened this issue Nov 13, 2016 · 29 comments

Comments

@jdemeyer
Copy link

CC: @defeo @fchapoton

Component: python3

Author: Frédéric Chapoton

Branch/Commit: 07d262a

Reviewer: Jeroen Demeyer

Issue created by migration from https://trac.sagemath.org/ticket/21874

@jdemeyer jdemeyer added this to the sage-7.5 milestone Nov 13, 2016
@fchapoton
Copy link
Contributor

Branch: u/chapoton/21874

@fchapoton
Copy link
Contributor

New commits:

c36156e21874 py3 compatible autogen/pari (bytes for communicate)

@fchapoton
Copy link
Contributor

Author: Frédéric Chapoton

@fchapoton
Copy link
Contributor

Commit: c36156e

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Nov 14, 2016

Branch pushed to git repo; I updated commit sha1. New commits:

6d3d2c7trac 21874 fixing

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Nov 14, 2016

Changed commit from c36156e to 6d3d2c7

@fchapoton
Copy link
Contributor

comment:3

should be good now. Works for python3, not checked for python2.

@jdemeyer
Copy link
Author

comment:4

Doctest failure, see patchbot.

@jdemeyer
Copy link
Author

comment:5

The "native" format for filenames is bytes. So it seems silly to convert bytes to unicode and then let Python convert it back to bytes. Just drop the .decode().

@fchapoton
Copy link
Contributor

comment:6

elsewhere in the code, one takes the sum (+) of datadir and a string, so one needs the decode()

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Nov 16, 2016

Branch pushed to git repo; I updated commit sha1. New commits:

169eb21trac 21874 correct doctest

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Nov 16, 2016

Changed commit from 6d3d2c7 to 169eb21

@jdemeyer
Copy link
Author

comment:9

Replying to @fchapoton:

elsewhere in the code, one takes the sum (+) of datadir and a string, so one needs the decode()

That string should then also be bytes.

@defeo
Copy link
Member

defeo commented Nov 16, 2016

comment:10

The "native" format for filenames is bytes.

Are you sure about this? In Python 3.5

>>> import os
>>> type(os.listdir('.')[0])
str

@jdemeyer
Copy link
Author

comment:11

listdir outputs str when given str and bytes when given bytes:

>>> import os
>>> type(os.listdir(b'.')[0])
<class 'bytes'>
>>> import os
>>> type(os.listdir('.')[0])
<class 'str'>

@jdemeyer
Copy link
Author

comment:12

See https://docs.python.org/3/howto/unicode.html#unicode-filenames

Note the sentence "When opening a file for reading or writing, you can usually just provide the Unicode string as the filename, and it will be automatically converted to the right encoding for you". This means that Python does a conversion of encoding, so it makes sense to use bytes for filenames to avoid conversions.

@defeo
Copy link
Member

defeo commented Nov 16, 2016

comment:13

Replying to @jdemeyer:

See https://docs.python.org/3/howto/unicode.html#unicode-filenames

Note the sentence "When opening a file for reading or writing, you can usually just provide the Unicode string as the filename, and it will be automatically converted to the right encoding for you". This means that Python does a conversion of encoding, so it makes sense to use bytes for filenames to avoid conversions.

But handling the locale/filesystem encoding seems best left to Python. What if datastring is bytes, string is python's utf-8, and the user's locale is ISO-8859? What is safest among open(datastring.decode() + string) and open(datastring + string.encode())? Probably neither.

I do not even dare imagine what would happen with a WIN-1252-encoded fat filesystem mounted inside a ISO-8859 locale (these things can happen with usb keys...).

@jdemeyer
Copy link
Author

comment:14

Replying to @defeo:

But handling the locale/filesystem encoding seems best left to Python. What if datastring is bytes, string is python's utf-8, and the user's locale is ISO-8859? What is safest among open(datastring.decode() + string) and open(datastring + string.encode())? Probably neither.

If string is ASCII, then string.encode() should be pretty safe since most encodings map ASCII to ASCII bytes.

@defeo
Copy link
Member

defeo commented Nov 16, 2016

comment:15

Replying to @jdemeyer:

Replying to @defeo:

But handling the locale/filesystem encoding seems best left to Python. What if datastring is bytes, string is python's utf-8, and the user's locale is ISO-8859? What is safest among open(datastring.decode() + string) and open(datastring + string.encode())? Probably neither.

If string is ASCII, then string.encode() should be pretty safe since most encodings map ASCII to ASCII bytes.

Same goes for datastring.decode(), then. I was obviously thinking of the case where they contain non-ASCII characters.

@jdemeyer
Copy link
Author

comment:16

Replying to @defeo:

Same goes for datastring.decode(), then.

Not if datastring refers to a user-chosen path. Then we shouldn't make too many assumptions.

@defeo
Copy link
Member

defeo commented Nov 16, 2016

comment:17

Replying to @jdemeyer:

Replying to @defeo:

Same goes for datastring.decode(), then.

Not if datastring refers to a user-chosen path. Then we shouldn't make too many assumptions.

Right. I thought it was the other way around. So where are these datadir + string? I see one on line 83 of parser.py, and there string is pari.desc, so I agree with you (assuming that gp correctly handles locale encodings). Any other places?

@fchapoton
Copy link
Contributor

comment:18

I think this is the only place. Could you please handle all this without me ?

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Nov 16, 2016

Changed commit from 169eb21 to 07d262a

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Nov 16, 2016

Branch pushed to git repo; I updated commit sha1. New commits:

07d262atrac 21874 using bytes everywhere

@fchapoton
Copy link
Contributor

comment:20

ok, here is another try, following Jeroen's suggestion

@fchapoton
Copy link
Contributor

comment:21

patchbot now gives a green light

@jdemeyer
Copy link
Author

Reviewer: Jeroen Demeyer

@jdemeyer
Copy link
Author

comment:22

Good for me then.

@vbraun
Copy link
Member

vbraun commented Nov 19, 2016

Changed branch from u/chapoton/21874 to 07d262a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants