-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
uu package uses old encoding #74289
Comments
Looking in the man pages for the uuencode and uudecode (http://www.manpagez.com/man/5/uuencode/), I see that the encoding used to go from ascii 32 to 95 but that 32 is deprecated and generally newer releases go from 33-96 (with 96 being used in place of 32). This replaces the " " in the encoding with "`". For example, the newest version of busybox only accepts the new encoding. The uu package has no way to specify to use this new encoding making it a pain to integrate. Oddly, the uu.decode function does properly decode files encoded using "`", but encode is unable to create them. |
Looks like perl has already encoded in this way: [~]$ perl -e 'print pack("u","Ca\x00t")'
The decoder source code explicitly states it could resolve backtick since some encoders use '`' instead of space. To maintain backwards compatibility, I think we can add a keyword-only backtick parameter to binascii.b2a_uu and uuencode.
|
Is there any standard? From Wikipedia [1]: """ This obviously makes impossible using "`" as zero instead of space. |
There seems no standard. I also read the wikipedia but for perl and uuencode on my Linux, they now all use backticks to represent zero instead of spaces. [ while Python now:
end Except the link Kyle gives, the manpage of FreeBSD describes the new algorithm: http://www.unix.com/man-page/freebsd/5/uuencode/ I don't propose to change current behaviour to break backwards compatibility. But I think it's reasonable to provide a way to allow users to use backticks.
|
What about other popular languages? Java, PHP, Ruby, Tcl, C#, JavaScript, Swift, Go, Rust? Do any languages provide a way for configuring zero character and what are the names of the options? Are there languages that use "`" instead of a space only for padding, but not for representing an ordinal zero? |
FWIW I am using NXP LPC microcontrollers at the moment, whose bootloader uses the grave/backtick instead of spaces. (NXP application note AN11229.) Although in practice it does seem to accept Python's spaces instead of graves. I wouldn't put too much weight to Wikipedia, especially where it says graves are not used for encoded data (vs length and padding). Earlier versions of Wikipedia did mention graves in regular data. I understand the reason for avoiding spaces is to due to spaces being stripped (e.g. by email, copy and paste, etc). You have to avoid spaces in data, not just padding, because a data space may still appear at the end of a line. |
Uuencode has no official standards and it all depends on the implementation. For other languages, I could only find PHP, java, activetcl? have official implementation. PHP and activetcl defaults to backticks and no options. Java defaults to spaces and no options. |
Thanks Martin and Xiang. Wikipedia is not a reliable source, but it usually is based on reliable sources. In this case seems it is wrong. The next question is about parameter name. The Wikipedia uses the name "grave accent", the manpage of FreeBSD uuencode uses the name "backquote", the proposed patch uses the name "backtick". "Grave accent" is an official Unicode name, "backquote" and "backtick" are commonly used in programming context. We could use also the name containing "space" with the default value True. |
I think "grave accent" is not suitable. Although it's the standard unicode name but it's not commonly used in programming so not direct enough. "backquote" and "backtick" seems could be used interchangeably I don't have any preference. Perl seems to use backtick instead of backquote when ` is a language part. Yeah, space is also a choice. |
Python 2 used the term "backquote" when ` is a language part. |
token defines it as backquote. But in doc there are also several places calling it backticks[1][2]. Do you have any preference Serhiy and Martin? [1] https://docs.python.org/release/3.0.1/whatsnew/3.0.html#removed-syntax |
I think I would prefer b2a_uu(data, grave=True), but am also happy with Xiang’s backtick=True if others prefer that. :) In my mind “grave accent” is the pure ASCII character; it just got abused for other things. Other options: b2a_uu(data, space=False)
b2a_uu(data, avoid_spaces=True)
b2a_uu(data, use_0x60=True) |
I'm +0 for something containing "space" in the option name since the purpose of changing the UU encoding is avoiding stripping spaces. But this is not strong preference. Actually there is no need to add a new option in b2a_uu(), since we can just use b2a_uu(data).replace(b' ', b'`'). It is added just for convenience. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: