Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uu package uses old encoding #74289

Closed
LawfulEvil mannequin opened this issue Apr 19, 2017 · 14 comments
Closed

uu package uses old encoding #74289

LawfulEvil mannequin opened this issue Apr 19, 2017 · 14 comments
Labels
3.7 stdlib type-feature

Comments

@LawfulEvil
Copy link
Mannequin

@LawfulEvil LawfulEvil mannequin commented Apr 19, 2017

BPO 30103
Nosy @vadmium, @serhiy-storchaka, @zhangyangyu
PRs
  • #1326
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-05-03.03:18:36.515>
    created_at = <Date 2017-04-19.18:22:20.863>
    labels = ['3.7', 'type-feature', 'library']
    title = 'uu package uses old encoding'
    updated_at = <Date 2017-05-03.03:18:36.514>
    user = 'https://bugs.python.org/LawfulEvil'

    bugs.python.org fields:

    activity = <Date 2017-05-03.03:18:36.514>
    actor = 'xiang.zhang'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-05-03.03:18:36.515>
    closer = 'xiang.zhang'
    components = ['Library (Lib)']
    creation = <Date 2017-04-19.18:22:20.863>
    creator = 'LawfulEvil'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 30103
    keywords = []
    message_count = 14.0
    messages = ['291893', '292416', '292464', '292465', '292466', '292491', '292513', '292515', '292516', '292517', '292518', '292559', '292579', '292834']
    nosy_count = 4.0
    nosy_names = ['martin.panter', 'serhiy.storchaka', 'xiang.zhang', 'LawfulEvil']
    pr_nums = ['1326']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue30103'
    versions = ['Python 3.7']

    @LawfulEvil
    Copy link
    Mannequin Author

    @LawfulEvil LawfulEvil mannequin commented Apr 19, 2017

    Looking in the man pages for the uuencode and uudecode (http://www.manpagez.com/man/5/uuencode/), I see that the encoding used to go from ascii 32 to 95 but that 32 is deprecated and generally newer releases go from 33-96 (with 96 being used in place of 32). This replaces the " " in the encoding with "`".

    For example, the newest version of busybox only accepts the new encoding.

    The uu package has no way to specify to use this new encoding making it a pain to integrate. Oddly, the uu.decode function does properly decode files encoded using "`", but encode is unable to create them.

    @LawfulEvil LawfulEvil mannequin added type-bug extension-modules labels Apr 19, 2017
    @bitdancer bitdancer added 3.7 type-feature stdlib and removed type-bug extension-modules labels Apr 19, 2017
    @zhangyangyu
    Copy link
    Member

    @zhangyangyu zhangyangyu commented Apr 27, 2017

    Looks like perl has already encoded in this way:

    [~]$ perl -e 'print pack("u","Ca\x00t")'
    $0V$`=```

    Oddly, the uu.decode function does properly decode files encoded using "`", but encode is unable to create them.

    The decoder source code explicitly states it could resolve backtick since some encoders use '`' instead of space.

    To maintain backwards compatibility, I think we can add a keyword-only backtick parameter to binascii.b2a_uu and uuencode.

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Apr 27, 2017

    Is there any standard?

    From Wikipedia [1]:

    """
    Note that 96 ("`" grave accent) is a character that is seen in uuencoded files but is typically only used to signify a 0-length line, usually at the end of a file. It will never naturally occur in the actual converted data since it is outside the range of 32 to 95. The sole exception to this is that some uuencoding programs use the grave accent to signify padding bytes instead of a space. However, the character used for the padding byte is not standardized, so either is a possibility.
    """

    This obviously makes impossible using "`" as zero instead of space.

    [1] https://en.wikipedia.org/wiki/Uuencoding#Uuencode_table

    @zhangyangyu
    Copy link
    Member

    @zhangyangyu zhangyangyu commented Apr 27, 2017

    There seems no standard. I also read the wikipedia but for perl and uuencode on my Linux, they now all use backticks to represent zero instead of spaces.

    []$ perl -e 'print pack("u","Ca\x00t")'
    $0V$`=```
    [
    ]$ cat /tmp/test
    Ca[~]$ uuencode /tmp/test -
    begin 664 -
    "0V$
    end

    while Python now:

    import uu
    uu.encode('/tmp/test', '-')
    begin 664 test
    "0V$

    end

    Except the link Kyle gives, the manpage of FreeBSD describes the new algorithm: http://www.unix.com/man-page/freebsd/5/uuencode/

    I don't propose to change current behaviour to break backwards compatibility. But I think it's reasonable to provide a way to allow users to use backticks.

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Apr 27, 2017

    What about other popular languages? Java, PHP, Ruby, Tcl, C#, JavaScript, Swift, Go, Rust? Do any languages provide a way for configuring zero character and what are the names of the options? Are there languages that use "`" instead of a space only for padding, but not for representing an ordinal zero?

    @vadmium
    Copy link
    Member

    @vadmium vadmium commented Apr 28, 2017

    FWIW I am using NXP LPC microcontrollers at the moment, whose bootloader uses the grave/backtick instead of spaces. (NXP application note AN11229.) Although in practice it does seem to accept Python's spaces instead of graves.

    I wouldn't put too much weight to Wikipedia, especially where it says graves are not used for encoded data (vs length and padding). Earlier versions of Wikipedia did mention graves in regular data.

    I understand the reason for avoiding spaces is to due to spaces being stripped (e.g. by email, copy and paste, etc). You have to avoid spaces in data, not just padding, because a data space may still appear at the end of a line.

    @zhangyangyu
    Copy link
    Member

    @zhangyangyu zhangyangyu commented Apr 28, 2017

    Uuencode has no official standards and it all depends on the implementation. For other languages, I could only find PHP, java, activetcl? have official implementation. PHP and activetcl defaults to backticks and no options. Java defaults to spaces and no options.

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Apr 28, 2017

    Thanks Martin and Xiang. Wikipedia is not a reliable source, but it usually is based on reliable sources. In this case seems it is wrong.

    The next question is about parameter name. The Wikipedia uses the name "grave accent", the manpage of FreeBSD uuencode uses the name "backquote", the proposed patch uses the name "backtick". "Grave accent" is an official Unicode name, "backquote" and "backtick" are commonly used in programming context. We could use also the name containing "space" with the default value True.

    @zhangyangyu
    Copy link
    Member

    @zhangyangyu zhangyangyu commented Apr 28, 2017

    I think "grave accent" is not suitable. Although it's the standard unicode name but it's not commonly used in programming so not direct enough. "backquote" and "backtick" seems could be used interchangeably I don't have any preference. Perl seems to use backtick instead of backquote when ` is a language part. Yeah, space is also a choice.

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Apr 28, 2017

    Python 2 used the term "backquote" when ` is a language part.

    @zhangyangyu
    Copy link
    Member

    @zhangyangyu zhangyangyu commented Apr 28, 2017

    token defines it as backquote. But in doc there are also several places calling it backticks[1][2]. Do you have any preference Serhiy and Martin?

    [1] https://docs.python.org/release/3.0.1/whatsnew/3.0.html#removed-syntax
    [2] https://docs.python.org/2/library/2to3.html?highlight=backtick#2to3fixer-repr

    @vadmium
    Copy link
    Member

    @vadmium vadmium commented Apr 29, 2017

    I think I would prefer b2a_uu(data, grave=True), but am also happy with Xiang’s backtick=True if others prefer that. :) In my mind “grave accent” is the pure ASCII character; it just got abused for other things. Other options:

    b2a_uu(data, space=False)
    b2a_uu(data, avoid_spaces=True)
    b2a_uu(data, use_0x60=True)

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Apr 29, 2017

    I'm +0 for something containing "space" in the option name since the purpose of changing the UU encoding is avoiding stripping spaces. But this is not strong preference.

    Actually there is no need to add a new option in b2a_uu(), since we can just use b2a_uu(data).replace(b' ', b'`'). It is added just for convenience.

    @zhangyangyu
    Copy link
    Member

    @zhangyangyu zhangyangyu commented May 3, 2017

    New changeset 13f1f42 by Xiang Zhang in branch 'master':
    bpo-30103: Allow Uuencode in Python using backtick as zero instead of space (bpo-1326)
    13f1f42

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 stdlib type-feature
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants