Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pprint long non-printable bytes as hexdump #62068

Closed
serhiy-storchaka opened this issue Apr 29, 2013 · 14 comments
Closed

pprint long non-printable bytes as hexdump #62068

serhiy-storchaka opened this issue Apr 29, 2013 · 14 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@serhiy-storchaka
Copy link
Member

BPO 17868
Nosy @pitrou, @ezio-melotti, @serhiy-storchaka
Files
  • pprint_bytes_hex.patch
  • pprint_bytes_hex_2.patch: Output a hexdump as a comment
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2015-12-20.11:39:21.735>
    created_at = <Date 2013-04-29.18:19:29.544>
    labels = ['type-bug', 'library']
    title = 'pprint long non-printable bytes as hexdump'
    updated_at = <Date 2015-12-20.11:41:15.017>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2015-12-20.11:41:15.017>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2015-12-20.11:39:21.735>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2013-04-29.18:19:29.544>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['30067', '30122']
    hgrepos = []
    issue_num = 17868
    keywords = ['patch']
    message_count = 14.0
    messages = ['188081', '188084', '188086', '188144', '188346', '188348', '188352', '188354', '188355', '188360', '188371', '188372', '189683', '256763']
    nosy_count = 4.0
    nosy_names = ['pitrou', 'techtonik', 'ezio.melotti', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'rejected'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue17868'
    versions = ['Python 3.4']

    @serhiy-storchaka
    Copy link
    Member Author

    Here is a patch with which pprint formats long bytes objects which contain non-ascii or non-printable bytes as a hexdump.

    Inspired by Antoine's wish (http://permalink.gmane.org/gmane.comp.python.ideas/20329).

    @serhiy-storchaka serhiy-storchaka added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Apr 29, 2013
    @ezio-melotti
    Copy link
    Member

    A couple of comments:

    1. A separate function might be better. I think this kind of output would be more useful while inspecting individual byte objects, rather than having it for arbitrary byte objects (that might be inside other containers).
    2. I don't know if the output of pprint is supposed to be eval()uable, but I don't like too much the base64.b16decode(...).replace(' ', '') in the output (especially if the byte objects are short). If a separate function is used as suggested in 1) this won't be a problem. Using the hex_codec might be another option.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 29, 2013

    Yes, I think a separate function would be better. There's another issue for pprint() of bytes with line continuations:

    http://bugs.python.org/issue17530

    @techtonik
    Copy link
    Mannequin

    techtonik mannequin commented Apr 30, 2013

    Some issues:

    1. the hex converting logic doesn't belong to base64 module - there is no chance a person without StackOverflow access can find it
    2. i'd put bpo-17862 first as a dependency for this one, because proposed itertools.chunks() can be further optimized (chunking endless sequences with constant memory overhead, pypy specific speedups, etc)

    I like that this is not over-engineered. In my hexdump module I got too involved with problems of parsing/producing full dumps in a way compatible with Python 2/3. So I have to postpone my own user story until finally I run out of time.

    Probably hexdump.dump() returning string will make it a useful API for the primary user story.
    hexdump.dumpgen() as a line generator with 16 hexadecimal bytes delimited by space should cover all other use cases.

    @serhiy-storchaka
    Copy link
    Member Author

    1. A separate function might be better. I think this kind of output would be more useful while inspecting individual byte objects, rather than having it for arbitrary byte objects (that might be inside other containers).

    I don't think the general hexdump() function is worth to including in the stdlib. It should have too many options (How many bytes display in one line? How group hexdigits? What replacemant character for non-printables? Whether or not to display addresses? Whether or not to display chars? What are delimiters between hexdigits and chars, address and hexdigits? What are line prefix and suffix? How display last incomplete line? How display first incomplete line?) and this makes it complicated. An application which outputs a hexdump on more rich device (a html file or a ANSI-colored terminal) needs advanced options.

    However a simple specialized code can be used for special purposes, i.e. internally in the pprint module. I don't see how it can be reused and don't interested in a general function.

    1. I don't know if the output of pprint is supposed to be eval()uable, but I don't like too much the base64.b16decode(...).replace(' ', '') in the output (especially if the byte objects are short). If a separate function is used as suggested in 1) this won't be a problem. Using the hex_codec might be another option.

    An alternative option is first output a bytes object as is (perhaps splitting it on multiple line as in bpo-17530) and then output a hexdamp as a comment.

    [b'\x7fELF\x01\x01\x01\x00\x00\n'
    b'\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00\x01'
    # 7F 45 4C 46 01 01 01 00 00 0A 00 00 00 00 00 00 | .ELF............
    # 02 00 03 00 01 | .....
    ]

    @serhiy-storchaka
    Copy link
    Member Author

    Here is an alternative patch which outputs a bytes literal and a hexdump as a comment.

    @pitrou
    Copy link
    Member

    pitrou commented May 4, 2013

    Le samedi 04 mai 2013 à 09:42 +0000, Serhiy Storchaka a écrit :

    However a simple specialized code can be used for special purposes,
    i.e. internally in the pprint module. I don't see how it can be reused
    and don't interested in a general function.

    I don't understand how it would be useful in the pprint module if it
    can't be useful as a general function. The general intent is the same:
    print something in a "nice" way. Just the nice way is different
    depending on the situations: if my bytes object is simply a bunch of
    HTTP headers, I don't want to have a hexdump.

    An alternative option is first output a bytes object as is (perhaps
    splitting it on multiple line as in bpo-17530) and then output a
    hexdamp as a comment.

    [b'\x7fELF\x01\x01\x01\x00\x00\n'
    b'\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00\x01'

    7F 45 4C 46 01 01 01 00 00 0A 00 00 00 00 00 00 | .ELF............

    02 00 03 00 01 | .....

    ]

    This won't work very nicely in smaller display widths. You'll need too
    many lines to represent a bytes object.

    @serhiy-storchaka
    Copy link
    Member Author

    I don't understand how it would be useful in the pprint module if it
    can't be useful as a general function.

    How can it be used besides pprint/pformat functions?

    Just the nice way is different
    depending on the situations: if my bytes object is simply a bunch of
    HTTP headers, I don't want to have a hexdump.

    Then perhaps a new parameter for pprint/pformat needed (hex=True?). I think printing integers in hexadecimal can sometimes be useful too.

    This won't work very nicely in smaller display widths. You'll need too
    many lines to represent a bytes object.

    This is a nature of hexdumps. Every byte requires 4+ characters (or 3+ if group hexdigits tighter).

    @pitrou
    Copy link
    Member

    pitrou commented May 4, 2013

    > I don't understand how it would be useful in the pprint module if it
    > can't be useful as a general function.

    How can it be used besides pprint/pformat functions?

    I don't understand your question. Do you never print some data at the
    command-line prompt? Or even as part of small test programs?

    Then perhaps a new parameter for pprint/pformat needed (hex=True?). I
    think printing integers in hexadecimal can sometimes be useful too.

    Passing type-specific parameters to pprint/pformat sounds like a bad
    idea to me. And I don't think you'd want to print *all* integers as hex.

    > This won't work very nicely in smaller display widths. You'll need too
    > many lines to represent a bytes object.

    This is a nature of hexdumps. Every byte requires 4+ characters (or 3+
    if group hexdigits tighter).

    Which is why the proposal doesn't fit well with pprint/pformat.

    @ezio-melotti
    Copy link
    Member

    The idea is that the output of pprint should be something like (once python/cpython#61732 is applied):
    >>> pprint.pprint(b'\x7fELF\x01\x01\x01\x00\x00\n\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00\x01')
    (b'\x7fELF\x01\x01\x01\x00\x00\n\x00\x00'
     b'\x00\x00\x00\x00\x02\x00\x03\x00\x01')

    whereas the output of hexdump can be something like:
    pprint.hexdump(b'\x7fELF\x01\x01\x01\x00\x00\n\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00\x01')
    7F 45 4C 46 01 01 01 00 00 0A 00 00 00 00 00 00 | .ELF............
    02 00 03 00 01 | .....

    hexdump() could accept some additional args too if required, but otherwise I don't think the details are so important as long as it produces something readable for a human.

    @serhiy-storchaka
    Copy link
    Member Author

    I don't understand your question. Do you never print some data at the
    command-line prompt? Or even as part of small test programs?

    To be honest, I very rarely even use pprint. I'm too lazy to import it. If I want to quickly get a hexdump, I use something like ' '.join('%02X'%i for i in data). It is shorter than import pprint; pprint.hexdump(data). For a small program most likely the standard hexdump() will not be enough.

    Passing type-specific parameters to pprint/pformat sounds like a bad
    idea to me.

    Agree. Of course it would be better to automatically determine a "nice" display (use hexdump only for large non-printable bytes).

    And I don't think you'd want to print *all* integers as hex.

    If you want to print bytes in hex, why not ints and floats? ;) In fact I don't want to print data as hex, so shut up.

    Which is why the proposal doesn't fit well with pprint/pformat.

    Perhaps I misunderstood your wish. I'm not against consider pprint as a black box, which does all good magic inside by default. The use of this feature does not require anything from the users and does not impose obligations on the maintainers. But I'm not interested in a separate function.

    @pitrou
    Copy link
    Member

    pitrou commented May 4, 2013

    Le samedi 04 mai 2013 à 15:41 +0000, Serhiy Storchaka a écrit :

    > Which is why the proposal doesn't fit well with pprint/pformat.

    Perhaps I misunderstood your wish. I'm not against consider pprint as
    a black box, which does all good magic inside by default. The use of
    this feature does not require anything from the users and does not
    impose obligations on the maintainers. But I'm not interested in a
    separate function.

    The problem is the "good magic" will depend on the situation. Really, I
    don't want a hexdump of a HTTP message :-)
    Which is why there should be a separate function for those *wishing* a
    hexdump.

    @serhiy-storchaka
    Copy link
    Member Author

    Oh, I forgot about bytes.fromhex(). This of course looks better than base64.b16decode((...).replace(' ', '')).

    @serhiy-storchaka
    Copy link
    Member Author

    Withdrawn in favor of bpo-17530.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants