pprint long non-printable bytes as hexdump #62068

serhiy-storchaka · 2013-04-29T18:19:30Z

BPO	17868
Nosy	@pitrou, @ezio-melotti, @serhiy-storchaka
Files	pprint_bytes_hex.patch pprint_bytes_hex_2.patch: Output a hexdump as a comment

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2015-12-20.11:39:21.735>
created_at = <Date 2013-04-29.18:19:29.544>
labels = ['type-bug', 'library']
title = 'pprint long non-printable bytes as hexdump'
updated_at = <Date 2015-12-20.11:41:15.017>
user = 'https://github.com/serhiy-storchaka'

bugs.python.org fields:

activity = <Date 2015-12-20.11:41:15.017>
actor = 'serhiy.storchaka'
assignee = 'none'
closed = True
closed_date = <Date 2015-12-20.11:39:21.735>
closer = 'serhiy.storchaka'
components = ['Library (Lib)']
creation = <Date 2013-04-29.18:19:29.544>
creator = 'serhiy.storchaka'
dependencies = []
files = ['30067', '30122']
hgrepos = []
issue_num = 17868
keywords = ['patch']
message_count = 14.0
messages = ['188081', '188084', '188086', '188144', '188346', '188348', '188352', '188354', '188355', '188360', '188371', '188372', '189683', '256763']
nosy_count = 4.0
nosy_names = ['pitrou', 'techtonik', 'ezio.melotti', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue17868'
versions = ['Python 3.4']

serhiy-storchaka · 2013-04-29T18:19:29Z

Here is a patch with which pprint formats long bytes objects which contain non-ascii or non-printable bytes as a hexdump.

Inspired by Antoine's wish (http://permalink.gmane.org/gmane.comp.python.ideas/20329).

ezio-melotti · 2013-04-29T18:34:55Z

A couple of comments:

A separate function might be better. I think this kind of output would be more useful while inspecting individual byte objects, rather than having it for arbitrary byte objects (that might be inside other containers).
I don't know if the output of pprint is supposed to be eval()uable, but I don't like too much the base64.b16decode(...).replace(' ', '') in the output (especially if the byte objects are short). If a separate function is used as suggested in 1) this won't be a problem. Using the hex_codec might be another option.

pitrou · 2013-04-29T18:55:49Z

Yes, I think a separate function would be better. There's another issue for pprint() of bytes with line continuations:

http://bugs.python.org/issue17530

techtonik · 2013-04-30T08:51:11Z

Some issues:

the hex converting logic doesn't belong to base64 module - there is no chance a person without StackOverflow access can find it
i'd put bpo-17862 first as a dependency for this one, because proposed itertools.chunks() can be further optimized (chunking endless sequences with constant memory overhead, pypy specific speedups, etc)

I like that this is not over-engineered. In my hexdump module I got too involved with problems of parsing/producing full dumps in a way compatible with Python 2/3. So I have to postpone my own user story until finally I run out of time.

Probably hexdump.dump() returning string will make it a useful API for the primary user story.
hexdump.dumpgen() as a line generator with 16 hexadecimal bytes delimited by space should cover all other use cases.

serhiy-storchaka · 2013-05-04T09:42:58Z

A separate function might be better. I think this kind of output would be more useful while inspecting individual byte objects, rather than having it for arbitrary byte objects (that might be inside other containers).

I don't think the general hexdump() function is worth to including in the stdlib. It should have too many options (How many bytes display in one line? How group hexdigits? What replacemant character for non-printables? Whether or not to display addresses? Whether or not to display chars? What are delimiters between hexdigits and chars, address and hexdigits? What are line prefix and suffix? How display last incomplete line? How display first incomplete line?) and this makes it complicated. An application which outputs a hexdump on more rich device (a html file or a ANSI-colored terminal) needs advanced options.

However a simple specialized code can be used for special purposes, i.e. internally in the pprint module. I don't see how it can be reused and don't interested in a general function.

I don't know if the output of pprint is supposed to be eval()uable, but I don't like too much the base64.b16decode(...).replace(' ', '') in the output (especially if the byte objects are short). If a separate function is used as suggested in 1) this won't be a problem. Using the hex_codec might be another option.

An alternative option is first output a bytes object as is (perhaps splitting it on multiple line as in bpo-17530) and then output a hexdamp as a comment.

[b'\x7fELF\x01\x01\x01\x00\x00\n'
b'\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00\x01'
# 7F 45 4C 46 01 01 01 00 00 0A 00 00 00 00 00 00 | .ELF............
# 02 00 03 00 01 | .....
]

serhiy-storchaka · 2013-05-04T10:27:44Z

Here is an alternative patch which outputs a bytes literal and a hexdump as a comment.

pitrou · 2013-05-04T11:17:24Z

Le samedi 04 mai 2013 à 09:42 +0000, Serhiy Storchaka a écrit :

However a simple specialized code can be used for special purposes,
i.e. internally in the pprint module. I don't see how it can be reused
and don't interested in a general function.

I don't understand how it would be useful in the pprint module if it
can't be useful as a general function. The general intent is the same:
print something in a "nice" way. Just the nice way is different
depending on the situations: if my bytes object is simply a bunch of
HTTP headers, I don't want to have a hexdump.

An alternative option is first output a bytes object as is (perhaps
splitting it on multiple line as in bpo-17530) and then output a
hexdamp as a comment.

[b'\x7fELF\x01\x01\x01\x00\x00\n'
b'\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00\x01'

7F 45 4C 46 01 01 01 00 00 0A 00 00 00 00 00 00 | .ELF............

02 00 03 00 01 | .....

]

This won't work very nicely in smaller display widths. You'll need too
many lines to represent a bytes object.

serhiy-storchaka · 2013-05-04T12:05:21Z

I don't understand how it would be useful in the pprint module if it
can't be useful as a general function.

How can it be used besides pprint/pformat functions?

Just the nice way is different
depending on the situations: if my bytes object is simply a bunch of
HTTP headers, I don't want to have a hexdump.

Then perhaps a new parameter for pprint/pformat needed (hex=True?). I think printing integers in hexadecimal can sometimes be useful too.

This won't work very nicely in smaller display widths. You'll need too
many lines to represent a bytes object.

This is a nature of hexdumps. Every byte requires 4+ characters (or 3+ if group hexdigits tighter).

pitrou · 2013-05-04T12:14:43Z

> I don't understand how it would be useful in the pprint module if it
> can't be useful as a general function.

How can it be used besides pprint/pformat functions?

I don't understand your question. Do you never print some data at the
command-line prompt? Or even as part of small test programs?

Then perhaps a new parameter for pprint/pformat needed (hex=True?). I
think printing integers in hexadecimal can sometimes be useful too.

Passing type-specific parameters to pprint/pformat sounds like a bad
idea to me. And I don't think you'd want to print *all* integers as hex.

> This won't work very nicely in smaller display widths. You'll need too
> many lines to represent a bytes object.

This is a nature of hexdumps. Every byte requires 4+ characters (or 3+
if group hexdigits tighter).

Which is why the proposal doesn't fit well with pprint/pformat.

ezio-melotti · 2013-05-04T12:46:44Z

The idea is that the output of pprint should be something like (once python/cpython#61732 is applied):
>>> pprint.pprint(b'\x7fELF\x01\x01\x01\x00\x00\n\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00\x01')
(b'\x7fELF\x01\x01\x01\x00\x00\n\x00\x00'
 b'\x00\x00\x00\x00\x02\x00\x03\x00\x01')

whereas the output of hexdump can be something like:
pprint.hexdump(b'\x7fELF\x01\x01\x01\x00\x00\n\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00\x01')
7F 45 4C 46 01 01 01 00 00 0A 00 00 00 00 00 00 | .ELF............
02 00 03 00 01 | .....

hexdump() could accept some additional args too if required, but otherwise I don't think the details are so important as long as it produces something readable for a human.

serhiy-storchaka · 2013-05-04T15:41:30Z

I don't understand your question. Do you never print some data at the
command-line prompt? Or even as part of small test programs?

To be honest, I very rarely even use pprint. I'm too lazy to import it. If I want to quickly get a hexdump, I use something like ' '.join('%02X'%i for i in data). It is shorter than import pprint; pprint.hexdump(data). For a small program most likely the standard hexdump() will not be enough.

Passing type-specific parameters to pprint/pformat sounds like a bad
idea to me.

Agree. Of course it would be better to automatically determine a "nice" display (use hexdump only for large non-printable bytes).

And I don't think you'd want to print *all* integers as hex.

If you want to print bytes in hex, why not ints and floats? ;) In fact I don't want to print data as hex, so shut up.

Which is why the proposal doesn't fit well with pprint/pformat.

Perhaps I misunderstood your wish. I'm not against consider pprint as a black box, which does all good magic inside by default. The use of this feature does not require anything from the users and does not impose obligations on the maintainers. But I'm not interested in a separate function.

pitrou · 2013-05-04T15:46:40Z

Le samedi 04 mai 2013 à 15:41 +0000, Serhiy Storchaka a écrit :

> Which is why the proposal doesn't fit well with pprint/pformat.

Perhaps I misunderstood your wish. I'm not against consider pprint as
a black box, which does all good magic inside by default. The use of
this feature does not require anything from the users and does not
impose obligations on the maintainers. But I'm not interested in a
separate function.

The problem is the "good magic" will depend on the situation. Really, I
don't want a hexdump of a HTTP message :-)
Which is why there should be a separate function for those *wishing* a
hexdump.

serhiy-storchaka · 2013-05-20T18:20:04Z

Oh, I forgot about bytes.fromhex(). This of course looks better than base64.b16decode((...).replace(' ', '')).

serhiy-storchaka · 2015-12-20T11:39:22Z

Withdrawn in favor of bpo-17530.

serhiy-storchaka added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Apr 29, 2013

serhiy-storchaka closed this as completed Dec 20, 2015

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pprint long non-printable bytes as hexdump #62068

pprint long non-printable bytes as hexdump #62068

serhiy-storchaka commented Apr 29, 2013

serhiy-storchaka commented Apr 29, 2013

ezio-melotti commented Apr 29, 2013

pitrou commented Apr 29, 2013

techtonik mannequin commented Apr 30, 2013

serhiy-storchaka commented May 4, 2013

serhiy-storchaka commented May 4, 2013

pitrou commented May 4, 2013

7F 45 4C 46 01 01 01 00 00 0A 00 00 00 00 00 00 | .ELF............

02 00 03 00 01 | .....

serhiy-storchaka commented May 4, 2013

pitrou commented May 4, 2013

ezio-melotti commented May 4, 2013

serhiy-storchaka commented May 4, 2013

pitrou commented May 4, 2013

serhiy-storchaka commented May 20, 2013

serhiy-storchaka commented Dec 20, 2015

pprint long non-printable bytes as hexdump #62068

pprint long non-printable bytes as hexdump #62068

Comments

serhiy-storchaka commented Apr 29, 2013

serhiy-storchaka commented Apr 29, 2013

ezio-melotti commented Apr 29, 2013

pitrou commented Apr 29, 2013

techtonik mannequin commented Apr 30, 2013

serhiy-storchaka commented May 4, 2013

serhiy-storchaka commented May 4, 2013

pitrou commented May 4, 2013

7F 45 4C 46 01 01 01 00 00 0A 00 00 00 00 00 00 | .ELF............

02 00 03 00 01 | .....

serhiy-storchaka commented May 4, 2013

pitrou commented May 4, 2013

ezio-melotti commented May 4, 2013

serhiy-storchaka commented May 4, 2013

pitrou commented May 4, 2013

serhiy-storchaka commented May 20, 2013

serhiy-storchaka commented Dec 20, 2015