Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce bytes.hex method (also for bytearray and memoryview) #54160

Closed
wiggin15 mannequin opened this issue Sep 25, 2010 · 38 comments
Closed

introduce bytes.hex method (also for bytearray and memoryview) #54160

wiggin15 mannequin opened this issue Sep 25, 2010 · 38 comments
Assignees
Labels
interpreter-core Interpreter core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@wiggin15
Copy link
Mannequin

wiggin15 mannequin commented Sep 25, 2010

BPO 9951
Nosy @malemburg, @warsaw, @birkenfeld, @rhettinger, @terryjreedy, @gpshead, @mdickinson, @ncoghlan, @pitrou, @ericvsmith, @tiran, @merwok, @ethanfurman, @wiggin15, @vadmium, @serhiy-storchaka
Files
  • bytes.hex.diff
  • bytes.hex-1.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gpshead'
    closed_at = <Date 2015-04-26.05:07:16.157>
    created_at = <Date 2010-09-25.23:38:47.004>
    labels = ['interpreter-core', 'type-feature']
    title = 'introduce bytes.hex method (also for bytearray and memoryview)'
    updated_at = <Date 2015-11-10.17:19:10.649>
    user = 'https://github.com/wiggin15'

    bugs.python.org fields:

    activity = <Date 2015-11-10.17:19:10.649>
    actor = 'python-dev'
    assignee = 'gregory.p.smith'
    closed = True
    closed_date = <Date 2015-04-26.05:07:16.157>
    closer = 'gregory.p.smith'
    components = ['Interpreter Core']
    creation = <Date 2010-09-25.23:38:47.004>
    creator = 'wiggin15'
    dependencies = []
    files = ['38961', '39204']
    hgrepos = []
    issue_num = 9951
    keywords = ['patch']
    message_count = 38.0
    messages = ['117397', '118272', '132911', '190112', '193034', '193041', '197571', '199629', '199631', '199634', '199665', '205744', '226692', '226703', '226730', '226731', '226732', '226734', '226735', '226737', '226738', '226745', '226750', '227335', '240607', '240719', '242011', '242027', '242030', '242031', '242033', '242034', '242035', '242036', '242039', '242041', '242043', '254456']
    nosy_count = 21.0
    nosy_names = ['lemburg', 'barry', 'georg.brandl', 'rhettinger', 'terry.reedy', 'gregory.p.smith', 'mark.dickinson', 'ncoghlan', 'pitrou', 'eric.smith', 'gotgenes', 'christian.heimes', 'eric.araujo', 'Arfrever', 'BreamoreBoy', 'ethan.furman', 'wiggin15', 'python-dev', 'martin.panter', 'serhiy.storchaka', 'hct']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue9951'
    versions = ['Python 3.5']

    @wiggin15
    Copy link
    Mannequin Author

    wiggin15 mannequin commented Sep 25, 2010

    Following up on these discussions:
    http://psf.upfronthosting.co.za/roundup/tracker/issue3532
    http://www.gossamer-threads.com/lists/python/dev/863892

    I'm submitting a patch to add bytes.hex method in accordance to PEP-358.
    The code was taken from binascii so it should be "tested".
    Also added bytearray.hex and fixed the documentation and testing.

    There are additional things to discuss, for example:

    • multiple and different implementations of tohex\fromhex - in binascii, sha1module, bytes, bytearray...
    • binascii's functions which perform the same thing, but those functions and the rest of binascii's functions receive and return wrong types. I would fix this but it breaks compatibility.

    @wiggin15 wiggin15 mannequin added interpreter-core Interpreter core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Sep 25, 2010
    @wiggin15
    Copy link
    Mannequin Author

    wiggin15 mannequin commented Oct 9, 2010

    fixed to Py_UNICODE

    @rhettinger
    Copy link
    Contributor

    rhettinger commented Apr 4, 2011

    See also: bpo-11756

    @terryjreedy
    Copy link
    Member

    terryjreedy commented May 26, 2013

    Also bpo-3532

    @wiggin15
    Copy link
    Mannequin Author

    wiggin15 mannequin commented Jul 14, 2013

    Hi, is there any chance to get this merged? This ticket has been open for almost 3 years...

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Jul 14, 2013

    There are several ways to do this: base64.b16encode, binascii.a2b_hex, hex(int.from_bytes(...)), etc. Why you need yet one?

    @wiggin15
    Copy link
    Mannequin Author

    wiggin15 mannequin commented Sep 13, 2013

    You can follow the discussion I linked in the ticket description for an answer:
    http://psf.upfronthosting.co.za/roundup/tracker/issue3532
    Mainly the answer is: to conform to PEP-358 and to provide the opposite of bytes.fromhex.
    I agree that you can use binascii, but apparently it was decided that this functionality is good to have in the builtins (what used to be encode/decode('hex') in Python 2.x, and what is now bytes.fromhex, with the missing bytes.hex). In addition, binascii works a little differently - already discussed in the given link...

    @tiran
    Copy link
    Member

    tiran commented Oct 12, 2013

    I like to see the feature in 3.4, too.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 12, 2013

    If it's the reverse of fromhex(), perhaps we should call it tohex()?

    @tiran
    Copy link
    Member

    tiran commented Oct 12, 2013

    Funny thing. I was searching for "tohex" when I found this ticket.

    @birkenfeld
    Copy link
    Member

    birkenfeld commented Oct 13, 2013

    Blasphemous question: why not give bytes a __hex__ method? Then you could use hex() to convert them :)

    The patch is outdated; it should not use PyUnicode_AS_UNICODE, but PyUnicode_New(..., 127) and then PyUnicode_1BYTE_DATA to get the char array.

    @hct
    Copy link
    Mannequin

    hct mannequin commented Dec 9, 2013

    would be good if we can specify a optional flag to get all cap hex. currently, I have to do hexlify( some_bytes ).decode( 'UTF-8' ).upper(). would be good to be able to do some_bytes.hex( upper=1 )

    @vstinner
    Copy link
    Member

    vstinner commented Sep 10, 2014

    New features cannot be added to Python 2 anymore, only to the current development version which is now Python 3.5.

    If new methods are added to bytes, they should be added to bytearray too. Maybe we should also consider add them to memoryview? memoryview has already a .bytes() method and can be casted to type "B" (array of integers in range 0..255).

    The float type has .hex() and .fromhex() methods. We should kepe these names to stay consistent. Which kind of output do you prefer? "0xHH 0xHH ...", "HH HH HH ..." or "HHHHHH..."? Do you want to add parameters to choose the format?

    Current binascii format:

    >>> binascii.hexlify('abc')
    '616263'

    @terryjreedy
    Copy link
    Member

    terryjreedy commented Sep 10, 2014

    To answer Serhiy, the goal is to have a bytes method that represents bytes as bytes rather than as a mixture of bytes and encoded ascii characters. This would aid people who work with bytes that are not encoded ascii and that do not embed encoded ascii. It should not be necessary to import anything.

    >>> hex(int.from_bytes(b'abc', 'big'))
    '0x616263'
    is a bit baroque and produces a hex representation of an int, not of multiple bytes.
    
    I think following the float precedent is a good idea.
    >>> float.fromhex(1.5.hex())
    1.5
    >>> float.fromhex('0x1.8000000000000p+0').hex()
    '0x1.8000000000000p+0'

    The output of bytes.hex should be one that is accepted by bytes.fromhex, which is to say, hex 'digit' pairs. Spaces are optionally allowed between pairs. I would add a 'spaces' parameter, defaulting to False. to output spaces when set to true. (Or possible reverse the default -- what do potential users think?)

    A possible altermative for the parameter could be form='' (default), form=' ' (add spaces), and form='x' to add '\x' prefixes. I don't know that adding '\x' would be useful. The prefixes are not accepted by .fromhex.

    @hct
    Copy link
    Mannequin

    hct mannequin commented Sep 10, 2014

    @victor

    binascii.hexlify('abc') doesn't work in 3.4. I assume this is a new thing for 3.5

    >>> import binascii
    >>> binascii.hexlify('abc')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: 'str' does not support the buffer interface
    >>>
    >>> binascii.hexlify(b'abc')
    b'616263'

    @terry
    I think that space shouldn't be done by the hex function. if you allow space between each hex, then what do you do if the bytes are actually from array of 64-bit ints? getting into support separating space for every X bytes is probably not the scope of this.

    I propose the hex functions for bytes/memoryview/bytearray should be as follow. I prefer to not have the '0x' prefix at all, but I understand all other hex functions adds it. would be good to have the option to not have the prefix.

    bytes.hex( byte_order = sys.byteorder ) returns a hex string in small letter. ex. c0ffee

    bytes.HEX( byte_order = sys.byteorder ) returns a hex string in capital letters. ex. DEADBEEF

    bytes.from_hex( hex_str, byte_order = sys.byteorder ) returns a bytes object. ex. b'\xFE\xFF'

    another more flexible way is to have hex function accept a format similar to how sscanf works, but this will probably bring lots of trouble for all kinds of variants to support and the required error checks.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Sep 10, 2014

    Just as a recap of at least some of the *current* ways to do a bytes -> hex conversion:

    >>> import codecs
    >>> codecs.encode(b"abc", "hex")
    b'616263'
    >>> import binascii
    >>> binascii.hexlify(b"abc")
    b'616263'
    >>> import base64
    >>> base64.b16encode(b"abc")
    b'616263'
    >>> hex(int.from_bytes(b"abc", "big"))
    '0x616263'
    >>> hex(int.from_bytes(b"abc", "little"))
    '0x636261'

    Thus, the underlying purpose of this proposal is to provide a single "more obvious way to do it". As per the recent discussion on python-ideas, the point where that is most useful is in debugging output.

    However, rather than a new method on bytes/bytearray/memoryview for this, I instead suggest it would be appropriate to extend the default handling of the "x" and "X" format characters to accept arbitrary bytes-like objects. The processing of these characters would be as follows:

    "x": display a-f as lowercase digits
    "X": display A-F as uppercase digits
    "#": includes 0x prefix
    ".precision": chunks output, placing a space after every <precision> bytes
    ",": uses a comma as the separator, rather than a space

    Output order would match binascii.hexlify()

    Examples:

    format(b"xyz", "x") -> '78797a'
    format(b"xyz", "X") -> '78797A'
    format(b"xyz", "#x") -> '0x78797a'

    format(b"xyz", ".1x") -> '78 79 7a'
    format(b"abcdwxyz", ".4x") -> '61626364 7778797a'
    format(b"abcdwxyz", "#.4x") -> '0x61626364 0x7778797a'

    format(b"xyz", ",.1x") -> '78,79,7a'
    format(b"abcdwxyz", ",.4x") -> '61626364,7778797a'
    format(b"abcdwxyz", "#,.4x") -> '0x61626364,0x7778797a'

    This approach makes it easy to inspect binary data, with the ability to inject regular spaces or commas to improved readability. Those are the basic features needed to support debugging.

    Anything more complicated than that, and we're starting to want something more like the struct module.

    @terryjreedy
    Copy link
    Member

    terryjreedy commented Sep 10, 2014

    The proposal is to add a .hex method (similar to binascii.hexlify) that is the inverse of .fromhex (similar to binascii.unhexlify), as originally specified in PEP-358.
    http://legacy.python.org/dev/peps/pep-0358/
    "The object has a .hex() method that does the reverse [of .frombytes]
    >> bytes([92, 83, 80, 255]).hex()
    '5c5350ff'
    "
    If we add .hex, I think we should stick with this: no 0x or \x prefix.

    To aid debugging, I would change spaces to be None or a positive int n to insert a space every n bytes. So .hex(8) for an array of 64 bit ints.

    @hct
    Copy link
    Mannequin

    hct mannequin commented Sep 10, 2014

    @terry

    natural bytes do not have space between them. I would think adding space is for typesetting situation which should be done by user's post-processing.

    I agree to not have any prefix to make .hex and from_hex uniform. the \x is the str representation of bytes when you print a bytes object directly in Python. the actual bytes object doesn't have that \x prefix.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Sep 10, 2014

    Good point Terry - I split the proposal to support bytes-like objects for 'x' and 'X' in string formatting out to bpo-22385.

    For bytes.hex, I'm inclined to stick with the dirt simple option described in PEP-358: the exact behaviour of the current binascii.hexlify().

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Sep 11, 2014

    Open question: the current patch adds bytes.hex() and bytearray.hex(). Should we also add memoryview.hex(), or split that suggestion out to a separate proposal?

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Sep 11, 2014

    I'd say add memoryview.hex() here as everything seems related. Victor has also mentioned memoryview in msg226692.

    @gotgenes
    Copy link
    Mannequin

    gotgenes mannequin commented Sep 11, 2014

    int has int.from_bytes and int.to_bytes.

    Currently, bytes has bytes.fromhex. Would the core developers please consider naming the method "bytes.tohex" instead of "bytes.hex", so there's at least a modicum of consistency in the method names of Python's builtin types?

    @malemburg
    Copy link
    Member

    malemburg commented Sep 11, 2014

    On 11.09.2014 01:04, Nick Coghlan wrote:
    > 
    > Nick Coghlan added the comment:
    > 
    > Just as a recap of at least some of the *current* ways to do a bytes -> hex conversion:
    > 
    >>>> import codecs
    >>>> codecs.encode(b"abc", "hex")
    > b'616263'
    >>>> import binascii
    >>>> binascii.hexlify(b"abc")
    > b'616263'
    >>>> import base64
    >>>> base64.b16encode(b"abc")
    > b'616263'
    >>>> hex(int.from_bytes(b"abc", "big"))
    > '0x616263'
    >>>> hex(int.from_bytes(b"abc", "little"))
    > '0x636261'
    > 
    > Thus, the underlying purpose of this proposal is to provide a single "more obvious way to do it". As per the recent discussion on python-ideas, the point where that is most useful is in debugging output.
    > 
    > However, rather than a new method on bytes/bytearray/memoryview for this, I instead suggest it would be appropriate to extend the default handling of the "x" and "X" format characters to accept arbitrary bytes-like objects. The processing of these characters would be as follows:
    > 
    > "x": display a-f as lowercase digits
    > "X": display A-F as uppercase digits
    > "#": includes 0x prefix
    > ".precision": chunks output, placing a space after every <precision> bytes
    > ",": uses a comma as the separator, rather than a space

    Hmm, but those would then work for str.format() as well, right ?

    Since "x" and "X" are already used to convert numbers to hex
    representation, opening these up for bytes sounds like it could
    easily mask TypeErrors for cases where you really want an integer
    to be formatted as hex and not bytes.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Sep 23, 2014

    Updated issue title to indicate proposal also covers bytearray and memoryview.

    @ncoghlan ncoghlan changed the title introduce bytes.hex method introduce bytes.hex method (also for bytearray and memoryview) Sep 23, 2014
    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Apr 13, 2015

    Arnon is here at the PyCon 2015 sprints, so bringing the current status up to date:

    = Why *.hex()? =

    • That's the name in PEP-358
    • That's the name of the comparable float method

    = Why add it to the builtin types? =

    • To provide One Obvious Way To Do It, rather than the current 5 (or so) non-obvious ways listed above
    • That's what PEP-358 proposed

    = Why postpone configurability and str.format() integration? =

    • Because these are more complex questions that can be left out of the "minimum useful feature" of new methods on the builtins and hence have been moved out to bpo-22385 (which depends on this issue, and would likely require a PEP to resolve all the technical details)

    @wiggin15
    Copy link
    Mannequin Author

    wiggin15 mannequin commented Apr 13, 2015

    I added the implementation for memoryview, updated to use PyUnicode_New etc., and moved the common implementation to its own file for code reuse.

    @ncoghlan ncoghlan self-assigned this Apr 25, 2015
    @wiggin15
    Copy link
    Mannequin Author

    wiggin15 mannequin commented Apr 25, 2015

    minor updates to stdtypes.rst. I also want to add a line to whatsnew/3.5 but don't know how to put it in words - maybe it's better if someone with better english will add it.

    @gpshead
    Copy link
    Member

    gpshead commented Apr 25, 2015

    bytes.hex-1.diff looks good, i'll take care of committing this and adding a what's new entry. thanks!

    @gpshead gpshead assigned gpshead and unassigned ncoghlan Apr 25, 2015
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 25, 2015

    New changeset c9f1630cf2b1 by Gregory P. Smith in branch 'default':
    Implements issue bpo-9951: Adds a hex() method to bytes, bytearray, & memoryview.
    https://hg.python.org/cpython/rev/c9f1630cf2b1

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 25, 2015

    New changeset 955a479b31a8 by Gregory P. Smith in branch 'default':
    bpo-9951: update _hashopenssl and md5module to use _Py_strhex().
    https://hg.python.org/cpython/rev/955a479b31a8

    @gpshead gpshead closed this as completed Apr 25, 2015
    @gpshead
    Copy link
    Member

    gpshead commented Apr 26, 2015

    note quite fixed, looks like some of the buildbots are having fun not compiling with this change:

    http://buildbot.python.org/all/builders/x86%20Tiger%203.x/builds/9569/steps/compile/logs/stdio

    investigating...

    @gpshead gpshead reopened this Apr 26, 2015
    @gpshead
    Copy link
    Member

    gpshead commented Apr 26, 2015

    i missed the hg adds :)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 26, 2015

    New changeset a7737204c221 by Gregory P. Smith in branch 'default':
    Add the files missing from c9f1630cf2b1 for bpo-9951.
    https://hg.python.org/cpython/rev/a7737204c221

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 26, 2015

    New changeset 7f0811452d0f by Gregory P. Smith in branch 'default':
    Switch binascii over to using the common _Py_strhex implementation for its hex
    https://hg.python.org/cpython/rev/7f0811452d0f

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Apr 26, 2015

    Thank you Arnon, and thank you Greg!

    @gpshead
    Copy link
    Member

    gpshead commented Apr 26, 2015

    I see some _Py_strhex related link errors on the Windows buildbots:

    http://buildbot.python.org/all/builders/x86%20Windows7%203.x/builds/9642/steps/compile/logs/stdio

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 26, 2015

    New changeset b46308353ed9 by Gregory P. Smith in branch 'default':
    Add missing PyAPI_FUNC macro's to the public functions as other .c files do
    https://hg.python.org/cpython/rev/b46308353ed9

    @gpshead gpshead closed this as completed Apr 26, 2015
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 10, 2015

    New changeset f3d8bb3ffa98 by Stefan Krah in branch '3.5':
    Iaaue bpo-25598: Fix memory_hex from bpo-9951 for non-contiguous buffers.
    https://hg.python.org/cpython/rev/f3d8bb3ffa98

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core Interpreter core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    10 participants