Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

py/objstr.c: Add bytes.hex / bytes.fromhex #7539

Merged
merged 2 commits into from Aug 12, 2022

Conversation

jimmo
Copy link
Member

@jimmo jimmo commented Jul 13, 2021

We closed #4016 today, but the functionality is nice and more concise to write (compared to using binascii).

This implements the suggestion of re-using the ubinascii implementation of (un)hexlify to provide bytes.hex, bytearray.hex, memoryview.hex, and bytes.fromhex. To do so I've moved these methods into the core, but only available if ubinascii is enabled, and then referenced them from extmod/modubinascii.c.

For code size reasons, we get an extra array.hex (which CPython doesn't have). We also don't provide bytearray.fromhex.

Note that (un)hexlify is not exactly equivalent to (from)hex because of the handling of input/output types. hexlify is <buffer> -> bytes, whereas <type>.hex is <type> -> str. Also unhexlify is <buffer> -> bytes and bytes.fromhex is str -> bytes. (This might be a good argument for using them, the behavior of (from)hex feels more correct in Python 3 and will be less likely to require additional conversion by the user.

It is +112 bytes on PYBV11... frustratingly nearly half of that is dealing with the bytes/str difference between (un)hexlify and (from)hex.

@jimmo jimmo force-pushed the bytes-hex-fromhex branch 2 times, most recently from d50daaa to 0d76c56 Compare July 13, 2021 14:26
@codecov-commenter
Copy link

Codecov Report

Merging #7539 (0d76c56) into master (e3291e1) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7539      +/-   ##
==========================================
- Coverage   98.28%   98.27%   -0.01%     
==========================================
  Files         154      154              
  Lines       19988    19995       +7     
==========================================
+ Hits        19646    19651       +5     
- Misses        342      344       +2     
Impacted Files Coverage Δ
extmod/modubinascii.c 100.00% <ø> (ø)
py/objarray.c 100.00% <100.00%> (ø)
py/objstr.c 100.00% <100.00%> (ø)
py/bc.c 88.65% <0.00%> (-1.04%) ⬇️
py/runtime.c 99.23% <0.00%> (-0.16%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e3291e1...0d76c56. Read the comment docs.

@dpgeorge
Copy link
Member

Thanks for this, it's quite neat. And I doubt it can be made smaller in code size.

Only remaining decision: is it worth the cost in code size?

@prusnak
Copy link
Contributor

prusnak commented Jul 14, 2021

Only remaining decision: is it worth the cost in code size?

I think so. It will make Python code much more readable which is a good thing for education.

@dpgeorge
Copy link
Member

@jimmo I needed the appveyor commit for #7472, thanks!

@keelung-yang
Copy link

@jimmo Seems there is no white whitespace support?
Since 31 32 33 34 41 42 43 44 61 62 63 64 is more readable then 313233344142434461626364, can you please add it as in Python 3.7 what't new: bytes.fromhex() and bytearray.fromhex() now ignore all ASCII whitespace, not only spaces? Thanks!

@romilly
Copy link

romilly commented Aug 8, 2022

The readability is attractive. It's a fine balance as to whether it;s worth the space.
Until it's implemented and released the omission should be listed in the Differences documentation.

I'm happy to create a PR for the docs if that would help.

py/objstr.c Outdated Show resolved Hide resolved
@jimmo
Copy link
Member Author

jimmo commented Aug 11, 2022

Updated. This will need to be updated for #9002 (which will slightly reduce the code size for this PR).

py/mpconfig.h Outdated
// Add bytes.hex and bytes.fromhex
#ifndef MICROPY_PY_BUILTINS_STR_HEX
#define MICROPY_PY_BUILTINS_STR_HEX (MICROPY_CONFIG_ROM_LEVEL_AT_LEAST_EXTRA_FEATURES)
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this up to just after MICROPY_PY_BUILTINS_STR_COUNT to keep the str config options grouped together

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed it to MICROPY_PY_BUILTINS_BYTES_HEX because it no longer applies to str.

py/objarray.c Outdated Show resolved Hide resolved
py/objstr.h Outdated Show resolved Hide resolved
@jimmo
Copy link
Member Author

jimmo commented Aug 11, 2022

Rebased.

With the merged locals dict from #9002, this is now +88 bytes. If we remove support for memoryview.hex, then it's +64 instead. I also tried doing the memoryview implementation via attr, but it comes out the same. For 24 bytes, I think we should keep it on memoryview.

The only issue now is that we have bytearray.fromhex but it incorrectly returns a bytes, not bytearray. One option here would be to move fromhex to the end of the locals table. This would remove it from bytearray, but add it to str. If we then wanted to add it back to bytearray, we could add a bytearray-specific implementation and include that at the start of the table. But that's a lot more code size.

py/objarray.c Outdated
@@ -596,6 +602,13 @@ const mp_obj_type_t mp_type_bytearray = {
#endif

#if MICROPY_PY_BUILTINS_MEMORYVIEW
#if MICROPY_PY_BUILTINS_BYTES_HEX
STATIC const mp_rom_map_elem_t memoryview_locals_table[] = {
{ MP_ROM_QSTR(MP_QSTR_hex), MP_ROM_PTR(&mp_obj_bytes_hex_as_str_obj) },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't this use a 1-element slice from array_bytearray_str_bytes_locals_table instead, to save this locals table?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could. Saves 8 bytes!

py/objstr.c Outdated
// For binascii.unhexlify.
MP_DEFINE_CONST_FUN_OBJ_1(mp_obj_bytes_fromhex_obj, bytes_fromhex);
// For bytes.fromhex.
STATIC MP_DEFINE_CONST_STATICMETHOD_OBJ(bytes_fromhex_staticmethod_obj, MP_ROM_PTR(&mp_obj_bytes_fromhex_obj));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to do away with this staticmethod wrapper and for it still to work, because it's accessing the function via the class (not an instance)? ie bytes.fromhex vs b''.fromhex?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer my own question, yes it is. So bytes.fromhex would work but b''.fromhex would not.

But actually in CPython this is a classmethod, and using classmethod means that the first argument will be the type, bytes vs bytearray. Then you could make bytearray.fromhex work!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Slightly complicated by keeping the version that works for binascii, but overall this worked nicely. In the end it's about 24 extra bytes for the wrapper functions etc, so that brings the PR to +104 !

Now that it's a bit clearer which wrappers and the function objects are for binascii vs str/bytes/bytearray/memoryview, I've moved the fun obj macros into binascii, and so it's just the underlying functions bytes_hex and bytes_fromhex that are "public". Should they be renamed to mp_obj_bytes_hex/fromhex?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please. If a function is not STATIC then it should be prefixed with mp_ in some way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jimmo jimmo force-pushed the bytes-hex-fromhex branch 4 times, most recently from 89188be to 28f9b7c Compare August 12, 2022 02:37
These were added in Python 3.5.

Enabled via MICROPY_PY_BUILTINS_BYTES_HEX, and enabled by default for all
ports that currently have ubinascii.

Rework ubinascii to use the implementation of these methods.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
Also make the sep test not micropython-specific.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
@dpgeorge dpgeorge merged commit f694058 into micropython:master Aug 12, 2022
@dpgeorge
Copy link
Member

I like this feature, will definitely be using instead of binascii from now on.

Thank you!

@robert-hh
Copy link
Contributor

That is definitely more convenient than binascii.

@romilly
Copy link

romilly commented Aug 12, 2022 via email

RetiredWizard pushed a commit to RetiredWizard/micropython that referenced this pull request Feb 9, 2023
Correct assignment of RMT channels on ESP32C3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants