py/obj: Remove mp_generic_unary_op(). #10348

dlech · 2022-12-27T22:34:31Z

Since converting to variable sized slots in mp_obj_type_t, we can now reduce the code size a bit by removing mp_generic_unary_op() and the corresponding slots where it is used. Instead we just implement the generic __hash__ operation in the runtime.

github-actions · 2022-12-27T22:37:08Z

Code size report:

   bare-arm:   -28 -0.050% 
minimal x86:  -424 -0.228% [incl -192(data)]
   unix x64:  -416 -0.052% standard[incl -192(data)]
      stm32:   -32 -0.008% PYBV10
        rp2:   -32 -0.010% PICO

dpgeorge · 2023-01-12T04:55:18Z

This looks like a good change (to reduce code size, and simplify things).

But, does it now allow hashing of objects that were not previously hashable, and which should not be hashable?

dlech · 2023-01-12T16:46:28Z

But, does it now allow hashing of objects that were not previously hashable, and which should not be hashable?

No. We know this is the case because all of the tests are still passing. This works because the fallback only takes effect when the unary_op slot is empty. Any object that should not be hashable already implements unary_op so doesn't hit the fallback.

dpgeorge · 2023-01-13T00:08:14Z

Here are some examples of things that now become hashable by the change in this PR, that were not previously hashable:

hash(list.pop)
hash([].pop)

# closure
def f(x):
    def g():
        return x
    return g
hash(f(1))

# slice
class A:
    def __getitem__(self, i):
        return i
hash(A()[1:2])

The first 3 are hashable in CPython, so that's an improvement. The last one (slice instance) is not hashable in CPython.

dlech · 2023-01-13T01:54:40Z

The last one (slice instance) is not hashable in CPython.

Good one. I found the reason for this and added a fix and a test.

codecov-commenter · 2023-01-13T02:30:30Z

Codecov Report

Merging #10348 (21dfa07) into master (67097d8) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master   #10348   +/-   ##
=======================================
  Coverage   98.50%   98.50%           
=======================================
  Files         155      155           
  Lines       20537    20541    +4     
=======================================
+ Hits        20229    20233    +4     
  Misses        308      308

Impacted Files	Coverage Δ
py/obj.c	`97.70% <ø> (-0.05%)`	⬇️
py/obj.h	`100.00% <ø> (ø)`
py/objfun.c	`100.00% <ø> (ø)`
py/objgenerator.c	`100.00% <ø> (ø)`
py/objnone.c	`100.00% <ø> (ø)`
py/objsingleton.c	`100.00% <ø> (ø)`
py/objtype.c	`100.00% <ø> (ø)`
py/objdict.c	`100.00% <100.00%> (ø)`
py/objslice.c	`100.00% <100.00%> (ø)`
py/runtime.c	`98.89% <100.00%> (+<0.01%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

dpgeorge · 2023-01-18T00:33:35Z

The remaining comment/concern I have about this change is that now most of the non-core types are hashable, eg:

machine.Pin
regex match instances
Vfs objects
files

And maybe other things I didn't think of.

Actually, it looks like files are hashable in CPython (but not in uPy without this PR), so that is again an improvement.

I think we need to go through all types and see which become hashable and decide whether that's acceptable or not.

dlech · 2023-01-18T18:10:45Z

This should be the nearly complete list.

mp_types_unary_op_status.csv

type	has unary_op
mp_irq_type	no
mp_type_set	yes
mp_type_frozenset	yes
mp_type_NoneType	yes
mp_type_singleton	yes
mp_type_slice	no
mp_type_zip	no
mp_type_array	yes
mp_type_bytearray	yes
mp_type_memoryview	yes
mp_type_array_it	no
mp_type_str	yes
mp_type_checked_fun	no
mp_type_settrace_codeobj	yes
mp_type_frame	yes
mp_type_range_it	no
mp_type_range	yes
mp_type_reversed	no
mp_type_type	yes
mp_type_super	no
mp_type_staticmethod	no
mp_type_classmethod	no
mp_type_polymorph_iter	no
mp_type_polymorph_iter_with_finaliser	no
mp_type_object	no
mp_type_stringio	no
mp_type_bytesio	no
mp_type_list	yes
mp_type_str	yes
mp_type_bytes	no
mp_type_module	no
mp_type_property	no
mp_type_map	no
stdio_obj_type	no
stdio_buffer_obj_type	no
mp_type_int	yes
mp_type_float	yes
mp_type_BaseException	no
mp_type_enumerate	no
mp_type_filter	no
mp_type_deque	yes
mp_type_dict_view_it	no
mp_type_dict_view	no
mp_type_dict	yes
mp_type_ordereddict	yes
mp_type_complex	yes
mp_type_closure	no
mp_type_bound_meth	no
mp_type_attrtuple	yes
mp_type_gen_wrap	yes
mp_type_native_gen_wrap	yes
mp_type_gen_instance	yes
mp_type_fun_builtin_0	yes
mp_type_fun_builtin_1	yes
mp_type_fun_builtin_2	yes
mp_type_fun_builtin_3	yes
mp_type_fun_builtin_var	yes
mp_type_fun_bc	yes
mp_type_fun_native	yes
mp_type_fun_asm	yes
mp_type_iobase	no
mp_type_bufwriter	no
mp_type_thread_lock	no
mp_type_code	no
machine_i2c_type	no
machine_rtc_type	no
machine_spi_type	no
machine_uart_type	no
samd_flash_type	no
machine_wdt_type	no
machine_pin_type	no
machine_dac_type	no
machine_adc_type	no
pyb_switch_type	no
pyb_timer_type	no
pyb_timer_channel_type	no
pyb_flash_type	no
rp2_pio_type	no
rp2_state_machine_type	no
rp2_flash_type	no
machine_uart_type	no
machine_spi_type	no
machine_pin_type	no
pin_cpu_pins_obj_type	no
pin_board_pins_obj_type	no
machine_i2c_type	no
machine_adc_type	no
machine_rtc_type	no
ra_led_type	no
extint_type	no
machine_wdt_type	no
machine_uart_type	no
machine_timer_type	no
machine_rtc_type	no
machine_spi_type	no
machine_pin_af_type	no
pin_cpu_pins_obj_type	no
pin_board_pins_obj_type	no
machine_pin_type	no
machine_i2s_type	no
mp_type_tuple	yes
machine_i2c_type	no
machine_adc_type	no
mp_type_it	no
machine_i2c_type	no
machine_uart_type	no
sensor_type	no
zephyr_disk_access_type	no
zephyr_flash_area_type	no
socket_type	no
machine_spi_type	no
machine_pin_type	no
mp_type_stest_fileio	no
mp_type_stest_textio2	no
mp_type_socket	no
pyb_timer_type	no
pyb_timer_channel_type	no
machine_spi_type	no
pin_type	no
pin_af_type	no
pyb_spi_type	no
pyb_flash_type	no
pyb_switch_type	no
pyb_wdt_type	no
pyb_can_type	no
pyb_usb_vcp_type	no
pyb_usb_hid_type	no
pyb_timer_type	no
pyb_timer_channel_type	no
pyb_servo_type	no
pyb_sdcard_type	no
pyb_mmcard_type	no
pyb_i2c_type	no
pin_cpu_pins_obj_type	no
pin_board_pins_obj_type	no
pyb_rtc_type	no
network_lan_type	no
machine_i2s_type	no
machine_adc_type	no
pyb_uart_type	no
pyb_led_type	no
machine_i2c_type	no
pyb_lcd_type	no
extint_type	no
pyb_dac_type	no
pyb_uart_type	no
pyb_adc_type	no
pyb_adc_all_type	no
pyb_accel_type	no
mp_type_poll	no
pyb_led_type	no
jclass_type	no
jobject_type	yes
jmethod_type	no
ffimod_type	no
ffifunc_type	no
fficallback_type	no
ffivar_type	no
opaque_type	no
pyb_switch_type	no
machine_timer_type	no
lan_if_type	no
esp32_partition_type	no
machine_i2c_type	no
machine_adcblock_type	no
wlan_if_type	no
esp32_rmt_type	no
pin_cpu_pins_obj_type	no
pin_board_pins_obj_type	no
ppp_if_type	no
network_lan_type	no
esp_wdt_type	no
socket_type	no
wlan_if_type	no
esp_timer_type	no
pyb_pin_type	no
pin_irq_type	no
mimxrt_flash_type	no
machine_hspi_type	no
machine_adc_type	no
pyb_uart_type	no
machine_wdt_type	no
pyb_led_type	no
machine_sdcard_type	no
machine_wdt_type	no
machine_rtc_type	no
machine_sdcard_type	no
machine_spi_type	no
machine_i2s_type	no
machine_spi_type	no
machine_uart_type	no
machine_pwm_type	no
pin_type	no
pin_af_type	no
machine_rtcounter_type	no
machine_i2c_type	no
machine_adc_type	no
machine_uart_type	no
machine_temp_type	no
machine_uart_type	no
machine_timer_type	no
ubluepy_scan_entry_type	no
ubluepy_delegate_type	no
nrf_flashbdev_type	no
ubluepy_peripheral_type	no
ubluepy_characteristic_type	no
uos_mbfs_textio_type	no
uos_mbfs_fileio_type	no
board_led_type	no
machine_rtc_type	no
machine_pin_cpu_pins_obj_type	no
machine_pin_board_pins_obj_type	no
machine_pin_type	no
machine_pin_af_type	no
machine_led_type	no
esp32_nvs_type	no
machine_spi_type	no
machine_i2s_type	no
machine_i2c_type	no
machine_touchpad_type	no
machine_dac_type	no
machine_adc_type	no
ubluepy_constants_ad_types_type	no
ubluepy_constants_type	no
ubluepy_descriptor_type	no
ubluepy_service_type	no
ubluepy_scanner_type	no
ubluepy_uuid_type	no
pyb_rtc_type	no
esp32_ulp_type	no
microbit_repeat_iterator_type	no
microbit_display_type	no
microbit_image_type	no
microbit_scrolling_string_type	no
microbit_scrolling_string_iterator_type	no
string_image_facade_type	yes
microbit_facade_iterator_type	no
machine_pin_type	no
machine_pin_irq_type	no
machine_adc_type	no
mp_irq_type	no
network_server_type	no
ssl_socket_type	no
pyb_rtc_type	no
pin_type	no
pin_board_pins_obj_type	no
pyb_timer_type	no
pyb_timer_channel_type	no
pyb_spi_type	no
pyb_uart_type	no
pyb_wdt_type	no
pyb_sleep_type	no
pyb_flash_type	no
pyb_sd_type	no
mod_network_nic_type_wlan	no
pyb_i2c_type	no
sha1_type	no
sha256_type	no
socket_type	no
pyb_adc_type	no
pyb_adc_channel_type	no
ussl_socket_type	no
mp_type_poll	no
utimeq_type	yes
ussl_socket_type	no
mp_fat_vfs_type	no
mp_type_vfs_fat_fileio	no
mp_type_vfs_fat_textio	no
mp_type_vfs_posix_fileio	no
mp_type_vfs_posix_textio	no
mp_type_vfs_posix	no
mod_network_nic_type_wiznet5k	no
MP_TYPE_VFS_LFSx_(_fileio)	no
MP_TYPE_VFS_LFSx_(_textio)	no
MP_TYPE_VFS_LFSx	no
machine_pwm_type	no
task_queue_type	no
task_type	no
mod_network_nic_type_nina	no
mp_network_cyw43_type	no
socket_type	no
machine_timer_type	no
mp_type_bluetooth_uuid	yes
mp_type_bluetooth_ble	no
webrepl_type	no
uhashlib_sha256_type	no
uhashlib_sha1_type	no
uhashlib_md5_type	no
uctypes_struct_type	yes
ucryptolib_aes_type	no
lwip_slip_type	no
lwip_socket_type	no
decompio_type	no
websocket_type	no
example_type_Timer	no
mp_type_framebuf	no
mp_machine_soft_spi_type	no
btree_type	no
machine_signal_type	no
machine_mem_type	no
machine_pinbase_type	no
match_type	no
re_type	no
mp_machine_soft_i2c_type	no

dlech · 2023-01-18T18:26:44Z

dlech · 2023-01-18T19:00:13Z

The following are items that clearly use the default hash in CPython so I've checked these off the list:

Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> hash(object())
8744021866632
>>> hash(reversed([]))
8744021803801
>>> hash(iter(range(0)))
8744021803791
>>> hash(staticmethod(print))
8744021788216
>>> hash(classmethod(print))
8744021803792
>>> hash(super(object))
8744020607112
>>> hash(property(print))
8744021019997
>>> hash(sys)
8744021864997
>>> hash(iter(bytearray()))
8744021787799
>>> hash(io.BytesIO())
8744020961070
>>> hash(sys.stdin)
8744021874887
>>> hash(sys.stdin.buffer)
8744021758980
>>> hash(BaseException())
8744021020422
>>> hash(enumerate([]))
8744021738716
>>> hash(filter(None, []))
8744020380579
>>> hash(map(None, []))
8758215359260
>>> hash(zip([]))
8758214576724
>>> hash(iter({}.items()))
8758214575878
>>> hash(iter(""))
8758215358645

bytes is a special case, so I checked it off the list.

Items that need to be fixed in MicroPython:

>>> hash({}.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict_items'
>>> hash({}.keys())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict_keys'

Bound methods are discussed in #5233

This just leaves a few internals like closure and code that I haven't considered yet.

stinos · 2023-01-18T19:12:45Z

Broad question: this will effect custom types defined by users as well if I get it correctly; what's like the worst thing which could happen if no action is taken to deal with this?

dlech · 2023-01-18T19:31:58Z

Yes it will affect custom types implemented in C that don't assign anything to the unary_op slot (or inherit from a type that does).

The worst that could happen (software-wise) is that someone could be depending on their type not being hashable, i.e. trying to add an object to a set and catching the TypeError, and their code would break and they would have to fix it like we did with slice in this PR.

I can't really think of any practical cases where someone would be depending on this though.

dlech · 2023-01-18T20:03:48Z

I added a commit for mp_type_dict_view and checked it off the list. It was broken in different ways before and after the change to the default implementation, so this commit could be applied in any case.

dlech · 2023-01-18T21:05:45Z

I also searched the MicroPython code base for cases where we have a binary_op that defines the __eq__ operation without unary_op and did not find any.

dpgeorge · 2023-05-19T02:04:11Z

I checked the following types:

closure
bound method
io.BufferedWriter
thread lock (via _thread.allocate_lock)
code object (via compile(...))

In all 5 cases CPython allows hashing the resulting object/instance. And MicroPython currently does not, but with this PR it will (because those types don't have an explicit unary_op handler so will fall back to the default and support hashing). So that is a good improvement for those 5 types.

I have marked those 5 off the list above. I'll see about adding tests for them.

Since converting to variable sized slots in mp_obj_type_t, we can now reduce the code size a bit by removing mp_generic_unary_op() and the corresponding slots where it is used. Instead we just implement the generic `__hash__` operation in the runtime. Signed-off-by: David Lechner <david@pybricks.com>

As per https://bugs.python.org/issue408326, the slice object should not be hashable. Since MicroPython has an implicit fallback when the unary_op slot is empty, we need to fill this slot. Signed-off-by: David Lechner <david@pybricks.com>

This adds a unary_op implementation for the dict_view type that makes the implementation of `hash()` for these types compatible with CPython. Signed-off-by: David Lechner <david@pybricks.com>

Signed-off-by: Damien George <damien@micropython.org>

dpgeorge · 2023-05-19T02:50:21Z

Happily, this PR is still a net decrease in code size.

dpgeorge · 2023-05-19T03:06:27Z

Merged. Thanks @dlech for your efforts on this.

dpgeorge added the py-core label Jan 12, 2023

dlech force-pushed the drop-generic-unary-op branch from db42e09 to 6d174ff Compare January 13, 2023 01:54

dlech force-pushed the drop-generic-unary-op branch from 6d174ff to b8c6bb4 Compare January 13, 2023 02:24

dlech force-pushed the drop-generic-unary-op branch 2 times, most recently from 10a2147 to 0c93940 Compare January 13, 2023 17:03

dlech force-pushed the drop-generic-unary-op branch from 0c93940 to 21dfa07 Compare January 18, 2023 20:02

dlech and others added 3 commits May 19, 2023 12:04

py/objslice: Ensure slice is not hashable.

8491eb1

As per https://bugs.python.org/issue408326, the slice object should not be hashable. Since MicroPython has an implicit fallback when the unary_op slot is empty, we need to fill this slot. Signed-off-by: David Lechner <david@pybricks.com>

py/objdict: Fix __hash__ for dict_view types.

2fe6d4e

This adds a unary_op implementation for the dict_view type that makes the implementation of `hash()` for these types compatible with CPython. Signed-off-by: David Lechner <david@pybricks.com>

dpgeorge force-pushed the drop-generic-unary-op branch from 21dfa07 to 32d853e Compare May 19, 2023 02:24

tests/basics: Add more tests for hashing of various types.

9accb7d

Signed-off-by: Damien George <damien@micropython.org>

dpgeorge force-pushed the drop-generic-unary-op branch from 32d853e to 9accb7d Compare May 19, 2023 02:35

dpgeorge merged commit 9accb7d into micropython:master May 19, 2023
39 checks passed

dlech deleted the drop-generic-unary-op branch May 19, 2023 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

py/obj: Remove mp_generic_unary_op(). #10348

py/obj: Remove mp_generic_unary_op(). #10348

dlech commented Dec 27, 2022 •

edited

github-actions bot commented Dec 27, 2022 •

edited

dpgeorge commented Jan 12, 2023

dlech commented Jan 12, 2023

dpgeorge commented Jan 13, 2023

dlech commented Jan 13, 2023

codecov-commenter commented Jan 13, 2023 •

edited

dpgeorge commented Jan 18, 2023

dlech commented Jan 18, 2023

dlech commented Jan 18, 2023 •

edited by dpgeorge

dlech commented Jan 18, 2023 •

edited

stinos commented Jan 18, 2023

dlech commented Jan 18, 2023 •

edited

dlech commented Jan 18, 2023

dlech commented Jan 18, 2023

dpgeorge commented May 19, 2023

dpgeorge commented May 19, 2023

dpgeorge commented May 19, 2023

py/obj: Remove mp_generic_unary_op(). #10348

py/obj: Remove mp_generic_unary_op(). #10348

Conversation

dlech commented Dec 27, 2022 • edited

github-actions bot commented Dec 27, 2022 • edited

dpgeorge commented Jan 12, 2023

dlech commented Jan 12, 2023

dpgeorge commented Jan 13, 2023

dlech commented Jan 13, 2023

codecov-commenter commented Jan 13, 2023 • edited

Codecov Report

dpgeorge commented Jan 18, 2023

dlech commented Jan 18, 2023

dlech commented Jan 18, 2023 • edited by dpgeorge

dlech commented Jan 18, 2023 • edited

stinos commented Jan 18, 2023

dlech commented Jan 18, 2023 • edited

dlech commented Jan 18, 2023

dlech commented Jan 18, 2023

dpgeorge commented May 19, 2023

dpgeorge commented May 19, 2023

dpgeorge commented May 19, 2023

dlech commented Dec 27, 2022 •

edited

github-actions bot commented Dec 27, 2022 •

edited

codecov-commenter commented Jan 13, 2023 •

edited

dlech commented Jan 18, 2023 •

edited by dpgeorge

dlech commented Jan 18, 2023 •

edited

dlech commented Jan 18, 2023 •

edited