Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

py/obj: Remove mp_generic_unary_op(). #10348

Merged
merged 4 commits into from May 19, 2023

Conversation

dlech
Copy link
Sponsor Contributor

@dlech dlech commented Dec 27, 2022

Since converting to variable sized slots in mp_obj_type_t, we can now reduce the code size a bit by removing mp_generic_unary_op() and the corresponding slots where it is used. Instead we just implement the generic __hash__ operation in the runtime.

@github-actions
Copy link

github-actions bot commented Dec 27, 2022

Code size report:

   bare-arm:   -28 -0.050% 
minimal x86:  -424 -0.228% [incl -192(data)]
   unix x64:  -416 -0.052% standard[incl -192(data)]
      stm32:   -32 -0.008% PYBV10
        rp2:   -32 -0.010% PICO

@dpgeorge
Copy link
Member

This looks like a good change (to reduce code size, and simplify things).

But, does it now allow hashing of objects that were not previously hashable, and which should not be hashable?

@dlech
Copy link
Sponsor Contributor Author

dlech commented Jan 12, 2023

But, does it now allow hashing of objects that were not previously hashable, and which should not be hashable?

No. We know this is the case because all of the tests are still passing. This works because the fallback only takes effect when the unary_op slot is empty. Any object that should not be hashable already implements unary_op so doesn't hit the fallback.

@dpgeorge
Copy link
Member

Here are some examples of things that now become hashable by the change in this PR, that were not previously hashable:

hash(list.pop)
hash([].pop)

# closure
def f(x):
    def g():
        return x
    return g
hash(f(1))

# slice
class A:
    def __getitem__(self, i):
        return i
hash(A()[1:2])

The first 3 are hashable in CPython, so that's an improvement. The last one (slice instance) is not hashable in CPython.

@dlech
Copy link
Sponsor Contributor Author

dlech commented Jan 13, 2023

The last one (slice instance) is not hashable in CPython.

Good one. I found the reason for this and added a fix and a test.

@codecov-commenter
Copy link

codecov-commenter commented Jan 13, 2023

Codecov Report

Merging #10348 (21dfa07) into master (67097d8) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master   #10348   +/-   ##
=======================================
  Coverage   98.50%   98.50%           
=======================================
  Files         155      155           
  Lines       20537    20541    +4     
=======================================
+ Hits        20229    20233    +4     
  Misses        308      308           
Impacted Files Coverage Δ
py/obj.c 97.70% <ø> (-0.05%) ⬇️
py/obj.h 100.00% <ø> (ø)
py/objfun.c 100.00% <ø> (ø)
py/objgenerator.c 100.00% <ø> (ø)
py/objnone.c 100.00% <ø> (ø)
py/objsingleton.c 100.00% <ø> (ø)
py/objtype.c 100.00% <ø> (ø)
py/objdict.c 100.00% <100.00%> (ø)
py/objslice.c 100.00% <100.00%> (ø)
py/runtime.c 98.89% <100.00%> (+<0.01%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@dlech dlech force-pushed the drop-generic-unary-op branch 2 times, most recently from 10a2147 to 0c93940 Compare January 13, 2023 17:03
@dpgeorge
Copy link
Member

The remaining comment/concern I have about this change is that now most of the non-core types are hashable, eg:

  • machine.Pin
  • regex match instances
  • Vfs objects
  • files

And maybe other things I didn't think of.

Actually, it looks like files are hashable in CPython (but not in uPy without this PR), so that is again an improvement.

I think we need to go through all types and see which become hashable and decide whether that's acceptable or not.

@dlech
Copy link
Sponsor Contributor Author

dlech commented Jan 18, 2023

This should be the nearly complete list.

mp_types_unary_op_status.csv

type has unary_op
mp_irq_type no
mp_type_set yes
mp_type_frozenset yes
mp_type_NoneType yes
mp_type_singleton yes
mp_type_slice no
mp_type_zip no
mp_type_array yes
mp_type_bytearray yes
mp_type_memoryview yes
mp_type_array_it no
mp_type_str yes
mp_type_checked_fun no
mp_type_settrace_codeobj yes
mp_type_frame yes
mp_type_range_it no
mp_type_range yes
mp_type_reversed no
mp_type_type yes
mp_type_super no
mp_type_staticmethod no
mp_type_classmethod no
mp_type_polymorph_iter no
mp_type_polymorph_iter_with_finaliser no
mp_type_object no
mp_type_stringio no
mp_type_bytesio no
mp_type_list yes
mp_type_str yes
mp_type_bytes no
mp_type_module no
mp_type_property no
mp_type_map no
stdio_obj_type no
stdio_buffer_obj_type no
mp_type_int yes
mp_type_float yes
mp_type_BaseException no
mp_type_enumerate no
mp_type_filter no
mp_type_deque yes
mp_type_dict_view_it no
mp_type_dict_view no
mp_type_dict yes
mp_type_ordereddict yes
mp_type_complex yes
mp_type_closure no
mp_type_bound_meth no
mp_type_attrtuple yes
mp_type_gen_wrap yes
mp_type_native_gen_wrap yes
mp_type_gen_instance yes
mp_type_fun_builtin_0 yes
mp_type_fun_builtin_1 yes
mp_type_fun_builtin_2 yes
mp_type_fun_builtin_3 yes
mp_type_fun_builtin_var yes
mp_type_fun_bc yes
mp_type_fun_native yes
mp_type_fun_asm yes
mp_type_iobase no
mp_type_bufwriter no
mp_type_thread_lock no
mp_type_code no
machine_i2c_type no
machine_rtc_type no
machine_spi_type no
machine_uart_type no
samd_flash_type no
machine_wdt_type no
machine_pin_type no
machine_dac_type no
machine_adc_type no
pyb_switch_type no
pyb_timer_type no
pyb_timer_channel_type no
pyb_flash_type no
rp2_pio_type no
rp2_state_machine_type no
rp2_flash_type no
machine_uart_type no
machine_spi_type no
machine_pin_type no
pin_cpu_pins_obj_type no
pin_board_pins_obj_type no
machine_i2c_type no
machine_adc_type no
machine_rtc_type no
ra_led_type no
extint_type no
machine_wdt_type no
machine_uart_type no
machine_timer_type no
machine_rtc_type no
machine_spi_type no
machine_pin_af_type no
pin_cpu_pins_obj_type no
pin_board_pins_obj_type no
machine_pin_type no
machine_i2s_type no
mp_type_tuple yes
machine_i2c_type no
machine_adc_type no
mp_type_it no
machine_i2c_type no
machine_uart_type no
sensor_type no
zephyr_disk_access_type no
zephyr_flash_area_type no
socket_type no
machine_spi_type no
machine_pin_type no
mp_type_stest_fileio no
mp_type_stest_textio2 no
mp_type_socket no
pyb_timer_type no
pyb_timer_channel_type no
machine_spi_type no
pin_type no
pin_af_type no
pyb_spi_type no
pyb_flash_type no
pyb_switch_type no
pyb_wdt_type no
pyb_can_type no
pyb_usb_vcp_type no
pyb_usb_hid_type no
pyb_timer_type no
pyb_timer_channel_type no
pyb_servo_type no
pyb_sdcard_type no
pyb_mmcard_type no
pyb_i2c_type no
pin_cpu_pins_obj_type no
pin_board_pins_obj_type no
pyb_rtc_type no
network_lan_type no
machine_i2s_type no
machine_adc_type no
pyb_uart_type no
pyb_led_type no
machine_i2c_type no
pyb_lcd_type no
extint_type no
pyb_dac_type no
pyb_uart_type no
pyb_adc_type no
pyb_adc_all_type no
pyb_accel_type no
mp_type_poll no
pyb_led_type no
jclass_type no
jobject_type yes
jmethod_type no
ffimod_type no
ffifunc_type no
fficallback_type no
ffivar_type no
opaque_type no
pyb_switch_type no
machine_timer_type no
lan_if_type no
esp32_partition_type no
machine_i2c_type no
machine_adcblock_type no
wlan_if_type no
esp32_rmt_type no
pin_cpu_pins_obj_type no
pin_board_pins_obj_type no
ppp_if_type no
network_lan_type no
esp_wdt_type no
socket_type no
wlan_if_type no
esp_timer_type no
pyb_pin_type no
pin_irq_type no
mimxrt_flash_type no
machine_hspi_type no
machine_adc_type no
pyb_uart_type no
machine_wdt_type no
pyb_led_type no
machine_sdcard_type no
machine_wdt_type no
machine_rtc_type no
machine_sdcard_type no
machine_spi_type no
machine_i2s_type no
machine_spi_type no
machine_uart_type no
machine_pwm_type no
pin_type no
pin_af_type no
machine_rtcounter_type no
machine_i2c_type no
machine_adc_type no
machine_uart_type no
machine_temp_type no
machine_uart_type no
machine_timer_type no
ubluepy_scan_entry_type no
ubluepy_delegate_type no
nrf_flashbdev_type no
ubluepy_peripheral_type no
ubluepy_characteristic_type no
uos_mbfs_textio_type no
uos_mbfs_fileio_type no
board_led_type no
machine_rtc_type no
machine_pin_cpu_pins_obj_type no
machine_pin_board_pins_obj_type no
machine_pin_type no
machine_pin_af_type no
machine_led_type no
esp32_nvs_type no
machine_spi_type no
machine_i2s_type no
machine_i2c_type no
machine_touchpad_type no
machine_dac_type no
machine_adc_type no
ubluepy_constants_ad_types_type no
ubluepy_constants_type no
ubluepy_descriptor_type no
ubluepy_service_type no
ubluepy_scanner_type no
ubluepy_uuid_type no
pyb_rtc_type no
esp32_ulp_type no
microbit_repeat_iterator_type no
microbit_display_type no
microbit_image_type no
microbit_scrolling_string_type no
microbit_scrolling_string_iterator_type no
string_image_facade_type yes
microbit_facade_iterator_type no
machine_pin_type no
machine_pin_irq_type no
machine_adc_type no
mp_irq_type no
network_server_type no
ssl_socket_type no
pyb_rtc_type no
pin_type no
pin_board_pins_obj_type no
pyb_timer_type no
pyb_timer_channel_type no
pyb_spi_type no
pyb_uart_type no
pyb_wdt_type no
pyb_sleep_type no
pyb_flash_type no
pyb_sd_type no
mod_network_nic_type_wlan no
pyb_i2c_type no
sha1_type no
sha256_type no
socket_type no
pyb_adc_type no
pyb_adc_channel_type no
ussl_socket_type no
mp_type_poll no
utimeq_type yes
ussl_socket_type no
mp_fat_vfs_type no
mp_type_vfs_fat_fileio no
mp_type_vfs_fat_textio no
mp_type_vfs_posix_fileio no
mp_type_vfs_posix_textio no
mp_type_vfs_posix no
mod_network_nic_type_wiznet5k no
MP_TYPE_VFS_LFSx_(_fileio) no
MP_TYPE_VFS_LFSx_(_textio) no
MP_TYPE_VFS_LFSx no
machine_pwm_type no
task_queue_type no
task_type no
mod_network_nic_type_nina no
mp_network_cyw43_type no
socket_type no
machine_timer_type no
mp_type_bluetooth_uuid yes
mp_type_bluetooth_ble no
webrepl_type no
uhashlib_sha256_type no
uhashlib_sha1_type no
uhashlib_md5_type no
uctypes_struct_type yes
ucryptolib_aes_type no
lwip_slip_type no
lwip_socket_type no
decompio_type no
websocket_type no
example_type_Timer no
mp_type_framebuf no
mp_machine_soft_spi_type no
btree_type no
machine_signal_type no
machine_mem_type no
machine_pinbase_type no
match_type no
re_type no
mp_machine_soft_i2c_type no

@dlech
Copy link
Sponsor Contributor Author

dlech commented Jan 18, 2023

In CPython, any type written in Python will inherit the default __hash__ from object unless overridden, so for all MicroPython-specific types, I think it makes sense for this to be the default. I think a useful criteria for "should this be hashable?" is "does it make sense to use it as a dict key or in a set?" (This was actually what triggered looking into this change in the first place - I wanted to use an arbitrary object as a dict key to associate metadata with it as you can in CPython but couldn't in MicroPython.)

So unless there are any MicroPython-specific types (extmod) that are "odd" and should not be able to be used as a dict key, I think we can lump all of those in the "the change is actually an improvement" pile.

So that just leaves core types where we should make sure they match the CPython behavior if applicable. Anything in the table above that says "yes" for "has unary_op" is not going to change behavior with the PR. And mp_type_slice has already been addressed.

So that leaves us with the following that could warrant further inspection:

  • mp_type_zip
  • mp_type_array_it
  • mp_type_checked_fun
  • mp_type_range_it
  • mp_type_reversed
  • mp_type_super
  • mp_type_staticmethod
  • mp_type_classmethod
  • mp_type_polymorph_iter
  • mp_type_polymorph_iter_with_finaliser
  • mp_type_object
  • mp_type_stringio
  • mp_type_bytesio
  • mp_type_bytes
  • mp_type_module
  • mp_type_property
  • mp_type_map
  • stdio_obj_type
  • stdio_buffer_obj_type
  • mp_type_BaseException
  • mp_type_enumerate
  • mp_type_filter
  • mp_type_dict_view_it
  • mp_type_dict_view
  • mp_type_closure
  • mp_type_bound_meth
  • mp_type_iobase
  • mp_type_bufwriter
  • mp_type_thread_lock
  • mp_type_code

@dlech
Copy link
Sponsor Contributor Author

dlech commented Jan 18, 2023

The following are items that clearly use the default hash in CPython so I've checked these off the list:

Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> hash(object())
8744021866632
>>> hash(reversed([]))
8744021803801
>>> hash(iter(range(0)))
8744021803791
>>> hash(staticmethod(print))
8744021788216
>>> hash(classmethod(print))
8744021803792
>>> hash(super(object))
8744020607112
>>> hash(property(print))
8744021019997
>>> hash(sys)
8744021864997
>>> hash(iter(bytearray()))
8744021787799
>>> hash(io.BytesIO())
8744020961070
>>> hash(sys.stdin)
8744021874887
>>> hash(sys.stdin.buffer)
8744021758980
>>> hash(BaseException())
8744021020422
>>> hash(enumerate([]))
8744021738716
>>> hash(filter(None, []))
8744020380579
>>> hash(map(None, []))
8758215359260
>>> hash(zip([]))
8758214576724
>>> hash(iter({}.items()))
8758214575878
>>> hash(iter(""))
8758215358645

bytes is a special case, so I checked it off the list.

Items that need to be fixed in MicroPython:

>>> hash({}.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict_items'
>>> hash({}.keys())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict_keys'

Bound methods are discussed in #5233

This just leaves a few internals like closure and code that I haven't considered yet.

@stinos
Copy link
Contributor

stinos commented Jan 18, 2023

Broad question: this will effect custom types defined by users as well if I get it correctly; what's like the worst thing which could happen if no action is taken to deal with this?

@dlech
Copy link
Sponsor Contributor Author

dlech commented Jan 18, 2023

Yes it will affect custom types implemented in C that don't assign anything to the unary_op slot (or inherit from a type that does).

The worst that could happen (software-wise) is that someone could be depending on their type not being hashable, i.e. trying to add an object to a set and catching the TypeError, and their code would break and they would have to fix it like we did with slice in this PR.

I can't really think of any practical cases where someone would be depending on this though.

@dlech
Copy link
Sponsor Contributor Author

dlech commented Jan 18, 2023

I added a commit for mp_type_dict_view and checked it off the list. It was broken in different ways before and after the change to the default implementation, so this commit could be applied in any case.

@dlech
Copy link
Sponsor Contributor Author

dlech commented Jan 18, 2023

I also searched the MicroPython code base for cases where we have a binary_op that defines the __eq__ operation without unary_op and did not find any.

@dpgeorge
Copy link
Member

I checked the following types:

  • closure
  • bound method
  • io.BufferedWriter
  • thread lock (via _thread.allocate_lock)
  • code object (via compile(...))

In all 5 cases CPython allows hashing the resulting object/instance. And MicroPython currently does not, but with this PR it will (because those types don't have an explicit unary_op handler so will fall back to the default and support hashing). So that is a good improvement for those 5 types.

I have marked those 5 off the list above. I'll see about adding tests for them.

dlech and others added 3 commits May 19, 2023 12:04
Since converting to variable sized slots in mp_obj_type_t, we can now
reduce the code size a bit by removing mp_generic_unary_op() and the
corresponding slots where it is used. Instead we just implement the
generic `__hash__` operation in the runtime.

Signed-off-by: David Lechner <david@pybricks.com>
As per https://bugs.python.org/issue408326, the slice object should not be
hashable.  Since MicroPython has an implicit fallback when the unary_op
slot is empty, we need to fill this slot.

Signed-off-by: David Lechner <david@pybricks.com>
This adds a unary_op implementation for the dict_view type that makes
the implementation of `hash()` for these types compatible with CPython.

Signed-off-by: David Lechner <david@pybricks.com>
Signed-off-by: Damien George <damien@micropython.org>
@dpgeorge
Copy link
Member

Happily, this PR is still a net decrease in code size.

@dpgeorge dpgeorge merged commit 9accb7d into micropython:master May 19, 2023
39 checks passed
@dpgeorge
Copy link
Member

Merged. Thanks @dlech for your efforts on this.

@dlech dlech deleted the drop-generic-unary-op branch May 19, 2023 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants