GROWTH_RATE prevents dict shrinking #77386
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
assignee = None closed_at = None created_at = <Date 2018-04-02.11:55:50.970> labels = ['interpreter-core', '3.7', '3.8', 'performance'] title = 'GROWTH_RATE prevents dict shrinking' updated_at = <Date 2022-01-26.18:08:32.273> user = 'https://github.com/methane'
activity = <Date 2022-01-26.18:08:32.273> actor = 'brandtbucher' assignee = 'none' closed = False closed_date = None closer = None components = ['Interpreter Core'] creation = <Date 2018-04-02.11:55:50.970> creator = 'methane' dependencies =  files = ['47512'] hgrepos =  issue_num = 33205 keywords = ['patch'] message_count = 8.0 messages = ['314806', '314807', '314977', '315351', '315380', '315409', '411534', '411716'] nosy_count = 8.0 nosy_names = ['rhettinger', 'methane', 'Yury.Selivanov', 'Mark.Shannon', 'eric.snow', 'serhiy.storchaka', 'miss-islington', 'brandtbucher'] pr_nums = ['6350', '6503'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'open' superseder = None type = 'resource usage' url = 'https://bugs.python.org/issue33205' versions = ['Python 3.7', 'Python 3.8']
The text was updated successfully, but these errors were encountered:
GROWTH_RATE is changed from (used*2) to (used*2 + dk_size/2) in bpo-17563, at Python 3.4.
From Python 3.6, there are no DUMMY entry. While there are dummy keys, we resize (repack) when entries are full.
(Of course, there are still benefit from slow repacking rate for new dict.
This GROWTH_RATE prevents dict is shrinked in insertion_resize().
For example, consider this dict:
>>> d = dict.fromkeys(range(10900)) >>> len(d) 10900 >>> sys.getsizeof(d) 295008 >>> for i in range(10900): ... del d[i] ... >>> len(d) 0 >>> sys.getsizeof(d) 295008
Current dk_size is 16384 and entries length is dk_size * 2 // 3 = 10922.
New dk_size is round_up_to_power_of_two(922 + 16384/2) = 16384.
>>> for i in range(923): ... d[i] = i ... >>> sys.getsizeof(d) 295008
round_up_to_power_of_two(used + dk_size/2) means dict is shrinked only when used == 0.
I propose changing GROWTH_RATE again to
In case of dict growing without deletion, dk_size is doubled for each resize as current behavior.
>>> import sys >>> d = dict.fromkeys(range(10900)) >>> sys.getsizeof(d) 295008 >>> for i in range(10900): ... del d[i] ... >>> len(d) 0 >>> sys.getsizeof(d) 295008 >>> for i in range(923): ... d[i] = i ... >>> sys.getsizeof(d) 36968
I want to backport this change to Python 3.7. While it's beta3, "dict can't be shrinked unless empty" behavior looks resource usage bug to me.
# cap2.json is master (used*2 + dk_size/2)
Benchmark hidden because not significant (5): rand_access(size=2), rand_access(size=5), rand_access(size=100), rand_access(size=200), rand_access(size=500)
The capacity of the dict is 2/3 of its hashtable size: dk_usable < 2/3 * dk_size.
Currently the dict grows if dk_usable > 1/4 * dk_size, and preserves the size if dk_usable < 1/4 * dk_size. Note that it it can grow twice if dk_usable > 1/2 * dk_size.
With the proposed change the dict will grow only if dk_usable > 1/3 * dk_size, preserve the size if 1/6 * dk_size < dk_usable < 1/3 * dk_size, and shrink if dk_usable < 1/6 * dk_size. After growing once it will no need to grow again until the number of item be increased.
We do not have filled since Python 3.6.
For example, when current dk_size == 16 and USABLE_FRACTION(dk_size) == 10, new dk_size is:
As you can see, dict is more sparse when there is dummy than when there is no dummy, except used=5/dummy=5 case.
There may be a small room for improvement, especially for