Skip to content

bpo-32494: Use gdbm_count for dbm_length if possible #19814

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 1, 2020

Conversation

corona10
Copy link
Member

@corona10 corona10 commented Apr 30, 2020

By this PR, we can use gdbm_count without exporting a new public API.

I ran the benchmark and it shows noticeable performance enhancement.
This can be measured by invalidating cached value.

Benchmark 1

Run len(kv) after putting new value to invalidate the cache.

| Benchmark | bpo-32494-master | bpo-32494-proposed |
+===========+==================+==============================+
| bpo-32494 | 262 us | 42.2 us: 6.20x faster (-84%) |
+-----------+------------------+------------------------------+

import pyperf

runner = pyperf.Runner()
runner.timeit(name="[bpo-32494](https://bugs.python.org/issue32494)",
              stmt="""
ret = len(kv)
kv[f'key-{ret}'] = f'value-{ret}'
"""
              ,
              setup = """
import dbm.gnu as gdbm
from test.support import TESTFN
kv = gdbm.open(TESTFN, 'c')
for i in range(1000):
    kv[f'key-{i}'] = f'value-{i}'
"""
              )

Benchmark2

Remove caching code path to measure without putting new key/value.

-    if (dp->di_size < 0) {
+    if (1) {

+-----------+--------------------+-------------------------------+
| Benchmark | bpo-32494-master-1 | bpo-32494-proposed-1 |
+===========+====================+===============================+
| bpo-32494 | 109 us | 590 ns: 185.32x faster (-99%) |
+-----------+--------------------+-------------------------------+

import pyperf

runner = pyperf.Runner()
runner.timeit(name="[bpo-32494](https://bugs.python.org/issue32494)",
              stmt="""
ret = len(kv)
"""
              ,
              setup = """
import dbm.gnu as gdbm
from test.support import TESTFN
kv = gdbm.open(TESTFN, 'c')
for i in range(1000):
    kv[f'key-{i}'] = f'value-{i}'
"""
              )

https://bugs.python.org/issue32494

@corona10
Copy link
Member Author

corona10 commented May 1, 2020

@pitrou Can you please take a look?

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing this. This looks mostly good, see below.

@bedevere-bot
Copy link

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@corona10
Copy link
Member Author

corona10 commented May 1, 2020

Thank you for the review :)

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@pitrou: please review the changes made to this pull request.

@bedevere-bot bedevere-bot requested a review from pitrou May 1, 2020 11:46
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Thanks for the Py_ssize_t fix!

@pitrou pitrou merged commit 8727664 into python:master May 1, 2020
@corona10 corona10 deleted the bpo-32494 branch May 1, 2020 13:41
@sam-s sam-s mannequin mentioned this pull request Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants