Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shelve: Shelf.clear() has very poor performance #107089

Closed
jtcave opened this issue Jul 23, 2023 · 2 comments
Closed

shelve: Shelf.clear() has very poor performance #107089

jtcave opened this issue Jul 23, 2023 · 2 comments
Labels
performance Performance or resource usage

Comments

@jtcave
Copy link
Contributor

jtcave commented Jul 23, 2023

Bug report

Calling the clear method on a shelve.Shelf object takes a very long time on databases that have thousands of entries.

It can be seen in this script, which creates a database with 10,000 entries and immediately clears it.

import os
import shelve
import tempfile

with tempfile.TemporaryDirectory() as tempdir:
    filename = os.path.join(tempdir, "test-shelf")
    with shelve.open(filename) as db:
        items = {str(x):x for x in range(10000)}
        db.update(items)
        db.clear()
print("ok")

On my M2 Mac Mini, this script takes about 2.1 seconds.

james@iris ~ % time python shelve-clear-test.py
ok
python shelve-clear-test.py  1.18s user 0.92s system 99% cpu 2.100 total

On a Debian VPS:

james@asteroid:~$ time python3 shelve-clear-test.py 
ok

real	0m43.665s
user	0m34.330s
sys	0m9.107s

Your environment

  • CPython versions tested on
    • 3.11.2
    • 3.11.4
    • cpython/main branch
  • Operating system and architecture
    • macOS/arm64 13.4.1
    • Debian/x86_64 12.1

Linked PRs

@jtcave jtcave added the type-bug An unexpected behavior, bug, or error label Jul 23, 2023
jtcave added a commit to jtcave/cpython that referenced this issue Jul 23, 2023
The clear method used to be implemented by inheriting a mix-in from the
MutableMapping ABC. It was a poor fit for shelves, and a better
implementation is now in place
@pochmann
Copy link
Contributor

How much of the 2.1 seconds is the clear method?

@jtcave
Copy link
Contributor Author

jtcave commented Jul 23, 2023

How much of the 2.1 seconds is the clear method?

Virtually all of it. The example script running on my branch takes milliseconds.

james@iris cpython-build % time ./python.exe ~/shelve-clear-test.py
ok
./python.exe ~/shelve-clear-test.py  0.05s user 0.02s system 97% cpu 0.069 total

jtcave added a commit to jtcave/cpython that referenced this issue Jul 23, 2023
The clear method used to be implemented by inheriting a mix-in from the
MutableMapping ABC. It was a poor fit for shelves, and a better
implementation is now in place
jtcave added a commit to jtcave/cpython that referenced this issue Jul 23, 2023
The prior performance fix could have disrupted non-dbm subclasses of
Shelf. The code has been refactored to put the clear logic in the
DbfilenameShelf class, which can assume the backing object is a dbm
object. The code still attempts to call the clear method on the backing
object (see pythongh-107122).
jtcave added a commit to jtcave/cpython that referenced this issue Jul 24, 2023
Because pythongh-107089 is peculiar to implementation details of dbm objects,
it would be less disruptive to implement it in the DbfilenameShelf
class, which is used for calls to shelve.open. Since it is known that
the backing object is specifically one of the dbm objects, its clear
method (see pythongh-107122) can be used with no fallback code.
jtcave added a commit to jtcave/cpython that referenced this issue Jul 27, 2023
The clear method used to be implemented by inheriting a mix-in from the
MutableMapping ABC. It was a poor fit for shelves, and a better
implementation is now in place
jtcave added a commit to jtcave/cpython that referenced this issue Jul 27, 2023
Because pythongh-107089 is peculiar to implementation details of dbm objects,
it would be less disruptive to implement it in the DbfilenameShelf
class, which is used for calls to shelve.open. Since it is known that
the backing object is specifically one of the dbm objects, its clear
method (see pythongh-107122) can be used with no fallback code.
jtcave added a commit to jtcave/cpython that referenced this issue Jul 27, 2023
The clear method used to be implemented by inheriting a mix-in from the
MutableMapping ABC. It was a poor fit for shelves, and a better
implementation is now in place
jtcave added a commit to jtcave/cpython that referenced this issue Jul 27, 2023
Because pythongh-107089 is peculiar to implementation details of dbm objects,
it would be less disruptive to implement it in the DbfilenameShelf
class, which is used for calls to shelve.open. Since it is known that
the backing object is specifically one of the dbm objects, its clear
method (see pythongh-107122) can be used with no fallback code.
jtcave added a commit to jtcave/cpython that referenced this issue Jul 28, 2023
The clear method used to be implemented by inheriting a mix-in from the
MutableMapping ABC. It was a poor fit for shelves, and a better
implementation is now in place
jtcave added a commit to jtcave/cpython that referenced this issue Jul 28, 2023
Because pythongh-107089 is peculiar to implementation details of dbm objects,
it would be less disruptive to implement it in the DbfilenameShelf
class, which is used for calls to shelve.open. Since it is known that
the backing object is specifically one of the dbm objects, its clear
method (see pythongh-107122) can be used with no fallback code.
@corona10 corona10 added performance Performance or resource usage and removed type-bug An unexpected behavior, bug, or error labels Jul 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

No branches or pull requests

3 participants