-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
dbm.dumb should be consistent when the database is closed #63584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This problem occurred in bpo-19282. Basicly, dbm.dumb is not consistent with dbm.gnu and dbm.ndbm when the database is closed, as seen in the following: >>> import dbm.dumb as d
>>> db = d.open('test.dat', 'c')
>>> db.close()
>>> db.keys()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/tank/libs/cpython/Lib/dbm/dumb.py", line 212, in keys
return list(self._index.keys())
AttributeError: 'NoneType' object has no attribute 'keys'
>>> vs >>> import dbm.gnu as g
>>> db = g.open('test.dat', 'c')
>>> db.close()
>>> db.keys()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
_gdbm.error: GDBM object has already been closed
>>> Attaching a patch. |
Patch looks good to me, I'll apply it when I apply bpo-19282. |
Additional checks can slowdown the code. It doesn't matter in some operations, but not on __len__, __contains__, etc. EAFP approach is to catch AttributeError and raise appropriate dbm.dumb.error exception. |
Yet one nitpick. I think that closing check should be after argument type check and key encoding. |
Here's the new version which addresses your last comment. Regarding the first issue, I don't believe that the result will be as readable (but I agree with you that it will be better). For instance, try: but will look totally different for other __len__: try: We could catch TypeError only for dunder methods though and for the rest of the methods check the value of _index before access. |
Serhiy, what's the status for this? Should this be targeted for 3.5? |
Yes, perhaps performance of the dbm.dumb module is not worst this cost. Are you have any benchmarking results? The patch LGTM.
I think yes. |
Here's a benchmark:
# ./python -S -m timeit -n 10000 -s 'import dbm.dumb as dbm; d=dbm.open("x.dat", "c");d.close()' 'try:' ' len(d)' 'except OSError:' ' pass'
# ./python -S -m timeit -n 10000 -s 'import dbm.dumb as dbm; d=dbm.open("x.dat", "c");d.close()' 'try:' ' len(d)' 'except OSError:' ' pass' Now this seems odd, maybe catching + reraising an exception has a greater overhead than a func call and checking if an attribute is None. Anyway, I agree that dbm.dumb is not performance critical. |
IMHO it is a bug fix, not a new feature, and could be applied in 3.3 and 3.4. |
Yes, of course, catching + reraising an exception is costly. But when an exception is not raised, this is cheap. Usually len() is called for open database.
Hmm. If consider dbm.dumb as separated module, this is perhaps a new feature. But if consider it as an implementation of the dbm protocol, it should conform to other dbm modules, and this is a bugfix. What would you say Nick? |
I think it's definitely still OK to fix this in 3.4, but I think it's borderline enough to avoid including in the last ever 3.3 release. Changing exception types always introduces a little backwards compatibility risk, so it's something I lean towards only doing in new feature releases, even in cases like this where the old exception was clearly not the best choice. |
msg204533 suggests that there will be >=2 bug fix releases in 3.3 branch (3.3.4 soon and 3.3.5 around release of 3.4.0). |
That's slightly more acceptable, but it's still hard to make the case |
If we agree that this should be fixed in 3.4, can somebody commit it before the second beta? |
Updated the patch to catch dbm.dumb.error in tests (reverting a change added in c2f1bb56760d). |
Can this patch be committed, now that 3.5 is active? |
_check_closed sounds like you expect it to be closed, and might even assert that it is closed, except that you want the check run even in release mode and/or it might fail. Since being closed is actually the unexpectedly broken state, I would prefer that you rename it to something else, like _verify_open. |
gzip uses the same name, _check_closed, but your suggestion sounds good. I'll update the patch. |
I think the requested timing regression was for the non-broken case. When the database has NOT been closed, and keys() still works, will it be way slower than before? Note that I am not asking you to do that test (though the eventual committer might); the implementation of whichdb leaves me believing that almost anyone who is likely to care will have already followed the docunmentation's recommendation to install another *dbm ahead of dumb. |
On my machine I get the following results for the unclosed-database case. With patch: # ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c');len(d)" Without patch: # ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c');len(d)" |
New changeset e8343cb98cc3 by Benjamin Peterson in branch '3.4': New changeset dbceba88b96e by Benjamin Peterson in branch 'default': |
Claudiu, your benchmark is broken, it measure a no-op. Correct benchmark: $ ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')" "len(d)" 3.4: 100000 loops, best of 3: 2.56 usec per loop There is 60% regression. |
Did you try 3.5 without the patch, in case the issue is not with his code?
|
Right, my benchmark was indeed flawed. Here are the new results on my machine: Without the patch With the patch Even having an empty _verify_open in __len__ method leads to this: # ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')" "len(d)" |
Here's a new patch which uses the EAFP approach for dunder methods (len, __contains__ etc) and the _verify_open method for the other methods (.keys, .iterkeys) etc. Now the results are similar with the benchmark without the patch. |
There is no need to speed up methods which do IO (getitem, __setitem__, __delitem__). However method which works only with an index (keys, iterkeys, __contains__, __len__) can be optimized. In the __contains__ method an exception can be raised not only by nen-existent __contains__ of None, but from __hash__ or __eq__ methods of a key, so we should distinguish these cases. |
Looks good to me. |
New changeset 95207bcd8298 by Serhiy Storchaka in branch '3.4': New changeset 2e59e0b579e5 by Serhiy Storchaka in branch 'default': |
Thanks Claudiu. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: