mailbox.Maildir re-reads directory too often #44299
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
assignee = 'https://github.com/akuchling' closed_at = <Date 2009-05-03.02:53:35.412> created_at = <Date 2006-12-03.17:28:11.000> labels = ['library', 'performance'] title = 'mailbox.Maildir re-reads directory too often' updated_at = <Date 2009-05-03.02:53:35.398> user = 'https://github.com/doko42'
activity = <Date 2009-05-03.02:53:35.398> actor = 'akuchling' assignee = 'akuchling' closed = True closed_date = <Date 2009-05-03.02:53:35.412> closer = 'akuchling' components = ['Library (Lib)'] creation = <Date 2006-12-03.17:28:11.000> creator = 'doko' dependencies =  files = ['2246', '13839'] hgrepos =  issue_num = 1607951 keywords = ['patch'] message_count = 6.0 messages = ['30739', '30740', '30741', '30742', '86969', '86996'] nosy_count = 3.0 nosy_names = ['akuchling', 'doko', 'joshtriplett'] pr_nums =  priority = 'normal' resolution = 'fixed' stage = 'test needed' status = 'closed' superseder = None type = 'resource usage' url = 'https://bugs.python.org/issue1607951' versions = ['Python 2.7']
The text was updated successfully, but these errors were encountered:
[forwarded from http://bugs.debian.org/401395]
Various functions in mailbox.Maildir call self._refresh, which always re-reads the cur and new directories with os.listdir. _refresh should stat each of the two directories first to see if they changed. This cuts processing time of a series of lookups down by a factor of the number of messages in the folder, a potentially large number.
By stat()'ing the directories, do you mean checking the mtime? I think this isn't safe because of the limited resolution of mtime on filesystems; ext3 seems to have a 1-second resolution for mtime, for example. This means that _refresh() might read a directory, and if some other process adds or deletes a message in the same second, _refresh() couldn't detect the change. Is there some other property of directories that could be used for a more reliable check?
The attached patch implements checking of mtime, but I don't recommend applying it; it causes the test suite in test_mailbox.py to break all over the place, because the process modifies mailboxes so quickly that the mtime check doesn't notice the process's own changes.
I'll wait a bit for any alternative suggestion, and then close this bug as "won't fix".
File Added: mailbox-mtime.patch
Stray thought: would it help if the patch stored the (mtime - 1sec) instead of the mtime? Successive calls in the same second would then always re-read the directories, but once the clock ticks to the next second, re-reads would only occur if the directories have actually changed. The check would be 'if new_mtime > self._new_mtime' instead of '=='.
Is this sort of mtime-based checking reliable on remote filesystems such as NFS?
Regardless of the resolution on any particular filesystem, stat and stat64
True. mailbox.Maildir's behavior of always representing the current contents
The two solutions below (inotify and your suggested mtime-1 approach) would
On Linux, you could use inotify to get a notice when anything changes in the
Please don't. The performance hit of repeatedly re-reading the directory
Good idea. That would work fine as well, and I believe it would have
Compared to inotify, this would have somewhat more overhead (but still far
This particular sort of checking, yes, I think so. The times do not
Thank you for looking at this problem,