Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mhlib fails on Btrfs filesystem (test_mhlib failure) #52007

Closed
nascheme opened this issue Jan 22, 2010 · 20 comments
Closed

mhlib fails on Btrfs filesystem (test_mhlib failure) #52007

nascheme opened this issue Jan 22, 2010 · 20 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@nascheme
Copy link
Member

BPO 7759
Nosy @akuchling, @nascheme, @pitrou, @bitdancer, @serhiy-storchaka
Files
  • mhlib_nlinks.txt
  • mhlib_nlinks_2.patch
  • mhlib_nlinks_3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2015-11-16.20:45:18.390>
    created_at = <Date 2010-01-22.22:22:12.697>
    labels = ['type-bug', 'library']
    title = 'mhlib fails on Btrfs filesystem (test_mhlib failure)'
    updated_at = <Date 2015-11-16.20:45:18.388>
    user = 'https://github.com/nascheme'

    bugs.python.org fields:

    activity = <Date 2015-11-16.20:45:18.388>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2015-11-16.20:45:18.390>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2010-01-22.22:22:12.697>
    creator = 'nascheme'
    dependencies = []
    files = ['15975', '41010', '41011']
    hgrepos = []
    issue_num = 7759
    keywords = ['patch']
    message_count = 20.0
    messages = ['98169', '98181', '98190', '98201', '98204', '98231', '98232', '111825', '119091', '254479', '254480', '254487', '254489', '254493', '254584', '254595', '254626', '254630', '254749', '254755']
    nosy_count = 8.0
    nosy_names = ['akuchling', 'nascheme', 'pitrou', 'r.david.murray', 'BreamoreBoy', 'python-dev', 'serhiy.storchaka', 'David.Edelsohn']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'commit review'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue7759'
    versions = ['Python 2.7']

    @nascheme
    Copy link
    Member Author

    Btrfs does not maintain a link count for directories (MacOS does the same I think). That confuses mhlib.py because it uses os.stat().st_nlinks as an optimization.

    The attached patch removes the optimization and make test_mhlib pass on Btrfs (and probably HFS+) filesystems.

    @nascheme nascheme added the type-bug An unexpected behavior, bug, or error label Jan 22, 2010
    @cjw296
    Copy link
    Contributor

    cjw296 commented Jan 23, 2010

    Please can you write a test for your patch?

    @pitrou
    Copy link
    Member

    pitrou commented Jan 23, 2010

    The documentation mentions that mhlib is deprecated and mailbox should be used instead. Is there any point in trying to fix it?

    @nascheme
    Copy link
    Member Author

    On Sat, Jan 23, 2010 at 06:09:33PM +0000, Antoine Pitrou wrote:

    The documentation mentions that mhlib is deprecated and mailbox
    should be used instead. Is there any point in trying to fix it?

    It looks like Btrfs will eventually conform to traditional st_nlink
    behavior. However, that still leaves HFS+. Perhaps the easiest fix
    would be to have the unit test check for weird st_nlink behavior by
    creating a directory with a subdirectory. If something is weird,
    skip testing mhlib. The downside to that solution is that someone
    might use mhlib on a HFS+ filesystem and encounter buggy behavior.

    I can imagine that removing the optimization can make mhlib much
    slower for large mail boxes. Maybe that would be better than
    risking lost mail though. On modern machines maybe it doesn't
    matter much.

    Neil

    @pitrou
    Copy link
    Member

    pitrou commented Jan 24, 2010

    > The documentation mentions that mhlib is deprecated and mailbox
    > should be used instead. Is there any point in trying to fix it?

    It looks like Btrfs will eventually conform to traditional st_nlink
    behavior. However, that still leaves HFS+.

    That wasn't really my question. What I ask is: since mhlib is
    deprecated, why do we need to fix it while people are encouraged to use
    mailbox instead?
    And, besides, does mailbox show the same problem?

    @nascheme
    Copy link
    Member Author

    On Sun, Jan 24, 2010 at 01:25:18AM +0000, Antoine Pitrou wrote:

    That wasn't really my question. What I ask is: since mhlib is
    deprecated, why do we need to fix it while people are encouraged to use
    mailbox instead?

    Sorry, I don't understand what you are proposing. Do you mean we
    should just let the test fail for people who develop on HFS+ and
    Btrfs filesystems? That seems not so good.

    And, besides, does mailbox show the same problem?

    No, it doesn't have that optimization.

    Neil

    @pitrou
    Copy link
    Member

    pitrou commented Jan 24, 2010

    Sorry, I don't understand what you are proposing. Do you mean we
    should just let the test fail for people who develop on HFS+ and
    Btrfs filesystems? That seems not so good.

    Hmm, you are right. From a quick glance, the patch looks ok. I assume you've checked it doesn't break anything else :)

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jul 28, 2010

    Since mhlib has gone from py3k is there any interest in applying this to 2.6 or 2.7, given that there's been no response to msg98232?

    @BreamoreBoy BreamoreBoy mannequin added the stdlib Python modules in the Lib dir label Jul 28, 2010
    @nascheme
    Copy link
    Member Author

    Closing this bug. I don't think it makes sense to change the mhlib module in bugfix release. My patch is fairly simple but not simple enough to make me feel comfortable.

    @serhiy-storchaka
    Copy link
    Member

    The new buildbot edelsohn-sles-z is red from its setting at 19 Aug 2015. test_mhlib is the only failed test.

    http://buildbot.python.org/all/builders/s390x%20SLES%202.7/builds/114/steps/test/logs/stdio
    ======================================================================
    FAIL: test_listfolders (test.test_mhlib.MhlibTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/dje/cpython-buildarea/2.7.edelsohn-sles-z/build/Lib/test/test_mhlib.py", line 185, in test_listfolders
        eq(folders, tfolders)
    AssertionError: Lists differ: [] != ['deep', 'deep/f1', 'deep/f2',...

    Second list contains 6 additional elements.
    First extra element 0:
    deep

    • []
      + ['deep', 'deep/f1', 'deep/f2', 'deep/f2/f3', 'inbox', 'wide']

    ----------------------------------------------------------------------

    I think we should fix this issue. Proposed patch adds a test that we can use nlinks for count a number of subdirectories.

    @serhiy-storchaka
    Copy link
    Member

    Here is even simpler and more reliable patch. It works even if the subfolder is a symlink to the directory on the filesystem that doesn't support links counting for directories.

    @bitdancer
    Copy link
    Member

    Sure, why not. Having buildbots be green is good, and I doubt anyone is using mhlib any more even in python2. And if they are the chances this will break something seems extremely small.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 11, 2015

    New changeset 37431d9abbcd by Serhiy Storchaka in branch '2.7':
    Issue bpo-7759: Fixed the mhlib module on filesystems that doesn't support
    https://hg.python.org/cpython/rev/37431d9abbcd

    @serhiy-storchaka
    Copy link
    Member

    The test now is passed.

    @nascheme
    Copy link
    Member Author

    I don't see how that patch can be correct. The logic is now if the directory has two links inside it then skip it. The filesystems that don't count '.' and '..' will have zero links when empty and will have two links when two real files exist in them.

    I think my original patch is safer.

    @serhiy-storchaka
    Copy link
    Member

    AFAIK the filesystem either counts directory references, in this case st_nlink >= 2, and st_nlink == 2 only for empty directory. The exception is a root directory, but it is not relevant. Or it doesn't count directory references, in this case st_nlink == 1.

    @nascheme
    Copy link
    Member Author

    So what happens for the filesystems that doesn't count '.' and '..'? It looks to me like if there are exactly two messages in a folder then the revised code will return [] (i.e. it will think the folder is empty). Probably we should revise the unit test to make a folder with two messages.

    @serhiy-storchaka
    Copy link
    Member

    st_nlink is not related to the number of messages in a folder. It is a number of hard links.

    If the filesystem supports hard links counting for directories, every directory (except /) has at least two links: one from its parent directory, and one from itself (via "."). Every subdirectory adds yet one hard link via "..". Non-directory files don't create hard links.

    Typical mail folder can contain thousands of messages and none or only a few subfolders. Subfolders (if there are any) usually are created before messages and hence encountered first in directory listing. Hereby the optimization can have significant effect.

    If there is a real case when st_nlink != 1 and is less then a number of subdirectories + 2, we should consider removing the optimization.

    @nascheme
    Copy link
    Member Author

    Okay, feel free to close this bug. I had heard that HFS+ counts files but I don't have a way to verify that.

    @serhiy-storchaka
    Copy link
    Member

    Yes, what I read, HFS+ counts files. st_ntlink is a number of files and directories + 2 (perhaps for fake "." and ".."). This value never less than 2 + number_of_subdirectories, hence the code should work. Just an optimization has no any effect (as well as on FS where st_ntlink == 1).

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants