Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch for #1586414 to avoid fragmentation on Windows #44180

Closed
enochjul mannequin opened this issue Oct 31, 2006 · 8 comments
Closed

Patch for #1586414 to avoid fragmentation on Windows #44180

enochjul mannequin opened this issue Oct 31, 2006 · 8 comments
Assignees
Labels
stdlib Python modules in the Lib dir

Comments

@enochjul
Copy link
Mannequin

enochjul mannequin commented Oct 31, 2006

BPO 1587674
Nosy @josiahcarlson, @gustaebel
Files
  • tarfile_set_length.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gustaebel'
    closed_at = <Date 2007-01-22.19:08:04.000>
    created_at = <Date 2006-10-31.05:05:25.000>
    labels = ['library']
    title = 'Patch for python/cpython#44173 to avoid fragmentation on Windows'
    updated_at = <Date 2007-01-22.19:08:04.000>
    user = 'https://bugs.python.org/enochjul'

    bugs.python.org fields:

    activity = <Date 2007-01-22.19:08:04.000>
    actor = 'lars.gustaebel'
    assignee = 'lars.gustaebel'
    closed = True
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2006-10-31.05:05:25.000>
    creator = 'enochjul'
    dependencies = []
    files = ['7598']
    hgrepos = []
    issue_num = 1587674
    keywords = ['patch']
    message_count = 8.0
    messages = ['51299', '51300', '51301', '51302', '51303', '51304', '51305', '51306']
    nosy_count = 3.0
    nosy_names = ['josiahcarlson', 'lars.gustaebel', 'enochjul']
    pr_nums = []
    priority = 'normal'
    resolution = 'rejected'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue1587674'
    versions = ['Python 2.6']

    @enochjul
    Copy link
    Mannequin Author

    enochjul mannequin commented Oct 31, 2006

    Add a call to file.truncate() to inform Windows of the
    size of the target file in makefile(). This helps
    guide cluster allocation in NTFS to avoid fragmentation.

    @enochjul enochjul mannequin closed this as completed Oct 31, 2006
    @enochjul enochjul mannequin assigned gustaebel Oct 31, 2006
    @enochjul enochjul mannequin added the stdlib Python modules in the Lib dir label Oct 31, 2006
    @enochjul enochjul mannequin closed this as completed Oct 31, 2006
    @enochjul enochjul mannequin assigned gustaebel Oct 31, 2006
    @enochjul enochjul mannequin added the stdlib Python modules in the Lib dir label Oct 31, 2006
    @gustaebel
    Copy link
    Mannequin

    gustaebel mannequin commented Nov 1, 2006

    Logged In: YES
    user_id=642936

    Is this merely an NTFS problem or is it the same with FAT fs?
    How do you detect file fragmentation?
    Doesn't this problem apply to all other modules or scripts
    that write to file objects as well?
    Shouldn't a decent filesystem be able to handle growing
    files in a correct manner?

    @enochjul
    Copy link
    Mannequin Author

    enochjul mannequin commented Nov 6, 2006

    Logged In: YES
    user_id=6071

    I have not really tested FAT/FAT32 yet as I don't use these
    filesystems now.

    The Disk Defragmenter tool in Windows 2000/XP shows the number of
    files/directories fragmented in its report.

    NTFS does handle growing files, but the operating system can only do
    so much without knowing the size of the file. Extracting from
    archives consisting of only several files does not cause
    fragmentation. However, if the archive has many files, it is much
    more likely that the default algorithm will fail to allocate
    contiguous clusters for some files. It may also depend on the amount
    of free space fragmentation on a particular partition and whether
    other processes are writing to other files in the same partition.

    Some details of the cluster allocation algorithm used in Windows can
    be found at http://support.microsoft.com/kb/841551.

    @gustaebel
    Copy link
    Mannequin

    gustaebel mannequin commented Nov 6, 2006

    Logged In: YES
    user_id=642936

    Personally, I think disk defragmenters are evil ;-) They
    create the need that they are supposed to satisfy at the
    same time. On Linux we have no defragmenters, so we don't
    bother about it.

    I think your proposal is some kind of a performance hack for
    a particular filesystem. In principle, this problem exists
    for all filesystems on all platforms. Fragmentation is IMO a
    filesystem's problem and is not so much a state but more
    like a process. Filesystem fragment over time and you can't
    do anything about it. For those people who care, disk
    fragmenter were invented. It is not tarfile.py's job to care
    about a fragmented filesystem, that's simply too low level.

    I admit that it is a small patch, but I'm -1 on having this
    applied.

    @josiahcarlson
    Copy link
    Mannequin

    josiahcarlson mannequin commented Nov 8, 2006

    Logged In: YES
    user_id=341410

    I disagree with user gustaebel. We should be adding
    automatic truncate calls for all possible supported
    platforms, in all places where it could make sense. Be it
    in tarfile, zipfile, where ever we can. It would make sense
    to write a function that can be called by all of those
    modules so that there is only one place to update if/when
    changes occur. If the function were not part of the public
    Python API, then it wouldn't need to wait until 2.6, unless
    it were considered a feature addition rather than bugfix.
    One would have to wait on a response from Martin or Anthony
    to know which it was, though I couldn't say for sure if
    operations that are generally performance enhancing are
    bugfixes or feature additions.

    @gustaebel
    Copy link
    Mannequin

    gustaebel mannequin commented Nov 8, 2006

    Logged In: YES
    user_id=642936

    You both still fail to convince me and I still don't see
    need for action. The only case ATM where this addition makes
    sense (in your opinion) is the Windows OS when using the
    NTFS filesystem and certain conditions are met. NTFS has a
    preallocation algorithm to deal with this. We don't know if
    there is any advantage on FAT filesystems.

    On Linux for example there is a plethora of supported
    filesystems. Some of them may take advantage, others may
    not. Who knows? We can't even detect which filesystem type
    we are currently writing to. Apart from that, the behaviour
    of truncate(arg) with arg > filesize seems to be
    system-dependent.

    So, IMO this is a very special optimization targeted at a
    single platform. The TarFile class is easily subclassable,
    just override the makefile() method and add the two lines of
    code. I think that's what ActiveState's Python Cookbook is for.

    BTW, I like my files to grow bit by bit. In case of an
    error, I can detect if a file was not extracted completely
    by comparing the file sizes. Furthermore, a file that grows
    is more common and more what a programmer who uses this
    module might expect.

    @gustaebel
    Copy link
    Mannequin

    gustaebel mannequin commented Dec 23, 2006

    Any progress on this one?

    @gustaebel
    Copy link
    Mannequin

    gustaebel mannequin commented Jan 22, 2007

    Closed due to lack of interest.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    0 participants