Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BZ2File leaking fd and memory #37709

Closed
nnorwitz mannequin opened this issue Jan 3, 2003 · 15 comments
Closed

BZ2File leaking fd and memory #37709

nnorwitz mannequin opened this issue Jan 3, 2003 · 15 comments
Labels
extension-modules C modules in the Modules dir

Comments

@nnorwitz
Copy link
Mannequin

nnorwitz mannequin commented Jan 3, 2003

BPO 661796
Nosy @gvanrossum, @loewis
Files
  • bz.diff: patch 2, this should be correct now
  • bz2test.diff: update test for the fd leak
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2003-02-11.18:55:19.000>
    created_at = <Date 2003-01-03.19:22:48.000>
    labels = ['extension-modules']
    title = 'BZ2File leaking fd and memory'
    updated_at = <Date 2003-02-11.18:55:19.000>
    user = 'https://bugs.python.org/nnorwitz'

    bugs.python.org fields:

    activity = <Date 2003-02-11.18:55:19.000>
    actor = 'niemeyer'
    assignee = 'niemeyer'
    closed = True
    closed_date = None
    closer = None
    components = ['Extension Modules']
    creation = <Date 2003-01-03.19:22:48.000>
    creator = 'nnorwitz'
    dependencies = []
    files = ['4862', '4863']
    hgrepos = []
    issue_num = 661796
    keywords = ['patch']
    message_count = 15.0
    messages = ['42252', '42253', '42254', '42255', '42256', '42257', '42258', '42259', '42260', '42261', '42262', '42263', '42264', '42265', '42266']
    nosy_count = 4.0
    nosy_names = ['gvanrossum', 'loewis', 'nnorwitz', 'niemeyer']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue661796'
    versions = ['Python 2.3']

    @nnorwitz
    Copy link
    Mannequin Author

    nnorwitz mannequin commented Jan 3, 2003

    The attached patch fixes BZ2File() leaking the fd and
    PyFileObject* info (name, read ahead buffer) when an
    object is deleted.

    I'm not sure the patch fixes these problems in the best
    way. It exposes most of the implementation from
    fileobject.c::file_dealloc() through a private API
    _PyFile_Dealloc().
    BZ2File derives from PyFileObject.

    Make sure the file: 'empty' exists and do:

      from bz2 import *

    A simple test which demonstrates the problem is:

      for i in range(100000): o = BZ2File('empty') ; del o

    Without the patch, you quickly get:

    IOError: [Errno 24] Too many open files: 'empty'

    You can modify this to:

      for i in range(100000): o = BZ2File('empty') ;
    o.close() ; del o

    Now you can see the memory leaks.

    @nnorwitz nnorwitz mannequin closed this as completed Jan 3, 2003
    @nnorwitz nnorwitz mannequin assigned niemeyer Jan 3, 2003
    @nnorwitz nnorwitz mannequin added the extension-modules C modules in the Modules dir label Jan 3, 2003
    @nnorwitz nnorwitz mannequin closed this as completed Jan 3, 2003
    @nnorwitz nnorwitz mannequin assigned niemeyer Jan 3, 2003
    @nnorwitz nnorwitz mannequin added the extension-modules C modules in the Modules dir label Jan 3, 2003
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Jan 3, 2003

    Logged In: YES
    user_id=21627

    Assigning to the author of the module for review. Gustavo,
    if you can't find the time for a review, feel free to
    unassign it.

    @nnorwitz
    Copy link
    Mannequin Author

    nnorwitz mannequin commented Jan 5, 2003

    Logged In: YES
    user_id=33168

    I've found a better solution: change the last line of
    BZ2File_dealloc() from:
    ((PyObject*)self)->ob_type->tp_free((PyObject *)self);

    to
      PyFile_Type.tp_dealloc((PyObject *)self);
    Updated patch attached.

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    Um, I don't understand why the BZ2File class inherits from
    FileObject. It doesn't seem to be inheriting any code, and
    there's no reason to inherit from FileObject just to make a
    file-like object. Maybe it was a misunderstanding?

    @niemeyer
    Copy link
    Mannequin

    niemeyer mannequin commented Jan 20, 2003

    Logged In: YES
    user_id=7887

    Sorry about the delay. I was on a not-so-short vacation.

    nnorwitz:

    Thanks for the patch. I'll check the problem (for my own
    understanding) and most certainly apply it.

    gvanrossum:

    BZ2File used to share a lot of code with FileObject.
    Unfortunately (for the BZ2File side), FileObject changed
    considerably post 2.2, and some of the code had to be moved
    to BZ2File itself. It still
    uses some code from FileObject, though (open, close, seek,
    part of the softspace handling, access to attributes and
    some common methods).

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    FileObject sharing: I see. But I still think it's a bad
    idea, and you're better off using stdio methods directly (as
    the evolution of the FileObject implementation has already
    shown).

    @niemeyer
    Copy link
    Mannequin

    niemeyer mannequin commented Jan 22, 2003

    Logged In: YES
    user_id=7887

    Ok. Do you want me to work on this for 2.3?

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    If you could, that would be great.

    @niemeyer
    Copy link
    Mannequin

    niemeyer mannequin commented Feb 2, 2003

    Logged In: YES
    user_id=7887

    Ok, I've started to "unparent" the bz2 code. OTOH, I've just
    realised what a lot of code, full of #ifdef's, I'll have to
    copy, mostly unchanged. The following functions

    file_new(), file_init(), open_the_file(),
    fill_file_fields(), dircheck(), close_file(),
    get_newlines(), _portable_fseek(), file_seek(), file_repr(),

    and the following arrays

    file_memberlist[], file_getsetlist[]

    My question is, do you think it's better to maintain all
    this code duplicated than maintaining bz2 up to date?

    I'll do the job, if you still think this is great.

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    I don't understand. Why would you want to copy all the
    implementation details of those functions? Have a look at
    how the zlib module emulates a file.

    @niemeyer
    Copy link
    Mannequin

    niemeyer mannequin commented Feb 3, 2003

    Logged In: YES
    user_id=7887

    zlib module doesn't emulate a file at C level. It also does
    not implement as many features as the BZ2File does
    (universal newlines, buffered IO, etc).

    The strange thing is, I thought it was a good thing to
    develop inheriting FileObject that way, since it would save
    code and maintenance, while being fast. I also thought it
    was a good model to make BZ2File behave as a subclass (I
    even mentioned that in the documentation). I'm surprised for
    being wrong.

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    IMO, being a subclass of a concrete class typically only
    makes sense if you can maintain the meaning of the instance
    variables defined by the base class. In this case I don't
    believe that's the case. If looks like you are using
    subclassing from the file class where you should really be
    using an instance variable that points to a regular file
    object. For example, operations on the file descriptor
    returned by fileno() have a very different effect for a
    BZ2file than for a regular file (since they manipulate the
    compressed file rather than the uncompressed byte stream
    represented by the BZ2file).

    You may also confuse other parts of the Python runtime that
    assume that when they see a file instance they can extract
    the underlying stdio FILE* from it and manipulate that.

    @niemeyer
    Copy link
    Mannequin

    niemeyer mannequin commented Feb 3, 2003

    Logged In: YES
    user_id=7887

    Ok.. I can't find a good way to workaround the presented
    reasons. I'll work to unparent BZ2File.

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    Thanks! I expect you'll be surprised at the gained clarity
    of the resulting code.

    @niemeyer
    Copy link
    Mannequin

    niemeyer mannequin commented Feb 11, 2003

    Logged In: YES
    user_id=7887

    Fixed in the following revisions:

    Modules/bz2module.c: 1.15
    Doc/lib/libbz2.tex: 1.6

    Thanks to everybody.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant