Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak in random number generation #47313

Closed
gtang mannequin opened this issue Jun 8, 2008 · 11 comments
Closed

memory leak in random number generation #47313

gtang mannequin opened this issue Jun 8, 2008 · 11 comments
Labels
performance Performance or resource usage

Comments

@gtang
Copy link
Mannequin

gtang mannequin commented Jun 8, 2008

BPO 3063
Nosy @tim-one, @facundobatista, @amauryfa
Files
  • unnamed
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2008-06-08.16:59:25.297>
    created_at = <Date 2008-06-08.15:46:06.196>
    labels = ['invalid', 'performance']
    title = 'memory leak in random number generation'
    updated_at = <Date 2010-11-25.00:25:22.764>
    user = 'https://bugs.python.org/gtang'

    bugs.python.org fields:

    activity = <Date 2010-11-25.00:25:22.764>
    actor = 'amaury.forgeotdarc'
    assignee = 'none'
    closed = True
    closed_date = <Date 2008-06-08.16:59:25.297>
    closer = 'facundobatista'
    components = ['None']
    creation = <Date 2008-06-08.15:46:06.196>
    creator = 'gtang'
    dependencies = []
    files = ['10552']
    hgrepos = []
    issue_num = 3063
    keywords = []
    message_count = 11.0
    messages = ['67833', '67834', '67835', '67836', '67837', '67838', '67839', '67841', '67843', '67844', '122321']
    nosy_count = 4.0
    nosy_names = ['tim.peters', 'facundobatista', 'amaury.forgeotdarc', 'gtang']
    pr_nums = []
    priority = 'normal'
    resolution = 'not a bug'
    stage = None
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue3063'
    versions = ['Python 2.6']

    @gtang
    Copy link
    Mannequin Author

    gtang mannequin commented Jun 8, 2008

    #the following code consume about 800M memory, which is normal
    n = 100000000
    data = [0.0 for i in xrange(n)]

    #however, if I assign random number to data list, it will consume extra
    2.5G memory.
    from random import random
    for s in xrange(n):
    data[i] = random()

    #even if I delete data, only 800M memory released
    del data

    #call gc.collect() does not help, the extra 2.5G memory not released
    import gc
    gc.collect()

    only when I quit Python, the memory is released. Same effect if I use
    random number generator from numpy.
    Same effect even if I just say data[i] = atpof("1.26")
    I tried it in both Python 2.4 and 2.5 on linux 64bit and 32bit.

    @gtang gtang mannequin added the performance Performance or resource usage label Jun 8, 2008
    @facundobatista
    Copy link
    Member

    Confirmed the issue in the trunk right now:

    (the number between square brackets point to the 'top' information below)

    facundo@pomcat:~/devel/reps/python/trunk$ ./python 
    Python 2.6a3+ (trunk:64009, Jun  7 2008, 09:51:56) 
    [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    [1]
    >>> data = [0.0 for i in xrange(100000000)]
    [2]
    >>> from random import random
    >>> for i in xrange(100000000):
    ...     data[i] = random()
    ... 
    >>> 
    [3]

    The memory consumption:

     PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND    
    

    [1] 4054 facundo 20 0 5032 3264 1796 S 0.0 0.2 0:00.02 python
    [2] 4054 facundo 20 0 414m 384m 1888 S 0.0 19.1 0:17.72 python
    [3] 4054 facundo 20 0 1953m 1.4g 1952 S 0.0 70.7 1:01.40 python

    @tim-one
    Copy link
    Member

    tim-one commented Jun 8, 2008

    Strongly doubt this has anything to do with random number generation.
    Python maintains a freelist for float objects, which is both unbounded
    and immortal. Instead of doing "data[i] = random()", do, e.g., "data[i]
    = float(s)", and I bet you'll see the same behavior. That is, whenever
    you create a number of distinct float objects simultaneously alive, the
    space they occupy is never released (although it is available to be
    reused for other float objects). The use of random() here simply
    creates a large number of distinct float objects simultaneously alive.

    @gtang
    Copy link
    Mannequin Author

    gtang mannequin commented Jun 8, 2008

    I agree with Tim's comment. The problem's why these floats keep alive
    even after random() call returns. Then this becomes a garbage
    collection issue?

    @tim-one
    Copy link
    Member

    tim-one commented Jun 8, 2008

    They stayed alive simultaneously because you stored 100 million of them
    simultaneously in a list (data[]). If instead you did, e.g.,

    for i in xrange(100000000):
        x = random()

    the problem would go away -- then only two float objects are
    simultaneously alive at any given time (the "old" float in x stays
    alive until the "new" float created by random() replaces it).

    @gtang
    Copy link
    Mannequin Author

    gtang mannequin commented Jun 8, 2008

    Here I am confused. 100million floats in a list takes about 800M byte
    memory. This is acceptable.

    for i in xrange(100000000):
        data[i] = random()

    so it should be 800M plus a float returned by random(). But the problem
    is after this loop, except 800M bytes list, another >2G memory is
    occupied. And delete data list and call gc.collect() does not release
    these memory. I think you mean there are lots of floats used in random
    () call, they should be released after random() returned.

    @facundobatista
    Copy link
    Member

    So, 0.0 would be cached, and the 414m+384m would be from the list
    itself, right? I tried,

    >> data = [(1.0/i) for i in xrange(1,100000000)]

    And the memory consumption was the big one.

    Grant, the 800 MB is taken by ONE 0.0, and a list of zillion positions.

    Furthermore, I did:

    >>> for x in xrange(100000000):
    ...     i = random()

    And the memory didn't increase.

    Grant, take note that there's no gc issue, the numbers stay alive
    because the list itself is pointing to them.

    Closing this as invalid.

    @gtang
    Copy link
    Mannequin Author

    gtang mannequin commented Jun 8, 2008

    Facundo:

    I understand now. You mean every unique float number used will be an object
    in memory. And never been released until Python quit. Is there any way to
    reclaim these memory? We need 3G memory to create a list of 100million
    randum numbers.

    Thank you very much,
    Grant

    On Sun, Jun 8, 2008 at 11:59 AM, Facundo Batista <report@bugs.python.org>
    wrote:

    Facundo Batista <facundo@taniquetil.com.ar> added the comment:

    So, 0.0 would be cached, and the 414m+384m would be from the list
    itself, right? I tried,

    >>> data = [(1.0/i) for i in xrange(1,100000000)]

    And the memory consumption was the big one.

    Grant, the 800 MB is taken by ONE 0.0, and a list of zillion positions.

    Furthermore, I did:

    >>> for x in xrange(100000000):
    ... i = random()

    And the memory didn't increase.

    Grant, take note that there's no gc issue, the numbers stay alive
    because the list itself is pointing to them.

    Closing this as invalid.

    ----------
    resolution: -> invalid
    status: open -> closed


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue3063\>


    @facundobatista
    Copy link
    Member

    Grant,

    A float takes 64 bits. 100 million floats take 800 MB, *just* the
    floats. You're also building a list of 100 million places.

    Maybe you shouldn't be building this structure in memory?

    In any case, you should raise this issue in comp.lang.python, to get advice.

    Regards,

    @tim-one
    Copy link
    Member

    tim-one commented Jun 8, 2008

    Float objects also require, as do all Python objects, space to hold a
    type pointer and a reference count. So each float object requires at
    least 16 bytes (on most 32-bit boxes, 4 bytes for the type pointer, 4
    bytes for the refcount, + 8 bytes for the float). So 100 million float
    objects requires at least 1.6 billion bytes.

    It is a gc issue in the sense that the float-object free-list is both
    unbounded and immortal. For that matter, so is the int-object
    free-list. This has been discussed many times over the years on
    python-dev, but nobody yet has a thoroughly attractive alternative.

    @amauryfa
    Copy link
    Member

    For the record, this was finally fixed with bpo-2862: gc.collect() now clears the free-lists during the collection of the highest generation.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants