Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reducing tee() memory footprint #60598

Closed
cami mannequin opened this issue Nov 3, 2012 · 8 comments
Closed

Reducing tee() memory footprint #60598

cami mannequin opened this issue Nov 3, 2012 · 8 comments
Assignees
Labels
docs Documentation in the Doc dir type-feature A feature request or enhancement

Comments

@cami
Copy link
Mannequin

cami mannequin commented Nov 3, 2012

BPO 16394
Nosy @rhettinger, @phmc
Files
  • tee.py: Sample code for global-queue tee()
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2016-04-26.07:11:17.756>
    created_at = <Date 2012-11-03.15:15:19.373>
    labels = ['type-feature', 'docs']
    title = 'Reducing tee() memory footprint'
    updated_at = <Date 2016-04-26.07:11:17.755>
    user = 'https://bugs.python.org/cami'

    bugs.python.org fields:

    activity = <Date 2016-04-26.07:11:17.755>
    actor = 'rhettinger'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2016-04-26.07:11:17.756>
    closer = 'rhettinger'
    components = ['Documentation']
    creation = <Date 2012-11-03.15:15:19.373>
    creator = 'cami'
    dependencies = []
    files = ['27856']
    hgrepos = []
    issue_num = 16394
    keywords = []
    message_count = 8.0
    messages = ['174632', '174653', '174659', '174669', '174674', '254552', '254694', '264222']
    nosy_count = 5.0
    nosy_names = ['rhettinger', 'docs@python', 'python-dev', 'cami', 'pconnell']
    pr_nums = []
    priority = 'low'
    resolution = 'fixed'
    stage = 'needs patch'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue16394'
    versions = ['Python 2.7', 'Python 3.2', 'Python 3.3', 'Python 3.4']

    @cami
    Copy link
    Mannequin Author

    cami mannequin commented Nov 3, 2012

    The memory footprint of itertools.tee can be reduced substantially by using a shared buffer for the child iterators (see sample code). If local queues are desired for efficient threading support, they can be combined with a global queue, allowing to constrain the size of local queues.

    @cami cami mannequin added stdlib Python modules in the Lib dir performance Performance or resource usage labels Nov 3, 2012
    @serhiy-storchaka
    Copy link
    Member

    Why do you think that your implementation uses less memory? You have some measurements confirming it?

    @cami
    Copy link
    Mannequin Author

    cami mannequin commented Nov 3, 2012

    No. The sample code is a demonstration how to do it, it's by no means a full-fledged patch.

    The drawback of the current implementation is that if you tee n-fold, and then advance one of the iterators m times, it fills n queues with m references each, for a total of (n-1)*m references. The documentation explicitly mentions this is unfortunate.

    I only demonstrate that it is perfectly unnecessary to fill n separate queues, as you can use a single queue and index into it. Instead of storing duplicate references, you can store a counter with each cached item reference. Replacing duplicates by refcounts, it turns (n-1)*m references into 2*m references (half of which being the counters).

    Not in the demo code: you can improve this further by storing items in iterator-local queues when that iterator is the only one that still needs to return it, and in a shared queue with refcount when there are more of them. That way, you eleminate the overhead of storing (item, 1) instead of item for a fix-cost per-iterator.

    @serhiy-storchaka
    Copy link
    Member

    The documentation explicitly mentions this is unfortunate.

    Ah, here's the thing. The code in the documentation is provided only for the explanation. In fact tee() is implemented in the same way that you suggest (but uses a more efficient deque implementation). I propose to close this issue as invalid. Thanks for the suggestion.

    @cami
    Copy link
    Mannequin Author

    cami mannequin commented Nov 3, 2012

    Oh great! Then I can use it as-is. How about reassigning the issue to documentation (for clarifying the inefficiency warning)?

    @cami cami mannequin added docs Documentation in the Doc dir and removed stdlib Python modules in the Lib dir labels Nov 3, 2012
    @cami cami mannequin assigned docspython Nov 3, 2012
    @serhiy-storchaka serhiy-storchaka added type-feature A feature request or enhancement and removed performance Performance or resource usage labels Nov 3, 2012
    @RamchandraApte RamchandraApte mannequin changed the title Improving tee() memory footprint Reducing tee() memory footprint Nov 4, 2012
    @rhettinger rhettinger assigned rhettinger and unassigned docspython Apr 19, 2013
    @serhiy-storchaka
    Copy link
    Member

    Raymond, do you want to add a clarification to the documentation, or prefer just to close this issue?

    @rhettinger
    Copy link
    Contributor

    Unlike the other "equivalents" in the docs, this one is further away from the actual implementation. So, I think I should add a clarification that the example code is a "rough logical equivalent" but that actual implementation may have shared queues (Jython, PyPy, and IronPython are free to have their own implementation choices).

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 26, 2016

    New changeset f09306f9fa6f by Raymond Hettinger in branch 'default':
    Issue bpo-16394: Note the tee() pure python equivalent is only a rough approximation.
    https://hg.python.org/cpython/rev/f09306f9fa6f

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants