New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WaitIterators: only return a given object once #1290

Closed
wants to merge 6 commits into
base: master
from

Conversation

Projects
None yet
2 participants
@hashbrowncipher
Contributor

hashbrowncipher commented Oct 16, 2018

Depending on feedback, this PR may supersede #1288.

There are detailed commit messages describing what is going on. As a summary, this PR:

  • documents a pitfall of gevent.wait

Other than documentation, the alternative would be to break all calls to gevent.wait that include Events or Semaphores for count > 1, but this a) seems very backward-incompatible, and b) would require classifying all AbstractLinkables into resettable and non-resettable

  • adds a test that WaitIterators can be iterated using just next() and without iter(), and makes it pass.
  • fixes a memory leak alluded to by a comment in the WaitIterator code

I did this by decythonizing the WaitIterator class. The pro is that we can now use the Python __del__, which cython objects apparently cannot do. What are the cons?

One alternative I just thought of would be to split WaitIterator into a cython WaitIterator (no timeout functionality and therefore no possibility of leakage) and a Python subclass WaitIteratorWithTimeout.

  • makes WaitIterator only return unique objects

Right now a WaitIterator can return the same object many times, which I think is very unexpected and probably pathological. I'm curious for feedback on backward compatibility, but my thought is that the current behavior is so broken (pardon me) that nobody would be relying on it.

PS: I think it might be worthwhile to expose WaitIterator (or something like it) as a documented interface, with add and remove methods so that users can dynamically choose greenlets to watch.

PPS: I had a lot of fun with the test suite while composing these changes. I especially liked the part where it kept track of reference counts and uncollectible garbage for me. Once I got the tests passing (and not a moment before), I was delighted to have such a detailed test suite.

@hashbrowncipher

This comment has been minimized.

Contributor

hashbrowncipher commented Oct 16, 2018

Hmm the travis-ci build on my fork did not have those two failures.

@jamadden

This comment has been minimized.

Member

jamadden commented Oct 16, 2018

Thanks for the PR! I appreciate the effort put into this.

However, gevent does not use finalizers/__del__, for many of the same reasons that relying on open files to close themselves issues ResourceWarning: unclosed file, including:

  • They provide non-deterministic behaviour, running code at some arbitrary point in the future
  • Such behaviour differs wildly on different implementations of Python (reference counting vs full GC)
  • On some versions of CPython, objects with a __del__ are not eligible for cyclic garbage collection at all

hashbrowncipher added some commits Oct 14, 2018

Make WaitIterator always return unique objects.
Instead of keeping count of the number of objects to return, this has
WaitIterator use a set to keep track of the objects it watches.
Document a danger of `gevent.wait` with resettable objects.
When `gevent.wait()` is used on Event and Semaphore objects and `count` is not
1, those objects may be acquired (for Semaphore) or cleared (for Event) before
they are returned by `wait()`. Since `iwait()` is an iterable, it is not subject
to the same problem.
Add test that objects returned by gevent.iwait are iterators
Right now, this code hangs:

    next(gevent.iwait(greenlets))

This is because the WaitIterator initialization happens when __iter__ is called.
Before initialization, nothing is actually being waited-upon

    ERROR: test_wait_noiter (__main__.TestWaiting)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/home/vagrant/gevent/src/greentest/greentest/leakcheck.py", line 200, in wrapper
        return _RefCountChecker(self, method)(args, kwargs)
      File "/home/vagrant/gevent/src/greentest/greentest/leakcheck.py", line 184, in __call__
        self._run_test(args, kwargs)
      File "/home/vagrant/gevent/src/greentest/greentest/leakcheck.py", line 109, in _run_test
        self.function(self.testcase, *args, **kwargs)
      File "/home/vagrant/gevent/src/greentest/greentest/errorhandler.py", line 48, in wrapper
        return method(self, *args, **kwargs)
      File "/home/vagrant/gevent/src/greentest/greentest/errorhandler.py", line 35, in wrapper
        return method(self, *args, **kwargs)
      File "/home/vagrant/gevent/src/greentest/greentest/testcase.py", line 63, in wrapper
        return method(self, *args, **kwargs)
      File "src/greentest/test__wait.py", line 12, in test_wait_noiter
        ready = next(gevent.iwait((sem1, sem2)))
      File "src/gevent/_hub_primitives.py", line 149, in gevent.__hub_primitives._WaitIterator.__next__
        raise
      File "src/gevent/_hub_primitives.py", line 140, in gevent.__hub_primitives._WaitIterator.__next__
        item = self._waiter.get()
      File "src/gevent/_waiter.py", line 192, in gevent.__waiter.MultipleWaiter.get
        Waiter.get(self)
      File "src/gevent/_waiter.py", line 151, in gevent.__waiter.Waiter.get
        return self.hub.switch()
      File "src/gevent/_greenlet_primitives.py", line 59, in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch
        def switch(self):
      File "src/gevent/_greenlet_primitives.py", line 59, in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch
        def switch(self):
      File "src/gevent/_greenlet_primitives.py", line 63, in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch
        return _greenlet_switch(self) # pylint:disable=undefined-variable
      File "src/gevent/__greenlet_primitives.pxd", line 35, in gevent.__greenlet_primitives._greenlet_switch
        return PyGreenlet_Switch(self, NULL, NULL)
    LoopExit: This operation would block forever
        Hub: <QuietHub '' at 0x7fc8fce60520 epoll default pending=0 ref=0 fileno=3 thread_ident=0x7fc8fffbe700>
        Handles:
    []
Add a test showing that WaitIterator will return the same object many…
… times

If someone calls `gevent.iwait([event1,...79 other objects])`, the iterator
will yield 80 objects, and (based on how frequently each of the objects becomes
ready), could potentially yield `event1` every time. I imagine this would catch
most users off-guard, especially because the number of other objects waited upon
(79, in this case) determines the number of times iwait will yield.

    FAIL: test_wait_unique (__main__.TestWaiting)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/home/vagrant/gevent/src/greentest/greentest/leakcheck.py", line 200, in wrapper
        return _RefCountChecker(self, method)(args, kwargs)
      File "/home/vagrant/gevent/src/greentest/greentest/leakcheck.py", line 184, in __call__
        self._run_test(args, kwargs)
      File "/home/vagrant/gevent/src/greentest/greentest/leakcheck.py", line 109, in _run_test
        self.function(self.testcase, *args, **kwargs)
      File "/home/vagrant/gevent/src/greentest/greentest/errorhandler.py", line 48, in wrapper
        return method(self, *args, **kwargs)
      File "/home/vagrant/gevent/src/greentest/greentest/errorhandler.py", line 35, in wrapper
        return method(self, *args, **kwargs)
      File "/home/vagrant/gevent/src/greentest/greentest/testcase.py", line 63, in wrapper
        return method(self, *args, **kwargs)
      File "src/greentest/test__wait.py", line 19, in test_wait_unique
        self.assertEqual(waited_objs, set([sem1, sem2]))
    AssertionError: Items in the second set but not the first:
    <gevent.__semaphore.Semaphore object at 0x7fa576932c50>

@hashbrowncipher hashbrowncipher force-pushed the hashbrowncipher:waititerator_unique branch from 784ff00 to 2d219ef Oct 16, 2018

@hashbrowncipher

This comment has been minimized.

Contributor

hashbrowncipher commented Oct 16, 2018

Ok, I've rebased the __del__ commits out of the branch.

# XXX: If iteration doesn't actually happen, we
# could leave these links around!
if not self._begun:
self._begun = True
self._objects = set(objects)

This comment has been minimized.

@jamadden

jamadden Oct 16, 2018

Member

I don't think we can count on objects being hashable. Overall the simpler PR in #1288 is probably better.

@hashbrowncipher hashbrowncipher changed the title from Improvements to WaitIterators to WaitIterators: only return a given object once Oct 16, 2018

@hashbrowncipher

This comment has been minimized.

Contributor

hashbrowncipher commented Oct 17, 2018

@jamadden now that #1292 and #1288 are merged, I'm curious to hear more about why we should not enforce uniqueness of objects returned by iwait().

In #1292, you mentioned "Unlinking objects as they become ready breaks the use case that deliberately resets them and lets them become ready, up to count number of times (think of queue processors)." It sounds like you're thinking of a use-case like:

    def test_iwait_multiple(self):
        sem1 = Semaphore(0)
        sem2 = Semaphore(0)

        def release():
            while True:
                sem1.release()
                gevent.idle()

        let = gevent.spawn(release)
        count = 0
        with gevent.iwait((sem1, sem2), count=20):
            sem1.acquire()
            count += 1

        self.assertEqual(count, 20)

But to make this work would require both a) the user knows the number of iterations they expect, in advance of calling iwait() and b) the number of iterations is smaller than the number of objects (because WaitIterator automatically reduces any count greater than len(objects) to len(objects). So I'm having trouble seeing how a use like a queue processor could benefit from this behavior, and the above test-case fails against the code now in master.

You've also mentioned that we shouldn't require that objects passed to iwait be hashable. All of the objects which I can think people might use in this context are hashable, including Popen, Semaphore, Event, greenlet and Greenlet. And anyone who makes their own class implementing rawlink() will automatically get hashability based on the object's id(), right?

I'm thinking back to my postscript in the PR description. If it isn't possible to modify iwait() to only return unique objects, would it be worthwhile to introduce a new interface, where users can add and remove unique objects to be waited upon simultaneously?

@hashbrowncipher

This comment has been minimized.

Contributor

hashbrowncipher commented Oct 17, 2018

Oops, the test case should be:

    def test_iwait_multiple(self):
        sem1 = Semaphore(0)
        sem2 = Semaphore(0)

        def release():
            while True:
                sem1.release()
                gevent.idle()

        let = gevent.spawn(release)
        count = 0
        with gevent.iwait((sem1, sem2), count=20) as iterator:
            for i in iterator:
                i.acquire()
                count += 1

        self.assertEqual(count, 20)
        let.kill()

And it fails with: AssertionError: 2 != 20

@jamadden

This comment has been minimized.

Member

jamadden commented Oct 17, 2018

I'm curious to hear more about why we should not enforce uniqueness of objects returned by iwait().
You've also mentioned that we shouldn't require that objects passed to iwait be hashable.

In a nutshell, those new restrictions are potentially breaking changes, and I don't really see much benefit to them that can't be accomplished today. E.g., if users want to pass in a set of objects, they can already do that. Additionally, we'd be adding overhead for the common case of a static list of objects that the user knows by inspection are already unique.

One thing I've learned repeatedly maintaining gevent and other projects is that if something is allowed, somebody out there is probably taking advantage of it. Whether or not it's explicitly documented. So making changes that tighten contracts, implicitly or explicitly, is a careful balancing act between a lot of factors. (I freely admit I don't always get that balance right.) (One quick semi-related example: people sometimes pass what are supposed to be positional parameters as keyword arguments, even to internal functions; changing a positional parameter name can wind up breaking code. There's a whole part of the test suite that makes sure that positional parameter names match the standard lib because of that.)

And anyone who makes their own class implementing rawlink() will automatically get hashability based on the object's id(), right

Not if they implement __eq__ on Python 3; they must explicitly also implement __hash__. Otherwise you get a TypeError

If it isn't possible to modify iwait() to only return unique objects, would it be worthwhile to introduce a new interface, where users can add and remove unique objects to be waited upon simultaneously?

I think the primitive building blocks for that are already public and that could be done today if it was really needed. To consider extending gevent's interface in that direction I'd need to see real use-cases, ideally in different codebases, to suggest that it's a common enough need that can't be easily met otherwise.

I can imagine that there might be a use-case for optionally unlinking objects as they are yielded, but that seems like it's rare, and if that's a problem it might be that the design is too complicated (i.e., i-waiting on and setting the same event from multiple greenlets at the same time is some sort of complicated multi-fan-in-fan-out thing that can probably be simplified). But I haven't thought too much about it.

@hashbrowncipher

This comment has been minimized.

Contributor

hashbrowncipher commented Oct 17, 2018

Thanks for the detailed explanation. I'd like to clarify two points and suggest one idea.

Additionally, we'd be adding overhead for the common case of a static list of objects that the user knows by inspection are already unique.

The problem raised in this PR is not related to the input of iwait, but rather its output. The user may know plenty about their input, but the goal of this PR was to ensure that iwait produces sane output by not allowing one (e.g.) Semaphore to be emitted 20 times, while the other 19 items are emitted not at all.

Not if they implement eq on Python 3; they must explicitly also implement hash. Otherwise you get a TypeError

Ooh, I didn't know that. Thanks for informing me. As a workaround (and assuming we wanted to go ahead with any version of this PR), we could potentially use a dict() indexed by id(obj)

Thanks again for your responses. If any of this is news for you, I'd be happy to continue iterating on this PR. If otherwise, it's probably time to close discussion.

@jamadden

This comment has been minimized.

Member

jamadden commented Oct 17, 2018

Thanks for a great discussion! It's not clear to me right now that these changes are something gevent should move forward with, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment