Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-38267: Add thread timeout parameter to loop.shutdown_default_executor() #16360

Closed

Conversation

aeros
Copy link
Contributor

@aeros aeros commented Sep 24, 2019

@aeros
Copy link
Contributor Author

aeros commented Sep 24, 2019

Adding documentation changes and Misc/NEWS next. Before doing so, I wanted to make sure the implementation and API were correct.

Lib/asyncio/base_events.py Outdated Show resolved Hide resolved
@@ -66,6 +66,8 @@
# Maximum timeout passed to select to avoid OS limitations
MAXIMUM_SELECT_TIMEOUT = 24 * 3600

# Default timeout for joining each thread in `shutdown_default_executor()`
THREAD_JOIN_TIMEOUT = 300
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if this name is suitable, or if anyone has a better suggestion.

@1st1 1st1 added needs backport to 3.8 only security fixes and removed needs backport to 3.8 only security fixes labels Sep 24, 2019
Lib/asyncio/base_events.py Outdated Show resolved Hide resolved
@@ -21,6 +25,10 @@ def run(main, *, debug=False):
It should be used as a main entry point for asyncio programs, and should
ideally only be called once.

Each thread within the threadpool is given a timeout duration of 5
minutes. If the thread hasn't finishing joining within that duration,
Copy link
Contributor Author

@aeros aeros Sep 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@1st1 While writing the documentation, I was thinking that @asvetlov's suggestion to add timeout as parameter for asyncio.run() might be beneficial. Although it would add an additional knob for users to adjust, forcing them to strictly use a 5 minute timeout duration seems a bit restrictive.

Do you think it might be useful to give users the option to adjust the timeout duration by adding a timeout parameter to asynio.run()? Since you regularly interact with daily users of asyncio, you might have a better idea of whether or not that would be useful.

Copy link
Member

@1st1 1st1 Sep 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that adding knobs is only justified when there's request for that from users. In this particular case it's a bit tricky.

The way I think about it is that it's easy to implement whatever timeout is necessary with the following pattern:

asyncio.run(main())

# versus

async def runner():
  try:
    return await main()
  finally:
    loop = asyncio.get_running_loop()
    await loop.shutdown_default_executor(3600)

asyncio.run(runner())

The argument for this pattern is that in more or less advanced asyncio applications there's always a "runner()" like in the above. It usually sets up signal handlers, manages global state, etc. asyncio.run() in those cases is simply providing a set of reasonable defaults on how you run your event loop and how you shut it down. And 5 minutes is a reasonable default in this case, I guess.

Long story short, I'd go with this design and learn from our users if we need to make it configurable.

This is just my opinion on this and I'd love to hear what @asvetlov thinks about it.

Copy link
Contributor Author

@aeros aeros Sep 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument for this pattern is that in more or less advanced asyncio applications there's always a "runner()" like in the above.

Ah, I actually hadn't thought as much about how relatively simple it would be for the users to create their own runner. I wonder if it would be worthwhile to document this pattern somewhere in asyncio-task.rst, for those who desire further customization.

And 5 minutes is a reasonable default in this case, I guess.

Yeah 5 minutes seems reasonable for most purposes. If it's an issue, we can easily adjust it by modifying the constant.

Long story short, I'd go with this design and learn from our users if we need to make it configurable.

Good point. It might be a better idea to leave out the parameter initially and then add it in later if the users request it. It would be easy to add the new parameter to the API for asyncio.run() if it's desired, but it's incredibly difficult to remove parameters in the future.

This is just my opinion on this and I'd love to hear what @asvetlov thinks about it.

Personally I have no issue with either of the options, I think both of them would be viable. After reading your latest response, I'm leaning more in favor of keeping the current iteration though.

The more I've been involved with API design, I've increasingly come to realize that there's frequently some degree of disconnect between what the users actually want and what the developers think the users want. It can definitely be worthwhile to start with a minimalist approach and add features based on user feedback, rather than making assumptions. Making too many assumptions can easily lead to feature bloat.

@aeros aeros requested a review from 1st1 September 25, 2019 00:44
@aeros aeros added the type-feature A feature request or enhancement label Sep 25, 2019
@aeros
Copy link
Contributor Author

aeros commented Sep 25, 2019

I can work on resolving the conflicts later tonight or tomorrow after someone gets the chance to look over #16403. The conflicts were probably created from the recently merged #16337. They should be fairly straightforward to resolve though.

@1st1
Copy link
Member

1st1 commented Oct 3, 2019

Do we really have it configured to wait for 600 seconds? Wasn't it 5 seconds?

I think it might be more straightforward from a design standpoint to have a specific timeout per-thread.

How? It's a threadpool. From the standpoint of the user all threads in the pool are equal.

I also think you're confused here. When you join a threadpool with an N seconds timeout it will simultaneously join all of its threads, giving all of them N seconds. The only difference is if we join all threads simultaneously or join them one by one. Doing the latter isn't a commonly accepted practice, and in reality is just wasteful.

Each thread within the threadpool is given a timeout duration of 5
minutes. If the thread hasn't finishing joining within that duration,
the lock is released and the thread is terminated.

@asvetlov will Python actually terminate a thread if we try to join it and it hits a timeout? IIRC the thread will simply continue to run, no?

@aeros
Copy link
Contributor Author

aeros commented Oct 3, 2019

@1st1

Do we really have it configured to wait for 600 seconds? Wasn't it 5 seconds?

Oops, 600 was a typo, I mean 300 seconds/5 minutes (which is what it currently is). In the original comment, you had suggested 5 minutes:

We can have a global asyncio "reasonable" timeout for shutting down threads -- say 5 minutes

will Python actually terminate a thread if we try to join it and it hits a timeout?

No, not from thread.join(timeout), I was just looking over the implementation yesterday in threading.py. When timeout is not None for thread.join(), the join operation stops blocking once the duration has reached, but the thread remains alive. The way to check if the timeout was reached is with thread.is_alive(). I had misunderstood this part when I initially opened the PR.

Instead, we would have to do something like this, to shutdown the ThreadPoolExecutor when the thread did not join within the timeout duration. I think a warning of some form should be emitted, but that's optional:

    async def shutdown_default_executor(self, timeout=None):
        self._executor_shutdown_called = True
        if self._default_executor is None:
            return
        future = self.create_future()
        thread = threading.Thread(target=self._do_shutdown, args=(future,))
        thread.start()
        try:
            await future
        finally:
            thread.join(timeout)
        if (thread.is_alive()):
            # Timeout was reached before threads finished joining
            # emit warning?
            self._default_executor.shutdown(wait=False)

    def _do_shutdown(self, future):
        try:
            self._default_executor.shutdown(wait=True)
            self.call_soon_threadsafe(future.set_result, None)
        except Exception as ex:
            self.call_soon_threadsafe(future.set_exception, ex)

@aeros
Copy link
Contributor Author

aeros commented Oct 3, 2019

@1st1:

I also think you're confused here. When you join a threadpool with an N seconds timeout it will simultaneously join all of its threads, giving all of them N seconds.

I was under the impression that the threads were being joined concurrently rather than in parallel. Meaning that there's cooperative multitasking being done with overlapping time periods from the threads joining, but each thread is not being joined simultaneously with true parallelism. Am I misunderstanding something?

@1st1
Copy link
Member

1st1 commented Oct 3, 2019

I was under the impression that the threads were being joined concurrently rather than in parallel.

Oh well, in CPython it's grey area :) Most of the time all threads work cooperatively, i.e. that's how the interpreter executes them. The GIL does serialize how opcodes are executed forcing threaded code to cooperate. But thinking about threads in Python in that way isn't productive. Just like it's not really productive to think about parallel vs concurrent threading in languages without the GIL -- ideally your code should be written in such a way that it doesn't care (i.e. access to shared data is properly guarded and serialized).

Trying to give all threads an equal opportunity to finalize & close isn't the job of asyncio code though. The GIL is a limitation that imposes certain restrictions on how threads work in CPython, but we shouldn't care. We should assume that threads in Python work in parallel otherwise it would be insanely hard to reason about them. Moreover, what if you write code that assumes that threads in CPython run cooperatively and then we ship CPython without GIL? :) [that won't happen anytime soon]

Therefore I see no need for a loop like [th.join(timeout=N) for th in executor.threads]. Simply waiting on executor.join_all_threads(timeout=N) (the code snippets are figurative) is enough.

I hope I made myself clear here, feel free to ask more questions.

@aeros
Copy link
Contributor Author

aeros commented Oct 3, 2019

@1st1

But thinking about threads in Python in that way isn't productive.

Ah, I might have had the difference between concurrency vs parallelism a bit excessively drilled into my head then over the last year. It's been a strong area of focus for me recently and I've tried to ensure that I don't mix up the two areas, but perhaps it's been to a fault.

We should assume that threads in Python work in parallel otherwise it would be insanely hard to reason about them.

I can definitely understand this point from the perspective of structuring the code, but from the real-time perspective of the actual amount of time it will take to finish joining the threads, doesn't the amount of time scale with each additional thread that needs to be joined?

I.E. if you have a thread X that takes N amount of time to finishing joining, increasing the count of X to join in the thread pool will increase N. From my understanding, it's not exactly linear, but the total amount of time does increase with each additional thread to join in the current implementation of the GIL. But perhaps this is an unneeded layer of complexity that most users wouldn't consider when specifying a timeout, so from an API perspective it makes more sense for the timeout to be for the thread pool?

I think we should think about it somewhat for creating a reasonable default timeout constant though. I'm not sure if the currently 300 seconds is enough, but if you think it will fit most general use cases for asyncio.run() and we emit some form of warning when it times out it should be fine. This would make it easier for easier to realize that they may have to use their own custom implementation for edge cases.

Moreover, what if you write code that assumes that threads in CPython run cooperatively and then we ship CPython without GIL? :) [that won't happen anytime soon]

I certainly hope that will happen some day! It's actually been a goal of mine to be able to contribute towards that effort, believe it or not. I still have a lot to learn about the GIL and the underlying interpreter implementation before I could even consider trying to help make the GIL unneeded, but I hope that I'm knowledgeable enough to do so at some point in time. (:

I hope I made myself clear here, feel free to ask more questions.

Thanks, this is a fairly complex area and there's definitely a lot that I have to learn. I think that I understand it from an abstract perspective, but sometimes it can be difficult to separate details from the categories of:

  1. Productive - what developers/users of asyncio are/should be thinking about

  2. Underlying implementation - how it actually works, such as GIL limitations

  3. Theoretical - abstract ideas, such as concurrency vs parallelism

Optimally while developing asyncio, I should have an understanding of 2 and 3 while providing an API that focuses on 1. But it's certainly easier said than done sometimes. Do you typically have a strategy for not getting lost in the details and focusing on the productive user experience?

@1st1
Copy link
Member

1st1 commented Oct 3, 2019

Do you typically have a strategy for not getting lost in the details and focusing on the productive user experience?

No strategy, just experience of dealing with code and people. But that's highly subjective and thus I can be wrong. That's why you always want to gather opinions from multiple engineers. In the case of asyncio those multiple engineers is @asvetlov :)

Lib/asyncio/base_events.py Outdated Show resolved Hide resolved
if (thread.is_alive()):
warnings.warn("The ThreadPoolExecutor did not finishing joining"
f"its threads within {timeout} seconds.",
RuntimeWarning)
Copy link
Contributor Author

@aeros aeros Oct 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@1st1 Couple of questions regarding the warning:

  1. Should it be a RuntimeWarning or ResourceWarning?

  2. Should I add a test to test_asyncio/test_base_events.py to make sure the the warning is triggered in an appropriate scenario?

Edit: Regarding the test, I don't think there's a way to test for this without creating a dangling thread due to the nature of the warning. Personally, I don't think it's worth adding an intentional resource leak to the asyncio regression tests just to test the functionality of a single warning. If there's another way to test for the warning without creating a dangling thread, let me know.

@aeros aeros requested a review from 1st1 October 5, 2019 22:13
Set the stacklevel to 2 and fix a spacing issue in the message
@aeros
Copy link
Contributor Author

aeros commented Jan 20, 2020

@1st1 Is there anything that needs to be changed in this PR?

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for two style nits. I'll try to merge my suggestions.

Lib/asyncio/base_events.py Outdated Show resolved Hide resolved
Doc/library/asyncio-task.rst Outdated Show resolved Hide resolved
@gvanrossum
Copy link
Member

@kumaraditya303 Do you have time to fix the merge conflicts?

@kumaraditya303
Copy link
Contributor

Do you have time to fix the merge conflicts?

I don't have permission to push to this PR so I created #97561.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting merge topic-asyncio type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants