Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing cancelled async futures #2666

Merged
merged 18 commits into from Mar 29, 2023
Merged

Fixing cancelled async futures #2666

merged 18 commits into from Mar 29, 2023

Conversation

chayim
Copy link
Collaborator

@chayim chayim commented Mar 26, 2023

The following pull request contains changes to Async cancelling. Specifically, all cases where a command is sent to redis, via execute_command (and friends), and now wrapped with a shield. This builds on the helpful proxy provided in #2665.

This change covers the following usage patterns.

  • Async
  • Async pipeline
  • Async pubsub
  • Async sentinel
  • Async cluster
  • Async cluster pipelines

Reproducible tests currently exist for async, async pipeline, and async cluster. But note - the async cluster pipeline seems at best incorrect.

Feedback and changes are very much welcomed. Bonus points for assistance with testing,

chayim and others added 3 commits March 27, 2023 08:47
@codecov-commenter
Copy link

codecov-commenter commented Mar 27, 2023

Codecov Report

Patch coverage: 94.55% and project coverage change: +0.04 🎉

Comparison is base (326bb1c) 92.25% compared to head (bb65a25) 92.29%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2666      +/-   ##
==========================================
+ Coverage   92.25%   92.29%   +0.04%     
==========================================
  Files         115      116       +1     
  Lines       29788    29893     +105     
==========================================
+ Hits        27481    27590     +109     
+ Misses       2307     2303       -4     
Impacted Files Coverage Δ
tests/test_asyncio/test_cluster.py 97.48% <ø> (+0.10%) ⬆️
tests/test_asyncio/test_connection.py 98.06% <ø> (+1.00%) ⬆️
redis/asyncio/cluster.py 91.82% <81.81%> (+0.40%) ⬆️
redis/asyncio/client.py 92.34% <92.10%> (-0.64%) ⬇️
tests/test_asyncio/test_cwe_404.py 96.90% <96.90%> (ø)
tests/test_asyncio/test_pubsub.py 99.06% <100.00%> (-0.16%) ⬇️

... and 3 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

redis/asyncio/client.py Outdated Show resolved Hide resolved
redis/asyncio/client.py Outdated Show resolved Hide resolved
Copy link
Contributor

@sileht sileht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even in theory, this change should fix the issue. asyncio.CancelledError may never reaches this redispy code due to this bug: aio-libs/async-timeout#229.

@chayim
Copy link
Collaborator Author

chayim commented Mar 28, 2023

@fps7806 @sileht I saw that when doing some research on this a couple of days back. That's what lead down some of the paths.

IMHO the right thing to do on the security side - is this series of changes. Yes, this can lead to the side affect of a true connection issue, if the connection has been cancelled. I think, looking at that we can safely say that the use of the client should handle that, as opposed to the client.

We're going to have a look at some try/catch work here. If try/catch within the shield triggers properly - I think we're good for release.

Targeting tomorrow.

@chayim
Copy link
Collaborator Author

chayim commented Mar 28, 2023

Relevent: python/cpython#28149

@chayim
Copy link
Collaborator Author

chayim commented Mar 28, 2023

Taking a final look at things.. it appears the best way to solve this, it to properly break the pipeline, when this timing issue occurs. In breaking the pipeline only in this case a RuntimeError is raised.

This exception is ultimately raised from below, at least in the specific asyncio/streams code: /usr/lib/python3.10/asyncio/streams.py:616.

As a result, the shield needs to catch the RuntimeError - exception, and we can trigger the pipeline reset. This is necessary because the CancelledError, as alluded to previously, doesn't trigger - though should.

Our choices are either to:

  1. Raise the RuntimeError , and leave it up to the user. This strikes me as inelligant.

  2. Catch it (as done here). In this case the pipeline will reset properly, and return a None.

This final change, has that in it.

@chayim
Copy link
Collaborator Author

chayim commented Mar 29, 2023

Failures are the result of #2670 and unrelated.

@chayim chayim merged commit 5acbde3 into redis:master Mar 29, 2023
46 checks passed
dvora-h added a commit that referenced this pull request Mar 29, 2023
Co-authored-by: James R T <jamestiotio@gmail.com>
Co-authored-by: dvora-h <dvora.heller@redis.com>
dvora-h added a commit that referenced this pull request Mar 29, 2023
Co-authored-by: James R T <jamestiotio@gmail.com>
Co-authored-by: dvora-h <dvora.heller@redis.com>
dvora-h added a commit that referenced this pull request Mar 29, 2023
* Fixing cancelled async futures (#2666)

Co-authored-by: James R T <jamestiotio@gmail.com>
Co-authored-by: dvora-h <dvora.heller@redis.com>

* Version 4.4.4

* fix  tests

* linters

* fixing the test for the 4.4 state

* remove superfluous try-except

---------

Co-authored-by: Chayim <chayim@users.noreply.github.com>
Co-authored-by: James R T <jamestiotio@gmail.com>
Co-authored-by: Chayim I. Kirshen <c@kirshen.com>
@risicle
Copy link

risicle commented Mar 31, 2023

❤️ for the backports, folks.

(Oh, I see 4.3.7 has been tagged but not released yet)

@kristjanvalur
Copy link
Contributor

This change makes it impossible to cancel a command and retry, because the cancellation never happens and the single connection lock is held indefinetly. So, the unit-tests I devised for pr #2506 no longer work.

Is this really intentional? Not to cancel operation at all? Is there a tl/dr somewhere on why this is desirable?

@kristjanvalur
Copy link
Contributor

Anyone reading the code will be clueless. "Wait, it is creating a new Task for each operation? That adds latency. and then shielding it? Whatever for?" Even the unit tests have no comments on them on what specifically they are trying to fix.

@kristjanvalur
Copy link
Contributor

kristjanvalur commented Apr 4, 2023

This testing code now breaks:

        ready = asyncio.Event()

        async def helper():
            with pytest.raises(asyncio.CancelledError):
                # blocking pop
                ready.set()
                await r.brpop(["nonexist"])
  
          # if all is well, we can continue.  The following should not hang.
    # DEADLOCK, here:
            await r.set("status", "down")

        task = asyncio.create_task(helper())
        await ready.wait()
        await asyncio.sleep(0.01)
        # the task is now sleeping, lets send it an exception
        task.cancel()  

The task was interrupted with a CancelledError. but the inner task (shileded()) was not interrupted and continues to wait, hanging on to the _single_connection_lock and the second attempt at r.set() is now deadlocked.

Basically, task interruption has been nerfed... Is there a workaround?

@agronholm
Copy link
Contributor

For what reason does this shield the sub-task running _try_send_command_parse_response()? If I cancel a task that is performing a blocking command on Redis, the shielded subtask crashes with redis.exceptions.ConnectionError: Connection closed by server. when the client is closed.

In other words: Always manage your tasks! Having unmanaged tasks lying around is bad practice.

@kristjanvalur
Copy link
Contributor

kristjanvalur commented Apr 20, 2023

@agronholm I believe this PR was a mistake. It is an attempt to fix an issue caused by a regression that I caused last year, but without understanding the problem. I have had a fix for that regression lying around for four months without being accepted.

Please see pr #2695 which undoes this change and applies the correct fix to the original regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants