Add StreamManager #6199

verult · 2023-07-15T02:25:48Z

The full StreamManager implementation prototyped in #6145.

Some tasks are deferred:

Add the ability to close the request iterator in order to close the stream. Being able to stop as soon as receiving the signal rather than waiting for the next request is non-trivial. Since it's not core to streaming functionality, this will be a follow-up soon after MVP.
Other TODOs are less critical.

This PR depends on the latest version of duet which has not been released. It contains fixes for AwaitableFuture cancellations.

I'm planning to add a README with design and sequence diagrams. Please let me know if you'd like to hold off on reviewing this PR until that's ready.

Only the last commit needs to be reviewed. All other commits are from #6190.

@maffoo @wcourtney

verult · 2023-07-15T02:26:29Z

Btw, if there's a way to set a PR as diffbase, please let me know!

verult · 2023-07-15T02:29:02Z

My recommendation is to review methods with long docstrings first - they are the most important pieces.

verult · 2023-07-17T23:14:30Z

Fixed a race involving StreamManager.stop(): it cancels the stream coroutine, but cancellation is handled in the asyncio thread, and the duet thread does not wait for the cancellation to finish. I fixed it by handling cancellation as a duet future callback.

The lingering cancellation failure is because the latest fixes in duet, which contains cancellation fixes, is not yet released.

wcourtney

First pass % tests. Really great overall, mostly nits and edge cases to consider. Thanks for all the attention to detail!

cirq-google/cirq_google/engine/stream_manager.py

wcourtney · 2023-07-18T00:21:48Z

cirq-google/cirq_google/engine/stream_manager.py

+
+    def __init__(self, grpc_client: quantum.QuantumEngineServiceAsyncClient):
+        self._grpc_client = grpc_client
+        self._request_queue: asyncio.Queue = asyncio.Queue()


It looks like lines 95-100 duplicate the contents of _reset(). Could we just call that instead?

I used that approach at first, but I got an error somewhere. I think it was a mypy error about self._response_demux being uninitialized before being read. Seems like assignments in __init__() is regarded as special here.

cirq-google/cirq_google/engine/stream_manager.py

wcourtney · 2023-07-18T01:14:32Z

cirq-google/cirq_google/engine/stream_manager.py

+                    # TODO(#5996) add exponential backoff
+                raise e
+
+            # Either when this request is canceled or the _manage_stream() loop is canceled.


either when this request is canceled or...

What's the client interface to cancel a request?

Clarified in the class and submit() docstrings.

wcourtney · 2023-07-18T01:18:20Z

cirq-google/cirq_google/engine/stream_manager.py

+
+    If it is, returns the next stream request to send upon retry. Otherwise, raises an error.
+    """
+    if error.code == Code.PROGRAM_DOES_NOT_EXIST:


Do we have Python 3.10 for match/case statements? https://docs.python.org/3/whatsnew/3.10.html#pep-634-structural-pattern-matching

Good to know this feature is in Python now! Unfortunately Cirq still supports 3.9.

wcourtney · 2023-07-18T01:21:07Z

cirq-google/cirq_google/engine/stream_manager.py

+                parent=project_name, quantum_program=program, quantum_job=job
+            ),
+        )
+        create_job_request = quantum.QuantumRunStreamRequest(


Does this get used as a retry request type somewhere? I didn't find it ctrl+f'ing around.

It's used in _decide_retry_request_or_raise() when there is a StreamError.

wcourtney · 2023-07-18T01:28:34Z

cirq-google/cirq_google/engine/stream_manager.py

+                current_request.create_quantum_program_and_job.quantum_program.name
+            )
+    elif error.code == Code.JOB_DOES_NOT_EXIST:
+        if 'get_quantum_result' in current_request:


Can you talk through this error mechanism? The best cause I can imagine is that the job was deleted, but then we shouldn't create a new one.

If that's true, then we wouldn't call create_job_request anywhere, which also makes thePROGRAM_DOES_NOT_EXIST error handling obsolete.

Good point. What about the case where CreateProgramAndJobRequest isn't sent properly before the stream breaks? A GetResultRequest will be issued but the job doesn't exist. Unfortunately this is indistinguishable from the case where the job was deleted. I would say it's OK for the job to be recreated because if a user truly wants to stop a job execution, they could cancel the future returned by submit(). WDYT?

maffoo · 2023-07-26T12:14:07Z

@verult, can you rebase on or merge with master to resolve the conflicts? `ResponseDemux is still showing up in the diff but that was already merged in #6190.

verult · 2023-07-26T18:28:08Z

Addressed Will's comments and rebased with master.

wcourtney

LGTM % some smallish suggestions. Thanks!!! 🙌

wcourtney · 2023-07-27T00:08:07Z

cirq-google/cirq_google/engine/stream_manager.py

+            google.api_core.exceptions.GoogleAPICallError if the stream breaks with a non-retryable
+                error.
+        """
+        if self._manage_stream_loop_future is None:


It doesn't look like this is updated?

cirq-google/cirq_google/engine/stream_manager.py

…loop_future

maffoo

Some minor comments, but mostly looks good. I still have some concerns about both the duet and asyncio threads mutating the StreamManager without synchronization, but I think we can address those issues in future PRs.

cirq-google/cirq_google/engine/stream_manager.py

maffoo · 2023-08-02T12:55:05Z

cirq-google/cirq_google/engine/stream_manager.py

+
+        There is at most a single instance of this coroutine running.
+        """
+        self._request_queue = asyncio.Queue()


This coroutine runs in the AsyncioExecutor in a separate thread, so I think it should not touch self, since that is also accessed separately in the duet thread. Or at the very least, any access to these fields should be protected with a lock so we don't have multiple threads accessing and modifying self without synchronization.

Thanks a lot for the catch. I also noticed that response_demux is written to in both the duet and asyncio threads. With that and your comment here, I'm getting a better understanding why having local scope is preferred, especially in a multithreaded environment. Thanks!

maffoo · 2023-08-02T12:59:49Z

cirq-google/cirq_google/engine/stream_manager.py

+        while self._request_queue is None:
+            # Wait for the stream coroutine to start.
+            # Ignoring coverage since this is rarely triggered.
+            await asyncio.sleep(1)  # coverage: ignore


This feels awkward. I think a better interface would be not to set _request_stream to a value or None and then check for None here, but rather to use a duet.AwaitableFuture[asyncio.Queue[...]]. Then can just await this future here. This will avoid extra time from an arbitrary pause here, and it also helps with the problem of _manage_stream mutating self, because it can just complete a future instead, which is safe to do from a different thread.

Using duet.AwaitableFuture failed with "task got bad yield", probably because it's written and read in the asyncio thread. I could try resolving this issue again after moving the queue local to the asyncio scope in a follow-up PR.

cirq-google/cirq_google/engine/stream_manager.py

maffoo · 2023-08-02T13:11:12Z

cirq-google/cirq_google/engine/stream_manager.py

+                # TODO(#5996) Consider moving the request future cancellation logic into a future
+                # done callback, so that the the cancellation caller can wait for it to complete.
+                # TODO(#5996) Check the condition that response_future is not done before
+                # cancelling, once request cancellation is moved to a callback.


I don't think we want to use callbacks just to get the semantics that we can wait for server-side cancellation. If we want such semantics, I think it would be better to subclass AwaitableFuture so that the future completion can be delayed compared to the cancel call. One has to be careful though when the actual cancellation involves sending a request to the server, since that is always a fallible operation and we can't wait arbitrarily long.

I don't think we want to use callbacks just to get the semantics that we can wait for server-side cancellation. If we want such semantics, I think it would be better to subclass AwaitableFuture so that the future completion can be delayed compared to the cancel call.

Just for my understanding: why do you think the latter approach is better? Is it because cancel() is a blocking call? I can also see that executing the cancellation routine in the same event loop as the main routine that fulfills the future would be very convenient.

maffoo · 2023-08-02T13:18:08Z

cirq-google/cirq_google/engine/stream_manager.py

+def _to_create_job_request(
+    create_program_and_job_request: quantum.QuantumRunStreamRequest,
+) -> quantum.QuantumRunStreamRequest:
+    """Converted the QuantumRunStreamRequest from a CreateQuantumProgramAndJobRequest to a
+    CreateQuantumJobRequest.
+    """
+    program = create_program_and_job_request.create_quantum_program_and_job.quantum_program
+    job = create_program_and_job_request.create_quantum_program_and_job.quantum_job
+    return quantum.QuantumRunStreamRequest(
+        parent=create_program_and_job_request.parent,
+        create_quantum_job=quantum.CreateQuantumJobRequest(parent=program.name, quantum_job=job),
+    )


nit: call the arg just request to ease readability (the rest of the name duplicates info from the type)

Suggested change

def _to_create_job_request(

create_program_and_job_request: quantum.QuantumRunStreamRequest,

) -> quantum.QuantumRunStreamRequest:

"""Converted the QuantumRunStreamRequest from a CreateQuantumProgramAndJobRequest to a

CreateQuantumJobRequest.

"""

program = create_program_and_job_request.create_quantum_program_and_job.quantum_program

job = create_program_and_job_request.create_quantum_program_and_job.quantum_job

return quantum.QuantumRunStreamRequest(

parent=create_program_and_job_request.parent,

create_quantum_job=quantum.CreateQuantumJobRequest(parent=program.name, quantum_job=job),

)

def _to_create_job_request(

request: quantum.QuantumRunStreamRequest,

) -> quantum.QuantumRunStreamRequest:

"""Converts the QuantumRunStreamRequest from a CreateQuantumProgramAndJobRequest to a

CreateQuantumJobRequest.

"""

program = request.create_quantum_program_and_job.quantum_program

job = request.create_quantum_program_and_job.quantum_job

return quantum.QuantumRunStreamRequest(

parent=request.parent,

create_quantum_job=quantum.CreateQuantumJobRequest(parent=program.name, quantum_job=job),

)

I had this name here because the type is actually QuantumRunStreamRequest.

maffoo · 2023-08-02T13:19:02Z

cirq-google/cirq_google/engine/stream_manager.py

+def _to_get_result_request(
+    create_program_and_job_request: quantum.QuantumRunStreamRequest,
+) -> quantum.QuantumRunStreamRequest:
+    """Converted the QuantumRunStreamRequest from a CreateQuantumProgramAndJobRequest to a
+    GetQuantumResultRequest.
+    """
+    job = create_program_and_job_request.create_quantum_program_and_job.quantum_job
+    return quantum.QuantumRunStreamRequest(
+        parent=create_program_and_job_request.parent,
+        get_quantum_result=quantum.GetQuantumResultRequest(parent=job.name),
+    )


Here too, I'd suggest calling the arg just request

Suggested change

def _to_get_result_request(

create_program_and_job_request: quantum.QuantumRunStreamRequest,

) -> quantum.QuantumRunStreamRequest:

"""Converted the QuantumRunStreamRequest from a CreateQuantumProgramAndJobRequest to a

GetQuantumResultRequest.

"""

job = create_program_and_job_request.create_quantum_program_and_job.quantum_job

return quantum.QuantumRunStreamRequest(

parent=create_program_and_job_request.parent,

get_quantum_result=quantum.GetQuantumResultRequest(parent=job.name),

)

def _to_get_result_request(

request: quantum.QuantumRunStreamRequest,

) -> quantum.QuantumRunStreamRequest:

"""Converts the QuantumRunStreamRequest from a CreateQuantumProgramAndJobRequest to a

GetQuantumResultRequest.

"""

job = request.create_quantum_program_and_job.quantum_job

return quantum.QuantumRunStreamRequest(

parent=request.parent,

get_quantum_result=quantum.GetQuantumResultRequest(parent=job.name),

)

…t/stream-manager-asyncio-heavy

codecov · 2023-08-04T00:18:34Z

Codecov Report

Patch coverage: 99.41% and no project coverage change.

Comparison is base (701538c) 97.37% compared to head (dec6705) 97.37%.
Report is 1 commits behind head on master.

❗ Current head dec6705 differs from pull request most recent head ef1b623. Consider uploading reports for the commit ef1b623 to get more accurate results

Additional details and impacted files

@@           Coverage Diff            @@
##           master    #6199    +/-   ##
========================================
  Coverage   97.37%   97.37%            
========================================
  Files        1116     1116            
  Lines       95708    96042   +334     
========================================
+ Hits        93191    93520   +329     
- Misses       2517     2522     +5

Files Changed	Coverage Δ
cirq-google/cirq_google/engine/stream_manager.py	`98.36% <98.05%> (-1.64%)`	⬇️
...q-google/cirq_google/engine/stream_manager_test.py	`100.00% <100.00%> (ø)`

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…t/stream-manager-asyncio-heavy

verult requested a review from maffoo July 15, 2023 02:25

verult requested review from wcourtney, a team, vtomole and cduck as code owners July 15, 2023 02:25

CirqBot added the size: XL lines changed >1000 label Jul 15, 2023

verult force-pushed the stream-client/stream-manager-asyncio-heavy branch from 3dacd5a to 7635c9c Compare July 15, 2023 02:33

verult mentioned this pull request Jul 17, 2023

StreamManager #6144

Closed

wcourtney requested changes Jul 18, 2023

View reviewed changes

verult force-pushed the stream-client/stream-manager-asyncio-heavy branch from b42424c to 861752e Compare July 26, 2023 18:16

verult added 6 commits July 26, 2023 18:27

Move AsyncioExecutor to separate file.

12c9f5b

Add StreamManager

acc5234

Fix test failures

72e9b60

Fixed cancellation races and asyncio.Queue loop issue

fe252a0

Remove the redundant unsubscribe

93f1979

Addressed wcourtney's comments

13d5cef

verult force-pushed the stream-client/stream-manager-asyncio-heavy branch from 861752e to 13d5cef Compare July 26, 2023 18:27

wcourtney approved these changes Jul 27, 2023

View reviewed changes

Fix linter and coverage failures; added done check to _manage_stream_…

38a89ba

…loop_future

verult mentioned this pull request Jul 31, 2023

Integrate StreamManager with run_sweep() verult/Cirq#17

Closed

maffoo reviewed Aug 2, 2023

View reviewed changes

maffoo approved these changes Aug 2, 2023

View reviewed changes

verult added 2 commits August 3, 2023 23:52

Address maffoo's comments

9a62f66

Merge branch 'master' of github.com:quantumlib/Cirq into stream-clien…

dec6705

…t/stream-manager-asyncio-heavy

Merge branch 'master' of github.com:quantumlib/Cirq into stream-clien…

ef1b623

…t/stream-manager-asyncio-heavy

verult mentioned this pull request Aug 4, 2023

Quantum Engine Stream Client #5996

Open

8 tasks

verult merged commit 14baaa5 into quantumlib:master Aug 4, 2023
33 checks passed

verult mentioned this pull request Aug 4, 2023

Integrate StreamManager with run_sweep() #6233

Merged

This was referenced Oct 20, 2023

Stream client prototype #2: asyncio-heavy #6145

Closed

Thread-safe implementation of AsyncCollector #6132

Closed

Streaming client prototype: EngineClient interface and stream handler #5982

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add StreamManager #6199

Add StreamManager #6199

verult commented Jul 15, 2023

verult commented Jul 15, 2023

verult commented Jul 15, 2023

verult commented Jul 17, 2023 •

edited

wcourtney left a comment

wcourtney Jul 18, 2023

verult Jul 25, 2023

wcourtney Jul 18, 2023

verult Jul 26, 2023

wcourtney Jul 18, 2023

verult Jul 26, 2023

wcourtney Jul 18, 2023

verult Jul 26, 2023

wcourtney Jul 18, 2023

verult Jul 26, 2023

maffoo commented Jul 26, 2023

verult commented Jul 26, 2023

wcourtney left a comment

wcourtney Jul 27, 2023

maffoo left a comment

maffoo Aug 2, 2023

verult Aug 3, 2023

maffoo Aug 2, 2023

verult Aug 3, 2023

maffoo Aug 2, 2023

verult Aug 3, 2023

maffoo Aug 2, 2023

verult Aug 3, 2023

maffoo Aug 2, 2023

verult Aug 3, 2023

codecov bot commented Aug 4, 2023 •

edited

Add StreamManager #6199

Add StreamManager #6199

Conversation

verult commented Jul 15, 2023

verult commented Jul 15, 2023

verult commented Jul 15, 2023

verult commented Jul 17, 2023 • edited

wcourtney left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maffoo commented Jul 26, 2023

verult commented Jul 26, 2023

wcourtney left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maffoo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Aug 4, 2023 • edited

Codecov Report

verult commented Jul 17, 2023 •

edited

codecov bot commented Aug 4, 2023 •

edited