compile: allow local notify streams for shuffle dispatch#24159
Merged
mergify[bot] merged 3 commits intoApr 20, 2026
Merged
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes a compile/remoterun shuffle-dispatch coordination bug where sendNotifyMessage could fail/hang when a dispatch operator is scheduled on the same CN as the coordinator by allowing the pipeline notification RPC stream to connect to the local CN address.
Changes:
- Remove the self-connection rejection in
pipelineClient.NewStream, allowing loopback/local backend streams. - Add a focused unit test ensuring local backend addresses are forwarded to the underlying
morpcclient. - Add a root-cause write-up documenting the failure mode and why the fix is safe.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| pkg/cnservice/cnclient/client.go | Removes the local-backend guard so notification streams can connect to the local CN address. |
| pkg/cnservice/cnclient/client_test.go | Adds a regression test verifying NewStream permits local backend connections and passes lock=true. |
| doc/issue_24158_send_notify_self_connection.md | Documents root cause, failure mechanism, and rationale for the narrow fix. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Contributor
Merge Queue Status
This pull request spent 16 seconds in the queue, including 1 second running CI. Required conditions to merge
|
fengttt
pushed a commit
to fengttt/matrixone
that referenced
this pull request
May 7, 2026
…n#24159) This fixes the self-connection bug in the compile/remoterun notification path. `Scope.RemoteRun` already handles the local-scope case with `ipAddrMatch`, but `Scope.sendNotifyMessage` still creates a pipeline RPC stream for every `RemoteReceivRegInfo.FromAddr`. When one dispatch operator is scheduled on the same CN as the coordinator, `fromAddr` equals the local service address. The old `pipelineClient.NewStream` rejected that loopback connection with `remote run pipeline in local`, so the prepare-done notification path could not be established for the local dispatch operator. That can break remote dispatch/shuffle coordination for queries such as secondary-index UPDATE plans using shuffle join with `serial_full(...)` and surface as failure or hang. This PR keeps the existing `RemoteRun` local-execution guard intact, but allows the notification stream itself to connect to the local CN address. That is the smallest fix that restores the existing dispatch notification protocol without refactoring local/remote receiver handling. Also included: - a unit test that pins the real regression point in `pipelineClient.NewStream` Approved by: @XuPeng-SH
LeftHandCold
added a commit
to LeftHandCold/matrixone
that referenced
this pull request
Jun 4, 2026
…n#24159) This fixes the self-connection bug in the compile/remoterun notification path. `Scope.RemoteRun` already handles the local-scope case with `ipAddrMatch`, but `Scope.sendNotifyMessage` still creates a pipeline RPC stream for every `RemoteReceivRegInfo.FromAddr`. When one dispatch operator is scheduled on the same CN as the coordinator, `fromAddr` equals the local service address. The old `pipelineClient.NewStream` rejected that loopback connection with `remote run pipeline in local`, so the prepare-done notification path could not be established for the local dispatch operator. That can break remote dispatch/shuffle coordination for queries such as secondary-index UPDATE plans using shuffle join with `serial_full(...)` and surface as failure or hang. This PR keeps the existing `RemoteRun` local-execution guard intact, but allows the notification stream itself to connect to the local CN address. That is the smallest fix that restores the existing dispatch notification protocol without refactoring local/remote receiver handling. Also included: - a unit test that pins the real regression point in `pipelineClient.NewStream` Approved by: @XuPeng-SH
mergify Bot
pushed a commit
that referenced
this pull request
Jun 7, 2026
…4854) This fixes the self-connection bug in the compile/remoterun notification path. `Scope.RemoteRun` already handles the local-scope case with `ipAddrMatch`, but `Scope.sendNotifyMessage` still creates a pipeline RPC stream for every `RemoteReceivRegInfo.FromAddr`. When one dispatch operator is scheduled on the same CN as the coordinator, `fromAddr` equals the local service address. The old `pipelineClient.NewStream` rejected that loopback connection with `remote run pipeline in local`, so the prepare-done notification path could not be established for the local dispatch operator. That can break remote dispatch/shuffle coordination for queries such as secondary-index UPDATE plans using shuffle join with `serial_full(...)` and surface as failure or hang. This PR keeps the existing `RemoteRun` local-execution guard intact, but allows the notification stream itself to connect to the local CN address. That is the smallest fix that restores the existing dispatch notification protocol without refactoring local/remote receiver handling. Also included: - a unit test that pins the real regression point in `pipelineClient.NewStream` Approved by: @XuPeng-SH
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
Which issue(s) this PR fixes:
fixes #24158
What this PR does / why we need it:
This fixes the self-connection bug in the compile/remoterun notification path.
Scope.RemoteRunalready handles the local-scope case withipAddrMatch, butScope.sendNotifyMessagestill creates a pipeline RPC stream for everyRemoteReceivRegInfo.FromAddr. When one dispatch operator is scheduled on the same CN as the coordinator,fromAddrequals the local service address. The oldpipelineClient.NewStreamrejected that loopback connection withremote run pipeline in local, so the prepare-done notification path could not be established for the local dispatch operator.That can break remote dispatch/shuffle coordination for queries such as secondary-index UPDATE plans using shuffle join with
serial_full(...)and surface as failure or hang.This PR keeps the existing
RemoteRunlocal-execution guard intact, but allows the notification stream itself to connect to the local CN address. That is the smallest fix that restores the existing dispatch notification protocol without refactoring local/remote receiver handling.Also included:
pipelineClient.NewStreamValidation
go test ./pkg/cnservice/cnclient ./pkg/sql/compile -count=1