-
Notifications
You must be signed in to change notification settings - Fork 2
Panic fix for sqs task while pushing messages on output channel #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes an intermittent panic ("send on closed channel") in the SQS task implementation by addressing a race condition in the message polling logic.
Key changes:
- Made
getMessagessynchronous instead of running it as a goroutine - Changed return value from always
nilto the actual error fromgetMessages - Ensured deterministic execution order:
getMessagescompletes beforeRunreturns
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
hladush
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the Run func for sqs was finishing first (but not getMessages) due to non deterministic nature of go routines,
How is it possible?
Run function finishes only if all processReceipts function finishes. processReceipts can finish only if receipts channel is closed. Ant the channel is closed only when getMessages finished.
I'm fine with this change, but this description is not correct
Reason: We were spawning getMessages(func to pull sqs messages) in a separate go routine. If the Run func for sqs was finishing first (but not getMessages) due to non deterministic nature of go routines,(please explain how it's possible) it will still try to push messages in output channel, even though output channel has closed after Run completion.
@hladush Whereas on the other hand previously in getMessages, s.SendData (trying to send on output channel) was experiencing channel closure, causing panic. It is evident from the below error message that getMessage is still trying to send on task's output channel, even though it is closed. And there can only be one reason of output channel closure : Return of the SQS's Run function. The above issue can be reproduced in below conditions:
But thinking of this, I think another condition can still arise where the processReceipts have exited, but getMessage is yet to complete pushing to receipts. In this condition, few messages will still lie in receipts channel, as processReceipts have already exited. Although I haven't yet encountered this condition, just speculating. Do you think we should remove common context from processReceipts, and modify the func to exit only if we dont receive anything from channel? This way we can clear remaining messages even after messagePull has stopped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@prasadlohakpure few comments.
In particular this usecase I would just add getMessages to the wg because it's easier for understand but what you did it's correct. |
Description
Panic fix for sqs while pushing messages on output channel
Existing behaviour:
Pipeline involving sqs may sometimes give intermittent error
panic: send on closed channel.Expected behaviour:
No intermittent panics for pipelines involving sqs task types.
Reason:
getMessages as well as threads(processReceipts) share the same context. So a certain conditions arises intermittently where processReceipts are exiting due to context cancellation, hence the wg.wait() condition on them gets satisfied, causing the Run func to return, and eventually closing the output channel for task
Whereas on the other hand previously in getMessages, s.SendData (trying to send on output channel) was experiencing channel closure, causing panic.
It is evident from the below error message that getMessage is still trying to send on task's output channel, even though it is closed. And there can only be one reason of output channel closure : Return of the SQS's Run function.
The above issue can be reproduced in below conditions:
Fix:
Make sqs polling sync in nature, and wait for completion of processReceipts(func to push message to output).
Testing:
Tested using
test/pipelines/sqs_with_context_concurrency.yamlTypes of changes
Checklist