-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reuse catalyst stream pull node if pull is locked #2168
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Such race condition. I'd imagine that the second /pull
request would simply be redirected to the node currently pulling it. Does this only happen when the second /pull
arrives after the stream started but before the load balancer gets its state to redirect to the right node?
Yeah, from our perspective it happens, when Load Balancer redirects to a different node AND the stream is not available (so, Mist load balancer does not see it active). I believe that for Trovo it happens when a streamer starts streaming, stops streaming, and then suddenly starts again (maybe some retry from OBS). So, Trovo does not send |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM either way
Co-authored-by: Victor Elias <victor@livepeer.org>
How do we test this to ensure no regressions? |
I think the same way as with other changes => Test pull stream in staging (with the stream stopped and started). And then monitor the prod. Any other thoughts how we could test it? |
This is a fix for the following scenario that happens from time to time on prod:
/pull
request which is successful/terminate
) we receive another/pull
request for the same streamThe workaround for this is to check if the given stream is currently locked (read: it's being pulled atm) and if yes, then trigger pulling from the same catalyst node. If we redirect to the same node, then Mist handles this.
Sample logs: https://eu-metrics-monitoring.livepeer.live/grafana/goto/wDCuqDYSg?orgId=1
fix https://linear.app/livepeer/issue/PS-518/more-pull-failuress