Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

senderNonce replay prevention on O that allows for out-of-order senderNonce values in tickets #2661

Closed
yondonfu opened this issue Nov 21, 2022 · 1 comment

Comments

@yondonfu
Copy link
Member

yondonfu commented Nov 21, 2022

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

This issue occurs whenever multiple segments are pushed in parallel or in quick succession with minimal time delay to B via the /live endpoint (i.e. HTTP push ingest). This can occur for a livestream if the client that is pushing to B (i.e. MistProcLivepeer in Catalyst) pushes really short segments really quickly or in parallel by simultaneously retrying segment N while pushing segment N + 1. This can occur for VOD file if the client is configured to push multiple segments in parallel to B (i.e. catalyst-api in Catalyst).

The issue can be repro'd by checking out the following branch and running a parallel segment push e2e test:

git checkout yf/fix-sender-nonce
go test -count=1 -v -run TestHTTPPushBroadcaster ./test/e2e

The test should fail with output that looks like this:

E1121 14:43:59.238132   29209 orchestrator.go:179] manifestID=559e2858f20f7ec9c6f7 seqNo=5 orchSessionID=c37fb7db sender=0xFd2f241e6b8Ca65be71f3A23d2badB4ED8BaA5a9 clientIP=127.0.0.1 Error receiving ticket sessionID=c37fb7db recipientRandHash=0c4c1e4ba5bb8d49e4354313636ac7d598810ceb20f15521309c3886bbb2aa49 senderNonce=4: invalid ticket senderNonce sender=0xFd2f241e6b8Ca65be71f3A23d2badB4ED8BaA5a9 nonce=4 highest=5
E1121 14:43:59.238138   29209 segment_rpc.go:101] manifestID=559e2858f20f7ec9c6f7 seqNo=5 orchSessionID=c37fb7db sender=0xFd2f241e6b8Ca65be71f3A23d2badB4ED8BaA5a9 clientIP=127.0.0.1 error processing payment: invalid ticket senderNonce sender=0xFd2f241e6b8Ca65be71f3A23d2badB4ED8BaA5a9 nonce=4 highest=5
E1121 14:43:59.238209   29209 segment_rpc.go:536] manifestID=559e2858f20f7ec9c6f7 nonce=11777106646266507276 seqNo=5 orchSessionID=c37fb7db orchestrator=https://127.0.0.1:8935 clientIP=127.0.0.1 Error submitting segment code=400 orch=https://127.0.0.1:8935 err="invalid ticket senderNonce sender=0xFd2f241e6b8Ca65be71f3A23d2badB4ED8BaA5a9 nonce=4 highest=5\n"
E1121 14:43:59.238232   29209 mediaserver.go:1030] manifestID=559e2858f20f7ec9c6f7 nonce=11777106646266507276 seqNo=5 orchSessionID=c37fb7db clientIP=127.0.0.1 orchestrator=https://127.0.0.1:8935 No sessions available name=5.ts url=http://127.0.0.1:8936/live/559e2858f20f7ec9c6f7/5.ts
    e2e.go:303:
                Error Trace:    /Users/yondonfu/Development/livepeer/go-livepeer/test/e2e/e2e.go:303
                                                        /Users/yondonfu/Development/livepeer/go-livepeer/test/e2e/asm_arm64.s:1165
                Error:          Expected nil, but got: &errors.errorString{s:"bad status code for push statusCode=503"}
                Test:           TestHTTPPushBroadcaster

Notice the error logs before the test error trace with the invalid ticket senderNonce error.

The reason that the invalid ticket senderNonce error appears under these circumstances is because O currently expects the senderNonce field of tickets that it receives with a unique recipientRandHash value to be monotonically increasing and O enforces this by keeping track of the highest senderNonce value it has seen thus far and if the senderNonce of the current ticket exceeds this value the ticket will be rejected. This restriction works fine for a typical livestream where there is a time delay before each segment appears and can be submitted to B because B will create tickets with ascending senderNonce values and O will receive these tickets in order with segments. This restriction causes issues for a) livestreams for which segments are retried and for which there are really short segments with really short delays between segment appearance and b) VOD files where segments are all available up front and may be pushed in parallel. In these cases, even if B sets the senderNonce values in ascending order for all segments, O can receive those tickets in a different order.

Resources:

Describe the solution you'd like
A clear and concise description of what you want to happen.

Instead of tracking a single highest seen senderNonce in pm.Recipient, we could track a slice with fixed size of MAX_SENDER_NONCE [1]. If the index N in the slice is set to 1, the senderNonce value was seen before and if it is set to 0, the senderNonce value has not been seen before. So, even if O receives a ticket with senderNonce N + 1 before it receives a ticket with senderNonce N, the second ticket will not be rejected.

We cap the size of the slice to MAX_SENDER_NONCE to control the memory consumed. Each set of ticket params advertised by O with a unique recipientRandHash will expire and trigger a cleanup of any memory used to track senderNonce values here. O also returns a fresh set of ticket params with the response for each segment with a new unique recipientRandHash value so in practice, a B should end up generating senderNonce values for a given recpientRandHash value only until it receives the response for the segment it sends to O which should limit the # of senderNonce updates to the slice/map on O. One way to think about MAX_SENDER_NONCE is that it describes the max # of tickets that B can send O up front before B receives a response for a segment.

What would a reasonable value be for MAX_SENDER_NONCE? Maybe something like 20-50.

[1] An alternative could be use a map with an int counter to track the # of elements in the map?

The spec should be updated if this solution is implemented.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@cyberj0g
Copy link
Contributor

Update: the fix is merged, next step is to make sure nonce errors are gone on production.

@cyberj0g cyberj0g closed this as completed Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants