senderNonce replay prevention on O that allows for out-of-order senderNonce values in tickets #2661

yondonfu · 2022-11-21T20:09:38Z

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

This issue occurs whenever multiple segments are pushed in parallel or in quick succession with minimal time delay to B via the /live endpoint (i.e. HTTP push ingest). This can occur for a livestream if the client that is pushing to B (i.e. MistProcLivepeer in Catalyst) pushes really short segments really quickly or in parallel by simultaneously retrying segment N while pushing segment N + 1. This can occur for VOD file if the client is configured to push multiple segments in parallel to B (i.e. catalyst-api in Catalyst).

The issue can be repro'd by checking out the following branch and running a parallel segment push e2e test:

git checkout yf/fix-sender-nonce
go test -count=1 -v -run TestHTTPPushBroadcaster ./test/e2e

The test should fail with output that looks like this:

E1121 14:43:59.238132   29209 orchestrator.go:179] manifestID=559e2858f20f7ec9c6f7 seqNo=5 orchSessionID=c37fb7db sender=0xFd2f241e6b8Ca65be71f3A23d2badB4ED8BaA5a9 clientIP=127.0.0.1 Error receiving ticket sessionID=c37fb7db recipientRandHash=0c4c1e4ba5bb8d49e4354313636ac7d598810ceb20f15521309c3886bbb2aa49 senderNonce=4: invalid ticket senderNonce sender=0xFd2f241e6b8Ca65be71f3A23d2badB4ED8BaA5a9 nonce=4 highest=5
E1121 14:43:59.238138   29209 segment_rpc.go:101] manifestID=559e2858f20f7ec9c6f7 seqNo=5 orchSessionID=c37fb7db sender=0xFd2f241e6b8Ca65be71f3A23d2badB4ED8BaA5a9 clientIP=127.0.0.1 error processing payment: invalid ticket senderNonce sender=0xFd2f241e6b8Ca65be71f3A23d2badB4ED8BaA5a9 nonce=4 highest=5
E1121 14:43:59.238209   29209 segment_rpc.go:536] manifestID=559e2858f20f7ec9c6f7 nonce=11777106646266507276 seqNo=5 orchSessionID=c37fb7db orchestrator=https://127.0.0.1:8935 clientIP=127.0.0.1 Error submitting segment code=400 orch=https://127.0.0.1:8935 err="invalid ticket senderNonce sender=0xFd2f241e6b8Ca65be71f3A23d2badB4ED8BaA5a9 nonce=4 highest=5\n"
E1121 14:43:59.238232   29209 mediaserver.go:1030] manifestID=559e2858f20f7ec9c6f7 nonce=11777106646266507276 seqNo=5 orchSessionID=c37fb7db clientIP=127.0.0.1 orchestrator=https://127.0.0.1:8935 No sessions available name=5.ts url=http://127.0.0.1:8936/live/559e2858f20f7ec9c6f7/5.ts
    e2e.go:303:
                Error Trace:    /Users/yondonfu/Development/livepeer/go-livepeer/test/e2e/e2e.go:303
                                                        /Users/yondonfu/Development/livepeer/go-livepeer/test/e2e/asm_arm64.s:1165
                Error:          Expected nil, but got: &errors.errorString{s:"bad status code for push statusCode=503"}
                Test:           TestHTTPPushBroadcaster

Notice the error logs before the test error trace with the invalid ticket senderNonce error.

The reason that the invalid ticket senderNonce error appears under these circumstances is because O currently expects the senderNonce field of tickets that it receives with a unique recipientRandHash value to be monotonically increasing and O enforces this by keeping track of the highest senderNonce value it has seen thus far and if the senderNonce of the current ticket exceeds this value the ticket will be rejected. This restriction works fine for a typical livestream where there is a time delay before each segment appears and can be submitted to B because B will create tickets with ascending senderNonce values and O will receive these tickets in order with segments. This restriction causes issues for a) livestreams for which segments are retried and for which there are really short segments with really short delays between segment appearance and b) VOD files where segments are all available up front and may be pushed in parallel. In these cases, even if B sets the senderNonce values in ascending order for all segments, O can receive those tickets in a different order.

Resources:

Describe the solution you'd like
A clear and concise description of what you want to happen.

Instead of tracking a single highest seen senderNonce in pm.Recipient, we could track a slice with fixed size of MAX_SENDER_NONCE [1]. If the index N in the slice is set to 1, the senderNonce value was seen before and if it is set to 0, the senderNonce value has not been seen before. So, even if O receives a ticket with senderNonce N + 1 before it receives a ticket with senderNonce N, the second ticket will not be rejected.

We cap the size of the slice to MAX_SENDER_NONCE to control the memory consumed. Each set of ticket params advertised by O with a unique recipientRandHash will expire and trigger a cleanup of any memory used to track senderNonce values here. O also returns a fresh set of ticket params with the response for each segment with a new unique recipientRandHash value so in practice, a B should end up generating senderNonce values for a given recpientRandHash value only until it receives the response for the segment it sends to O which should limit the # of senderNonce updates to the slice/map on O. One way to think about MAX_SENDER_NONCE is that it describes the max # of tickets that B can send O up front before B receives a response for a segment.

What would a reasonable value be for MAX_SENDER_NONCE? Maybe something like 20-50.

[1] An alternative could be use a map with an int counter to track the # of elements in the map?

The spec should be updated if this solution is implemented.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

The text was updated successfully, but these errors were encountered:

cyberj0g · 2023-01-17T08:28:32Z

Update: the fix is merged, next step is to make sure nonce errors are gone on production.

github-actions bot added the status: triage this issue has not been evaluated yet label Nov 21, 2022

yondonfu added area: orchestrator QoL area: blockchain and removed status: triage this issue has not been evaluated yet labels Nov 21, 2022

yondonfu mentioned this issue Nov 21, 2022

Tons of Invalid Ticket senderNonce for the past few months #2614

Closed

Quintendos mentioned this issue Dec 5, 2022

Payment issues when transcoding vod with community orchestrators livepeer/catalyst-api#257

Closed

Quintendos mentioned this issue Dec 12, 2022

VoD jobs distributed on network with public O's livepeer/catalyst-api#246

Closed

cyberj0g self-assigned this Dec 23, 2022

cyberj0g mentioned this issue Jan 5, 2023

Allow for out of order nonce #2710

Merged

that-kampe-steak mentioned this issue Jan 11, 2023

Rewards Claiming Failing with Nonce Too Low in v0.5.36 #2718

Open

cyberj0g closed this as completed Feb 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

senderNonce replay prevention on O that allows for out-of-order senderNonce values in tickets #2661

senderNonce replay prevention on O that allows for out-of-order senderNonce values in tickets #2661

yondonfu commented Nov 21, 2022 •

edited

Loading

cyberj0g commented Jan 17, 2023

senderNonce replay prevention on O that allows for out-of-order senderNonce values in tickets #2661

senderNonce replay prevention on O that allows for out-of-order senderNonce values in tickets #2661

Comments

yondonfu commented Nov 21, 2022 • edited Loading

cyberj0g commented Jan 17, 2023

yondonfu commented Nov 21, 2022 •

edited

Loading