Capacity Cap for B #706

eladmallel · 2019-01-23T14:18:14Z

Is your feature request related to a problem? Please describe.
To improve customer experience, B should respond with a clear error message whenever it's at its max capacity of concurrent streams and a new RTMP stream comes in.

Describe the solution you'd like
The user/customer should receive clear feedback from B that it is now at max capacity and therefore cannot handle new streams.

We probably want to add yet another CLI param, such as maxConcurrentStreams of type int. B should reject any incoming stream once its concurrent stream count reaches maxConcurrentStreams.

Not sure yet how to best achieve this goal?

Does RTMP support this kind of response?
Should we rely on webhooks for this kind of response?
Any other approaches?

Describe alternatives you've considered
Discussed above.

Additional context
None.

The text was updated successfully, but these errors were encountered:

eladmallel · 2019-01-23T14:18:51Z

@j0sh @ericxtang @darkdarkdragon can you please share your thoughts on how we should approach this?

j0sh · 2019-01-23T18:46:12Z

Does RTMP support this kind of response?

Not really, the best we can do here is disconnect the stream. We might be able to add a special code to the connect response prior to disconnect, but that requires changes to LPMS and probably joy4 (the golang RTMP implementation), in addition to any specialized handling on the client (customer) side. Most RTMP application hooks do not expose this type of detail, so it's probably more trouble than it's worth.

Should we rely on webhooks for this kind of response?

We should certainly be able to stop the stream based on the webhook response, but I'm inclined to also add capacity caps as a separate node-level feature. Enforcing capacity via webhook means the endpoint(s) need to have a synchronized view of the broadcaster's internal state, which is error-prone.

Any other approaches?

This is pretty easy to implement. However, for M2, I'd say O-level caps are more important since this will allow the transcoding network to self-balance.

From experience with other RTMP providers, most customers will probably take a non-explicit approach to load balancing their B nodes:

Over-provision RTMP servers slightly above anticipated demand
Randomly assign customers to servers (whether assigning them a fixed node, or dynamically via ingest restreaming)

Of course, a capacity cap is still useful here, if only to preserve existing QoS at the upper limit and/or alert some monitoring system.

eladmallel · 2019-01-24T09:16:30Z

I'd say O-level caps are more important

I agree that's important, and we have a separate issue for that as part of M2.

This is pretty easy to implement.

I'd love to understand more specifically how you see this implemented? What I had in mind was to introduce a new webhook message that lets the user know why the stream was rejected. A bit more in detail:

We could name this message something like StreamRejectedDueToMaxCapacity
It should contain the host the rejected the stream, and the user-defined ManifestID, so the user can have sufficient information to decide how to react
B would invoke this webhook immediately after rejecting an RTMP stream

Does this sound good, or did you have a different approach in mind?

cc @darkdarkdragon

darkdarkdragon · 2019-01-24T14:24:30Z

@j0sh

Enforcing capacity via webhook means the endpoint(s) need to have a synchronized view of the broadcaster's internal state

Can you elaborate on that?

darkdarkdragon · 2019-01-24T14:25:40Z

@eladmallel

We could name this message something like StreamRejectedDueToMaxCapacity

It should contain the host the rejected the stream, and the user-defined ManifestID, so the user can have sufficient information to decide how to react

B would invoke this webhook immediately after rejecting an RTMP stream

Sounds good for me

j0sh · 2019-01-24T20:52:08Z

@eladmallel I may have mis-interpreted this part in the writeup: "Should we rely on webhooks for this kind of response?"

Specifically, this sounded like we would be outsourcing the logic for tracking capacity and rejecting streams beyond the max to the endpoint receiving the stream auth webhook (running outside the node). This would be problematic.

If we're just talking about an event notification that the capacity cap has been reached -- then all this sounds fine. I see that it's also mentioned within the issue for system-wide instrumentation, #671 so perhaps the specific approach to notifications (webhooks, existing metrics, collectd, something else) could be decided as part of addressing the instrumentation feature.

It should contain the host the rejected the stream

Note this will probably require changes deeper down, since I don't think connection level information is exposed to the goclient. But having such info available would generally be useful, so I'm in favor of making this change at eventually.

@darkdarkdragon Happy to elaborate on the problematic aspects if it's still relevant, but didn't want to get sidetracked over my own misunderstanding.

I'd love to understand more specifically how you see this implemented?

// Check this within createRTMPStreamIDHandler and/or gotRTMPStreamIDHandler
func (s *LivepeerServer) isAtCapacity (url *url.URL) bool {
    s.connectionLock.RLock()
    defer s.connectionLock.RUnlock()
    if len(s.rtmpConnections) > MaxBroadcastStreams {
            go sendAtCapacityNotif(url) 
            return true
     }
    return false
}

eladmallel · 2019-01-31T15:26:36Z

@j0sh I think you now understand my intention well. When I was thinking to use webhooks it was with the purpose of helping the user/customer of B be aware that:

B rejected a stream just now for a good and specific reason
You should know exactly which B so you can potentially direct traffic to other Bs

I think that your pseudocode is in that same line of thinking.

Are we then good with relying on webhooks to provide users of B with this useful information? @j0sh @darkdarkdragon

j0sh · 2019-01-31T18:05:33Z

Are we then good with relying on webhooks to provide users of B with this useful information?

This feels quite close to #704 -- the "B at capacity" notification really is just another event, so let's take the discussion of the actual mechanism there. Also related is #671

eladmallel added Transcoding API labels Jan 23, 2019

eladmallel mentioned this issue Jan 23, 2019

O should respond with an InsufficientCapacity when B attempts a new job/segment negotiation and O is at max capacity #603

Closed

darkdarkdragon self-assigned this Feb 26, 2019

darkdarkdragon mentioned this issue Feb 26, 2019

Capacity Cap for B #753

Merged

3 tasks

darkdarkdragon closed this as completed Feb 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capacity Cap for B #706

Capacity Cap for B #706

eladmallel commented Jan 23, 2019 •

edited

Loading

eladmallel commented Jan 23, 2019

j0sh commented Jan 23, 2019

eladmallel commented Jan 24, 2019

darkdarkdragon commented Jan 24, 2019

darkdarkdragon commented Jan 24, 2019

j0sh commented Jan 24, 2019

eladmallel commented Jan 31, 2019

j0sh commented Jan 31, 2019

Capacity Cap for B #706

Capacity Cap for B #706

Comments

eladmallel commented Jan 23, 2019 • edited Loading

eladmallel commented Jan 23, 2019

j0sh commented Jan 23, 2019

eladmallel commented Jan 24, 2019

darkdarkdragon commented Jan 24, 2019

darkdarkdragon commented Jan 24, 2019

j0sh commented Jan 24, 2019

eladmallel commented Jan 31, 2019

j0sh commented Jan 31, 2019

eladmallel commented Jan 23, 2019 •

edited

Loading