Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operational query on running stunner in headless mode #31

Closed
imcom opened this issue Aug 30, 2022 · 14 comments
Closed

Operational query on running stunner in headless mode #31

imcom opened this issue Aug 30, 2022 · 14 comments
Labels
type: question Further information is requested

Comments

@imcom
Copy link

imcom commented Aug 30, 2022

Currently I am a bit confused with the scaling operation of stunner in one-to-one call scenario. This the setup, initially I would have one stunner with LoadBalancer service (Public facing IP) and cluster service IP for WebRTC client within k8s. This works fine as long as there is only one stunner pod. But once I scale the stunner pods to 3 instances I would assume the WebRTC would not establish because there is no control over the LB and cluster service to actually land the BIND requests to which stunner correct?

So in this case what should be done to scale ? A naive way I could think of is that I need to assign a new LB public address for each stunner and use headless service within the k8s. But this adds extra complexity on how should I ensure the both clients can use the same stunner ?

Thanks in advance

@imcom
Copy link
Author

imcom commented Aug 30, 2022

Oh sorry, they can use arbitrary stunner and talk to each other

@imcom imcom closed this as completed Aug 30, 2022
@imcom
Copy link
Author

imcom commented Aug 30, 2022

Well ... I guess the confusion is still here ... Let's say I have stunner-A and stunner-B. A browser as a video player and an application in k8s which streams video frames. So now when the browser connects to stunner-A and the application connects to stunner-B, will this work out ? So browser would have a relay addr which is the pod IP + random port of stunner-A and the application most likely can uses its pod IP + random port and it is host candidate. So how does stunner-A forwards to application on its host candidate?

@imcom imcom reopened this Aug 30, 2022
@imcom
Copy link
Author

imcom commented Aug 30, 2022

IIUC, if the application talks to stunner-B than by default, application would not be able to whitelist the peer addr in stunner-A, so the browser should not be able to reach the application via stunner-A, is it correct ?

@rg0now
Copy link
Member

rg0now commented Aug 31, 2022

I'll try to answer your questions below, let me know if I am being too fast (or too slow).

Let's say I have stunner-A and stunner-B. A browser as a video player and an application in k8s which streams video frames. So now when the browser connects to stunner-A and the application connects to stunner-B, will this work out ?

The idea here is that the user, located outside the cluster behind potentially several layers of NAT, will connect via STUNner-A (or STUNner-B), by creating a transport relay connection on STUNner. The user-facing side of the transport relay connection runs TURN, so that it is immune to NATs, while the cluster-facing side uses plain UDP. Now the client will ask STUNner to return the IP address and port corresponding to the cluster-facing side of the transport relay connection, this is the so called "relay candidate". Note that the IP address here is STUNner-A's pod IP, while the port is some semi-random (STUNner allows you to customize the upstream port range).

The application on the other hand, being located inside the cluster, will not need STUNner at all: it will create a humble host candidate over its pod IP. (Note that this is contingent on that you configure your application with no external STUN or TURN servers at all; otherwise the client and the application may establish a media connection circumventing STUNner, which is most probably not what you want. Curiously, TURN magic also allows us to successfully connect the client and the application in the case when you mistakenly set STUNner as a TURN server for the application, but this is much rather a funny side-effect than a useful feature here.)

So browser would have a relay addr which is the pod IP + random port of stunner-A and the application most likely can uses its pod IP + random port and it is host candidate. So how does stunner-A forwards to application on its host candidate?

Exactly! Now the client passes over the relay candidate to the application and the application returns its own host candidate to the client over the signaling channel, and then they try to connect via this candidate pair. The client sends a packet to the application's host candidate via STUNner A, which it will conveniently receive over the external TURN connection and forward over its pod IP:port (the relay candidate) to the application's pod IP:port (the host candidate). Now, since pod-pod communication in Kubernetes must never ever involve a NAT, this connection setup attempt will be successful: the application will find that it receives the packet from the IP:port it was told in the relay candidate, and the same applies in the other way around. ICE goes to connected state and voila, your WebRTC media is ingested into the cluster despite that it commonly traverses 2-3 NATs while traveling from the client to the application via the Kubernetes container network.

Does this answer your question?

@rg0now rg0now added good first issue Good for newcomers question type: question Further information is requested and removed question good first issue Good for newcomers labels Aug 31, 2022
@imcom
Copy link
Author

imcom commented Sep 1, 2022

@rg0now Excellent explanation! Huge Thanks!

(Note that this is contingent on that you configure your application with no external STUN or TURN servers at all; otherwise the client and the application may establish a media connection circumventing STUNner, which is most probably not what you want. Curiously, TURN magic also allows us to successfully connect the client and the application in the case when you mistakenly set STUNner as a TURN server for the application, but this is much rather a funny side-effect than a useful feature here.)

I noticed this too, but I was wondering in the one-to-one call example, when we loop back the connection to stunner itself, it is then the case that both sides end up using stunner provided relay addr for the media transport. However, I do find that the most intuitive way is to configure the UDPRoute to the media service as backend so Pod-to-Pod communication is right there.

@imcom
Copy link
Author

imcom commented Sep 1, 2022

However, there is a further question regarding the ICE procedure. So by right the application inside cluster should NOT configure stunner as ICE servers. But what if they do, and the browser gets a relay from stunner-A and the application gets another relay from stunner-B I would assume this can also work given that stunner-A would actually forward the browsers BINDs to relay addr of stunner-B.

Browser --> stunner-A (3478) [src: relay-A] --> [dst: relay-B] stunner-B --> the application

Is this correct ? So it really does not matter which one connects to which stunner as the ICE server

Thanks in advance!

@rg0now
Copy link
Member

rg0now commented Sep 1, 2022

I noticed this too, but I was wondering in the one-to-one call example, when we loop back the connection to stunner itself, it is then the case that both sides end up using stunner provided relay addr for the media transport.

I'm not sure we're using the word "looping back" in the same sense. We use this term to mean that in the headless model, when there is no media server at all, both clients open a relay candidate on STUNner and then these magically get looped back to one another during the ICE conversation, connecting the two clients. This works because both relay candidate's contain pod IPs, so this is a fairly standard TURN server use case it is just that the relay candidates are opened "inside" the cluster.

The one2one call demo uses the "media-plane setup" on the other hand, in which case the flow of packets is client_a -> stuner_a -> kurento -> stunner_b -> client_b and vice versa. In some sense this is also "looping back" media to STUNner; I guess this is what you meant.

Is this correct ? So it really does not matter which one connects to which stunner as the ICE server

Almost correct: I would make this more precise in the following way: "a client external to the cluster MUST use STUNner to open a transport relay connection, and the application located inside the cluster MAY use STUNner to open a transport relay candidate" (where MUST and MAY mean what they usually do in IETF parlance).

Observe the difference: the case when the client generates only a host candidate (or a server reflexive candidate using a public STUN server) but it does not use STUNner, while the application opens a relay candidate via STUNner (and potentially further host candidates) will not work. This is because in this case the clients' ICE candidates all contain public IPs while the application's candidate (obtained from STUNner) contains a private IP (pod IP), and the former has no way to route packets to the latter (the other way around should work though but I don't know ICE enough to know whether or not it handles half-duplex connections, plus there is a NAT in the middle so I guess this should not work either).

The other case (both the client and the application use STUNner) should work though (not that I ever tested this). Feel free to try, let me know what you obtained.

@imcom
Copy link
Author

imcom commented Sep 1, 2022

@rg0now Awesome! I believed I've learnt all I need. I am deploying STUNner into our production environment for our new product. Where we also use pion to stream h264 streams to the client. The client (browser) and pion uses STUNner to establish WebRTC. I've tried using relay only on both ends which worked. But now I've changed the setup to client uses a relay addr and pion uses its Pod IP as host candidate.

Cheers

@imcom
Copy link
Author

imcom commented Sep 1, 2022

To that end, I also believe once we have a large number of pion streamers in our cluster, we need to have a relatively small sized media server group in front of the pion streamers so the STUNner would not need to loop over like thousands or tens of thousands Route targets.

In my imagination, for instance, I can have a media server group of 5 servers. Each of them acting as SFU? So the current streamers could just send the frames to the media servers and our clients should fetch corresponding media tracks from those media servers. This is how I imagined to scale our service.

Any suggestions or corrections would be greatly appreciated!

@rg0now
Copy link
Member

rg0now commented Sep 1, 2022

To that end, I also believe once we have a large number of pion streamers in our cluster, we need to have a relatively small sized media server group in front of the pion streamers so the STUNner would not need to loop over like thousands or tens of thousands Route targets.

Can you give more detail? We are extremely interested in production deployments of STUNner: we want to learn what works and what not in your setup. Let us know if we could maybe organize a call or something.

That being said, STUNner does not quite pre-condition the size of the backend pool, as long as all backends belong to the same Kubernetes service. (Well, this is not entirely true: the operator actually injects all the pod IPs as endpoint IP addresses into the STUNner config so when you have like thousands of pods in a single backend service then this may take a while, but if you have thousands of media servers in your cluster then I guess this is your least worry). So if it is an option to you to put all your pion streamers into a single Kubernetes service then all should be fine.

Quick note: STUNner in fact does not load-balance across the backend pool (yet): by the time STUNner receives a TURN allocation request from the client the client actually knows which media-server/application/backend it wants to talk to (the one corresponding to the ICE candidate received from the application), so STUNner just merely checks whether that pod-ip belongs to the Kubernetes service you specified as the UDPRoute backend, and it if does then it lets the request through, otherwise it blocks it.

Does this make any sense?

@imcom
Copy link
Author

imcom commented Sep 2, 2022

Totally!

Sure I will try to put more details of our setup. Besides, I believe we could probably find a time by mid of Sep. to have a conversation via concall. Stay in touch!

The deployment is as follows.

Our application in the cloud (k8s) is more or less like a online gaming, where users will be assigned to a dedicated server. The server renders user input into h264 frames and stream those frames to our pion based proxy sidecar. The user is using WebSocket for signaling with this pion proxy sidecar directly and they use STUNner as the broker. For the latest configuration, user would get a relay addr from STUNner and the pion proxy would just use POD IP for the RTC communication. We do not possess any media servers I would say. More like a one-to-one video only (one-way) call. So that being said, let's say if we want to serve 1K users simultaneously, we will then scale 1K server apps + proxies so that STUNner will have 1K routes in this case. What I was saying previously is that perhaps when we have 10K users, we should have a pool of media servers and STUNner only takes care of the media servers. Then our application server's proxy would somehow stream the videos to those media servers.

@rg0now
Copy link
Member

rg0now commented Sep 2, 2022

Thanks @imcom, this was super insightful. Two minor notes:

  • As per STUNner's scaling per backend pod: The dataplane (stunnerd) should be able to handle tens of thousands of simultaneous connections into the cluster, and if not then that's a bug. Another blocker may be raw packet speed: in standalone benchmarks STUNner can handle 50-100k packet per sec with 1-2 cores and our GKE benchmarks show somewhere around 30-50 kpps per 1.5 cores, but this is with only 10-50 simultaneous connections.
  • As per the control plane: at present the Kubernetes operator watches all the Kubernetes Endpoint resources in the cluster and it will re-generate the config for each change, so it is definitely not designed to handle 10k endpoints per service and it is absolutely not designed for massive-scale endpoint churn. It should work, but it might be slow. (It may need a manual tweak to increase the throttling period, requiring a recompile.)

That being said, I don't see a reason why the current STUNner version would not work with 10k endpoints, provided that the dataplane is scaled-out to 10-20 pods and you give sufficient CPU to the operator. And if not, then your idea to inject media servers between can be a good solution.

Backend scalability has not been in the focus yet, but I guess your input should be a good motivation for us to work on this: we know how to handle this, just haven't had the time/resources to work on it yet.

@imcom
Copy link
Author

imcom commented Sep 2, 2022

Lovely, we have not got chance to do the benchmark on our side, mind to elaborate more on the tooling or methodology you used for it ? Anyhow your benchmark result looks just sufficient and reasonable for our use case. Glad to hear that !

@rg0now
Copy link
Member

rg0now commented Nov 18, 2022

Just wanted to drop a line here that we have documented the ideas that came up during our discussion:

  • in the "Asymmetric ICE mode" the client uses STUNner to generate a TURN relay candidate via STUNner and the server generates only a host candidate, while
  • in the "Symmetric ICE mode" both the client and the server generate TURN a relay candidate and they connect via STUNner.

I'm closing this issue for now, feel free to reopen if there's any related issue you'd like to discuss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants