-
-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pade behind a proxy #408
Comments
You're probably using the reverse proxy because of a NATing environment. But at the screenshots in the thread, in the section "IP Address Mapping", the fields for the local and pubic IP (the internal and external one) are empty. This settings "feed" the generation of the ICE harvester configuration
of the JVB ( |
Hi Guido, |
Sorry, but I can't tell you anything about an OpenFire-Cluster at this time; I don't test it for now. Do you know the sketch at https://github.com/igniterealtime/openfire-pade-plugin/wiki/OFmeet-Network-Scheme ? Dele recently started activities to enable Pade for an Openfire Cluster. But I don't know if this is already usable. In addition to OpenFire itself, the Jitsi components also must be set up to work as a cluster: I would think that all used JVB must register at each JiCoFo and interconnected ("Octo"-Feature). And the ICE harvester (as part of the JVB) at each node must at least announce the own address. Dele added some new code about this: openfire-pade-plugin/src/java/org/jivesoftware/openfire/plugin/ofmeet/JitsiJvbWrapper.java Lines 319 to 346 in 2ba5bcc
IMHO, this code will override the ICE addresses with the values for network.interface and network.interface_public , but only if a value is already provided (L320,L337). From that, you have to enter (maybe dummy values by intentions) at the Admin UI. After startup, please check the generated config files.
Dele changed the default log level of the Jitsi Component Wrappers, you might add
to the logging configuration file |
Hi Guido. |
@slmc-tech can we confirm if your network has NAT or not. If both your nodes have static IP addresses with a public FQDN for your domain, then you do not need to specify public and private IP addresses and don't need to modify openfire.xml for clustering. If you bypass the load-balancer, can you get a 3-way conference with participants connected directly with either node01 or node02 with audio and video working ok? If that is the case, then the load balancer could be the cause of the issue. That is where you will need the expertise of @gjaekel |
@deleolajide yes this is a NATed environment. The nodes sit inside a class C network and have only private ip addresses assigned. The load balancer is a reverse proxy and load balancer (nginx) where we translate incoming tcp 443 to the private ip of the nodes on port 7443. The load balancer also passes udp 10000 to the nodes. It is basically set up like @gjaekel has shared in the diagram. We have DNS servers that translate FQDN addresses to the nodes for users that are logged in to the internal network. When you go to the nodes directly everything works just fine. You have multiuser conferences with audio video and all that jazz as expected. Regardless of which node you connect to you can join conferences and create new ones. |
@deleolajide @gjaekel server { |
Thank you responding with so much detail 👍
The issue could be me using network.interface which cause Openfire to bind to that network adaptor only. Let me make a change to use a different parameter value and see if it makes a difference. |
I have made the change. See this commit Use this snapshot build of pade.jar to test it. I tested on my dev server and it works ok with 2 nodes. However, I don't have a load balancer in front of Openfire Now use the following new property names instead of the old names
Updated wiki page - https://github.com/igniterealtime/openfire-pade-plugin/wiki/Clustering-multiple-Jitsi-Videobridges-using-Hazelcast-plugin |
@deleolajide Thank you very much for spending time on this!! |
@deleolajide so we have done some testing here and it kind of seems like we are 50% there :-) |
I believe you have to remove the old plugin on both nodes and make sure both nodes have pade version 1.6.3-SNAPSHOT
Yes, but I created two instances of Openfire bound to different IP addresses on my DEV PC. I noticed that I had to wait for a few minutes before both nodes saw each other.
That sound like a regression. You were able to connect to either node directly before. |
(As I was very busy yesterday, i wasn't able to contribute)
Let me ask you if you want your setup act in active/active (to spread the load) or active/passive mode (for redundancy). You have to answer this for the for the "Openfire" and "Jitsi" domain, it don't must use the same mode for the different domains but the configuration will have to respect this, of course. Note, that the SWebSocket proxy offered by Pàdé isn't mandatory from Jitsi's point of view. It just ease the network setup, because it allow to run the JVB without an external IP. But this external IP is announced by the ICE component and is the important on for the A/V streams, the ICE handshake (will try to) chose the right target IP to be used as a route between the clients and the serving JVB (for UDP, with an optional fallback to TCP). |
@deleolajide you were right. We removed the plugin and cleared all parameters stored in the DB and then reinstalled the snapshot. This is what we are seeing now:
|
Note, that the XMPP connections and the JVB connections are two pair of shoes. You "see" other members in rooms by concerns of the XMPP part, but the A/V connections is build up by the JVM. The JVB is commanded by OpenFire to "open the commuication channels"
If P2P is enabled, 2-participants-meetings will connect each other aside the JVB. I wonder if and how your two JVBs are exposed to the internet: Are both visible on separate IPs or are they hidden behind and served through the load balancer? In the later case: How do you manage the the UDP traffic? |
@gjaekel I see. Thank you for clarifying this. I can see that p2p is indeed enabled. As discussed if we stop one of the servers everything works just fine so maybe this is a clustering issue then.... |
... so we probably "just" have left an issue with the A/V streams. |
:-) |
@gjaekel i missed one of your points. None of the nodes are exposed to the internet. They sit within a class c internal network and are accessible from the internet through a reverse proxy where we port forward 443 to 7443 of the nodes. UDP streams is something that i am still trying to figure out how to load balance effectively. As of now we are just forwarding UDP to the nodes like so: server { |
But here -- maybe in contrast to DNS requests -- we have a high stateful usecase: The traffic must by directed to the bridge which have the knowledge of because IMHO from the "Jitsi Point of View" the participants of a session an all other corresponding data is hold here. In the case of a JVB-cluster, there's an additional Inter-Bridge-Communication TCP channel. The "Octo" Feature seems to allow to move traffic between bridges. Written this, I want to point out that I havn't any practical experience with this so far at all. I'm also never used Ngix yet and wasn't aware that it offers UPD proxying and even load balancing. |
As a PoC, I would recommend to use just one JVB for the start, i.e. just let Ngix proxy it to one destination. |
Yes nginx can indeed load balance UDP that's why we have been using it for this particular service. By using just one JVB you mean route all UDP to just one node when in fact both nodes are used for tcp load balancing? |
No, this is what will not work for sure IMHO. For the fist sprint, expose one bridge using one external IP to the clients for TCP and UDP. Next, you may expose the 2nd on a different IP. Next, you may check if the inter-JVB-communication works as well as this will allow to fail-over to another bridge in case of an graceful(!) shutdown of one or in the case of JVB load balancing mechanism will try to move traffic between bridges. Note, that (to my knowledge) one actual session can't be split between bridges. |
I see. We can confirm that everything is working just fine when there is just one node so we don't really need to test this any further. This works fine with and without the intermediate load balancer. But. If you fire up the second node it does not work. Take the load balancer off the equation completely. It does not work. It does not work if you direct all traffic to one node it does not work if you split up clients to nodes the openfire nodes just do not do any load balancing. It is as if they are not in a cluster and also any kind of A/V does not work. |
|
BTW: To evaluate clustering is on my ToDo-list and our network layout seems comparable. Therefore to get it running this is a win-win for me. 😉 |
@deleolajide I hope this is going to help you approach the issue when that time comes.
Basically nothing worked. Keep in mind that I am talking about direct connections to the nodes here with no load balancer interfering. |
I did, but it is not working any more. I had to modify openfire.xml to bind both nodes to different IP addresses on the same PC and I also modified hazelcast-local-config.xml. It looks like my test was incorrectly done or I have since messed up my cluster configuration. Sorry for misleading you. I have to find a free weekend to spend some quality time on this and set up a proper multi-node cluster with Docker or multiple PCs. If possible, can you confirm that normal group-chat works ok on your cluster with Spark, Inverse or any other XMPP client using clients connected to both nodes. Please confirm that the MUC room can be created ok from any node. Thanks |
Indeed it has. Thank for the support to get this working. I think we now almost there with these latest changes I got it working with 6 users, 3 on each node 👍 You can confirm the distribution of users on the nodes The key difference is that I have given each JVB user unique names across the cluster. That is achieved by adding an extra Openfire XML Property octo_id.
I am going to test properly with two PCs over the weekend. If you can't wait for that, then try the latest snapshot at https://igniterealtime.org/projects/openfire/plugins/1.6.3-SNAPSHOT/pade.jar?snapshot=20220324.143502-29 |
With clustered JVBs, a graceful shutdown might be become meaningful. This is advised via the REST-API (must be enabled) and enter a state where it continue to host existing conferences, but not accepting new ones. |
@deleolajide Apologies for the very late response. |
Quick update: I have setup the clustering same as you Dele and there is no change. On the browser I am seeing 404 for the https://www.gravatar.com/avatar/ domain and 405 for the https://rv-xmpp-02.domain.com:7443/ws/?room=test room. I will also disable Websockets Data Chanel from the network settings and retest. |
This is caused by a periodical connection test of the Websocket connection initialed by the Jitsi WebClient. There should no neet for this, but found that this is meant to keep firewalls happy with the long-running websocket connection. We may implement a GET method to avoid this, because it looks to newbies like an error as we see. |
I am pretty sure I was not seeing a 405 in this before. I believe this was 200. Anyway I just cannot make this work it seems. |
Probably because I don't have any firewalls or any security restrictions in place Assuming the situation is still the same. It works when all clients connect to the same node and have the same region value in their config.js generated by ofmeet. If clients connect to both nodes, then only clients on the same node see and hear each other. This may be obvious, but are you absolutely sure you have opened TCP port 4096 between both nodes for Jitsi Otco communication as this is what enables the multiple JVBs to share the same meeting.
I assume this is because there is no internet access to your network, otherwise that is very strange.
Great idea 👍 💯 |
I shall doublecheck the firewall and indeed turn off the firewall on the servers for testing all together. Also note that the 404s of the gravatar.com are noticed on the clients who have internet access just fine. The browser gets instructed from the application to go to this domain and returns a 404. This is noticed when there is an issue with the application i.e. if all clients go to one node we do not see this message. I presume it is a Jitsi thing then... |
Just wanted to point out that JVB seems to be looking for UDP 4096 not TCP. Still testing this and it is still not working. I will post more once I have a conclusive idea of what the application is trying to do. Looking at server captures now. |
@slmc-tech You're right with UDP. You may check connectivity with According to the very long Jitsi Community discussion about bridging you should see something like
in the log. Here another article about OCTO configuration. |
Ok I think I have found the issues. I have managed to make this work now. I believe I will manage to make this work through the load balancer also now that I understand what the traffic flows look like a bit better. I will be updating with my comments shortly. I really think that this should be properly documented in a wiki by the way... @deleolajide let me know how i can help once i have verified a working production ready set up. |
Dele I am going through packet captures that's how I found the various misconfigurations. I will update as soon as possible. |
@deleolajide
|
I have deployed latest code and now https://pade.chat:5443/pade/keepalive/ works 👍
That is music to my ears. You are most welcome to help us improve documentation. I can give you permission to edit the wiki files and you can submit PRs on the main code. |
Oh, that's a completely different approach: I thought that Jitsi Webclient will call The committed code should work; I simulate it by adding |
Yes. I have no intention of making a PR on Openfire. It is much easier to use config.websocketKeepAliveUrl as meet.jit.si/config.js does or setting config.websocketKeepAlive to 0, |
It's much more easier, indeed. I agree, let's see for the effects of this in the wild at first. |
Ahh! 😄 |
May we offer to switch between |
This is very useful indeed. Ok guys so I have managed to make this work in our environment with load balancers and all but I must say that this is beyond complicated from a networking perspective. Keep in mind that I am working with enterprise grade equipment and the whole setup is quite complicated to start with. |
I use Google Docs for the existing Single Node Network Diagram as source. May I create a new one that we may Co-Edit? |
Sure yes we can do that Guido. |
I am a simple developer :-) Network configuration sends my head into a spin. Please feel to create as many WIKI pages as you like with as much information for other DevOps and Network administrators who will need this stuff 👍 💯 |
It is actually the other way round. Tell me what you need to simplify this and make it easy to use and I will do my best to get into the code 👍 |
Ok I will try to explain this in as much detail as I can with all protocols involved. Should I start a new wiki page then or work on an existing one? I would like to manage your expectations on this in terms of time. I want to do this as soon as possible before I forget how this is set up but I have a ton of things that I have put on hold to work on this so it will take a couple of days. Sounds fair? |
Indeed it does. Whatever you contribute will be fully appreciated, considering how busy you are and the sacrifices you have made to get this far and achieve the breakthrough. We can always go back to merge and edit the wiki pages. Capturing as much info while it it is still fresh in memory and we have access to screenshots and config data should definitely be the priority. |
I have started this wiki to discuss the proxy setup. We should also put the network diagrams in there i think. |
We are trying to make Pade work in a clustered environment where nodes sit behind a reverse proxy (nginx).
When we try to configure network parameters from the openfire.xml file we see that video bridges do not come up and neither does the focus service.
When no network settings are hardcoded the bridges and focus come up as expected.
Clients that connect to the nodes directly can create and join conferences just fine. However clients who go through the reverse proxy do not get any audio or video.
We are forwarding 443 from the proxy to 7443 on the openfire nodes and are also loadbalancing udp 10000. However we do not see any udp traffic leaving the clients so webrtc / websockets does not get initiated at all.
Our setup was working fine when there was just one node (no clustering enabled).
For screenshots of our network settings please refer to this thread: https://discourse.igniterealtime.org/t/pade-1-6-2-clustering/91483/4.
Any advice on how we could make this work would be greatly appreciated.
@gjaekel Guido hi @deleolajide Dele thought that maybe you can offer your insight on this.
Thanks,
The text was updated successfully, but these errors were encountered: