Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WEBRTC for communication between agents and browsers #14874

Merged
merged 50 commits into from
Apr 20, 2023

Conversation

ktsaou
Copy link
Member

@ktsaou ktsaou commented Apr 7, 2023

This is an experiment.

We explore the possibility of using WEBRTC (the technology that supports audio/video/chat conferencing calls), for improving the communication between agents and web browsers.

In general, WEBRTC should work like this:

  1. A web browser sends an offer to an agent, via a signalling server. The offer is actually a SDP message and a list of candidate endpoints to connect to.
  2. The agent receives this offer and responds with an answer, again via the signalling server. The answer is similar to the offer, a SDP message and a list of candidate endpoints to connect to.
  3. Then, the browser and the agent try to connect to each other, using any of the candidate ways exchanged with the above messages.
  4. If a direct connection is not possible, then a TURN server is needed to establish the connection. The endpoint of the TURN server(s) need to be in the offer and the answer exchanged.

In this PR, to create an easy playground for this technology, we bypassed the signalling server. So:

  • the web browser is making an HTTP POST request to /api/v2/rtc_offer at the agent sending its SDP and candidates. This is implemented (we had to improve the agent web server to support POST requests).
  • The agent responds with its own SDP and candidates in a json format.
  • The agent attempts to connect to the candidates given in the browser's request.
  • The web browser attempts to connect to the candidates given in the agent's response (that web page is still missing).

So, what we have done so far:

  • changed condfigure.ac to detect the availability of libdatachannel. When the system lacks this library, webrtc will be disabled at the agent.
  • upgraded configure.ac to use C++17. This was required by libdatachannel.
  • improved the agent web server to support POST requests.
  • created the API endpoint /api/v2/rtc_offer to grab the SDP offer from the browser and send back to the browsers the agent's answer.
  • created web/rtc directory with some basic code to setup WEBRTC communication.
  • added the code to establish a WEBRTC communication.

What remains to be done:

  • vendor libdatachannel. Their license is compatible with Netdata. Check this: License compatibility with GPL v3+ paullouisageneau/libdatachannel#833
  • make Netdata code independent of C++17, by using the C-API of libdatachannel
  • provide basic configuration for webrtc in netdata.conf.
  • use Netdata Cloud for exchanging the signaling messages:
    request with SDP offer: browser --(https)--> cloud --(mqtt)--> agent
    response with SDP answer: agent --(mqtt)--> cloud --(https)--> browser
  • remove /api/v2/rtc_offer endpoint from the agent.
  • webrtc connections should be fire-and-forget for the agent, they should self-cleanup.
  • webrtc error handling.
  • messages sent by web browser to web rtc data channels should be treated as URLs to run /api/v2 endpoints.
  • webrtc responses need to be compressed manually.
  • webrtc responses has size constraints. The exact per message size limits are exchanged with SDP messages. They seem to be 250KiB. If an agent response is bigger than that, the response needs to be split in multiple messages and assembled at the web browser.
  • build a TURN server able to query and merge agent requests
  • install many TURN servers across the globe, to lower the latency of agent responses

@github-actions github-actions bot added area/build Build system (autotools and cmake). area/web labels Apr 7, 2023
@Ferroin
Copy link
Member

Ferroin commented Apr 7, 2023

Regarding the C++17 requirement, this functionally mandates GCC 7 or Clang 5 as a minimum compiler version. The only platform we officially support that cannot meet this requirement is CentOS 7, so the CentOS 7 CI jobs on this PR should be expected to fail even once we have libdatachannel vendored.

CentOS 7 goes EOL upstream on 2024-06-30, and I would be more than happy to have an excuse to drop it early (dropping CentOS 7 support would let us do a lot of cleanup in the packaging code), but AIUI it’s still one of our more actively used platforms.

That said, there is no reason we can’t just choose to not support this for native builds on CentOS 7. It needs to be optional at both compile time and runtime anyway for a number of reasons, so having a specific exception for one supported platform is not that onerous.


A couple of other quick thoughts:

  • As mentioned above, this needs to be optional, partly because of the dependence on Netdata Cloud, and partly because it adds a lot of complexity (and therefore we need to be able to turn it off for debugging purposes, at least initially).
  • We should ideally provide a standalone signaling server option, ideally with integrated TURN support, so that users on isolated networks can benefit from this as well.
  • While we need to bundle libdatachannel as it’s not yet widely available in distro repositories, we really need to support using a system copy (for all the same reasons as needing to support system copies of almost everything else we vendor).

@ktsaou
Copy link
Member Author

ktsaou commented Apr 10, 2023

Hopefully libdatachannel has a C API. So, I rewrote the WEBRTC code in C, and now the dependency for C++17 is removed from Netdata. Still libdatachannel needs C++17, but if it can be compiled somehow on a target system, netdata will use it while happily working with C++11 for its own code.

@Ferroin
Copy link
Member

Ferroin commented Apr 10, 2023

Regarding the CI failures:

  • The clang-format warnings are valid (this PR includes the fixes we did to make the check work properly), but can be ignored.
  • The Alpine 3.14 and 3.15 failures are legitimate, and appear to be issues with compatibility with older versions of musl libc. We can drop support for these platforms without much issue though, so probably not much of a blocker here.

@ktsaou
Copy link
Member Author

ktsaou commented Apr 10, 2023

No build is failing now.

@ktsaou
Copy link
Member Author

ktsaou commented Apr 14, 2023

Guys, I would appreciate some testing on this PR.

I have tried to simplify and unify query parsing and handling for our http APIs. I also significantly reduced the memory required per http request, from 220KiB to 18KiB (16KiB of which is the compression buffer). The same interface (struct web_client) is now used by 3 consumers: web server, ACLK and WebRTC. So, I have done many changes to the whole logic to unify them and simplify them.

There is no point to test WebRTC at this point. I am interested mainly for possible issues at the web server and ACLK sides.

Once this review is done, we can proceed and merge this (WebRTC is disabled by default - it is enabled only when compiled with internal checks).

buffer_flush(w->url_path_decoded);
buffer_fast_strcat(w->url_path_decoded, "/", 1);
buffer_strcat(w->url_path_decoded, url);
return func(host, w, url);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part will overwrite url. Because of this the local agent dashboard cannot access children.

This results in 404 in access.log

eg

1591786 '[localhost]:49128' 'DATA' (sent/all = 126/173 bytes -27%, prep/sent/total = 0.09/0.03/0.12 ms) 404 '/host/rasp/api/v1/registry?action=hello'

webrtc_base.iceServersCount = i;
internal_error(true, "WEBRTC: there are %d default ice servers: '%s'", webrtc_base.iceServersCount, buffer_tostring(wb));

char *servers = config_get(CONFIG_SECTION_WEBRTC, "ice servers", buffer_tostring(wb));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be in cloud.conf or under aclk so it is clear it is cloud related?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is not aclk related. It is only webrtc related. The ICE servers are used for 2 purposes:

  1. STUN servers - users should be able to set their own if they want.
  2. TURN servers - users should be able to setup their own TURN servers if they want.

The default list should eventually be coming from the cloud though...

@ktsaou ktsaou merged commit c3d70ff into netdata:master Apr 20, 2023
@ktsaou ktsaou mentioned this pull request Jun 30, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants