High Availability of switchio itself? #60

robwilkes · 2018-03-05T01:25:57Z

Hi Guys,

Just discovered this project and really like the look of it, and the documentation is excellent.

I will more than likely move over to it, as the language/syntax looks really nice, however before/whilst I do I'm hoping you could answer a question for me.

Whilst switchio can be used to control a FreeSWITCH cluster, I'm wondering how can you make switchio itself highly available, or clustered?

I have a need to build a highly available solution in FreeSWITCH, with programmable call control, and what I have currently built is:

FreeSWITCH cluster (PostgreSQL in the core), with Keepalived, and can failover calls with 1-4 seconds of lost audio depending on the type of outage.
Outbound ESL connection in dialplan to my Python server running socketserver, native ESL (using swig), and can manage the calls as desired.

I am able to failover FreeSWITCH back and forth repeatedly with no issue, however the socket is broken, and so too is my ability to manage the call.

Do you have a solution, or idea, how this would work with switchio (or switchy)?
I presume I cannot have multiple switchio instances with the same config pointing to the same FreeSWITCH servers, as they with both want to manage the call simultaneously?

Another option might be to use VMware Fault Tolerance (limited to a single vCPU from memory, so won't scale well), any idea how switchio will behave if I 'sofia recover' the calls to another FreeSWITCH server?

I will eventually test this last one myself, and when I do report back, if I haven't heard anything back prior.

I know they're not easy questions and I appreciate you taking the time to read it.

goodboy · 2018-03-05T06:01:29Z

Hi @robwilkes thanks for considering the project :)

Couple notes:

docs are a bit out of date and don't go into the details of the new asyncio api
currently there is nothing in switchio that attempts to handle the problem of HA

For HA I've thought about trying to introduce the RAFT protocol but have never had a reason to toy with it. This would definitely be an interesting problem to solve though I have little experience with it. Maybe @moises-silva can comment.

Do you have a solution, or idea, how this would work with switchio (or switchy)?
I presume I cannot have multiple switchio instances with the same config pointing to the same FreeSWITCH servers, as they with both want to manage the call simultaneously?

You are correct currently this is the default but could be changed easily though testing would take a bit of work.

any idea how switchio will behave if I 'sofia recover' the calls to another FreeSWITCH server?

No, unfortunately, but we'd gladly accept a PR for a test.
Currently a cluster can be orchestrated using multiple docker containers as in the tests/CI.

moises-silva · 2018-03-10T21:28:49Z

Although RAFT is nice and it'd be interesting there's probably simpler (easier to achieve in the short term) solutions to get good enough HA.

For many years I always wanted to have FreeSWITCH ESL to be able to do outbound connection(s) when the module is loaded (and take care of re-connects). This is different from the existing outbound socket mode in that is meant to be a global control connection, not per session. This would mean FreeSWITCH would initiate the connection to a switchio server (or pool) on startup. You could then let FreeSWITCH connect via haproxy to your switchio pool. When FreeSWITCH recovers after a crash and loads mod_esl it would reconnect to an available switchio server (and haproxy takes care of deciding who is available, load-balance or whatever). This could be implemented with relative ease on mod_event_socket, or, if that becomes hard to push upstream for the version of FreeSWITCH you're using (e.g the maintainers of FreeSWITCH may not want to add it to v1.6/v1.8), it can be a separate switchio proxy component in python or something else that is always run side-by-side with your FreeSWITCH instance.

That certainly doesn't solve the problem of existing calls state, but it allows you to serve new calls immediately with switchio upon FS recovery.

Now, for step two, recovering state of ongoing calls. This is where raft could help, but, you could also follow FreeSWITCH's approach and save relevant state in a database. FreeSWITCH basically does this to delegate the call state to the database, and if you have a database cluster already for FreeSWITCH you can reuse the same setup for switchio. This means switchio would store state that needs to be preserved after a crash in a database using an odbc driver (e.g https://github.com/aio-libs/aioodbc).

Finally, you monitor/control the cluster using whatever else you're already using for FreeSWITCH cluster resources (eg. corosync/pacemaker)

Those are just some thoughts top off my head.

goodboy added enhancement help wanted question labels Nov 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High Availability of switchio itself? #60

High Availability of switchio itself? #60

robwilkes commented Mar 5, 2018

goodboy commented Mar 5, 2018

moises-silva commented Mar 10, 2018

High Availability of switchio itself? #60

High Availability of switchio itself? #60

Comments

robwilkes commented Mar 5, 2018

goodboy commented Mar 5, 2018

moises-silva commented Mar 10, 2018