Introduction

mpermar edited this page Jun 19, 2013 · 14 revisions

Introduction

Rayo is a message-oriented XML protocol for controlling phone calls, audio mixers and a variety of advanced media resources such as speech recognizers, speech synthesizers and audio recorders. These capabilities can be combined to create a wide variety of applications such as menu-based phone systems, in-game conferencing and anonymous dating services.

So why another protocol? While call and media control protocols are nothing new, existing options are either way too low level and difficult to use (e.g. SIP, MRCP, MGCP, etc.) or far too restrictive for highly interactive applications (e.g. VoiceXML, TwiML). Rayo bridges the gap by providing a unified call+media control protocol that’s both simple to use yet flexible enough to adapt to a wide array of network scenarios.

Key Features

  • Call Control: Incoming calls are "offered" to clients at which point they can be answered, rejected, redirected to another destination, etc. Every attempt is made to shield the Rayo client from the low level telephony protocol (e.g. SIP, Jingle, PSTN, etc.)
  • Audio File Playback: A compatible Rayo server will fetch a file from a specified URL and play the containing audio to the caller.
  • Speech Synthesis / TTS: In cases where dynamic data must be spoken, a Speech Synthesis engine may be used to play computer generated speech to the caller.
  • DTMF / Touch-tone Events: Rayo surfaces real-time event when the caller presses keys on their touch-tone keypad.
  • Speech Recognition: Enables the phone application to take spoken queues allowing for sophisticated voice-driven menus and directory services.
  • Call Recording: Can be used to capture the caller’s voice (e.g. Voicemail) or both sides of the call for auditing and compliance purposes.
  • Mixing: Typically referred to as an audio “conference”; calls can be joined together so that the participants can hear each other in real-time.

Clients and Servers

Rayo is a client/server protocol. The server interfaces with low level telephony components while the client controls calls by sending XML messages to the server. The communication between client and server is bidirectional with the client sending commands to the server and the server sending events to the client.

Server to Client

Server informing client of a new incoming call (Offer Event)

Server to Client

Client telling server to answer the call (Answer Command)

For a complete list of Rayo message types check out the chapter titled “Rayo Specification”.

As previously mentioned, Rayo is a message-oriented XML protocol. In other words, commands and events are communicated between the client and server using XML documents. This basic design allows Rayo messages to be delivered using virtually any wire protocol. That said, few transports offer such an advanced level of message routing, security and domain federation as the Extensible Messaging and Presence Protocol (XMPP).

The following chapter introduces basic XMPP concepts and describes best practices for tunneling Rayo over XMPP.

Rayo over XMPP

XMPP provides a secure channel for routing small XML messages between networks in close to real-time. The key points here are routing and federation.

Rayo over XMPP

This diagram shows a call (‘c24a49’) being controlled by an XMPP client logged in as hello@acme.com. Don’t worry about the details here, the concept is that hello@acme.com does not need to explicitly trust the rayo.org domain. Nor does it need to establish a direct connection to rayo.org. Instead, the client domain (i.e. acme.com) creates a trust boundary on behalf of the client. Messages flow quickly and efficiently between these trust boundaries via persistent TCP sockets.

For a great introduction to XMPP, I highly recommend Chapter 2 of “XMPP: The Definitive Guide” (http://oreilly.com/catalog/9780596521264). The author, Peter Saint-Andre, is the executive director of the XMPP Software Foundation and very passionate about XMPP and related technologies; and a fantastic writer to boot.

Presence and Load Balancing

Presence is the ability for one XMPP user to know the availability of another in real-time. Presence comes in handy for person to person communication. Think of a typical instant messaging network like Yahoo! or Skype. We rely on presence to know if our contacts are available; and if not, some indication of when they’ll return (“Offline”, “Busy”, “BRB”, “Away”, etc.). Without presence, IM would be very chaotic.

What if we could apply presence to inter-system communication? Using presence for non-human communication is not only possible but encouraged by the XMPP community. Even though XMPP was initially designed for person to person communication, there are a growing number of protocols that leverage presence for smart message routing.

Rayo uses presence information to automatically distribute calls between clients as they become online. As new clients come online, their presence information flows to the Rayo domain and they are automatically put into routing.

To demonstrate further, let’s use a fictitious cloud telephony service named Rayo.net. Presumably, Rayo.net has a web portal where developers can signup, buy phone numbers, and map those phone numbers to an application.

First, the developer creates a new application by providing a name and an area code.

Provisioning Step 1

Next, the developer binds an XMPP address with the application.

Provisioning Step 1

When selecting “Finish”, the Rayo.net domain will initiate a presence subscription (http://xmpp.org/rfcs/rfc3921.html#sub) with the specified JID; in this case partycalls@acme.com.

<presence from="rayo.net" to="partycalls@acme.com" type="subscribe" />

If partycalls@acme.com is online it will receive the subscription request and complete the subscription by replying with "subscribed".

<presence from="partycalls@acme.com/1" to="rayo.net" type="subscribed" />

At this point the presence subscription is complete and the Rayo.net domain will be notified whenever a client for the “Party Line” application comes online and will distribute incoming call offers evenly across all available resources.

Calls

The Rayo protocol primarily deals with calls. Inbound calls originate from the PSTN or via SIP and are offered to Rayo clients via XMPP using a Jabber Identifier (JID). Each call is in turn represented by it's own unique JID allowing a two way conversation between the Rayo client and the server that's handling the call signaling and media.

JID Format

The JID follows a specific format. In XMPP the JID is constructed as

  <node>@<domain>/<resource>

For Rayo, the <node> portion of the JID always represents the call ID. The <resource>, when present, represents the affected command ID.

Incoming Calls

  <!-- Message comes from the Call's JID -->
  <presence to='16577@app.rayo.net/1' from='9f00061@call.rayo.net/1'>
    <offer xmlns='urn:xmpp:rayo:1'
        to='tel:+18003211212'
        from='tel:+13058881212'>
      <!-- Signaling (e.g. SIP) Headers -->
      <header name='Via' value='192.168.0.1' />
      <header name='Contact' value='192.168.0.1' />
    </offer>
  </presence>

The Rayo client can now control the call by using one of the following commands.

  <!-- Accept (e.g. SIP 180/Ringing). Only applies to incoming calls. -->
  <iq type='set' to='9f00061@call.rayo.net/1' from='16577@app.rayo.net/1'>
    <accept xmlns='urn:xmpp:rayo:1'>
      <!-- Sample Headers (optional) -->
      <header name="x-skill" value="agent" />
      <header name="x-customer-id" value="8877" />
    </accept>
  </iq>

  <!-- Accept with early media (e.g. SIP 183). Only applies to incoming calls. -->
  <iq type='set' to='9f00061@call.rayo.net/1' from='16577@app.rayo.net/1'>
    <accept xmlns='urn:xmpp:rayo:1' earlyMedia='true'>
      <!-- Sample Headers (optional) -->
      <header name="x-skill" value="agent" />
      <header name="x-customer-id" value="8877" />
    </accept>
  </iq>
  
  <!-- Answer (e.g. SIP 200/OK). Only applies to incoming calls. -->
  <iq type='set' to='9f00061@call.rayo.net/1' from='16577@app.rayo.net/1'>
    <answer xmlns='urn:xmpp:rayo:1'>    
      <!-- Sample Headers (optional) -->
      <header name="x-skill" value="agent" />
      <header name="x-customer-id" value="8877" />
    </answer>
  </iq>

  <!-- Redirect (e.g. SIP 302/Redirect). Only applies to incoming calls. -->
  <iq type='set' to='9f00061@call.rayo.net/1' from='16577@app.rayo.net/1'>
    <redirect to='tel:+14152226789' xmlns='urn:xmpp:rayo:1'>    
      <!-- Sample Headers (optional) -->
      <header name="x-skill" value="agent" />
      <header name="x-customer-id" value="8877" />
    </redirect>
  </iq>

A call can also be rejected. Rejections can include an optional rejection reason. Rejection reasons are one of <busy/>, <decline/> or <error/>. If not specified, <decline/> is used as the default reason.

  <!-- Decline  (.g. SIP 603/Decline). Only applies to incoming calls. -->
  <iq type='set' to='9f00061@call.rayo.net/1' from='16577@app.rayo.net/1'>
    <reject xmlns='urn:xmpp:rayo:1'>
      <decline />
      <!-- Sample Headers (optional) -->
      <header name="x-reason-internal" value="bad-skill" />
    </reject>
  </iq>

  <!-- Busy  (.g. SIP 486/Busy). Only applies to incoming calls. -->
  <iq type='set' to='9f00061@call.rayo.net/1' from='16577@app.rayo.net/1'>
    <reject xmlns='urn:xmpp:rayo:1'>
      <busy />
      <!-- Sample Headers (optional) -->
      <header name="x-busy-detail" value="out of licenses" />
    </reject>
  </iq>

  <!-- Error  (.g. SIP 500/Internal Server Error). Only applies to incoming calls. -->
  <iq type='set' to='9f00061@call.rayo.net/1' from='16577@app.rayo.net/1'>
    <reject xmlns='urn:xmpp:rayo:1'>
      <error />
      <!-- Sample Headers (optional) -->
      <header name="x-error-detail" value="soem descriptive error message" />
    </reject>
  </iq>

Outbound Calls

Rayo clients can initiate outbound calls using the <dial /> command.

  <!-- Handled by the domain controller which picks a random Rayo Server -->
  <iq type='set' to='call.rayo.net' from='16577@app.rayo.net/1'>
     <dial to='tel:+13055195825' from='tel:+14152226789' xmlns='urn:xmpp:rayo:1'>
        <header name="x-skill" value="agent" />
        <header name="x-customer-id" value="8877" />
     </dial>
  </iq>
  
  <iq type='result' to='16577@app.rayo.net/1' from='call.rayo.net'>
     <!-- The Call's ID -->
     <ref id='9f00061' />
  </iq>

The client will then begin to receive progress events as the call makes it's way through the network.

  <!-- Far end has accepted the call and is ringing (e.g. 180/Ringing) -->
  <presence to='16577@app.rayo.net/1' from='9f00061@call.rayo.net/1'>
    <ringing xmlns='urn:xmpp:rayo:1' />
  </presence>
  
  <!-- The outgoing call has been answered (e.g. 200/OK) -->
  <presence to='16577@app.rayo.net/1' from='9f00061@call.rayo.net/1'>
    <answered xmlns='urn:xmpp:rayo:1' />
  </presence>

If for some reason the call is not accepted by the far end, the Rayo client will receive an <end/> event indicating the reason for the failure.

  <!-- Dial destination did not answer within the timeout period -->
  <presence to='16577@app.rayo.net/1' from='9f00061@call.rayo.net/1'>
    <end xmlns='urn:xmpp:rayo:1'>    
      <timeout />
    </end>
  </presence>
  
  <!-- Dial destination is busy and annot answer the call -->
  <presence to='16577@app.rayo.net/1' from='9f00061@call.rayo.net/1'>
    <end xmlns='urn:xmpp:rayo:1'>    
      <busy />
    </end>
  </presence>

  <!-- Dial destination rejected the call -->
  <presence to='16577@app.rayo.net/1' from='9f00061@call.rayo.net/1'>
    <end xmlns='urn:xmpp:rayo:1'>    
      <reject />
    </end>
  </presence>

  <!-- Rayo encountered a system error while dialing -->
  <presence to='16577@app.rayo.net/1' from='9f00061@call.rayo.net/1'>
    <end xmlns='urn:xmpp:rayo:1'>    
      <error>Lucy, you got some 'splainin to do</error>
    </end>
  </presence>

Note: A Rayo <end/> indicates that the call has been disconnected and that no more events are possible for this call. Therefore, the <end/> event is a perfect point for clients to clean up resources related to the controlling of the call.

Handling caller hangup

If the caller hangs up the call Rayo will produce an <end/> event with a <hangup/> reason like so:

<presence to='16577@app.rayo.net/1' from='9f00061@call.rayo.net/1'>
  <end xmlns='urn:xmpp:rayo:1'>    
    <hangup/>
  </end>
</presence>

Note: A Rayo <end/> indicates that the call has been disconnected and that no more events are possible for this call. Therefore, the <end/> event is a perfect point for clients to clean up resources related to controlling the call.

Forcing a call to end

Rayo client can force a call to end by sending a <hangup/> command to the call's JID.

<iq type='set' to='9f00061@call.rayo.net/1' from='16577@app.rayo.net/1'>
  <hangup xmlns='urn:xmpp:rayo:1'>    
    <!-- Sample Headers (optional) -->
    <header name="x-reason-internal" value="bad-skill" />
  </hangup>
</iq>

NOTE: The client will still receive an <end/> event indicating that that call has been disconnected and that no further events or commands are possible.

Components

Components extend the Rayo protocol by providing additional media and call control functionality.

Components are started by sending a specialized command to the Rayo server. This example shows the use of the <say xmlns='urn:xmpp:tropo:say:1'/> component. Don't worry about the specifics of the <say/> element for now. We'll discuss each component in detail in the folowing chapters. The key point here is that a component request is being sent to the call's JID.

NOTE: You can easily spot a component request because the namespace will be in the format urn:xmpp:rayo:NAMESPACE:COMPONENT_NAME:1

  <iq type='set' to='9f00061@call.rayo.net/1' from='16577@app.rayo.net/1'>
    <say xmlns='urn:xmpp:tropo:say:1' 
      voice='allison'>
      <audio url='http://acme.com/greeting.mp3'>
          Thanks for calling ACME company
      </audio>
      <audio url='http://acme.com/package-shipped.mp3'>
          Your package was shipped on
      </audio>
      <say-as interpret-as='date'>12/01/2011</say-as>
    </say>
  </iq>

The Rayo server will validate the component request and attach a new instance of the component to the call. In a happy day scenario the client will immediately receive an IQ result containing the newly created component's ID. The component's ID is combined with the call's JID to control the component (e.g. pause, resume, stop, etc.) and to corelate events coming from the component as well.

A component's JID is calculated by combining the call's JID with the newly created component's ID like so: <call-id>@<rayo-domain>/<component-id>

  <!-- Server responds a unique ID -->
  <iq type='result' to='16577@app.rayo.net/1' to='9f00061@call.rayo.net/1'>
     <ref id='fgh4590' xmlns='urn:xmpp:rayo:1' />
  </iq>

NOTE: Remember that Rayo executes components asynchronously and in many cases more than one component can run at the same time. For example, you can have the <record xmlns='' /> component running throught the entire call's lifetime while you interact with the user using the "say" and "ask" components resulting in the entire call being recorded.

Component Commands

Components are controlled by sending command messages to their unique JID. The only command required by all components is the <stop/> command.

  <iq type='set' to='9f00061@call.rayo.net/fgh4590' from='16577@app.rayo.net/1'>
    <stop xmlns='urn:xmpp:rayo:1' />
  </iq>

As you'll see in the following chapters, component developers can get very creative with the command they support allowing for some really interesting capabilities. For example, the ability to pause and resume audio playback as well as muting and unmuting the caller's microphone while in a conference.

Component Events

Events are specialized lifecycle messages that flow from a component instance to the Rayo client that's controlling the call. As you'll see in the following chapters, component events are very powerful and can provide great insight into a running application.

The only event required by all components is the <complete xmlns='urn:xmpp:rayo:ext:complete:1' />. This is an example complete event produced by the <say urn:xmpp:tropo:say:1/> component when audio playback has completed succesfully.

  <presence to='9f00061@call.rayo.net/fgh4590' from='16577@app.rayo.net/1'>
   <complete xmlns='urn:xmpp:rayo:ext:1'>
     <success xmlns='urn:xmpp:tropo:say:complete:1' />
   </complete>
  </presence>