No description, website, or topics provided.
Java XQuery
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
lib
src/net/dara/mlproject
xquery/voicememo
README.rst
build.xml
scheme.png

README.rst

VoiceMemo Application using MarkLogic Server

Author: Hari Krishna Dara
Date: 15-Dec-2010

Objective

Create a service that allows us to create memos by simply calling a phone number and dictating the memo. The service should transcribe the memo and store it with enough metadata to be useful to search and retrieve using a web interface. The service should be built on top of MarkLogic Server using Twilio as the voice gateway.

Implementation

There are two main parts to the system:

  1. Note taker webservice that has two components

    • Voice-memo webservice, that receives actions and callbacks from Twilio
    • The Twilio voice gateway frontend that is in turn driven by TwiML generated by Voice-memo webservice
  2. A MarkLogic application that allows us to browse, search and manage the stored memo's.

Note taker webservice

The webservice provides several REST API calls for Twilio's server to use. When a user calls a pre-assigned phone number, the call is answered by Twilio and with the help of TwiML generated by callbacks to our webservice, the user is guided through the recording process. Twilio calls our webservice first when the recording is available, and later when the transcription is ready. The webservice uses MarkLogic server to create and update memo as Twilio makes that information available to the service. Here is a visualization of the entire process:

http://img189.imageshack.us/img189/2971/scheme.png

1. User initiates call

User first calls a predetermined phone number. The call is answered by Twilio and triggers the rest of the process.

2. Twilio sends SID to Note Taker

After the process is initiated, one of the first things Twilio does is to send a request to the "Voice URL", which is set to a predetermined URL of the Note Taker. The URL looks like this: http://host:port/voicememo/startmemo.

3. Instruct user to start recording

The response back to Twilio is a TwiML document that guides the user through recording process. Here is how the XML looks like:

<Response>
    <Say>Hello. Please start recording your memo after the beep. Press # key when done.</Say>
    <Record transcribe="true" transcribeCallback="transcribedmemo"
        action="voicememo" maxLength="30" finishOnKey="#"/>
</Response>

4. Instruct user to speak

Twilio plays the message that is part of the Say element. When user presses the "#" key or exceeds the 30 second length, the recording ends.

5. Recording ends

When recording ends, Twilio sends a request to the http://host:port/voicememo/recordedmemo URL that includes the RecordingUrl and RecordingDuration parameters. The RecordingUrl can be used to download recorded audio.

6. Create memo

At this point, the Note Taker webservice has the following information to add a new memo:

  • Recoding of the memo
  • Duration of the recording
  • Timestamp (approximate)

The voice recording is retrieved from the RecordingUrl and included into the document as Base64 encoded text. While inserting the memo as a new XML document in MarkLogic, the following information is used:

  • The URI of the document is set to: /voicememo/<CallSid>.html
  • The Category of the document is set to: /voicememo/<From>

7. Say, Good bye! to user

After the new memo has been inserted into the system, a TwiML document is returned that terminates the call gracefully. Here is how the XML looks like:

<Response>
    <Say>Your memo has been recorded.</Say>
    <Say>Goodbye.</Say>
</Response>

8. Call ends

Twilio plays the message that the memo has been recorded and disconnects the call after saying goodbye.

9. Transcription is available

When Twilio completes the transcription of the voice message, an asynchronous request is made to the URL http://host:port/voicememo/transcribedmemo with the parameters TranscriptionStatus and TranscriptionText.

10. Update Memo with transcription information

If TranscriptionStatus is "Completed", the TranscriptionText is then added to the memo that is already created. The unique CallSid is used to identify the correct XML document that needs to be updated. The TranscriptionStatus is added to the document anyway, with an empty TranscriptionText as it is useful to know why a transcription is missing, if at all a message fails to transcribe.

MarkLogic Application

Here are some useful queries to lookup information in the voice memo database and help build an application:

Given a memo URI, retrieve its voice recording

fn:doc(fn:doc("/voicememo/<CallSid>.xml")//recordedVoiceDocURI/text())/node()

Retrieving the transcribed memos

Let us first work on a function that can output HTML formatted memo's:

declare function local:get_memos($xpathexpr as item()*) as item()
{
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head><title>List of Memos</title></head>
    <body>
    <table>
    <tr><th>Recorded At</th><th>Memo</th></tr>
    {
    for $memo in $xpathexpr
    return <tr><td>{fn:format-dateTime(xdmp:parse-dateTime("[Y0001]-[M01]-[D01]T[h01]:[m01]:[s01].[f1][Z]", $memo/*:recordedAt/text()),
                     "[Y01]/[M01]/[D01] [H01]:[m01]:[s01]:[f01]")}</td><td>{$memo/*:transcriptionText/text()}</td></tr>
    }
    </table>
    </body>
    </html>
}
;

We can now pass different XPath expressions to it. To return all memo's that have a transcription available, try this:

let $xpathexpr := //*:voicememo[*:transcriptionStatus = "completed"]
return local:get_memos($xpathexpr)

To find all memo's with specific word in them:

let $xpathexpr := //*:voicememo[contains(*:transcriptionText, "macy")]
return local:get_memos($xpathexpr)

VoiceMemo structure

The structure of an XML document representing a voice memo is as follows:

<voicememo>
    <recordedAt>2010-12-16T21:33:54.6173-05:00</recordedAt>
    <recordedVoiceDocURI>/voicememo/recording/CA21bc69b2af50e38b40d0bb93d43a8e04.xml</recordedVoiceDocURI>
    <recordedDuration>seconds</recordedDuration>
    <transcriptionStatus>status</transcriptionStatus>
    <transcriptionText>text</transcriptionText>
</voicememo>

Adding VoiceMemo's to MarkLogic Server

There are two distinct operations to perform while adding voice memo's.

  • First, the voice memo needs to be inserted with partial information using xdmp:document-insert() function. The document also includes a transcriptionStatus with a value of "unavailable", which will be replaced once the transcription is available.

  • Second, the same memo needs to be updated when more information (viz., transcription) is available. The call back from Twilio with this information typically happens after a few seconds to minutes of the completion of the call. This process involves reconstructing the document URI and retrieving the document to do the following:

    • replace the transcriptionStatus node with the value of "TranscriptionStatus" parameter from the request using xdmp:node-replace().
    • insert the transcriptionText node with the value of "TranscriptionText" parameter from the request using xdmp:node-insert-child().

Building REST services

The Note Taker is a REST based webservice that provides API's for Twilio to call into for actions and callbacks. There are several approaches to building such a service, and for the current purpose, we use MarkLogic's ability to act as an application server and build an application using the XQuery files. However the application is NOT a web application, so it is not going to serve HTML pages, but rather XML, more particularly TwiML.

There are 3 URI's that are exposed by the WebService and they are:

  • /voicememo/startmemo.xqy - Gets the conversation started. This is also the "Voice URL" for Twilio.
  • /voicememo/recordedmemo.xqy - Creates the memo with voice recording information.
  • /voicememo/transcribedmemo.xqy - Updates the memo with transcribed message.

Each of these resources are served by distinct XQuery files.

Code snippets

Say Hello with Twilio

Here is a simple XQuery file that serves TwiML. Save this as an .xqy file and set the URL as "Voice URL" for Twilio. Make sure that the security is disabled [2] such that Twilio can access the URL without requiring any credentials:

xquery version "1.0-ml";

let $callerCity := xdmp:get-request-field("CallerCity", "Unknown City")

return <Response><Say>Hello caller, from {$callerCity}. We wish you a Merry Christmas. Goodbye.</Say></Response>

Retrieve binary data from URL and insert as document

This code snippet shows how to retrieve a URL containing binary data (such as the Twilio voice recording) and insert it as a binary document. This code can be executed as it is in CQ[1]_:

xquery version "1.0-ml";
declare namespace foo = "xdmp:http";

(: A magnificent ant macro picture by gbohne from: http://www.flickr.com/photos/gbohne/5052878709/ :)
let $response := xdmp:http-get("http://farm5.static.flickr.com/4152/5052878709_44b4bc6430_o_d.jpg")
return xdmp:document-insert('/image/image1.xml', $response[2]/node(), (), '/image')

Retrieve a binary document from MarkLogic server

This code snippet shows how to retrieve the above document back. This code can be executed as it is in CQ [1]. When executed, the browser would prompt you to save the file, name the file appropriately (say, image.jpg) and verify it by opening the file.:

xquery version "1.0-ml";

doc('/image/image1.xml')/node()

Assumptions

  • A basic assumption that simplifies the logic a bit is that, when recording ends, Twilio's action always takes place ahead of callback for transcription. In practice this might be the actual documented behavior, but even otherwise, probably safe enough to assume.

Setting up runtime

The following instructions show how to setup a runtime environment for the MarkLogic server and deploy the XQuery files.

Setup an Amazon EC2 instance

Follow most of the guide, Installing MarkLogic on an EC2 Micro Instance [3]. Use the AMI created by blog author Mike Brevoort instead of the original from RightScale, to short-circuit the steps needed to setup MarkLogic server. Here is a summary of the steps:

  • If you don't already have an Amazon Webservices (AWS) a/c, first visit http://aws.amazon.com and setup a free a/c.

  • Sign up for AWS free tier at http://aws.amazon.com/free/

    • You would need a valid US phone number in order to sign up. The sign up process requires you to verify your identity by receiving a call back from AWS and entering a PIN number generated by their signup page.
    • The subscription activation happens with in a few seconds to several hours.
  • Login to Amazon AWS console at https://console.aws.amazon.com and click on EC2 tab.

    • Click on Security Groups and add a new group called MarkLogic.

    • Make sure MarkLogic group is selected

      • under allowed connections, select HTTP/TCP and enter From Port of 8000 to To Port of 8020 (MarkLogic by default runs on 8000 and 8001 ports, and allow a few more ports to setup additional MarkLogic servers). Leave the default of Source for anyone to connect to. Click Save.
      • Also add SSH to be able to SSH into the box.
  • Click on Key Pairs and create a new key pair. Give a sensible name and click Create. Save the private key file that will start downloading after a moment.

  • Go back to EC2 Dashboard and click on Launch Instance to select an Amazon Machine Image (AMI).

    • Click Community AMIs and search for ami-4682752f and click Select. This is a 32-bit CentOS image.
    • Choose Micro for Instance Type
    • Click Continue and click Continue again with the defaults.
    • Enter a sensible name for the instance and click Continue.
    • Choose the Key Pair that was created earlier and click "Continue".
    • Choose the security group that was created before and click "Continue"
    • Click Launch on the review page. Click Close to close the wizard.
  • Go to "Instances" page

    • Click on the instance that we just created and ntoe the public DNS value for the instance.
  • SSH into the instance.

    • Switch to a terminal as root user and use ssh with the -i option to specify the private key that we downloaded before using the command ssh -i <pem file> root@<instance public DNS name>.

Setup MarkLogic Server

Following the process described in the section Setup an Amazon EC2 instance, the MarkLogic server is already installed and started up as a daemon. You can verify this by:

  • visiting http://<ec2 public DNS name>:8001 from a browser, you should see the MarkLogic License Key Entry page.
  • by using command /etc/rc.d/init.d/MarkLogic lsof and see all the list of open files and sockets for the MarkLogic server.

The MarkLogic Server needs to be activated before it can be used. There is an option to request a license when you first connect to the Administration server. Follow these steps:

  • Click on Free under Get a license key and enter your details to get a free community license.
  • Click OK for the license and Accept license terms.
  • You get a prompt that says This server must now self-install the initial databases and application servers. Click OK to continue., click OK.
  • Enter details for security setup, give a name for admin user and select a realm.
  • You will now be prompted for the admin user.

Create a new Database

Login to the Administration interface at port 8001 and perform the following steps:

Create a Forest

  • Click on the Forests node under Configure on the left-side of the page.
  • Select the Create tab on the right-side of the page.
  • Enter a name for the database, such as MLProject and click OK.

Create a Database

  • Click on the Databases node under Configure on the left-side of the page.
  • Select the Create tab on the right-side of the page.
  • Enter a name for the database, such as MLProject and click OK.
  • You will see a message that says This database has no forests, select Database->Forests to attach a forest, click on the link Database->Forests link.
  • Check the forest that you created earlier and click OK.

Checkout the code from github

  • Use this command to checkout a readonly copy of the code: git clone git://github.com/haridsv/MLProject.git
  • A new directory named MLProject should be created, and the xquery subdirectory has the relevant code.

Create an HTTP Server

  • Click on the Groups node under Configure on the left-side of the page.
  • Click on the Default node and then on the App Servers node.
  • Select the Create HTTP tab on the right-side of the page.
  • Enter a name for the server
  • Enter the absolute path to the xquery directory that was checked out of github.
  • Enter 8020 for the port.
  • Select the MLProject from the database dropdown.
  • Select application-level from the authentication dropdown.
  • Select the admin user from the default user dropdown.
  • Click OK.

Verify Setup

To ensure that the xquery files are accessible, visit the http://<ec2 public DNS>:8020/voicememo/startmemo.xqy URL from browser running on your local computer, and expect to see the below response (your browser might try to automatically render the XML as HTML, so view the page source if in doubt):

<Response>
    <Say>Hello. Please start recording your memo after the beep. Press # key when done.</Say>
    <Record transcribe="true" transcribeCallback="transcribedmemo"
        action="voicememo" maxLength="30" finishOnKey="#"/>
</Response>

Install CQ

Download CQ [1] and expand the archive under the MLProject/xquery directory.

Setup Twilio

  • Visit http://www.twilio.com and click on Get Started.

  • Enter your first and last name, email address and a password to create a developer a/c.

  • Log into Twilio developer a/c with the email address and password, and under Sandbox:

    • Note the phone number and PIN, you would need to use them to test the service.
    • Enter the http://<ec2 public DNS>:8020/voicememo/startmemo.xqy address into the Voice URL and click Save.

Test the service

  • Call the phone number noted above and when prompted, enter the PIN.
  • Enter a voice message. Speak slowly and clearly and in English. Twilio's voice transcription quality is not very good, so be prepared to see many mistakes.
  • Verify the memo in the database, by using the queries under MarkLogic Application. Keep in mind that Twilio takes a few seconds to minutes to make the transcription available, so the memo might not have that information initially.

References

[1](1, 2) MarkLogic CQ is a web-based XQuery tool, available from http://developer.marklogic.com/code/cq
[2]To disable security, see the information posted here: http://markmail.org/thread/6ntgnwrjlrusq2ot
[3]Installing MarkLogic on an EC2 Micro Instance: http://blogs.avalonconsult.com/blog/generic/installing-marklogic-on-an-ec2-micro-instance-free-for-1-year/