Watson Speech iOS SDK

An SDK for iOS mobile applications enabling use of the Bluemix Watson Speech To Text and Text To Speech APIs from Watson Developer Cloud

The SDK include support for recording and streaming audio and receiving a transcript of the audio in response.

Installation

Using the framework

Download the watsonsdk.framework.zip and unzip it somewhere convenient
Once unzipped drag the watsonsdk.framework folder into your xcode project view under the Frameworks folder.

Some additional iOS standard frameworks must be added.

Select your project in the Xcode file explorer and open the "Build Phases" tab. Expand the "Link Binary With Libraries" section and click the + icon
Add the following frameworks
- AudioToolbox.framework
- AVFoundation.framework
- CFNetwork.framework
- CoreAudio.framework
- Foundation.framework
- libicucore.tbd (or libicucore.dylib on older versions)
- Quartzcore.framework
- Security.framework

Include headers

in Objective-C

	#import <watsonsdk/SpeechToText.h>
	#import <watsonsdk/STTConfiguration.h>
	#import <watsonsdk/TextToSpeech.h>
	#import <watsonsdk/TTSConfiguration.h>

in Swift

Add the headers above for Objective-c into a bridging header file.

#Sample Application

This repository contains a sample application demonstrating the SDK functionality.

To run the application clone this repository and then navigate in Finder to folder containing the SDK files.

Double click on the watsonsdk.xcodeproj to launch xcode.

To run the sample application, change the compile target to 'watsonsdktest' and run on the iPhone simulator.

Note that this is sample code and no security review has been performed on the code.

#Speech To Text

Create a STT Configuration

By default the Configuration will use the IBM Bluemix service API endpoint, custom endpoints can be set using setApiURL in most cases this is not required.

in Objective-C

	STTConfiguration *conf = [[STTConfiguration alloc] init];

in Swift

	let conf:STTConfiguration = STTConfiguration()

Authentication

There are currently two authentication options.

Basic Authentication, using the credentials provided by the Bluemix Service instance.

in Objective-C

    [conf setBasicAuthUsername:@"<userid>"];
    [conf setBasicAuthPassword:@"<password>"];

in Swift

	conf.basicAuthUsername = "<userid>"
	conf.basicAuthPassword = "<password>"

Token authentication, if a token authentication provider is running at https://my-token-factory/token

	[conf setTokenGenerator:^(void (^tokenHandler)(NSString *token)){
        NSURL *url = [[NSURL alloc] initWithString:@"https://my-token-factory/token"];
        NSMutableURLRequest *request = [[NSMutableURLRequest alloc] init];
        [request setHTTPMethod:@"GET"];
        [request setURL:url];
        
        NSError *error = [[NSError alloc] init];
        NSHTTPURLResponse *responseCode = nil;
        NSData *oResponseData = [NSURLConnection sendSynchronousRequest:request returningResponse:&responseCode error:&error];
        if ([responseCode statusCode] != 200) {
            NSLog(@"Error getting %@, HTTP status code %i", url, [responseCode statusCode]);
            return;
        }
        tokenHandler([[NSString alloc] initWithData:oResponseData encoding:NSUTF8StringEncoding]);
    } ];

Create a SpeechToText instance

in Objective-C

	@property SpeechToText;
	
	...
	
	self.stt = [SpeechToText initWithConfig:conf];

in Swift

	var stt:SpeechToText = SpeechToText();
	
	
	...
	
	self.stt = SpeechToText.init(config: conf)

Get a list of models supported by the service

in Objective-C

	[stt listModels:^(NSDictionary* jsonDict, NSError* err){
        
        if(err == nil)
            ... read values from NSDictionary ...

    }];

in Swift

stt!.listModels({
    (jsonDict, err) in
    
    if err == nil {
    	println(jsonDict)
    }
})

Get details of a particular model

Available speech recognition models can be obtained using the listModel function.

	[stt listModel:^(NSDictionary* jsonDict, NSError* err){
        
        if(err == nil)
            ... read values from NSDictionary ...
    
    } withName:@"WatsonModel"];

Use a named model

The speech recognition model can be changed in the configuration.

	[conf setModelName:@"ja-JP_BroadbandModel"];

Enabling audio compression

By default audio sent to the server is uncompressed PCM encoded data, compressed audio using the Opus codec can be enabled.

	[conf setAudioCodec:WATSONSDK_AUDIO_CODEC_TYPE_OPUS];

Start audio transcription

	[stt recognize:^(NSDictionary* res, NSError* err){
        
        if(err == nil)
            result.text = [stt getTranscript:res];
        else
            result.text = [err localizedDescription];
    }];

End audio transcription

The app must indicate to the SDK when transcription should be ended.

	NSError* error= [stt endRecognize];
    if(error != nil)
        NSLog(@"error is %@",error.localizedDescription);

The Speech to Text service end of sentence detection can be used to detect that the user has stopped speaking this is indicated in the transcription result, we can use this to automatically end the recognize operation. The following code can be used in the app to do this.

in Objective-C

	 // start recognize
    [stt recognize:^(NSDictionary* res, NSError* err){
        
        if(err == nil) {
            
            
            if([self.stt isFinalTranscript:res]) {
                
                NSLog(@"this is the final transcript");
                [stt endRecognize];
            }
            
            result.text = [stt getTranscript:res];
        } else {
            result.text = [err localizedDescription];
        }
    }];

in Swift

self.stt.recognize({ (res: [NSObject:AnyObject]!, err: NSError!) -> Void in
    
	if err == nil {
	
	  if self.stt.isFinalTranscript(res) {
	  
	    NSLog("this is the final transcript");
	    self.stt.endRecognize()
	  }
	  
	  result.text = self.stt.getTranscript(res);
	} else {
	  result.text = err.localizedDescription;
	}
    });

Obtain a confidence score

A confidence score is available for any final transcripts (whole sentences). This can be obtained by passing the resulting Dictionary object.

    [stt getConfidenceScore:res]

Receive speech power levels during the recognize

[stt getPowerLevel:^(float power){
        
		// user the power level to make a simple UIView graphic indicator 
        CGRect frm = self.soundbar.frame;
        frm.size.width = 3*(70 + power);
        self.soundbar.frame = frm;
        self.soundbar.center = CGPointMake(self.view.frame.size.width / 2, 	self.soundbar.center.y);
        
    }];

Text To Speech

Create a Configuration

By default the Configuration will use the IBM Bluemix service API endpoint, custom endpoints can be set using setApiURL in most cases this is not required.

	TTSConfiguration *conf = [[TTSConfiguration alloc] init];
    [conf setBasicAuthUsername:@"<userid>"];
    [conf setBasicAuthPassword:@"<password>"];

Set the voice

You can change the voice model used for TTS by setting it in the configuration.

in Objective-C

    [conf setVoiceName:@"en-US_MichaelVoice"];

in Swift

	conf.voiceName = "en-US_MichaelVoice"

Use Token Authentication

If you use tokens (from your own server) to get access to the service, provide a token generator to the Configuration. userid and password will not be used if a token generator is provided.

in Objective-C

   [conf setTokenGenerator:^(void (^tokenHandler)(NSString *token)){
        // get a token from your server in secure way
        NSString *token = ...

        // provide the token to the tokenHandler
        tokenHandler(token);
    }];

Create a TextToSpeech instance

	self.tts = [TextToSpeech initWithConfig:conf];

Get a list of voices supported by the service

in Objective-C

	[tts listVoices:^(NSDictionary* jsonDict, NSError* err){
        
        if(err == nil)
            ... read values from NSDictionary ...

    }];

in Swift

	tts!.listVoices({
            (jsonDict, err) in
            
            if err == nil {
                println(jsonDict)
            }
        })

Generate and play audio

in Objective-C

	[self.tts synthesize:^(NSData *data, NSError *err) {
    
        // play audio and log when playing has finished
        [self.tts playAudio:^(NSError *err) {
            
            if(!err)
                NSLog(@"audio finished playing");
            else
                NSLog(@"error playing audio %@",err.localizedDescription);
            
        } withData:data];
        
    } theText:@"Hello World"];

in Swift

	tts!.synthesize({
		(data, err) in
            
		tts!.playAudio({
			(err) in
            
				... do something after the audio has played ...
		
		}, withData: data)
            
	}, theText: "Hello World")

Common issues

If you get an error such as...

Undefined symbols for architecture x86_64

Check that all the required frameworks have been added to your project.

Open Source @ IBM

Find more open source projects on the IBM Github Page.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
watsonResources		watsonResources
watsonsdk.xcodeproj		watsonsdk.xcodeproj
watsonsdk		watsonsdk
watsonsdktest		watsonsdktest
CONTRIBUTIONS.txt		CONTRIBUTIONS.txt
License.txt		License.txt
README.md		README.md
watsonsdk.framework.zip		watsonsdk.framework.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Watson Speech iOS SDK

Table of Contents

Installation

Include headers

#Sample Application

#Speech To Text

Create a STT Configuration

Authentication

Create a SpeechToText instance

Get a list of models supported by the service

Get details of a particular model

Use a named model

Enabling audio compression

Start audio transcription

End audio transcription

Obtain a confidence score

Receive speech power levels during the recognize

Text To Speech

Create a Configuration

Set the voice

Use Token Authentication

Create a TextToSpeech instance

Get a list of voices supported by the service

Generate and play audio

Common issues

Open Source @ IBM

Copyright and license

About

Releases

Packages

Languages

License

rd-mobile-alex/speech-ios-sdk

Folders and files

Latest commit

History

Repository files navigation

Watson Speech iOS SDK

Table of Contents

Installation

Include headers

#Sample Application

#Speech To Text

Create a STT Configuration

Authentication

Create a SpeechToText instance

Get a list of models supported by the service

Get details of a particular model

Use a named model

Enabling audio compression

Start audio transcription

End audio transcription

Obtain a confidence score

Receive speech power levels during the recognize

Text To Speech

Create a Configuration

Set the voice

Use Token Authentication

Create a TextToSpeech instance

Get a list of voices supported by the service

Generate and play audio

Common issues

Open Source @ IBM

Copyright and license

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages