An SDK for iOS mobile applications enabling use of the Bluemix Watson Speech To Text and Text To Speech APIs from Watson Developer Cloud
The SDK include support for recording and streaming audio and receiving a transcript of the audio in response.
Using the framework
- Download the watsonsdk.framework.zip and unzip it somewhere convenient
- Once unzipped drag the watsonsdk.framework folder into your xcode project view under the Frameworks folder.
Some additional iOS standard frameworks must be added.
-
Select your project in the Xcode file explorer and open the "Build Phases" tab. Expand the "Link Binary With Libraries" section and click the + icon
-
Add the following frameworks
- AudioToolbox.framework
- AVFoundation.framework
- CFNetwork.framework
- CoreAudio.framework
- Foundation.framework
- libicucore.tbd (or libicucore.dylib on older versions)
- Quartzcore.framework
- Security.framework
in Objective-C
#import <watsonsdk/SpeechToText.h>
#import <watsonsdk/STTConfiguration.h>
#import <watsonsdk/TextToSpeech.h>
#import <watsonsdk/TTSConfiguration.h>
in Swift
Add the headers above for Objective-c into a bridging header file.
This repository contains a sample application demonstrating the SDK functionality.
To run the application clone this repository and then navigate in Finder to folder containing the SDK files.
Double click on the watsonsdk.xcodeproj to launch xcode.
To run the sample application, change the compile target to 'watsonsdktest' and run on the iPhone simulator.
Note that this is sample code and no security review has been performed on the code.
By default the Configuration will use the IBM Bluemix service API endpoint, custom endpoints can be set using setApiURL
in most cases this is not required.
in Objective-C
STTConfiguration *conf = [[STTConfiguration alloc] init];
in Swift
let conf:STTConfiguration = STTConfiguration()
There are currently two authentication options.
Basic Authentication, using the credentials provided by the Bluemix Service instance.
in Objective-C
[conf setBasicAuthUsername:@"<userid>"];
[conf setBasicAuthPassword:@"<password>"];
in Swift
conf.basicAuthUsername = "<userid>"
conf.basicAuthPassword = "<password>"
Token authentication, if a token authentication provider is running at https://my-token-factory/token
[conf setTokenGenerator:^(void (^tokenHandler)(NSString *token)){
NSURL *url = [[NSURL alloc] initWithString:@"https://my-token-factory/token"];
NSMutableURLRequest *request = [[NSMutableURLRequest alloc] init];
[request setHTTPMethod:@"GET"];
[request setURL:url];
NSError *error = [[NSError alloc] init];
NSHTTPURLResponse *responseCode = nil;
NSData *oResponseData = [NSURLConnection sendSynchronousRequest:request returningResponse:&responseCode error:&error];
if ([responseCode statusCode] != 200) {
NSLog(@"Error getting %@, HTTP status code %i", url, [responseCode statusCode]);
return;
}
tokenHandler([[NSString alloc] initWithData:oResponseData encoding:NSUTF8StringEncoding]);
} ];
in Objective-C
@property SpeechToText;
...
self.stt = [SpeechToText initWithConfig:conf];
in Swift
var stt:SpeechToText = SpeechToText();
...
self.stt = SpeechToText.init(config: conf)
in Objective-C
[stt listModels:^(NSDictionary* jsonDict, NSError* err){
if(err == nil)
... read values from NSDictionary ...
}];
in Swift
stt!.listModels({
(jsonDict, err) in
if err == nil {
println(jsonDict)
}
})
Available speech recognition models can be obtained using the listModel function.
[stt listModel:^(NSDictionary* jsonDict, NSError* err){
if(err == nil)
... read values from NSDictionary ...
} withName:@"WatsonModel"];
The speech recognition model can be changed in the configuration.
[conf setModelName:@"ja-JP_BroadbandModel"];
By default audio sent to the server is uncompressed PCM encoded data, compressed audio using the Opus codec can be enabled.
[conf setAudioCodec:WATSONSDK_AUDIO_CODEC_TYPE_OPUS];
[stt recognize:^(NSDictionary* res, NSError* err){
if(err == nil)
result.text = [stt getTranscript:res];
else
result.text = [err localizedDescription];
}];
The app must indicate to the SDK when transcription should be ended.
NSError* error= [stt endRecognize];
if(error != nil)
NSLog(@"error is %@",error.localizedDescription);
The Speech to Text service end of sentence detection can be used to detect that the user has stopped speaking this is indicated in the transcription result, we can use this to automatically end the recognize operation. The following code can be used in the app to do this.
in Objective-C
// start recognize
[stt recognize:^(NSDictionary* res, NSError* err){
if(err == nil) {
if([self.stt isFinalTranscript:res]) {
NSLog(@"this is the final transcript");
[stt endRecognize];
}
result.text = [stt getTranscript:res];
} else {
result.text = [err localizedDescription];
}
}];
in Swift
self.stt.recognize({ (res: [NSObject:AnyObject]!, err: NSError!) -> Void in
if err == nil {
if self.stt.isFinalTranscript(res) {
NSLog("this is the final transcript");
self.stt.endRecognize()
}
result.text = self.stt.getTranscript(res);
} else {
result.text = err.localizedDescription;
}
});
A confidence score is available for any final transcripts (whole sentences). This can be obtained by passing the resulting Dictionary object.
[stt getConfidenceScore:res]
[stt getPowerLevel:^(float power){
// user the power level to make a simple UIView graphic indicator
CGRect frm = self.soundbar.frame;
frm.size.width = 3*(70 + power);
self.soundbar.frame = frm;
self.soundbar.center = CGPointMake(self.view.frame.size.width / 2, self.soundbar.center.y);
}];
By default the Configuration will use the IBM Bluemix service API endpoint, custom endpoints can be set using setApiURL
in most cases this is not required.
TTSConfiguration *conf = [[TTSConfiguration alloc] init];
[conf setBasicAuthUsername:@"<userid>"];
[conf setBasicAuthPassword:@"<password>"];
You can change the voice model used for TTS by setting it in the configuration.
in Objective-C
[conf setVoiceName:@"en-US_MichaelVoice"];
in Swift
conf.voiceName = "en-US_MichaelVoice"
If you use tokens (from your own server) to get access to the service, provide a token generator to the Configuration. userid
and password
will not be used if a token generator is provided.
in Objective-C
[conf setTokenGenerator:^(void (^tokenHandler)(NSString *token)){
// get a token from your server in secure way
NSString *token = ...
// provide the token to the tokenHandler
tokenHandler(token);
}];
self.tts = [TextToSpeech initWithConfig:conf];
in Objective-C
[tts listVoices:^(NSDictionary* jsonDict, NSError* err){
if(err == nil)
... read values from NSDictionary ...
}];
in Swift
tts!.listVoices({
(jsonDict, err) in
if err == nil {
println(jsonDict)
}
})
in Objective-C
[self.tts synthesize:^(NSData *data, NSError *err) {
// play audio and log when playing has finished
[self.tts playAudio:^(NSError *err) {
if(!err)
NSLog(@"audio finished playing");
else
NSLog(@"error playing audio %@",err.localizedDescription);
} withData:data];
} theText:@"Hello World"];
in Swift
tts!.synthesize({
(data, err) in
tts!.playAudio({
(err) in
... do something after the audio has played ...
}, withData: data)
}, theText: "Hello World")
If you get an error such as...
Undefined symbols for architecture x86_64
Check that all the required frameworks have been added to your project.
Find more open source projects on the IBM Github Page.
Copyright 2015 IBM Corporation under the Apache 2.0 license.