Skip to content

API Wrapper for the Microsoft Azure Speech Services Speech-to-text REST API 3.1 (Cognitive Services).

License

Notifications You must be signed in to change notification settings

PerfectMemory/azure_stt

Repository files navigation

azure_stt

Gem Version CI Coverage Status Maintainability

API Wrapper for the Microsoft Azure Speech Services Speech-to-text REST API 3.1 (Cognitive Services).

Installation

Add this line to your application's Gemfile:

gem 'azure_stt'

And then execute:

bundle

Or install it yourself as:

gem install azure_stt

Azure Speech-to-text Subscription key

To be able to use the gem, you must have a subscription key. You can generate one on your Azure account.

  • If you don't have an Azure account, you can create one for free on this page.
  • Once logged on your Azure portal, subscribe to Speech in Microsoft Cognitive Services.
  • You will find two subscription keys available in 'RESOURCE MANAGEMENT > Keys' ('KEY 1' and 'KEY 2').

Usage

Configuration

Two environment variables are used:

  • 'REGION': the region of your subscription

  • 'SUBSCRIPTION_KEY': the API key you can generate on your Azure account.

You can look at the file env.sample and change the values. If you do not want to use environment variables, you can configure the values like so:

AzureSTT.configure do |config|
  config.region = 'your_region'
  config.subscription_key = 'your_key'
end

Finally, the class AzureSTT::Session uses by the default the values from the configuration, but you can initialize the session with custom values:

session = AzureSTT::Session.new(region: 'your_region', subscription_key: 'your_key')

Start a transcription

require 'azure_stt'

properties = {
  "diarizationEnabled" => false,
  "wordLevelTimestampsEnabled" => false,
  "punctuationMode" => "DictatedAndAutomatic",
  "profanityFilterMode" => "Masked"
}

content_urls = [ 'https://path.com/audio.ogg', 'https://path.com/audio1.ogg']

session = AzureSTT::Session.new

transcription = session.create_transcription(
  content_urls: content_urls,
  properties: properties,
  locale: 'en-US',
  display_name: 'The name of the transcription')

# You can the retrieve the results of your transcription with the id
puts transcription.id
# Outputs 'your_transcription_id'

Get a transcription

require 'azure_stt'

session = AzureSTT::Session.new

transcription = session.get_transcription('your_transcription_id')

# Returns
# #<AzureSTT::Transcription id="d35a802d-70ae-4358-a35d-b5faa0c75457"
# # model="" properties={"diarizationEnabled"=>false,
# # "wordLevelTimestampsEnabled"=>false, "channels"=>[0, 1],
# # "punctuationMode"=>"DictatedAndAutomatic", "profanityFilterMode"=>"Masked",
# # "duration"=>"PT5M18S"}
# # links={"files"=>"https://uscentral.api.cognitive.microsoft.com/speechtotext/v3.1/transcriptions/d35a802d-70ae-4358-a35d-b5faa0c75457/files"}
# # last_action_date_time=#<Date: 2020-05-31 ((2459366j,0s,0n),+0s,2299161j)> created_date_time=#<Date: 2020-05-31 ((2459366j,0s,0n),+0s,2299161j)>
# # status="Succeeded" locale="en-US" display_name="Transcription name" files=[]>

if transcription.succeeded?
  # You can then access to the text, for instance :
  result = transcription.results.first
  puts result.text
end

Delete a transcription

require 'azure_stt'

session = AzureSTT::Session.new

transcription = session.delete_transcription('your_transcription_id')

The API doesn't seem to send 404 errors when the id is unknown, but always send a 204 response. So the Session#delete_transcription returns true even when the transcription didn't exist.

Starting a transcription, fetching the results and deleting the transcription

require 'azure_stt'

session = AzureSTT::Session.new

properties = {
  "diarizationEnabled" => false,
  "wordLevelTimestampsEnabled" => false,
  "punctuationMode" => "DictatedAndAutomatic",
  "profanityFilterMode" => "Masked"
}

content_urls = [ 'https://path.com/audio.ogg' ]

session = AzureSTT::Session.new

transcription = session.create_transcription(
  content_urls: content_urls,
  properties: properties,
  locale: 'en-US',
  display_name: 'The name of the transcription')

id = transcription.id

while(!transcription.finished?) do
  sleep(30)
  transcription = session.get_transcription(id)
end

if(transcription.succeeded?)
  puts transcription.results.first.text
end

session.delete_transcription(id)

Development

After checking out the repo, run bin/setup to install dependencies. You can also run bin/console for an interactive prompt that will allow you to experiment.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/PerfectMemory/azure_stt. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

Code of Conduct

Everyone interacting in the AzureStt project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.

About

API Wrapper for the Microsoft Azure Speech Services Speech-to-text REST API 3.1 (Cognitive Services).

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks