# The Speech Service

One of the more challenging tasks for deep-learning models is processing human speech. Azure Cognitive Services includes a [Speech](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/overview) service that converts text to speech, speech to text, and more. It’s even capable of [captioning recorded videos and live video streams](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/captioning-concepts?pivots=programming-language-python) and filtering out profanity as it does. A Python SDK simplifies the code you write and makes it remarkably easy to incorporate speech into your apps.

To demonstrate, install the package named [`azure-cognitiveservices-speech`](https://pypi.org/project/azure-cognitiveservices-speech/) containing the Speech SDK. Use the Azure Portal to create a Cognitive Services Speech resource and make note of the subscription key and service region. Then run the following code after replacing `KEY` with the subscription key and `REGION` with the region you selected:

In [1]:
from azure.cognitiveservices import speech

speech_config = speech.SpeechConfig(KEY, REGION)
speech_config.speech_recognition_language = 'en-US'

Now run the following statements and when prompted, speak into your microphone. This example creates a [`SpeechRecognizer`](https://docs.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.speechrecognizer?view=azure-python) object and uses its [`recognize_once_async`](https://docs.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.speechrecognizer?view=azure-python#azure-cognitiveservices-speech-speechrecognizer-recognize-once-async) method to convert up to 30 seconds of live audio from your PC’s default microphone into text. Observe that the text doesn’t appear until you’ve finished speaking:

In [2]:
recognizer = speech.SpeechRecognizer(speech_config)

print('Speak into your microphone')
result = recognizer.recognize_once_async().get()

if result.reason == speech.ResultReason.RecognizedSpeech:
    print(result.text)

Speak into your microphone
When will your new book be published?


How about converting text to speech? Here’s an example that uses the SDK’s `SpeechSynthesizer` class to vocalize a sentence. The synthesized voice belongs to an  English speaker named Jenny ("en-US-JennyNeural"), and it’s one of more than [300 neural voices](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support?tabs=speechtotext#prebuilt-neural-voices) you can choose from:

In [3]:
speech_config.speech_synthesis_voice_name = 'en-US-JennyNeural'
synthesizer = speech.SpeechSynthesizer(speech_config)
synthesizer.speak_text_async('When will your new book be published?').get()

<azure.cognitiveservices.speech.SpeechSynthesisResult at 0x1787c9b42b0>

All of the “speakers” are multilingual. If you ask a French speaker – for example, "fr-FR-CelesteNeural" – to synthesize an English sentence, the vocalization will feature a French accent.

You can combine a [`TranslationRecognizer`](https://docs.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.translation.translationrecognizer?view=azure-python) object with a `SpeechSynthesizer` object to translate speech in real time. The following example takes spoken English as input and plays it back in French using the voice of a native French speaker:

In [4]:
speech_config.speech_synthesis_voice_name = 'fr-FR-YvetteNeural'
synthesizer = speech.SpeechSynthesizer(speech_config)

translation_config = speech.translation.SpeechTranslationConfig(KEY, REGION)
translation_config.speech_recognition_language = 'en-US'
translation_config.add_target_language('fr')

recognizer = speech.translation.TranslationRecognizer(translation_config)

print('Speak into your microphone')
result = recognizer.recognize_once_async().get()

if result.reason == speech.ResultReason.TranslatedSpeech:
    text = result.translations['fr']
    synthesizer.speak_text_async(text).get()

Speak into your microphone


These samples use your PC’s default microphone for voice input and default speakers for output. You can specify other sources of input and output by passing an [`AudioConfig`](https://docs.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.audio.audioconfig?view=azure-python) object to the methods that create `SpeechRecognizer`, `SpeechSynthesizer`, and `TranslationRecognizer` objects. Among the options this enables is using a file or stream rather than a microphone as the source of input.