# Speech

Increasingly, we expect to be able to communicate with artificial intelligence (AI) systems by talking to them, often with the expectation of a spoken response.

![A robot speaking](./images/speech.jpg)

*Speech recognition* (an AI system interpreting spoken language) and *speech synthesis* (an AI system generating a spoken response) are the key components of a speech-enabled AI solution.

## Create a Cognitive Services resource

To build software that can interpret audible speech and respond verbally, you can use the **Speech** cognitive service, which provides a simple way to transcribe spoken language into text and vice-versa.

If you don't already have one, use the following steps to create a **Cognitive Services** resource in the Azure subscription:

> **Note**: If you already have a Cognitive Services resource, just open its **Quick start** page in the Azure portal and copy its key and endpoint to the cell below. Otherwise, follow the steps below to create one.

1. In another browser tab, open the Azure portal at https://portal.azure.com, sign in with the lab credentials.
2. Click the **&#65291;Create a resource** button, search for *Cognitive Services*, and create a **Cognitive Services** resource with the following settings:
    - **Subscription**: *Select the existing subscription where you are performing the lab*.
    - **Resource group**: *Select the existing resource group*.
    - **Region**: *Choose any available region or the region where the resource group is deployed*.
    - **Name**: *speech-uniqueID* , You can find the uniqueID value in the Lab Environment-> Environment details tab
    - **Pricing tier**: S0
    - **I confirm I have read and understood the notices**: Selected.

3. Click on **Review+Create**. After the template has passed the validation click **create** to create the Cognitive Service.
4. Wait for deployment to complete. Then go to your cognitive services resource, and on the **Overview** page, click the link to manage the keys for the service. You will need the endpoint and keys to connect to your cognitive services resource from client applications.

### Get the Key and Endpoint for your Cognitive Services resource

To use your cognitive services resource, client applications need its  endpoint and authentication key:

1. In the Azure portal, on the **Keys and Endpoint** page for your cognitive service resource, copy the **Key1** for your resource and paste it in the code below, replacing **YOUR_COG_KEY**.
2. Copy the **Endpoint** for your resource and and paste it in the code below, replacing **YOUR_COG_ENDPOINT**.
3. Copy the **Location** for your resource and and paste it in the code below, replacing **YOUR_COG_LOCATION**.
4. Run the code below by clicking the **Run cell** (&#9655;) button to the left of the cell.

In [None]:
#Replace YOUR_COG_KEY and YOUR_COG_LOCATION with the cognitive service key and location values.

cog_key = 'YOUR_COG_KEY'
cog_location = 'YOUR_COG_LOCATION'

print('Ready to use cognitive services in {} using key {}'.format(cog_location, cog_key))

## Speech recognition

Suppose you want to build a home automation system that accepts spoken instructions, such as "turn the light on" or "turn the light off". Your application needs to be able to take the audio-based input (your spoken instruction), and interpret it by transcribing it to text that it can then parse and analyze.

Now you're ready to transcribe some speech. The input can be a microphone or an audio file. 

### Speech Recognition with a microphone

Run the cell below and **immediately** say out loud **"turn the light on"**. The speech-to-text capabilities of the Speech service will transcribe the audio. The output should be your speech in text.


In [None]:
import os
import IPython
from azure.cognitiveservices.speech import SpeechConfig, SpeechRecognizer, AudioConfig

# Configure speech recognizer
speech_config = SpeechConfig(cog_key, cog_location)

# Have students say "turn the light on" 
speech_recognizer = SpeechRecognizer(speech_config)

# Use a one-time, synchronous call to transcribe the speech
speech = speech_recognizer.recognize_once()

print(speech.text)


### (!) Check In

Were you able to run the cell and translate your speech to text? If the above cell does not give a text output (example output: _Turn the light on._), try running the cell again and **immediately** say out loud "turn the light on".

### Speech Recognition with an audio file

If the cell above does not give a text output, your microphone may not be set up to accept input. Instead, run the cell below to see the Speech Recognition service in action with an audio file instead of microphone input. 


In [None]:
import os
from playsound import playsound
from azure.cognitiveservices.speech import SpeechConfig, SpeechRecognizer, AudioConfig

# Get spoken command from audio file
file_name = 'light-on.wav'
audio_file = os.path.join('data', 'speech', file_name)

# Configure speech recognizer
speech_config = SpeechConfig(cog_key, cog_location)
audio_config = AudioConfig(filename=audio_file) # Use file instead of default (microphone)
speech_recognizer = SpeechRecognizer(speech_config, audio_config)

# Use a one-time, synchronous call to transcribe the speech
speech = speech_recognizer.recognize_once()

# Play audio and show transcribed text
playsound(audio_file)
print(speech.text)

## Speech synthesis

So now you've seen how the Speech service can be used to transcribe speech into text; but what about the opposite? How can you convert text into speech?

Well, let's assume your home automation system has interpreted a command to turn the light on. An appropriate response might be to acknowledge the command verbally (as well as actually performing the commanded task!)

In [None]:
import os
import matplotlib.pyplot as plt
from PIL import Image
from azure.cognitiveservices.speech import SpeechConfig, SpeechSynthesizer, AudioConfig
%matplotlib inline

# Get text to be spoken
response_text = 'Turning the light on.'

# Configure speech synthesis
speech_config = SpeechConfig(cog_key, cog_location)
speech_synthesizer = SpeechSynthesizer(speech_config)

# Transcribe text into speech
result = speech_synthesizer.speak_text(response_text)

# Display an appropriate image 
file_name = response_text + "jpg"
img = Image.open(os.path.join("data", "speech", file_name))
plt.axis('off')
plt. imshow(img)

Try changing the **response_text** variable to *Turning the light off.* (including the period at the end) and run the cell again to hear the result.

## Learn more

You've seen a very simple example of using the Speech cognitive service in this notebook. You can learn more about [speech-to-text](https://docs.microsoft.com/azure/cognitive-services/speech-service/index-speech-to-text) and [text-to-speech](https://docs.microsoft.com/azure/cognitive-services/speech-service/index-text-to-speech) in the Speech service documentation.