# Azure AI Foundry

<center><img src="../../../images/Azure-AI-Foundry_1600x900.jpg" alt="Azure AI Foundry" width="600">

## Lab 2

In this lab we will explore the AI services present in Azure Foundry. This lab will cover the following services:
- Speech
- Language + Translator
- Vision + Document
- Content Safety

Understanding these services allows us to add more capabilities to our applications.

### Exercise 1 - Speech

The Speech service provides speech-to-text and text-to-speech conversion capabilities with a Speech resource. You can transcribe speech to text with high accuracy, produce natural text-to-speech voices, translate spoken audio, and use speaker recognition during conversations. Create custom voices, add specific words to your base vocabulary, or build your own models. Run Speech anywhere, in the cloud or at the edge in containers. It's easy to enable speech in your applications, tools, and devices with the Speech CLI, Speech SDK, and REST APIs.

Common scenarios for speech usage:

**Caption Generation:** Learn how to synchronize captions with your input audio, apply profanity filters, get partial results, apply customizations, and identify spoken languages for multilingual scenarios.

**Audio Content Creation:** You can use neural voices to make interactions with chatbots and voice assistants more natural and engaging, convert digital texts like e-books into audiobooks, and enhance automotive navigation systems.

**Call Center:** Transcribe calls in real-time or process a batch of calls, remove personally identifiable information, and extract insights like sentiment analysis to assist with your call center use case.

**Language Learning:** Provide pronunciation assessment feedback for language learners, offer real-time transcription support for remote learning conversations, and read educational materials aloud using neural voices.

**Voice Assistants:** Create natural, human-like conversational interfaces for your applications and experiences. The voice assistant feature offers fast and reliable interaction between a device and an assistant implementation.

To perform this exercise, verify that your `.env` file has the following variables filled:
- SPEECH_ENDPOINT
- SPEECH_KEY

After verification, let's start by loading the necessary libraries, initializing the client, and making a call to convert audio to text:

In [None]:
%pip install azure-cognitiveservices-speech

In [None]:
# Continuous speech recognition to process all audio, even with initial silence
import os
import azure.cognitiveservices.speech as speechsdk
import time
from dotenv import load_dotenv

load_dotenv()

speech_key = os.getenv('SPEECH_KEY')
speech_region = os.getenv('SPEECH_REGION')
audio_file = '../../../samples/audio001.wav'
if speech_key and speech_region:
    try:
        speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=speech_region)
        speech_config.speech_recognition_language = "en-US"
        audio_config = speechsdk.audio.AudioConfig(filename=audio_file)
        speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

        recognized_texts = []
        def recognized_cb(evt):
            if evt.result.text:
                print('Recognized:', evt.result.text)
                recognized_texts.append(evt.result.text)

        speech_recognizer.recognized.connect(recognized_cb)

        print("Starting continuous recognition...")
        speech_recognizer.start_continuous_recognition()

        # Wait for recognition to finish (adjust time according to audio size)
        time.sleep(10)
        speech_recognizer.stop_continuous_recognition()

        print("Recognition completed. Full text:")
        print(' '.join(recognized_texts))
    except Exception as e:
        print(f"Error in continuous recognition: {e}")
else:
    print("Please configure the SPEECH_KEY and SPEECH_REGION environment variables.")