## 📚 Prerequisites

Before running this notebook, ensure you have configured Azure AI services and set the appropriate configuration parameters. You can find the setup instructions [here](README.md).

## 📋 Table of Contents

This notebook guides you through the following sections:

1. [**Speech to Text from Local Files**](#speech-to-text-from-local-files): This section covers how to convert speech to text from various audio file formats stored locally on your machine.

2. [**Speech to Text from Blob Storage**](#blob-storage): Learn how to convert speech to text from audio files stored in Azure Blob Storage.

3. [**Speech to Text from Streams**](#streams): This section demonstrates how to convert speech to text from audio streams, using push streams for real-time processing.

For more details, refer to the following resources:
- [Quickstart: Azure Cognitive Services Speech SDK](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master)

### 🛠️ Notebook Setup (Optional)

#### Setting Up Conda Environment and Configuring VSCode for Jupyter Notebooks

Follow these steps to create a Conda environment and set up your VSCode for running Jupyter Notebooks:

##### Create Conda Environment from the Repository

> Instructions for Windows users: 

1. **Create the Conda Environment**:
   - In your terminal or command line, navigate to the repository directory.
   - Execute the following command to create the Conda environment using the `environment.yml` file:
     ```bash
     conda env create -f environment.yml
     ```
   - This command creates a Conda environment as defined in `environment.yml`.

2. **Activating the Environment**:
   - After creation, activate the new Conda environment by using:
     ```bash
     conda activate speech-ai-azure-services
     ```

> Instructions for Linux users (or Windows users with WSL or other linux setup): 

1. **Use `make` to Create the Conda Environment**:
   - In your terminal or command line, navigate to the repository directory and look at the Makefile.
   - Execute the `make` command specified below to create the Conda environment using the `environment.yml` file:
     ```bash
     make create_conda_env
     ```

2. **Activating the Environment**:
   - After creation, activate the new Conda environment by using:
     ```bash
     conda activate speech-ai-azure-services
     ```

##### Configure VSCode for Jupyter Notebooks

1. **Install Required Extensions**:
   - Download and install the `Python` and `Jupyter` extensions for VSCode. These extensions provide support for running and editing Jupyter Notebooks within VSCode.

2. **Attach Kernel to VSCode**:
   - After creating the Conda environment, it should be available in the kernel selection dropdown. This dropdown is located in the top-right corner of the VSCode interface.
   - Select your newly created environment (`speech-ai-azure-services`) from the dropdown. This sets it as the kernel for running your Jupyter Notebooks.

3. **Run the Notebook**:
   - Once the kernel is attached, you can run the notebook by clicking on the "Run All" button in the top menu, or by running each cell individually.

By following these steps, you'll establish a dedicated Conda environment for your project and configure VSCode to run Jupyter Notebooks efficiently. This environment will include all the necessary dependencies specified in your `environment.yml` file. If you wish to add more packages or change versions, please use `pip install` in a notebook cell or in the terminal after activating the environment, and then restart the kernel. The changes should be automatically applied after the session restarts.

In [1]:
import os

# Define the target directory (change yours)
target_directory = (
    r"C:\Users\pablosal\Desktop\sharepoint-indexing-azure-cognitive-search"
)

# Check if the directory exists
if os.path.exists(target_directory):
    # Change the current working directory
    os.chdir(target_directory)
    print(f"Directory changed to {os.getcwd()}")
else:
    print(f"Directory {target_directory} does not exist.")

Directory changed to C:\Users\pablosal\Desktop\sharepoint-indexing-azure-cognitive-search


## Speech to Text from Local Files

In [None]:
# Import the SpeechTranscriber class from the speech_to_text module in the src.speech package
from src.speech.speech_to_text import SpeechTranscriber

# Create an instance of the SpeechTranscriber class
transcriber_client = SpeechTranscriber()

In [14]:
AUDIO_FILE_PCM_STEREO = "C://Users//pablosal//Desktop//gbbai-azure-ai-speech-services//utils//audio_data//d6a35a5e-be01-40cd-b9ef-d61fcda699fa.pcm"

In [15]:
# Call the transcribe_speech_from_file_continuous method of the transcriber_client object
# Pass the AUDIO_FILE_PCM_STEREO constant as the file_name argument
# Pass the following arguments OPTIONAL:
# language (str): The language to use for speech recognition. This parameter is optional and defaults to None.
# source_language_config (SourceLanguageConfig): The source language configuration. This parameter is optional and defaults to None.
# auto_detect_source_language_config (AutoDetectSourceLanguageConfig): The auto detect source language configuration. This parameter is optional and defaults to None.

transcriber_client.transcribe_speech_from_file_continuous(file_name=AUDIO_FILE_PCM_STEREO)

2023-12-26 17:28:29,278 - micro - MainProcess - INFO     SESSION STARTED: SessionEventArgs(session_id=78cf3c53053c40b6ba579aea9b57b8ca) (speech_to_text.py:<lambda>:129)
2023-12-26 17:28:30,464 - micro - MainProcess - INFO     RECOGNIZING: SpeechRecognitionEventArgs(session_id=78cf3c53053c40b6ba579aea9b57b8ca, result=SpeechRecognitionResult(result_id=578782ea3b8e4e7a86b3843cb5a1855c, text="what is the date", reason=ResultReason.RecognizingSpeech)) (speech_to_text.py:<lambda>:125)
2023-12-26 17:28:31,050 - micro - MainProcess - INFO     RECOGNIZING: SpeechRecognitionEventArgs(session_id=78cf3c53053c40b6ba579aea9b57b8ca, result=SpeechRecognitionResult(result_id=063c72a5c1cf423a98f23155fc33ba40, text="may 15th 1980", reason=ResultReason.RecognizingSpeech)) (speech_to_text.py:<lambda>:125)
2023-12-26 17:28:31,903 - micro - MainProcess - INFO     RECOGNIZING: SpeechRecognitionEventArgs(session_id=78cf3c53053c40b6ba579aea9b57b8ca, result=SpeechRecognitionResult(result_id=162b2d82632a486084d7c

'What is the date? May 15th, 1980. Thursday, May 15th, 19180. What is the date? Saturday, July 6th, 2024.'

## Speech to Text from Blob Files

In [21]:
AUDIO_FILE_PCM_STEREO_BLOB = "https://testeastusdev001.blob.core.windows.net/speechapp/d6a35a5e-be01-40cd-b9ef-d61fcda699fa.pcm"

In [23]:
transcriber_client.transcribe_speech_from_blob_continuous(blob_url=AUDIO_FILE_PCM_STEREO_BLOB)

2023-12-26 17:33:44,365 - micro - MainProcess - INFO     SESSION STARTED: SessionEventArgs(session_id=fb2a4a5af38b4cd181900a054b8e502b) (speech_to_text.py:<lambda>:186)
2023-12-26 17:33:45,621 - micro - MainProcess - INFO     RECOGNIZING: SpeechRecognitionEventArgs(session_id=fb2a4a5af38b4cd181900a054b8e502b, result=SpeechRecognitionResult(result_id=ddd3e26e327b42278483f93b8bcf1ba1, text="what is the date", reason=ResultReason.RecognizingSpeech)) (speech_to_text.py:<lambda>:182)
2023-12-26 17:33:46,151 - micro - MainProcess - INFO     RECOGNIZING: SpeechRecognitionEventArgs(session_id=fb2a4a5af38b4cd181900a054b8e502b, result=SpeechRecognitionResult(result_id=35725cb234484242b069438844aeec54, text="may 15th 1980", reason=ResultReason.RecognizingSpeech)) (speech_to_text.py:<lambda>:182)
2023-12-26 17:33:46,787 - micro - MainProcess - INFO     RECOGNIZING: SpeechRecognitionEventArgs(session_id=fb2a4a5af38b4cd181900a054b8e502b, result=SpeechRecognitionResult(result_id=6a320224089f4b5ca8b45

'What is the date? May 15th, 1980. Thursday, May 15th, 19180. What is the date? Saturday, July 6th, 2024.'

## Speech to Text from streams (preview):

In [24]:
from src.speech.utils_audio import check_audio_file

In [25]:
AUDIO_FILE_PCM_MONO = "C://Users//pablosal//Desktop//gbbai-azure-ai-speech-services//utils//audio_data//aboutSpeechSdk.wav"

In [26]:
check_audio_file(AUDIO_FILE_PCM_MONO)

2023-12-26 17:39:28,901 - micro - MainProcess - INFO     PCM Format (int-16): True (utils_audio.py:check_audio_file:38)
2023-12-26 17:39:28,902 - micro - MainProcess - INFO     One Channel (Mono): True (utils_audio.py:check_audio_file:42)
2023-12-26 17:39:28,903 - micro - MainProcess - INFO     Valid Sample Rate (8000 or 16000 Hz): True (utils_audio.py:check_audio_file:46)
2023-12-26 17:39:28,904 - micro - MainProcess - INFO     Bytes Per Second (16000 or 32000): 32000 (utils_audio.py:check_audio_file:50)
2023-12-26 17:39:28,904 - micro - MainProcess - INFO     Two-block Aligned: True (utils_audio.py:check_audio_file:54)


True

In [27]:
transcriber_client.speech_recognition_with_push_stream(audio_file=AUDIO_FILE_PCM_MONO)

2023-12-26 17:39:44,963 - micro - MainProcess - INFO     SESSION STARTED: SessionEventArgs(session_id=5734ba1770c04e2ca12c12f5dab4496c) (speech_to_text.py:<lambda>:241)
2023-12-26 17:39:44,970 - micro - MainProcess - INFO     Mono data shape: (1600,) (speech_to_text.py:speech_recognition_with_push_stream:280)
2023-12-26 17:39:45,073 - micro - MainProcess - INFO     Mono data shape: (1600,) (speech_to_text.py:speech_recognition_with_push_stream:280)
2023-12-26 17:39:45,181 - micro - MainProcess - INFO     Mono data shape: (1600,) (speech_to_text.py:speech_recognition_with_push_stream:280)
2023-12-26 17:39:45,290 - micro - MainProcess - INFO     Mono data shape: (1600,) (speech_to_text.py:speech_recognition_with_push_stream:280)
2023-12-26 17:39:45,400 - micro - MainProcess - INFO     Mono data shape: (1600,) (speech_to_text.py:speech_recognition_with_push_stream:280)
2023-12-26 17:39:45,509 - micro - MainProcess - INFO     Mono data shape: (1600,) (speech_to_text.py:speech_recognition_w

'The Speech SDK exposes many features from the Speech Service, but not all of them. The capabilities of the Speech SDK are often associated with scenarios. The Speech SDK is ideal for both real time and non real time scenarios using local devices, files, Azure BLOB storage and even input and output streams. When a scenario is not achievable with a Speech SDK, look for a REST API alternative. Speech to text, also known as speech recognition, transcribes audio streams to text that your applications, tools, or devices can consume or display. Use speech to text with language understanding. Louis to derive user intents from transcribed speech and act on voice commands. Use speech translation to translate speech input to a different language with a single call. For more information, see Speech to Text Basics.'