## Business AI Meeting Companion STT

### Introduction
Consider you're attending a business meeting where all conversations are being captured by an advanced AI application. This application not only transcribes the discussions with high accuracy but also provides a concise summary of the meeting, emphasizing the key points and decisions made.

In our project, we'll use OpenAI's Whisper to transform speech into text. Next, we'll use IBM Watson's AI to summarize and find key points. We'll make an app with Hugging Face Gradio as the user interface.

### Learning Objectives
After finishing this lab, you will able to:

* Create a Python script to generate text using a model from the Hugging Face Hub, identify some key parameters that influence the model's output, and have a basic understanding of how to switch between different LLM models.
* Use OpenAI's Whisper technology to convert lecture recordings into text, accurately.
* Implement IBM Watson's AI to effectively summarize the transcribed lectures and extract key points.
* Create an intuitive and user-friendly interface using Hugging Face Gradio, ensuring ease of use for students and educators.

### Preparing the environment
Let's start with setting up the environment by creating a Python virtual environment and installing the required libraries, using the following commands in the terminal:

```
pip3 install virtualenv 
virtualenv my_env # create a virtual environment my_env
source my_env/bin/activate # activate my_env
```

Then, install the required libraries in the environment (this will take time ☕️☕️):
```
# installing required libraries in my_env
pip install transformers==4.35.2 torch==2.1.1 gradio==4.17.0 langchain==0.0.343 ibm_watson_machine_learning==1.0.335 huggingface-hub==0.19.4
```

We need to install ffmpeg to be able to work with audio files in python:

`sudo apt update`

Then run: `sudo apt install ffmpeg -y`

#### Step 1: Speech-to-Text
Initially, we want to create a simple speech-to-text Python file using OpenAI Whisper.

You can test the sample audio file Sample voice link to [download](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX04C6EN/Testing%20speech%20to%20text.mp3).

Create and open a Python file and call it `simple_speech2text.py`.

Let's download the file first (you can do it manually, then drag and drop it into the file environment).

```
import requests

# URL of the audio file to be downloaded
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX04C6EN/Testing%20speech%20to%20text.mp3"

# Send a GET request to the URL to download the file
response = requests.get(url)

# Define the local file path where the audio file will be saved
audio_file_path = "downloaded_audio.mp3"

# Check if the request was successful (status code 200)
if response.status_code == 200:
    # If successful, write the content to the specified local file path
    with open(audio_file_path, "wb") as file:
        file.write(response.content)
    print("File downloaded successfully")
else:
    # If the request failed, print an error message
    print("Failed to download the file")
```

Next, implement OpenAI Whisper for transcribing voice to speech.

You can override the previous code in the Python file.

```
import torch
from transformers import pipeline

# Initialize the speech-to-text pipeline from Hugging Face Transformers
# This uses the "openai/whisper-tiny.en" model for automatic speech recognition (ASR)
# The `chunk_length_s` parameter specifies the chunk length in seconds for processing
pipe = pipeline(
  "automatic-speech-recognition",
  model="openai/whisper-tiny.en",
  chunk_length_s=30,
)

# Define the path to the audio file that needs to be transcribed
sample = 'downloaded_audio.mp3'

# Perform speech recognition on the audio file
# The `batch_size=8` parameter indicates how many chunks are processed at a time
# The result is stored in `prediction` with the key "text" containing the transcribed text
prediction = pipe(sample, batch_size=8)["text"]

# Print the transcribed text to the console
print(prediction)
```

Run the Python file and you will get the output.

`python3 simple_speech2text.py`

In the next step, we will utilize `Gradio` for creating interface for our app.

## Gradio interface
### Creating a simple demo
Through this project, we will create different LLM applications with Gradio interface. Let's get familiar with Gradio by creating a simple app:

Still in the `project` directory, create a Python file and name it hello.py`.

Open `hello.py`, paste the following Python code and save the file.

```
import gradio as gr

def greet(name):
    return "Hello " + name + "!"

demo = gr.Interface(fn=greet, inputs="text", outputs="text")

demo.launch(server_name="0.0.0.0", server_port= 7860)
```

The above code creates a **gradio.Interface** called `demo`. It wraps the `greet` function with a simple text-to-text user interface that you could interact with.

The **gradio.Interface** class is initialized with 3 required parameters:

* fn: the function to wrap a UI around
* inputs: which component(s) to use for the input (e.g. “text”, “image” or “audio”)
* outputs: which component(s) to use for the output (e.g. “text”, “image” or “label”)

The last line `demo.launch()` launches a server to serve our `demo`.

### Launching the demo app
Now go back to the terminal and make sure that the `my_env` virtual environment name is displayed at the beginning of the line. Run the following command to execute the Python script.

`python3 hello.py`

As the Python code is served by a local host, click on the button below and you will be able to see the simple application we just created. Feel free to play around with the input and output of the web app!

#### Step 2: Creating audio transcription app
Create a new python file `speech2text_app.py`

**Exercise: Complete the `transcript_audio` function.**

From the step1: fill the missing parts in `transcript_audio` function.

```
import torch
from transformers import pipeline
import gradio as gr

# Function to transcribe audio using the OpenAI Whisper model
def transcript_audio(audio_file):
    # Initialize the speech recognition pipeline
    pipe = #-----> Fill here <----
    
    # Transcribe the audio file and return the result
    result = #-----> Fill here <----
    return result

# Set up Gradio interface
audio_input = gr.Audio(sources="upload", type="filepath")  # Audio input
output_text = gr.Textbox()  # Text output

# Create the Gradio interface with the function, inputs, and outputs
iface = gr.Interface(fn=transcript_audio, 
                     inputs=audio_input, outputs=output_text, 
                     title="Audio Transcription App",
                     description="Upload the audio file")

# Launch the Gradio app
iface.launch(server_name="0.0.0.0", server_port=7860)
```

Then, run your app:

`python3 speech2text_app.py`