#**Integrating Hugging Face Models with Gradio for Audio Transcription and Summarization**

## **Overview**

This project demonstrates how to build a web-based interface using Gradio to process audio files. The interface allows users to upload an audio file, transcribe the audio to text using the Whisper model, and summarize the transcribed text using the PEGASUS model from Hugging Face. Users can also adjust the length of the summary using sliders.

In [None]:
# Install required libraries using pip
!pip install gradio transformers pandas openai-whisper

Collecting gradio
  Downloading gradio-4.44.0-py3-none-any.whl.metadata (15 kB)
Collecting openai-whisper
  Downloading openai-whisper-20231117.tar.gz (798 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m798.6/798.6 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting aiofiles<24.0,>=22.0 (from gradio)
  Downloading aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)
Collecting fastapi<1.0 (from gradio)
  Downloading fastapi-0.114.1-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.4.0-py3-none-any.whl.metadata (2.9 kB)
Collecting gradio-client==1.3.0 (from gradio)
  Downloading gradio_client-1.3.0-py3-none-any.whl.metadata (7.1 kB)
Collecting httpx>=0.24.1 (from gradio)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting orjson~=3.



#**Libraries and Models**

**ʘ** We import the whisper library for audio transcription and transformers for text summarization.

**ʘ** We load the Whisper model using whisper.load_model("base") for converting audio to text.

**ʘ** The PEGASUS model from Hugging Face is initialized using pipeline("summarization", model="google/pegasus-large") to summarize the transcribed text.

**ʘ** pandas to store results in a CSV file for future reference.



In [None]:
import gradio as gr
import whisper    # Library for speech recognition
from transformers import pipeline
import pandas as pd


# Load the Whisper model for speech recognition
whisper_model = whisper.load_model("base")


# Load the summarization model from Hugging Face
summarization = pipeline("summarization", model="google/pegasus-large")



100%|████████████████████████████████████████| 139M/139M [00:01<00:00, 145MiB/s]
  checkpoint = torch.load(fp, map_location=device)
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/3.09k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-large and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


generation_config.json:   0%|          | 0.00/260 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/88.0 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]



#**Code explain**

**Define Function for Processing Audio**

process_audio(audio_file, min_length, max_length):

  1. Takes an audio file, minimum, and maximum summary length as input.

  2. Checks if the audio file is provided and contains valid content.

  3. Transcribes the audio using Whisper's transcribe function.

  4. Summarizes the transcribed text using the PEGASUS model.

  5. Saves the audio file path, transcript, and summary into a DataFrame and stores it in a CSV file named results.csv.

  6. Returns the transcript and summary for display in the Gradio interface.
Handles exceptions and prints errors for debugging.

**Create and Launch Gradio Interface**

**gr.Interface**: A Gradio component that creates a user interface.

**Inputs:**

  1. **gr.Audio:** Allows users to upload an audio file.

  2. **gr.Slider:** Sliders for setting the minimum and maximum summary lengths.

**Outputs:**
  1. **gr.Textbox:** Display the transcribed text.
  2. **gr.Textbox:** Display the summarized text.

The interface is launched using iface.launch(), allowing users to interact with the application directly.

In [None]:
def process_audio(audio_file, min_length, max_length):
    try:
        # Ensure audio_file is not None and has valid content
        if audio_file is None:
            raise ValueError("No audio file provided.")

        # Use the Whisper model to transcribe the audio file into text
        result = whisper_model.transcribe(audio_file)
        text = result['text']

        # Check if transcription was successful
        if not text:
            raise ValueError("Failed to transcribe the audio. The transcription result is empty.")

        # Use the summarization pipeline to summarize the transcribed text
        summary_result = summarization(text, min_length=min_length, max_length=max_length)
        summary = summary_result[0]['summary_text']

        # Check if summarization was successful
        if not summary:
            raise ValueError("Failed to summarize the transcript. The summary result is empty.")

        # Create a DataFrame to store the audio file, transcript, and summary
        df_results = pd.DataFrame({
            "Audio File": [audio_file],  # Store the path to the audio file
            "Transcript": [text],       # Store the transcribed text
            "Summary": [summary]        # Store the generated summary
        })

        # Save the results to a CSV file named "results.csv"
        df_results.to_csv("results.csv", index=False)

        # Return the transcript and summary to be displayed in the Gradio interface
        return text, summary

    except Exception as e:
        # General error handling
        error_message = f"An error occurred: {str(e)}"
        return error_message, error_message

# Create a Gradio interface
iface = gr.Interface(
    fn=process_audio,  # The function to be called when processing the input
    inputs=[
        gr.Audio(sources="upload", type="filepath", label="Upload your audio file"),  # Audio input field for file upload
        gr.Slider(minimum=10, maximum=50, value=30, label="Minimum Summary Length"),   # Slider for setting minimum summary length
        gr.Slider(minimum=50, maximum=600, value=100, label="Maximum Summary Length")  # Slider for setting maximum summary length
    ],
    outputs=[
        gr.Textbox(label="Transcript"),  # Textbox for displaying the transcript
        gr.Textbox(label="Summary")      # Textbox for displaying the summary
    ],
    title="Audio to Summarized Transcript",  # Title of the app
    description="Upload an audio file and adjust summary length to get both the transcript and summary."  # Description of the app
)

# Launch the app
iface.launch()


Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://56e5854ad5edecf82a.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


