<a href="https://colab.research.google.com/github/prthameshhh/Audio-transcription-and-summarization/blob/main/Audio_Transcription_and_summarization_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Audio Transcription and Text Summarization with Pre-trained Language Models

This Colab notebook demonstrates the usage of pre-trained language models for audio transcription and text summarization tasks. It utilizes the Mistral model for text generation and sets up a Gradio interface for easy user interaction.

## Overview

The script includes the following components:

1. **Model Loading:** It loads the OpenAi whisper(base) model for audio transcription, Mistral 7B pre-trained language model for text generation and sets up the necessary configurations.
   
2. **Gradio Interface:** The script sets up a Gradio interface to enable users to interactively transcribe audio and generate text summaries.

3. **Text Summarization:** Constants, prompt templates, and helper functions are defined for summarizing text. A map-reduce approach is employed for handling larger content efficiently.

4. **Text Generation Pipeline:** The script configures the text generation pipeline with specific parameters and quantization settings to ensure optimal performance.

5. **Audio Transcription:** A function is defined to transcribe audio files using the OpenAI Whisper model. This functionality is seamlessly integrated into the Gradio interface.

## Usage

1. **Load Models:** Load the necessary pre-trained language models and configure the text generation pipeline.

2. **Define Prompt Templates:** Define constants, prompt templates, and helper functions for text summarization.

3. **Setup Gradio Interface:** Set up the Gradio interface for user interaction, allowing users to transcribe audio and receive text summaries.

4. **Transcribe Audio and Generate Summaries:** Use the Gradio interface to transcribe audio files and generate concise text summaries interactively.

## Conclusion

This Colab notebook provides a comprehensive demonstration of using pre-trained language models for audio transcription and text summarization tasks. By leveraging the Mistral model, OpenAI Whisper model, and Gradio interface, users can efficiently transcribe audio and generate concise summaries with ease.



In [None]:
!pip install gradio
!pip install cohere -q
!pip install openai -q
!pip install git+https://github.com/openai/whisper.git -q
!pip install git+https://github.com/openai/whisper.git --use-deprecated=legacy-resolver -q
!sudo apt update && sudo apt install ffmpeg -q
!pip install langchain torch accelerate bitsandbytes transformers

In [None]:
from huggingface_hub import login
login("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")

In [None]:
# Imports
import whisper
import gradio as gr
import langchain
from langchain.document_loaders import DirectoryLoader
from langchain.llms import HuggingFacePipeline
from langchain.cache import InMemoryCache
import torch
from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain import PromptTemplate, LLMChain
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

INSTALLING THE MODELS

In [None]:
# Load models and tokenizers
# Loads the Mistral model and tokenizer for text generation, and sets up quantization configuration.

transcriber = whisper.load_model("base")

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

model_id = "mistralai/Mistral-7B-v0.1"
model_4bit = AutoModelForCausalLM.from_pretrained(
    model_id, device_map="auto", quantization_config=quantization_config
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

CREATING THE MODEL

In [None]:
# Constants
# Defines various constants used in the script such as style, prompt trigger, output language, model context window, etc.
STYLE = "generate a concise article with title"
PROMPT_TRIGGER = "ARTICLE GENERATION"
OUTPUT_LANGUAGE = "English"
VERBOSE = False
MODEL_CONTEXT_WINDOW = 13000
MAX_ANSWER_TOKENS = 1500
CHUNK_SIZE = 10000
CHUNK_OVERLAP = 500

# Set up pipelines
# Configures the text generation pipeline with parameters such as max length, temperature, top p, top k, etc.
pipeline = pipeline(
    "text-generation",
    model=model_4bit,
    tokenizer=tokenizer,
    use_cache=True,
    device_map="auto",
    max_length=1000,
    max_new_tokens=1000,
    do_sample=True,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.eos_token_id,
    temperature=0.2,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1,
)

llm = HuggingFacePipeline(pipeline=pipeline)

# Prompt templates
# Defines templates for prompts used in summarization.
combine_prompt_template = """
write a article by taking the following text as a context and
{style}

```{content}```

{trigger} in {language}:
"""

map_prompt_template = """
Write a detailed article using following :

{text}

detailed article in {language}:
"""

# Helper functions
def summarize_base(llm, content):
    """Summarize whole content at once. The content needs to fit into model's context window."""
    prompt = PromptTemplate.from_template(combine_prompt_template).partial(
        style=STYLE, trigger=PROMPT_TRIGGER, language=OUTPUT_LANGUAGE
    )
    chain = LLMChain(llm=llm, prompt=prompt, verbose=VERBOSE)
    output = chain.run(content)
    return output

def summarize_map_reduce(llm, content):
    """Summarize content potentially larger that model's context window using map-reduce approach."""
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP
    )
    split_docs = text_splitter.create_documents([content])
    print(
        f"Map-Reduce content splits ({len(split_docs)} splits): {[len(sd.page_content) for sd in split_docs]}"
    )

    map_prompt = PromptTemplate.from_template(map_prompt_template).partial(
        language=OUTPUT_LANGUAGE
    )

    combine_prompt = PromptTemplate.from_template(combine_prompt_template).partial(
        style=STYLE, trigger=PROMPT_TRIGGER, language=OUTPUT_LANGUAGE
    )

    chain = load_summarize_chain(
        llm=llm,
        chain_type="map_reduce",
        map_prompt=map_prompt,
        combine_prompt=combine_prompt,
        combine_document_variable_name="content",
        verbose=VERBOSE,
    )

    output = chain.run(split_docs)
    return output

def process_and_generate_audio(audio_file):
    """Transcribes audio, generates a summary, and creates TTS audio for the summary."""
    # Transcribe audio
    text = transcriber.transcribe(audio_file)["text"]
    length = len(text.split(" "))

    # Summarize based on content tokens
    content_tokens = llm.get_num_tokens(text)
    base_threshold = 0.75 * MODEL_CONTEXT_WINDOW

    if content_tokens < base_threshold:
        summary = summarize_base(llm, text)
    else:
        summary = summarize_map_reduce(llm, text)
    len_of_sum = len(summary.split(" "))

    return (
        f"Transcription:\n{text}\n\n length_of_transcription:\n {length}",
        f" {summary}\n\n length_of_summary:\n {len_of_sum}",
    )

# Gradio interface
# Sets up the Gradio interface for audio transcription and summary generation.
iface = gr.Interface(
    fn=process_and_generate_audio,
    inputs=gr.Audio(type="filepath"),
    outputs=[
        gr.Textbox(label="Transcription"),
        gr.Textbox(label="Summary"),
    ],
    live=True,
)


In [None]:
iface.launch()