# Step Back Prompting RAG chain with Groq's Whisper API and Langchain LCEL

## Introduction

This tutorial demonstrates how to create a powerful question-answering system for audio and video content by combining Groq's Whisper API for transcription and Step Back Prompting RAG (Retrieval-Augmented Generation) for enhanced comprehension and response generation. Read the paper on Step Back Prompting [here](https://arxiv.org/abs/2310.06117).

Groq's Whisper API provides state-of-the-art speech recognition capabilities, allowing us to accurately and quicklt transcribe audio and video files.

Step Back Prompting is a RAG technique that improves the quality of responses by encouraging the language model to "step back" and consider the broader context before answering specific questions. This approach helps in generating more comprehensive and accurate responses, especially for complex queries.

By integrating these technologies, we'll create a system that can:
1. Transcribe audio/video content
2. Index the transcriptions for efficient retrieval
3. Use Step Back Prompting RAG to generate insightful responses to user queries about the content

This notebook will guide you through the process of:

1. Setting up the environment and dependencies.
2. Downloading and transcribing audio/video files using Groq's Whisper API.
3. Splitting and indexing the transcriptions into a vectorstore.
4. Setting up the Step Back Prompting RAG pipeline using Langchain's LCEL (LangChain Expression Language).
5. Chat with the media.

By the end of this tutorial, you'll have a powerful tool for extracting insights from audio and video content, combining the strengths of accurate transcription and advanced language understanding.

You can create a developer account for free at https://console.groq.com/ and generate a free API key to follow this tutorial!

## 1. Setting up the environment and dependencies

In this section, we install and import the necessary libraries required for our media rag task.

In [2]:
# Installation
%pip install langchain -q
%pip install langchain_community -q
%pip install langchain_groq -q
%pip install yt_dlp -q
%pip install pydub -q
%pip install librosa -q
%pip install openai -q
%pip install langchain_chroma -q
%pip install langchain_huggingface -q
%pip install sentence-transformers -q

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


### Set up environment variables
- You can create your free Groq api key [here](https://console.groq.com/keys)
- SAVE_DIR is the name of the directory you want to save all the downloaded media urls. By default it saves all the media within a `media` subfolder.

In [7]:
import os

os.environ["GROQ_API_KEY"] = "gsk_..."
os.environ["SAVE_DIR"] = "./media/" # Directory you want to save the media files to
os.environ["TOKENIZERS_PARALLELISM"] = "false" # To suppress huggingface warnings

## 2. Downloading and transcribing audio/video files using Groq's Whisper API

### Create media downloading functions

Let's create some functions to help us download youtube videos and media files from the internet.

The `load_yt` function takes in a youtube video url and downloads the mp3 of the video to the `SAVE_DIR` directory.

The `load_meda` function takes in a remote media url and downloads it to the `SAVE_DIR` directory.

In [8]:
import yt_dlp
from typing import Optional, Dict, List
import requests
import os
from urllib.parse import urlparse
import uuid
import tqdm

def load_yt(
    url: str, 
    save_dir: Optional[str] = os.getenv("SAVE_DIR"),
    ydl_opts: Optional[Dict] = None
) -> str:
    """Download a youtube video to `save_dir`"""
    if not save_dir:
        raise ValueError("save_dir is not set and SAVE_DIR environment variable is not defined")

    if not ydl_opts:
        ydl_opts = {
            "format": "worstaudio/worst", # Select low quality audio to speed up download
            "postprocessors": [{
                "key": "FFmpegExtractAudio",
                "preferredcodec": "mp3",
                "preferredquality": "32",
            }],
            "outtmpl": os.path.join(save_dir, "%(title)s.%(ext)s"),
            "restrictfilenames": True,  # This option replaces spaces and other problematic characters
        }

    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        info = ydl.extract_info(url, download=True)
        filename = ydl.prepare_filename(info)
        filename = os.path.splitext(filename)[0] + '.mp3'
    return filename

def load_media(
    url: str, 
    save_dir: Optional[str] = os.getenv("SAVE_DIR")
) -> str:
    """Download a audio/video file to `save_dir`"""

    response = requests.get(url)
    response.raise_for_status()

    parsed_url = urlparse(url)
    filename = os.path.basename(parsed_url.path)

    if not filename:
        filename = f"downloaded_file_{uuid.uuid4().hex[:8]}"

    os.makedirs(save_dir, exist_ok=True)
    file_path = os.path.join(save_dir, filename)

    with open(file_path, 'wb') as file:
        file.write(response.content)

    return file_path

### Download and Transcribe

Now we can download and transcribe a list of urls and use the `load_yt` and `load_media` functions to download and then Groq's Whisper API to transcribe the media.

The transcribed text is then stored in the `transcribed_documents` list as a `Document`, which is a Langchain primitive that will allow us to index these transcripts easily later on.

In [9]:
from groq import Groq
from langchain_core.documents import Document

client = Groq(
    api_key=os.getenv("GROQ_API_KEY")
)

save_dir = os.getenv("SAVE_DIR")
urls = [
    "https://www.youtube.com/watch?v=nvJ74ZSpDQ4",
    "https://www.youtube.com/watch?v=0epti7O0Yis",
    "https://static.deepgram.com/examples/interview_speech-analytics.wav"
]
transcribed_documents = []

for url in tqdm.tqdm(urls):
    if "youtube.com" in url or "youtu.be" in url:
        load_func = load_yt
    else:
        load_func = load_media

    try:
        # Download file
        filepath = load_func(url)
        print("filepath:", filepath)

        #Transcribe file

        with open(filepath, "rb") as file:
            transcript = client.audio.transcriptions.create(
                file=(filepath, file.read()),
                model="whisper-large-v3",
                prompt="GROQ, Groq, LPU, Mixtral, Mistral, Llama, Meta", # Optional: Add keywords that can help the model transcribe propernouns
                response_format="text",
                temperature=0.0  # Optional
            )

        transcribed_documents.append(Document(
            page_content=transcript,
            metadata={
                'source': url,
                'filepath': filepath
            }
        ))
    except Exception as e:
        print(f"Error downloading {url[:6]}.. : {str(e)}")
        continue

  0%|          | 0/3 [00:00<?, ?it/s]

[youtube] Extracting URL: https://www.youtube.com/watch?v=nvJ74ZSpDQ4
[youtube] nvJ74ZSpDQ4: Downloading webpage
[youtube] nvJ74ZSpDQ4: Downloading ios player API JSON
[youtube] nvJ74ZSpDQ4: Downloading m3u8 information
[info] nvJ74ZSpDQ4: Downloading 1 format(s): 233
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 303
[download] Destination: ./media/AMA_-_Function_Calling_in_GroqCloud.mp4
[download] 100% of   10.70MiB in 00:00:24 at 452.72KiB/s                  
[ExtractAudio] Destination: ./media/AMA_-_Function_Calling_in_GroqCloud.mp3
Deleting original file ./media/AMA_-_Function_Calling_in_GroqCloud.mp4 (pass -k to keep)
filepath: ./media/AMA_-_Function_Calling_in_GroqCloud.mp3


 33%|███▎      | 1/3 [00:46<01:33, 46.92s/it]

[youtube] Extracting URL: https://www.youtube.com/watch?v=0epti7O0Yis
[youtube] 0epti7O0Yis: Downloading webpage
[youtube] 0epti7O0Yis: Downloading ios player API JSON
[youtube] 0epti7O0Yis: Downloading m3u8 information
[info] 0epti7O0Yis: Downloading 1 format(s): 233
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 72
[download] Destination: ./media/Untold_story_of_AI_s_fastest_chip.mp4
[download] 100% of    2.49MiB in 00:00:14 at 179.68KiB/s                
[ExtractAudio] Destination: ./media/Untold_story_of_AI_s_fastest_chip.mp3
Deleting original file ./media/Untold_story_of_AI_s_fastest_chip.mp4 (pass -k to keep)
filepath: ./media/Untold_story_of_AI_s_fastest_chip.mp3


 67%|██████▋   | 2/3 [01:08<00:32, 32.03s/it]

filepath: ./media/interview_speech-analytics.wav


100%|██████████| 3/3 [01:12<00:00, 24.26s/it]


## 3. Splitting and indexing the transcriptions into a vectorstore

In [10]:
from langchain_chroma import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings

embed_model = HuggingFaceEmbeddings()
text_splitter = RecursiveCharacterTextSplitter(
    separators=[
        "\n\n", 
        "\n", 
        " ",
        "",
        # For multilingual/non-english text
        # "\u200b",  # Zero-width space
        # "\uff0c",  # Fullwidth comma
        # "\u3001",  # Ideographic comma
        # "\uff0e",  # Fullwidth full stop
        # "\u3002",  # Ideographic full stop
    ],
    chunk_size=2500,
    chunk_overlap=0,
    length_function=len,
    is_separator_regex=False,
)
documents = text_splitter.split_documents(transcribed_documents)
vectorstore = Chroma.from_documents(documents, embedding=embed_model, collection_name="groq_media_rag")
retriever = vectorstore.as_retriever(
    search_kwargs={
        'k': 2 # Limit to 2 documents per retrieval
    }
)
print(f"Documents indexed: {len(documents)}")
print(f"Total number of documents in vectorstore: {len(vectorstore.get()['ids'])}")

Documents indexed: 15
Total number of documents in vectorstore: 15


## 4. Setting up the Step Back Prompting RAG pipeline using Langchain's LCEL (LangChain Expression Language)

### Create Step Back Prompting Chain

Step Back Prompting enhances our RAG system by rephrasing specific questions into more general forms, leading to better context retrieval and more comprehensive answers. Here's how we implement it:

1. Define a `StepBackQuestions` class for structured output.
2. Initialize the `ChatGroq` model with "llama3-8b-8192".
3. Create a system prompt for question rephrasing.
4. Set up a `ChatPromptTemplate` combining the system prompt and user question.
5. Construct the `STEP_BACK_CHAIN` using Langchain's LCEL to:
   - Take the user's question
   - Generate a more generic question
   - Retrieve relevant documents
   - Combine retrieved documents into a single string

In [11]:
# Adapted from: https://github.com/langchain-ai/langchain/blob/master/cookbook/stepback-qa.ipynb
from pydantic import BaseModel, Field
from operator import itemgetter

from langchain_groq import ChatGroq
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableLambda, RunnablePassthrough

class StepBackQuestions(BaseModel):
    question: str = Field(
        description="a question"
    )

STEP_BACK_LLM = ChatGroq(model="llama3-8b-8192").with_structured_output(StepBackQuestions, method='function_calling')

STEP_BACK_SYSTEM_PROMPT = """\
You are an expert analyser. Your task is to step back and paraphrase a question to a more generic step-back question, which is easier to answer. Here are a few examples:
Question: "What is the primary advantage of Groq's LPU architecture?
Answer: "what are the key features of Groq's LPU design?

Question: "How does Groq's LPU compare to traditional GPUs in AI workloads?
Answer: "how does Groq's LPU performance compare to other AI chips?\
"""

STEP_BACK_PROMPT = ChatPromptTemplate.from_messages([
    ("system", STEP_BACK_SYSTEM_PROMPT),
    ("human", "{question}")
])

STEP_BACK_CHAIN = (
    {
        "question": RunnablePassthrough()
    }
    |
    STEP_BACK_PROMPT
    |
    STEP_BACK_LLM
    |
    itemgetter("question")
    |
    retriever
    | 
    RunnableLambda(lambda docs: "\n".join(doc.page_content for doc in (docs)))
)
STEP_BACK_CHAIN.invoke("Why is Groq so fast?")

"inbound from companies featured in the S&P 500. To do this, they're undergoing a production ramp-up. There's another interesting part of this story. The entirety of the chip is made in the U.S., which is very unique from its chip-making counterparts that rely heavily on Taiwan's TSMC foundry. There are some critical questions that come from Groq's value proposition. Is speed going to matter so much that the company can take meaningful market share? Does a shorter time to render answers make a difference to the users of large language models? Also, when companies order chips from Groq, they have to order a ton of them. What Groq can handle with 578 LPU chips, NVIDIA can handle with two of their H100 GPU chips. Does this make scaling unfeasible? While it's unclear whether Groq will be able to turn its moment of virality into the backbone for fast AI compute, this is a moment that was a long time in the making for Ross and crew. And if we want real innovation, the kind that drives down c

### Create the Final RAG Pipeline with Step Back Prompting Chain

This section combines the Step Back Prompting chain with our main RAG pipeline to create a comprehensive question-answering system. The pipeline:

1. Initializes a Groq LLM for the final response generation.
2. Defines a system prompt that instructs the model to use both regular and step-back contexts.
3. Creates a chat prompt template combining the system prompt and user input.
4. Sets up a function to format retrieved documents.
5. Constructs the final RAG chain using LCEL, which:
   - Retrieves and formats regular context
   - Obtains step-back context from the previous chain
   - Passes the user's input
   - Generates a response using the LLM
   - Parses the output to a string

This integrated pipeline leverages both direct and generalized contexts to produce more informed and comprehensive answers.

The code below implements this final RAG pipeline:

In [12]:
from langchain_core.documents import Document
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

rag_llm = ChatGroq(model='llama3-70b-8192', temperature=0.2)

RAG_SYSTEM_PROMPT = """\
You are an assistant for question-answering tasks. \
Use the following pieces of retrieved context given within delimiters to answer the human's questions.
```
{context}
{step_back_context}
```"""

RAG_PROMPT = ChatPromptTemplate.from_messages([
    ("system", RAG_SYSTEM_PROMPT),
    ("human", "{input}")
])

def format_docs(docs: List[Document]):
    """Format the retrieved documents"""
    return "\n".join(doc.page_content for doc in docs)

rag_chain = (
    {
        "context": retriever | RunnableLambda(lambda docs: "\n".join(doc.page_content for doc in (docs))), # Use retriever to retrieve docs from vectorstore -> format the documents into a string
        "step_back_context": STEP_BACK_CHAIN, # Retrieve step back chain docs
        "input": RunnablePassthrough() # Propogate the 'input' variable to the next step
    }
    | RAG_PROMPT # format prompt with 'context' and 'input' variables
    | rag_llm # get response from LLM using the formatteed prompt
    | StrOutputParser() # Parse through LLM response to get only the string response
)

## 5. Chat with the media

We can call the `.invoke()` method to run the whole step back rag chain pipeline and query our media files.

In [13]:
print(rag_chain.invoke("Why is Groq so fast?"))
print(rag_chain.invoke("Why is function calling on Groq powerful?"))
print(rag_chain.invoke("What are KPIs?"))

Groq is fast due to its LPU (Language Processing Unit) design, which is specifically tailored for sequential language processing tasks. This design allows it to process language tasks quickly and efficiently. Additionally, the LPU has memory on the chip, which is a rarity, and is optimized for inference, making it well-suited for tasks like chat interfaces. This unique design enables Groq to achieve speeds that are significantly faster than traditional graphics processing units (GPUs).
Function calling on Groq is powerful because it allows for a sequential process of calling various tools, using the answer, and then going back to prepare a final answer. This process is similar to a single core processor that can fetch in memory, reason over that memory, use another tool, and then prepare a final answer. Additionally, Groq has an "insane amount of budget" when it comes to tokens per second, which enables this sequential process to happen efficiently. This capability has the potential to

## Conclusion

This notebook has demonstrated how to build a powerful question-answering system that combines Groq's Whisper API for media transcription with Step Back Prompting RAG for enhanced comprehension and response generation.

This approach significantly enhances the quality of responses by considering broader contexts before answering specific questions. It demonstrates the power of combining cutting-edge transcription technology with advanced language understanding techniques.

By following this tutorial, you've created a versatile tool that can extract insights from audio and video content, opening up new possibilities for content analysis, research, and information retrieval.

We hope this tutorial has been informative and inspires you to explore further applications of these powerful technologies!