# Retrieval-Augmented Generation with Groq API and BM25



Objective: Implement a simple RAG model by combining a retrieval model (like BM25) with a language generation model (like BART) to answer queries based on a document set.
1. Set up a small document corpus.
2. Use BM25 to retrieve relevant documents based on a user query.
3. Pass the retrieved documents to a language generation model to formulate an answer.
4. Evaluate the quality of the generated responses.

In [11]:
!pip install groq
!pip install rank_bm25
!pip install nltk

Collecting rank_bm25
  Downloading rank_bm25-0.2.2-py3-none-any.whl.metadata (3.2 kB)
Downloading rank_bm25-0.2.2-py3-none-any.whl (8.6 kB)
Installing collected packages: rank_bm25
Successfully installed rank_bm25-0.2.2


In [7]:
api_key = "put_Your_API_key_here"

In [12]:
from groq import Groq
from rank_bm25 import BM25Okapi
from nltk.tokenize import word_tokenize

In [15]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [16]:
corpus = [
    "The sky is blue.",
    "The sun is bright.",
    "The sun in the sky is bright.",
    "We can see the shining sun, the bright sun."
]

tokenized_corpus = [word_tokenize(doc.lower()) for doc in corpus]
bm25 = BM25Okapi(tokenized_corpus)

query = "What color is the sky?"
tokenized_query = word_tokenize(query.lower())
doc_scores = bm25.get_scores(tokenized_query)
top_n_docs = [corpus[i] for i in doc_scores.argsort()[-3:][::-1]]

In [19]:
client = Groq(api_key=api_key)


# Prepare the context using the retrieved documents
context = " ".join(top_n_docs) + " " + query

# Send the request to Groq's Llama3 model
completion = client.chat.completions.create(
    model="llama3-groq-70b-8192-tool-use-preview",
    messages=[
        {
            "role": "system",
            "content": "You are a document assistant who retrieves information from documents and answers queries based on them."
        },
        {
            "role": "user",
            "content": f"Based on the following documents: {context}, answer the user's query."
        }
    ],
    temperature=0.5,
    max_tokens=1024,
    top_p=0.65,
    stream=True,
    stop=None,
)

# Stream the response and print it
for chunk in completion:
    print(chunk.choices[0].delta.content or "", end="")

The sky is blue.