# 🧠 Retrieval-Augmented Generation (RAG) with LangChain, Groq & OpenAI
This notebook demonstrates a simple yet effective RAG pipeline using the LangChain framework.
It combines retrieval from a vector store with the power of LLMs for enhanced question answering.

Technologies used:
- `LangChain`
- `Groq` (DeepSeek LLM)
- `OpenAI` Embeddings
- `FAISS` Vector Store
- `.env` configuration

In [23]:
from langchain_groq import ChatGroq
from dotenv import load_dotenv
import os

In [24]:
load_dotenv()

True

In [25]:
llm=ChatGroq(model="deepseek-r1-distill-llama-70b")

In [26]:
print(llm.invoke("What is the capital of France?").content)

<think>

</think>

The capital of France is Paris.


In [31]:
api_key = os.getenv("OPENAI_API_KEY")
from langchain.embeddings import OpenAIEmbeddings


embedding_model = OpenAIEmbeddings(
    model="text-embedding-3-small",
    openai_api_key=api_key
)


In [33]:
doc_vector=embedding_model.embed_query("What is the capital of France?")

In [34]:
doc_vector

[0.04169800796545377,
 0.01580059497647811,
 0.02816049141105829,
 0.024351143736323445,
 -0.023142803010658835,
 -0.0027392478127470483,
 -0.014223607911944254,
 0.014336249512510045,
 0.010834108827590925,
 -0.010199217858909303,
 0.0069428410894621624,
 -0.02404393954047539,
 -0.061645875418364096,
 -0.015083783013080014,
 -0.014233847803453173,
 0.023163282793676677,
 -0.006625395605121351,
 0.019446098799134896,
 0.07241853930896616,
 -0.02439210516500424,
 0.0030029322368515547,
 -0.010091695738436693,
 -0.04100167299110584,
 0.011970768450047981,
 0.062096441820627256,
 0.0070964436530474686,
 -0.045548315206224294,
 -0.007347327979935186,
 0.0036403834112410456,
 0.03942468685789454,
 0.04214857436771693,
 -0.025149877625760573,
 -0.0019558739934039406,
 0.04309067046356917,
 -0.02453546737141935,
 -0.03995717611751929,
 -0.037642895443924386,
 -0.03934276400053295,
 0.021320052030653,
 0.029676036332570956,
 -0.0031360545517577425,
 -0.013025507077788564,
 0.006845559326159751

## Data Ingestion

In [35]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [36]:
os.getcwd()

'c:\\Users\\user\\Desktop\\LLMops\\document_portal\\notebook'

In [37]:
file_path=os.path.join(os.getcwd(), "data", "sample.pdf")

In [39]:
loader=PyPDFLoader(file_path)

In [42]:
documents=loader.load()

In [43]:
documents

[Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2024-04-10T21:11:43+00:00', 'author': '', 'keywords': '', 'moddate': '2024-04-10T21:11:43+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'c:\\Users\\user\\Desktop\\LLMops\\document_portal\\notebook\\data\\sample.pdf', 'total_pages': 15, 'page': 0, 'page_label': '1'}, page_content='Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAidan N. Gomez∗ †\nUniversity of Toro

In [44]:
len(documents)

15

In [45]:
text_splitter=RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=150,
    length_function=len
)

In [46]:
docs=text_splitter.split_documents(documents)

In [47]:
len(docs)

115

In [48]:
docs[0]

Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2024-04-10T21:11:43+00:00', 'author': '', 'keywords': '', 'moddate': '2024-04-10T21:11:43+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'c:\\Users\\user\\Desktop\\LLMops\\document_portal\\notebook\\data\\sample.pdf', 'total_pages': 15, 'page': 0, 'page_label': '1'}, page_content='Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAidan N. Gomez∗ †\nUniversity of Toron

In [49]:
docs[0].metadata

{'producer': 'pdfTeX-1.40.25',
 'creator': 'LaTeX with hyperref',
 'creationdate': '2024-04-10T21:11:43+00:00',
 'author': '',
 'keywords': '',
 'moddate': '2024-04-10T21:11:43+00:00',
 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5',
 'subject': '',
 'title': '',
 'trapped': '/False',
 'source': 'c:\\Users\\user\\Desktop\\LLMops\\document_portal\\notebook\\data\\sample.pdf',
 'total_pages': 15,
 'page': 0,
 'page_label': '1'}

In [50]:
docs[1].metadata

{'producer': 'pdfTeX-1.40.25',
 'creator': 'LaTeX with hyperref',
 'creationdate': '2024-04-10T21:11:43+00:00',
 'author': '',
 'keywords': '',
 'moddate': '2024-04-10T21:11:43+00:00',
 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5',
 'subject': '',
 'title': '',
 'trapped': '/False',
 'source': 'c:\\Users\\user\\Desktop\\LLMops\\document_portal\\notebook\\data\\sample.pdf',
 'total_pages': 15,
 'page': 0,
 'page_label': '1'}

In [52]:
docs[100].metadata

{'producer': 'pdfTeX-1.40.25',
 'creator': 'LaTeX with hyperref',
 'creationdate': '2024-04-10T21:11:43+00:00',
 'author': '',
 'keywords': '',
 'moddate': '2024-04-10T21:11:43+00:00',
 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5',
 'subject': '',
 'title': '',
 'trapped': '/False',
 'source': 'c:\\Users\\user\\Desktop\\LLMops\\document_portal\\notebook\\data\\sample.pdf',
 'total_pages': 15,
 'page': 11,
 'page_label': '12'}

In [54]:
docs[11].page_content

'significant improvements in computational efficiency through factorization tricks [21] and conditional\ncomputation [32], while also improving model performance in case of the latter. The fundamental\nconstraint of sequential computation, however, remains.\nAttention mechanisms have become an integral part of compelling sequence modeling and transduc-\ntion models in various tasks, allowing modeling of dependencies without regard to their distance in'

In [55]:
from langchain.vectorstores import FAISS

In [56]:
docs[0].page_content

'Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAidan N. Gomez∗ †\nUniversity of Toronto\naidan@cs.toronto.edu'

In [57]:
doc_vector=embedding_model.embed_documents(docs[0].page_content)

In [58]:
len(embedding_model.embed_documents(docs[0].page_content)[0])

1536

In [59]:
vectorstore=FAISS.from_documents(docs, embedding_model)

## Retrieval-Augmented Generation (RAG)

In [62]:
relevant_doc=vectorstore.similarity_search("where does tansfromeres come from?")

In [63]:
relevant_doc

[Document(id='690f4ea2-2699-4a00-82cc-9bdd13cdd225', metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2024-04-10T21:11:43+00:00', 'author': '', 'keywords': '', 'moddate': '2024-04-10T21:11:43+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'c:\\Users\\user\\Desktop\\LLMops\\document_portal\\notebook\\data\\sample.pdf', 'total_pages': 15, 'page': 8, 'page_label': '9'}, page_content='results to the base model.\n6.3 English Constituency Parsing\nTo evaluate if the Transformer can generalize to other tasks we performed experiments on English\nconstituency parsing. This task presents specific challenges: the output is subject to strong structural\nconstraints and is significantly longer than the input. Furthermore, RNN sequence-to-sequence\nmodels have not been able to attain state-of-the-art results in small-data regimes [

In [64]:
relevant_doc[0].page_content

'results to the base model.\n6.3 English Constituency Parsing\nTo evaluate if the Transformer can generalize to other tasks we performed experiments on English\nconstituency parsing. This task presents specific challenges: the output is subject to strong structural\nconstraints and is significantly longer than the input. Furthermore, RNN sequence-to-sequence\nmodels have not been able to attain state-of-the-art results in small-data regimes [37].'

In [65]:
relevant_doc=vectorstore.similarity_search("llama2 finetuning benchmark experiments.",k=10)

In [66]:
relevant_doc

[Document(id='cbf1c277-6dbb-47f8-ab26-b1ca9b72f5dc', metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2024-04-10T21:11:43+00:00', 'author': '', 'keywords': '', 'moddate': '2024-04-10T21:11:43+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'c:\\Users\\user\\Desktop\\LLMops\\document_portal\\notebook\\data\\sample.pdf', 'total_pages': 15, 'page': 7, 'page_label': '8'}, page_content='in different ways, measuring the change in performance on English-to-German translation on the\n5We used values of 2.8, 3.7, 6.0 and 9.5 TFLOPS for K80, K40, M40 and P100, respectively.\n8'),
 Document(id='43d0515c-085b-4f83-b83a-19242255d946', metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2024-04-10T21:11:43+00:00', 'author': '', 'keywords': '', 'moddate': '2024-04-10T21:11:43+00:00', 'ptex.fullb

In [67]:
retriever = vectorstore.as_retriever(search_kwargs={"k": 10})

In [68]:
retriever.invoke("llama2 finetuning benchmark experiments.")

[Document(id='cbf1c277-6dbb-47f8-ab26-b1ca9b72f5dc', metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2024-04-10T21:11:43+00:00', 'author': '', 'keywords': '', 'moddate': '2024-04-10T21:11:43+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'c:\\Users\\user\\Desktop\\LLMops\\document_portal\\notebook\\data\\sample.pdf', 'total_pages': 15, 'page': 7, 'page_label': '8'}, page_content='in different ways, measuring the change in performance on English-to-German translation on the\n5We used values of 2.8, 3.7, 6.0 and 9.5 TFLOPS for K80, K40, M40 and P100, respectively.\n8'),
 Document(id='43d0515c-085b-4f83-b83a-19242255d946', metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2024-04-10T21:11:43+00:00', 'author': '', 'keywords': '', 'moddate': '2024-04-10T21:11:43+00:00', 'ptex.fullb

### Question: user question
### Context: Top-k relevant chunks retrieved from the vector store using semantic search.


In [69]:
prompt_template = """
        Answer the question based on the context provided below. 
        If the context does not contain sufficient information, respond with: 
        "I do not have enough information about this."

        Context: {context}

        Question: {question}

        Answer:"""

In [70]:
from langchain.prompts import PromptTemplate

In [71]:
prompt=PromptTemplate(
    template=prompt_template,
    input_variables=["context", "question"]
)

In [72]:
prompt

PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='\n        Answer the question based on the context provided below. \n        If the context does not contain sufficient information, respond with: \n        "I do not have enough information about this."\n\n        Context: {context}\n\n        Question: {question}\n\n        Answer:')

In [73]:
from langchain_core.output_parsers import StrOutputParser

In [74]:
parser=StrOutputParser()

In [75]:
def format_docs(docs):
    return "\n\n".join([doc.page_content for doc in docs])

In [76]:
from langchain_core.runnables import RunnablePassthrough

In [77]:
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [79]:
rag_chain.invoke("tell  me about the finetuning benchmark experiments?")

'<think>\nOkay, I need to figure out how to answer the question: "tell me about the finetuning benchmark experiments?" based on the provided context. \n\nFirst, I\'ll read through the context to find any mentions of fine-tuning or benchmark experiments. The context discusses various experiments with the Transformer model, specifically on English-to-German translation. It talks about varying attention heads, key dimensions, positional embeddings, and other hyperparameters.\n\nI see sections about model variations, training details, and results in tables. However, I don\'t notice any specific section or mention of "finetuning" or "benchmark experiments" explicitly. The experiments described seem to be about architecture variations rather than fine-tuning on specific datasets or tasks.\n\nSince the context doesn\'t provide information about fine-tuning benchmarks, I should respond that there\'s not enough information available.\n</think>\n\nI do not have enough information about this.'