### **Install and Import libarary**

In [1]:
! pip install -U langchain
! pip install -qU langchain[groq]
! pip install langchain_community
! pip install -qU langchain-mistralai
! pip install pypdf
! pip install chromadb
! pip install faiss-cpu



In [2]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

### **set envorinment variables**

In [None]:
import os

os.environ['LANGSMITH_TRACING'] = "true"   # for tracing api calls
os.environ['LANGSMITH_ENDPOINT'] = "https://api.smith.langchain.com"  # provide the url where it will trace
os.environ['LANGSMITH_API_KEY'] = "ENTER_YOUR_LANGCHAIN_API_KEY" # api key to access langchain
os.environ['LANGSMITH_PROJECT'] ="PROJECT_NAME" # project name on langsmith
os.environ['GROQ_API_KEY'] = "ENTER_YOUR_GROQ_API_KEY" # LLM api key for
os.environ['MISTRAL_API_KEY'] = "ENTER_YOUR_MISTRAL_API_KEY"
os.environ['HF_TOKEN'] = 'ENTER_YOUR_HUGGINGFACE_TOKEN'

### **Load LLaMA3 chat model**

In [52]:
from langchain.chat_models import init_chat_model

llm = init_chat_model('llama3-8b-8192', model_provider="groq")

### **load the data from Text file**

In [53]:
from langchain_community.document_loaders import TextLoader, WebBaseLoader, PyPDFLoader
import bs4

In [54]:
# loading synthatic data using TextLoader
loader = TextLoader('business_data.txt')
text_docs = loader.load()
print(text_docs)

[Document(metadata={'source': 'business_data.txt'}, page_content='Acme Corporation\n1234 Business Park Drive\nInnovation City, CA 90210\nPhone: (555) 123-4567\nEmail: support@acmecorp.com\nWebsite: www.acmecorp.com\n\n--------------------------------------------------------------------------------\nCompany Overview\n--------------------------------------------------------------------------------\nAcme Corporation is a leading provider of innovative business solutions, specializing in state-of-the-art technology and customer support services. Founded in 2005, Acme Corporation has grown into a multinational organization with operations in over 20 countries. Our mission is to empower businesses with tools that drive efficiency, streamline operations, and enhance customer satisfaction.\n\nOur core business areas include:\n- Enterprise Software Solutions\n- Cloud-Based Customer Support Platforms\n- Data Analytics and Business Intelligence Tools\n- Digital Transformation Consulting\n\nAt Acm

In [55]:
# print the content inside the document
print(text_docs[0].page_content[:600])

Acme Corporation
1234 Business Park Drive
Innovation City, CA 90210
Phone: (555) 123-4567
Email: support@acmecorp.com
Website: www.acmecorp.com

--------------------------------------------------------------------------------
Company Overview
--------------------------------------------------------------------------------
Acme Corporation is a leading provider of innovative business solutions, specializing in state-of-the-art technology and customer support services. Founded in 2005, Acme Corporation has grown into a multinational organization with operations in over 20 countries. Our mission 


In [56]:
# print the metadata of the loaded docs
print(text_docs[0].metadata)

{'source': 'business_data.txt'}


### **load the data from Web Url**

In [57]:
# loading pytorch tutorial content here
web_loader = WebBaseLoader(web_path='https://pytorch.org/tutorials/beginner/basics/intro.html', bs_kwargs=dict(parse_only=bs4.SoupStrainer(class_=('section'))))
web_docs = web_loader.load()
print(web_docs)

[Document(metadata={'source': 'https://pytorch.org/tutorials/beginner/basics/intro.html'}, page_content='\nLearn the Basics¶Created On: Feb 09, 2021 | Last Updated: Nov 04, 2024 | Last Verified: Nov 05, 2024\nAuthors:\nSuraj Subramanian,\nSeth Juarez,\nCassie Breviu,\nDmitry Soshnikov,\nAri Bornstein\nMost machine learning workflows involve working with data, creating models, optimizing model\nparameters, and saving the trained models. This tutorial introduces you to a complete ML workflow\nimplemented in PyTorch, with links to learn more about each of these concepts.\nWe’ll use the FashionMNIST dataset to train a neural network that predicts if an input image belongs\nto one of the following classes: T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker,\nBag, or Ankle boot.\nThis tutorial assumes a basic familiarity with Python and Deep Learning concepts.\n\nRunning the Tutorial Code¶\nYou can run this tutorial in a couple of ways:\n\nIn the cloud: This is the easiest w

In [58]:
# print the content inside the web document
print(web_docs[0].page_content[200:755])

ne learning workflows involve working with data, creating models, optimizing model
parameters, and saving the trained models. This tutorial introduces you to a complete ML workflow
implemented in PyTorch, with links to learn more about each of these concepts.
We’ll use the FashionMNIST dataset to train a neural network that predicts if an input image belongs
to one of the following classes: T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker,
Bag, or Ankle boot.
This tutorial assumes a basic familiarity with Python and Deep Learning 


In [59]:
# print the metadata of the loaded docs
print(web_docs[0].metadata)

{'source': 'https://pytorch.org/tutorials/beginner/basics/intro.html'}


### **load the data from Pdf Document**

In [None]:
pdf_loader = PyPDFLoader('deepseek_r1_paper.pdf')
pdf_data = pdf_loader.load()
print(pdf_data)

[Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2025-01-23T01:45:31+00:00', 'author': '', 'keywords': '', 'moddate': '2025-01-23T01:45:31+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': '/content/deepseek_r1_paper.pdf', 'total_pages': 22, 'page': 0, 'page_label': '1'}, page_content='DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via\nReinforcement Learning\nDeepSeek-AI\nresearch@deepseek.com\nAbstract\nWe introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.\nDeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without super-\nvised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities.\nThrough RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing\nreasoning behaviors. However, it encounte

In [202]:
# print the metadata of the loaded docs
print(pdf_data[0].metadata)

{'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2025-01-23T01:45:31+00:00', 'author': '', 'keywords': '', 'moddate': '2025-01-23T01:45:31+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': '/content/deepseek_r1_paper.pdf', 'total_pages': 22, 'page': 0, 'page_label': '1'}


In [203]:
# print the content inside the pdf document
print(pdf_data[0].page_content[400:750])

Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing
reasoning behaviors. However, it encounters challenges such as poor readability, and language
mixing. To address these issues and further enhance reasoning performance, we introduce
DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL.


### **Split the Documents into the Chunks**

In [245]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
pdf_docs = splitter.split_documents(pdf_data)

In [248]:
# print the lenght of pdf docs
print('total documents after splitting text into chunks :', len(pdf_docs))

total documents after splitting text into chunks : 70


In [249]:
# print first 3 splitted docs
print(pdf_docs[10])
print('++++++++++++++++++++++++++++++++++++')
print(pdf_docs[11])
print('++++++++++++++++++++++++++++++++++++')
print(pdf_docs[12])

page_content='and 57.2% on LiveCodeBench. These results significantly outperform previous open-
source models and are comparable to o1-mini. We open-source distilled 1.5B, 7B, 8B, 14B,
32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community.
1.2. Summary of Evaluation Results
• Reasoning tasks: (1) DeepSeek-R1 achieves a score of 79.8% Pass@1 on AIME 2024, slightly
surpassing OpenAI-o1-1217. On MATH-500, it attains an impressive score of 97.3%,
performing on par with OpenAI-o1-1217 and significantly outperforming other models. (2)
On coding-related tasks, DeepSeek-R1 demonstrates expert level in code competition tasks,
as it achieves 2,029 Elo rating on Codeforces outperforming 96.3% human participants in
the competition. For engineering-related tasks, DeepSeek-R1 performs slightly better than
DeepSeek-V3, which could help developers in real world tasks.
• Knowledge: On benchmarks such as MMLU, MMLU-Pro, and GPQA Diamond, DeepSeek-' metadata={'producer': 'pdfTeX-1.

### **create the Vector Embeddings and Store in VectorDB**

In [250]:
from langchain_mistralai import MistralAIEmbeddings

# using mistral opensource embeddings
embedding_model = MistralAIEmbeddings(model='mistral-embed')

In [251]:
from langchain_community.vectorstores import Chroma

# store first 50 documents into vectorstore
db = Chroma.from_documents(pdf_docs[:20], embedding_model)
print(db)

<langchain_community.vectorstores.chroma.Chroma object at 0x7be579f7d810>


### **search the similar docs to user Query from VectorDB**

In [252]:
query = '''Using the reasoning data generated by DeepSeek-R1, we fine-tuned several dense models
that are widely used in the research community.'''
results = db.similarity_search_with_relevance_scores(query, k=4)
print(len(results))

4


In [255]:
print('---------------------------------')
print('similarity score :', results[0][1])
print('---------------------------------')
print('content >>> ', results[0][0].page_content)
print()
print('---------------------------------')
print()
print('similarity score :', results[1][1])
print('---------------------------------')
print('content >>> ', results[1][0].page_content)
print()
print('---------------------------------')
print()
print('similarity score :', results[3][1])
print('---------------------------------')
print('content >>> ', results[3][0].page_content)
print()
print('---------------------------------')
print()

---------------------------------
similarity score : 0.907244501276104
---------------------------------
content >>>  • Using the reasoning data generated by DeepSeek-R1, we fine-tuned several dense models
that are widely used in the research community. The evaluation results demonstrate that
the distilled smaller dense models perform exceptionally well on benchmarks. DeepSeek-
R1-Distill-Qwen-7B achieves 55.5% on AIME 2024, surpassing QwQ-32B-Preview. Addi-
tionally, DeepSeek-R1-Distill-Qwen-32B scores 72.6% on AIME 2024, 94.3% on MATH-500,

---------------------------------

similarity score : 0.9047781836050799
---------------------------------
content >>>  through RL on small models. The open source DeepSeek-R1, as well as its API, will benefit
the research community to distill better smaller models in the future.
• Using the reasoning data generated by DeepSeek-R1, we fine-tuned several dense models
that are widely used in the research community. The evaluation results demonstrate

### **create prompt template for LLM calling**

In [256]:
from langchain_core.prompts import ChatPromptTemplate


prompt = ChatPromptTemplate.from_template("""
        Answer the following question based on the provide context.
        Thnink step by step before providing the answer.

        <context>
        {context}
        </context>

        Question: {input}""")

### **create Document Chains**

In [257]:
from langchain.chains.combine_documents import create_stuff_documents_chain

# this will create the chain to pass list of documents to llm with provifde promopt format
# it must contins "context" variable inside the promot
document_chain = create_stuff_documents_chain(llm, prompt)
document_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, template='\n        Answer the following question based on the provide context.\n        Thnink step by step before providing the answer.\n  \n        <context>\n        {context}\n        </context>\n        \n        Question: {input}'), additional_kwargs={})])
| ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x7be578f61ad0>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x7be578fc3850>, model_name='llama3-8b-8192', model_kwargs={}, groq_api_key=SecretStr('**********'))
| StrOutputParser(), kwargs={}, config={'run_name': 'stuff_

### **create Retriever**

In [258]:
# create retriever directly from verctorstore
retriever = db.as_retriever()
retriever

VectorStoreRetriever(tags=['Chroma', 'MistralAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x7be579f7d810>, search_kwargs={})

### **create Retriever Chain**

In [259]:
from langchain.chains import create_retrieval_chain

# create_retriever_chain takes retriever and document chain as input to
retriever_chain = create_retrieval_chain(retriever, document_chain)
retriever_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['Chroma', 'MistralAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x7be579f7d810>, search_kwargs={}), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, template='\n        Answer the following question based on the provide context.\n        Thnink step by step before providing the answer.\n

### **Ask you Query to RAG System (LLM + Context from VectorDB)**

In [260]:
response = retriever_chain.invoke({'input': 'what is Group Relative Policy Optimization ?'})
response.keys()

dict_keys(['input', 'context', 'answer'])

In [261]:
# print the context docs
response['context'][0]

Document(metadata={'author': '', 'creationdate': '2025-01-23T01:45:31+00:00', 'creator': 'LaTeX with hyperref', 'keywords': '', 'moddate': '2025-01-23T01:45:31+00:00', 'page': 4, 'page_label': '5', 'producer': 'pdfTeX-1.40.25', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'source': '/content/deepseek_r1_paper.pdf', 'subject': '', 'title': '', 'total_pages': 22, 'trapped': '/False'}, page_content='brief overview of our RL algorithm, followed by the presentation of some exciting results, and\nhope this provides the community with valuable insights.\n2.2.1. Reinforcement Learning Algorithm\nGroup Relative Policy OptimizationIn order to save the training costs of RL, we adopt Group\nRelative Policy Optimization (GRPO) (Shao et al., 2024), which foregoes the critic model that is\ntypically the same size as the policy model, and estimates the baseline from group scores instead.')

In [265]:
# print the first 2 context docs
print(response['context'][0].page_content)

brief overview of our RL algorithm, followed by the presentation of some exciting results, and
hope this provides the community with valuable insights.
2.2.1. Reinforcement Learning Algorithm
Group Relative Policy OptimizationIn order to save the training costs of RL, we adopt Group
Relative Policy Optimization (GRPO) (Shao et al., 2024), which foregoes the critic model that is
typically the same size as the policy model, and estimates the baseline from group scores instead.


In [266]:
# print the answer
print(response['answer'])

According to the provided context, Group Relative Policy Optimization (GRPO) is a reinforcement learning algorithm that "foregoes the critic model that is typically the same size as the policy model, and estimates the baseline from group scores instead."

In other words, GRPO is a policy optimization algorithm that does not use a separate critic model to estimate the value function, but instead uses group scores to estimate the baseline. This approach aims to save training costs in reinforcement learning.


### **Ask Another Query**

In [267]:
response = retriever_chain.invoke({'input': '''One of the most remarkable aspects of this self-evolution is the emergence of sophisticated
behaviors as the test-time computation increases. why is that ?'''})

In [268]:
# print the first 2 context docs
print(response['context'][0].page_content)
print('--------------------------------')
print(response['context'][1].page_content)
print('--------------------------------')

ability to solve increasingly complex reasoning tasks by leveraging extended test-time compu-
tation. This computation ranges from generating hundreds to thousands of reasoning tokens,
allowing the model to explore and refine its thought processes in greater depth.
One of the most remarkable aspects of this self-evolution is the emergence of sophisticated
behaviors as the test-time computation increases. Behaviors such as reflection—where the model
--------------------------------
tation. This computation ranges from generating hundreds to thousands of reasoning tokens,
allowing the model to explore and refine its thought processes in greater depth.
One of the most remarkable aspects of this self-evolution is the emergence of sophisticated
behaviors as the test-time computation increases. Behaviors such as reflection—where the model
revisits and reevaluates its previous steps—and the exploration of alternative approaches to
--------------------------------


In [269]:
# print the answer
print(response['answer'])

Based on the provided context, one of the most remarkable aspects of this self-evolution is the emergence of sophisticated behaviors as the test-time computation increases because it allows the model to:

* Generate hundreds to thousands of reasoning tokens, allowing the model to explore and refine its thought processes in greater depth.
* Revisit and reevaluate its previous steps (reflection), which enables the model to refine its reasoning processes.
* Explore alternative approaches to problem-solving, which enhances the model's reasoning capabilities.

These behaviors are not explicitly programmed, but instead, emerge as a result of the model's interaction with the reinforcement learning environment.
