Building a RAG model with langchain and HUgging face embeddings

Retrieval Agumented Generation is a powerful technique that combines capabilities of large language models with external knowledge reterival
* Langchain :A framework for developing applications powered by language models
* Chroma DB: An open source vector database for storing and retrieving embeddings
* OpenAI or (hugging face, groq embeddings):for embeddings and language model

In [4]:
import os
from dotenv import load_dotenv
load_dotenv()

False

In [18]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_core.documents import Document

## vector stores
from langchain_community.vectorstores import Chroma
import numpy as np
from typing import List

### RAG Architecture
1. Document Loading : Load documents from various sources
2. Document Splitting: Split documents into chunks 
3. embedding generation: convert chunks into vectors using embedding 
4. vector storage: store embedding into vector store(chromabd , fiass or vector db)
5. query processing : convert user query into embedding 
6. similarity search: Get the similarity search from vector store to get relevant 
7. Context Agumentation: combine retrieved chunks with query 
8. response generation: LLM generates answer using context 
#### Benefits of using RAG
1. reduce hallucinations
2. provides up-to-date information
3. allows citing sources
4. works with domain specific knowledge

#### Document Loading

In [19]:
from langchain_community.document_loaders import DirectoryLoader

dirload=DirectoryLoader(
    'data',
    glob='*.txt',
    loader_cls=TextLoader,
    loader_kwargs={'encoding':'utf-8'}
)

In [20]:
loaded_txtfiles=dirload.load()
loaded_txtfiles

[Document(metadata={'source': 'data\\doc1.txt'}, page_content="Natural Language Processing (NLP) is a subfield of artificial intelligence that enables computers to understand, interpret, and generate human language in a meaningful way.\nDefinition and Overview\nNatural Language Processing (NLP) combines computer science, artificial intelligence, and linguistics to facilitate interactions between computers and humans using natural language. It encompasses a range of techniques that allow machines to process and analyze large amounts of natural language data, enabling them to perform tasks such as translation, sentiment analysis, and text summarization. \nGeeksForGeeks\n+1\nKey Applications\nNLP is widely used in various applications, including:\nChatbots and Virtual Assistants: Tools like Amazon's Alexa and Apple's Siri utilize NLP to understand user queries and provide relevant responses. \n2\nText Translation: NLP powers translation services that convert text from one language to anot

### Splitting files 

In [21]:
splitter=RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100,
    length_function=len,
    separators=[" "]
)
chunks=splitter.split_documents(loaded_txtfiles)
print(chunks)
print(f"Len of the chunks documents: {len(chunks)}")
print(f"first chunk is {chunks[0]}")
print(f"second chunk is {chunks[1]}")


[Document(metadata={'source': 'data\\doc1.txt'}, page_content='Natural Language Processing (NLP) is a subfield of artificial intelligence that enables computers to understand, interpret, and generate human language in a meaningful way.\nDefinition and Overview\nNatural Language Processing (NLP) combines computer science, artificial intelligence, and linguistics to facilitate interactions between computers and humans using natural language. It encompasses a range of techniques that allow machines to process and analyze large amounts of natural language data,'), Document(metadata={'source': 'data\\doc1.txt'}, page_content="of techniques that allow machines to process and analyze large amounts of natural language data, enabling them to perform tasks such as translation, sentiment analysis, and text summarization. \nGeeksForGeeks\n+1\nKey Applications\nNLP is widely used in various applications, including:\nChatbots and Virtual Assistants: Tools like Amazon's Alexa and Apple's Siri utilize

In [22]:
chunks

[Document(metadata={'source': 'data\\doc1.txt'}, page_content='Natural Language Processing (NLP) is a subfield of artificial intelligence that enables computers to understand, interpret, and generate human language in a meaningful way.\nDefinition and Overview\nNatural Language Processing (NLP) combines computer science, artificial intelligence, and linguistics to facilitate interactions between computers and humans using natural language. It encompasses a range of techniques that allow machines to process and analyze large amounts of natural language data,'),
 Document(metadata={'source': 'data\\doc1.txt'}, page_content="of techniques that allow machines to process and analyze large amounts of natural language data, enabling them to perform tasks such as translation, sentiment analysis, and text summarization. \nGeeksForGeeks\n+1\nKey Applications\nNLP is widely used in various applications, including:\nChatbots and Virtual Assistants: Tools like Amazon's Alexa and Apple's Siri utiliz

### Embedding Model

In [23]:
from langchain_huggingface.embeddings import HuggingFaceEmbeddings

embeddings=HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

In [24]:
vectors=embeddings.embed_query(chunks[0].page_content)

In [25]:
vectors

[-0.009500443935394287,
 -0.0006951941759325564,
 0.06901475787162781,
 -0.027056686580181122,
 0.04192333668470383,
 -0.02050250954926014,
 0.040943603962659836,
 0.017849715426564217,
 -0.008673853240907192,
 0.025633497163653374,
 -0.02867620438337326,
 -0.051815662533044815,
 -0.01270467508584261,
 0.0034350890200585127,
 0.044689711183309555,
 0.09272845089435577,
 -0.004294833168387413,
 -0.04311579093337059,
 -0.0594249963760376,
 -0.03851087763905525,
 0.03274223208427429,
 0.10735950618982315,
 -0.08523678034543991,
 -0.029469335451722145,
 0.02170873060822487,
 0.09952089935541153,
 -5.500652696355246e-05,
 -0.06273727118968964,
 0.06564243882894516,
 0.0396789088845253,
 0.03626876324415207,
 0.005678847897797823,
 0.04797719791531563,
 0.12220264971256256,
 -0.031764645129442215,
 0.062304284423589706,
 -0.03304588794708252,
 0.01355755515396595,
 -0.01349285151809454,
 -0.020101826637983322,
 -0.06572545319795609,
 -0.05165945366024971,
 -0.06078336015343666,
 0.0182442143

### Initailize the chromadb vector store and store chunks in vector representations

In [26]:
persistent_dir="./chroma_db"

vector_store=Chroma.from_documents(
    documents=chunks,
    embedding=HuggingFaceEmbeddings(),
    persist_directory=persistent_dir,
    collection_name='RAG_Collection'
)
vector_store

<langchain_community.vectorstores.chroma.Chroma at 0x2081fc9a210>

In [27]:
query="What is NLP?" 
similar_docs=vector_store.similarity_search(query,k=3)

In [28]:
similar_docs

[Document(metadata={'source': 'data\\doc1.txt'}, page_content='Natural Language Processing (NLP) is a subfield of artificial intelligence that enables computers to understand, interpret, and generate human language in a meaningful way.\nDefinition and Overview\nNatural Language Processing (NLP) combines computer science, artificial intelligence, and linguistics to facilitate interactions between computers and humans using natural language. It encompasses a range of techniques that allow machines to process and analyze large amounts of natural language data,'),
 Document(metadata={'source': 'data\\doc1.txt'}, page_content='Natural Language Processing (NLP) is a subfield of artificial intelligence that enables computers to understand, interpret, and generate human language in a meaningful way.\nDefinition and Overview\nNatural Language Processing (NLP) combines computer science, artificial intelligence, and linguistics to facilitate interactions between computers and humans using natural

In [30]:
similarity_scores=vector_store.similarity_search_with_score(query,k=8)
similarity_scores

[(Document(metadata={'source': 'data\\doc1.txt'}, page_content='Natural Language Processing (NLP) is a subfield of artificial intelligence that enables computers to understand, interpret, and generate human language in a meaningful way.\nDefinition and Overview\nNatural Language Processing (NLP) combines computer science, artificial intelligence, and linguistics to facilitate interactions between computers and humans using natural language. It encompasses a range of techniques that allow machines to process and analyze large amounts of natural language data,'),
  0.4971999228000641),
 (Document(metadata={'source': 'data\\doc1.txt'}, page_content='Natural Language Processing (NLP) is a subfield of artificial intelligence that enables computers to understand, interpret, and generate human language in a meaningful way.\nDefinition and Overview\nNatural Language Processing (NLP) combines computer science, artificial intelligence, and linguistics to facilitate interactions between computers

* Chroma db uses eculdiean distance , means closer to 0 means similar macth it ranges from 0 to 2 (sometimes even higher)

### Initialize LLM,RAG Chain,Prompt Template,Query the RAG system

In [190]:
class SafeHuggingFaceEndpoint(HuggingFaceEndpoint):
    def _process_response(self, result):
        """Normalize HuggingFace response to plain text"""
        if isinstance(result, list):
            if len(result) > 0 and isinstance(result[0], dict) and "generated_text" in result[0]:
                return result[0]["generated_text"]
            return str(result)
        elif isinstance(result, dict):
            return result.get("generated_text", str(result))
        return str(result)

In [191]:
from langchain_huggingface.chat_models import ChatHuggingFace
from langchain_huggingface import HuggingFaceEndpoint

model=SafeHuggingFaceEndpoint( repo_id="openai-community/gpt2",
            task="text-generation",
            max_new_tokens=512,
            do_sample=False,
            repetition_penalty=1.03)
chat=ChatHuggingFace(llm=model,verbose=True)
chat

ChatHuggingFace(llm=SafeHuggingFaceEndpoint(repo_id='openai-community/gpt2', repetition_penalty=1.03, stop_sequences=[], server_kwargs={}, model_kwargs={}, model='openai-community/gpt2', client=<InferenceClient(model='openai-community/gpt2', timeout=120)>, async_client=<InferenceClient(model='openai-community/gpt2', timeout=120)>, task='text-generation'), model_id='openai-community/gpt2', model_kwargs={})

In [192]:
model

SafeHuggingFaceEndpoint(repo_id='openai-community/gpt2', repetition_penalty=1.03, stop_sequences=[], server_kwargs={}, model_kwargs={}, model='openai-community/gpt2', client=<InferenceClient(model='openai-community/gpt2', timeout=120)>, async_client=<InferenceClient(model='openai-community/gpt2', timeout=120)>, task='text-generation')

In [193]:
chat=ChatHuggingFace(llm=model,verbose=True)


In [1]:
from langchain_openai import ChatOpenAI
model=ChatOpenAI(
    model='gpt-3.5-turbo',
    api_key='euri-6201748b43fef9b93ecd0aaf1f374303527711988664d4d774f07bcf092275c2'
)

  from .autonotebook import tqdm as notebook_tqdm


In [7]:
import requests
headers = {
        "Content-Type": "application/json",
        "Authorization": "Bearer euri-6201748b43fef9b93ecd0aaf1f374303527711988664d4d774f07bcf092275c2"
    }
resp=requests.get("https://api.euron.one/api/v1/euri/chat/completions",headers=headers)
resp.json()

{'errors': [{'message': 'Route not found'}],
 'statusCode': 404,
 'message': 'route: /api/v1/euri/chat/completions, errorMsg: Route not found, rayId: ZUmT8WE2nLFCG8HoED',
 'success': False}

In [57]:
import requests
from langchain_core.runnables import Runnable
from langchain_core.outputs import Generation, LLMResult

from langchain_core.prompts.base import StringPromptValue
from langchain_core.prompts.chat import ChatPromptValue


class EuronChatModel(Runnable):
    """Custom Runnable wrapper for the Euron Chat API (OpenAI-style)."""

    def __init__(self, api_key: str, model_name: str = "gpt-4.1-nano"):
        self.api_url = "https://api.euron.one/api/v1/euri/chat/completions"
        self.api_key = api_key
        self.model_name = model_name

    def invoke(self, input_text, config=None) -> str:
        """Invoked by LangChain chains (handles PromptValue objects too)."""

        # ðŸ”¹ Convert LangChain PromptValue -> string if necessary
        if isinstance(input_text, (StringPromptValue, ChatPromptValue)):
            input_text = input_text.to_string()
        elif not isinstance(input_text, str):
            input_text = str(input_text)

        headers = {
            "Content-Type": "application/json",
            "Authorization": f"Bearer {self.api_key}",
        }

        payload = {
            "model": self.model_name,
            "messages": [{"role": "user", "content": input_text}],
            "max_tokens": 1000,
            "temperature": 0.7,
        }

        response = requests.post(self.api_url, headers=headers, json=payload)
        response.raise_for_status()
        data = response.json()

        return data.get("choices", [{}])[0].get("message", {}).get("content", "")

# Initialize your API model
llm = EuronChatModel(api_key="euri-6201748b43fef9b93ecd0aaf1f374303527711988664d4d774f07bcf092275c2")
print(llm.invoke("Explain what deep learning."))


Deep learning is a specialized subset of machine learning that focuses on building and training neural networks with multiple layersâ€”often called deep neural networksâ€”to automatically learn and extract complex patterns and representations from data. 

In essence, deep learning models mimic the way the human brain processes information, allowing computers to recognize images, understand speech, translate languages, and perform many other tasks that require understanding high-level features. These models consist of interconnected layers of nodes (neurons), where each layer transforms the input data into increasingly abstract and useful representations. Through large amounts of data and computational power, deep learning models can achieve remarkable accuracy in tasks such as image recognition, natural language processing, and more.

Key features of deep learning include:
- **Multiple layers**: Deep neural networks have many hidden layers that enable learning of intricate patterns.
- 

In [143]:
# from langchain.chat_models import init_chat_model

# huggingface=init_chat_model(model="deepseek-ai/deepseek-llm-7b-chat",model_provider="huggingface:deepseek-ai/deepseek-llm-7b-chat",config={
#     "temperature":0.8,
#     "max_new_tokens":256
# })
# huggingface.invoke('What is llm')

In [56]:
# from langchain_huggingface import HuggingFacePipeline
# from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

# model_id = "mistralai/Mistral-7B-Instruct-v0.2"

# tokenizer = AutoTokenizer.from_pretrained(model_id)
# model = AutoModelForCausalLM.from_pretrained(model_id)

# pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
# llm = HuggingFacePipeline(pipeline=pipe)

# print(llm.invoke("Explain LLM in simple words"))


### Modern RAG

In [32]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_classic.chains.combine_documents import create_stuff_documents_chain
from langchain_classic.chains.retrieval import create_retrieval_chain

In [33]:
#convert vector store to retriever
retriever=vector_store.as_retriever(
    search_kwargs={"k":3}
)
retriever

VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000002081FC9A210>, search_kwargs={'k': 3})

In [34]:
##Create a Prompt Template
system_prompt="""
You are an assistant for question answering tasks.
Use following pieces of retrieved context to answer the question. 
If you don't know the answer please say don't know the answer .
keep answer concise
context: {context}"""

In [48]:
prompt=ChatPromptTemplate.from_messages(
    [("system",system_prompt),
     ("human","{input}")]
)
# dummy_docs = [Document(page_content="Deep learning uses neural networks for pattern recognition.")]

# prompt.invoke({"context":dummy_docs,"input":"what is nlp"})

In [49]:
## Create a document chain
document_chain=create_stuff_documents_chain(
    llm=llm,prompt=prompt
)
document_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="\nYou are an assistant for question answering tasks.\nUse following pieces of retrieved context to answer the question. \nIf you don't know the answer please say don't know the answer .\nkeep answer concise\ncontext: {context}"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])
| <__main__.EuronChatModel object at 0x00000208335187D0>
| StrOutputParser(), kwargs={}, config={'run_name': 'stuff_documents_chain'}, config_factories=[])

In [50]:
## Final rag chain
rag_chain=create_retrieval_chain(
    retriever=retriever,
    combine_docs_chain=document_chain
)
rag_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000002081FC9A210>, search_kwargs={'k': 3}), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="\nYou are an assistant for question answering tasks.\nUse following pieces of retrieved context to answer the question. \nIf 

In [61]:
from langchain_core.prompts import PromptTemplate
from langchain_classic.chains.combine_documents import create_stuff_documents_chain
from langchain_classic.chains.retrieval import create_retrieval_chain

# Initialize your API model
llm = EuronChatModel(api_key="euri-6201748b43fef9b93ecd0aaf1f374303527711988664d4d774f07bcf092275c2")

# Prompt Template (simple instruction format)
prompt = PromptTemplate.from_template("""
Use the context below to answer the question concisely.
If unsure, say you don't know.

Context:
{context}

Question:
{input}

Answer:
""")

# Combine document chain
document_chain = create_stuff_documents_chain(llm=llm, prompt=prompt)

# Retrieval chain (Chroma retriever)
rag_chain = create_retrieval_chain(
    retriever=retriever, 
    combine_docs_chain=document_chain
)

# Run
response = rag_chain.invoke({"input": "What is NLP? give me more info "})
print("ðŸ§  Final Answer:", response.get("answer", response))


ðŸ§  Final Answer: NLP, or Natural Language Processing, is a subfield of artificial intelligence that focuses on enabling computers to understand, interpret, and generate human language in a meaningful way. It combines techniques from computer science, AI, and linguistics to facilitate interactions between humans and machines using natural language. NLP involves processing and analyzing large amounts of natural language data to achieve tasks such as language translation, sentiment analysis, speech recognition, and more.


In [54]:
response

{'input': 'What is deep learning?',
 'context': [Document(metadata={'source': 'data\\doc2.txt'}, page_content='Large Language Models (LLMs) are advanced AI systems built on deep neural networks designed to process, understand and generate human-like text. By using massive datasets and billions of parameters, LLMs have transformed the way humans interact with technology. It learns patterns, grammar and context from text and can answer questions, write content, translate languages and many more. Mordern LLMs include ChatGPT (OpenAI), Google Gemini, Anthropic Claude,'),
  Document(metadata={'source': 'data\\doc3.txt'}, page_content='Large Language Models (LLMs) are advanced AI systems built on deep neural networks designed to process, understand and generate human-like text. By using massive datasets and billions of parameters, LLMs have transformed the way humans interact with technology. It learns patterns, grammar and context from text and can answer questions, write content, translate

In [55]:
from fastapi import FastAPI

app=FastAPI()

app.get("/home")
def add(a:int,b:int):
    return a+b
print(add(3,4))

7
