## Conversational Recipe Bot with RAG (with Qdrant Vector DB(locally set) and LangChain Vector Summary)

All steps in a nutshell:

Read files | Talk to the database | Load PDFs | Split text | Handle memory | Chat with the model | Create a web UI using Gradio

    Set up the environment

#### Install all necessary libraries for RAG bot including Langchain, Weaviate DB, Ollama, and embedding tools.

* Langchain: A framework to build apps using LLMs.

* Weaviate: A database that stores text as numbers (vectors) for quick search.

* Ollama: A local LLM interface (we’re using it to talk to the model).

* Sentence-transformers: Converts text into vectors (embeddings).

* tiktoken: Token counter for OpenAI models (to manage costs and limits)(was mentioned in langchain weaviate doc).

In [1]:
%pip install langchain langchain_community scikit-learn langchain-ollama sentence-transformers tiktoken qdrant-client langchain-qdrant gradio

Collecting qdrant-client
  Downloading qdrant_client-1.12.1-py3-none-any.whl.metadata (10 kB)
Collecting langchain-qdrant
  Downloading langchain_qdrant-0.1.4-py3-none-any.whl.metadata (1.7 kB)
Collecting portalocker<3.0.0,>=2.7.0 (from qdrant-client)
  Downloading portalocker-2.10.1-py3-none-any.whl.metadata (8.5 kB)
Collecting h2<5,>=3 (from httpx[http2]>=0.20.0->qdrant-client)
  Downloading h2-4.1.0-py3-none-any.whl.metadata (3.6 kB)
Collecting hyperframe<7,>=6.0 (from h2<5,>=3->httpx[http2]>=0.20.0->qdrant-client)
  Downloading hyperframe-6.0.1-py3-none-any.whl.metadata (2.7 kB)
Collecting hpack<5,>=4.0 (from h2<5,>=3->httpx[http2]>=0.20.0->qdrant-client)
  Downloading hpack-4.0.0-py3-none-any.whl.metadata (2.5 kB)
Downloading qdrant_client-1.12.1-py3-none-any.whl (267 kB)
Downloading langchain_qdrant-0.1.4-py3-none-any.whl (23 kB)
Downloading portalocker-2.10.1-py3-none-any.whl (18 kB)
Downloading h2-4.1.0-py3-none-any.whl (57 kB)
Downloading hpack-4.0.0-py3-none-any.whl (32 kB)
D

Weaviate version has to be 3.26.0 or above getting connection error if not 

In [2]:
!pip show qdrant-client

Name: qdrant-client
Version: 1.12.1
Summary: Client library for the Qdrant vector search engine
Home-page: https://github.com/qdrant/qdrant-client
Author: Andrey Vasnetsov
Author-email: andrey@qdrant.tech
License: Apache-2.0
Location: /home/sakhaglobal/.pyenv/versions/3.8.20/lib/python3.8/site-packages
Requires: grpcio, grpcio-tools, httpx, numpy, portalocker, pydantic, urllib3
Required-by: langchain-qdrant


In [22]:
import gradio as gr
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_ollama import ChatOllama
from langchain.memory import VectorStoreRetrieverMemory
from langchain.chains import ConversationalRetrievalChain
from typing import Tuple

from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams # For collection creation (have to learn about this)
from langchain_qdrant import Qdrant

What all we are importing:

*  Importing weaviate for vector store, 

*  PyPDFLoader for document loading, 

*  RecursiveCharacterTextSplitter for text splitting,

*  HuggingFaceEmbeddings for embeddings, ChatOllama for conversational AI,

*  PromptTemplate for templating, 

*  ConversationBufferMemory or ConversationVectorMemory for storing conversation history,and ConversationalRetrievalChain for conversational question answering.

    Load and prepare documents
Documents can be anything, we can load a PDF or use webpages as the source also

In [4]:
# List of PDF file paths to load documents from (the below mentioned book is 102 pages)
pdf_paths = [
    "/home/sakhaglobal/Documents/Personal_GitHub/ai-ml-practice/rag-using-llm/recipe-sample.pdf"
]

     Split documents

While chunking, we can manually understand what chunk size will fit best to our needs, for me 1000 as the size and 200 as overlap worked good.

In [5]:
docs = [PyPDFLoader(pdf_path).load() for pdf_path in pdf_paths]
docs_list = [item for sublist in docs for item in sublist]

# chunk size set to 1000 for better context understanding, overlap set to 200 to avoid missing context
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", "(?<=\. )", " ", ""],
    add_start_index=True  # Helps track document position
)
doc_splits = text_splitter.split_documents(docs_list)

  from cryptography.hazmat.primitives.ciphers.algorithms import AES, ARC4


Using my own embedding model as weaviate local has no own embedding model as cloud

In [7]:
import os
os.environ["SENTENCE_TRANSFORMERS_HOME"] = "/home/sakhaglobal/Documents/Personal_GitHub/chefly/ai/cache"
from sentence_transformers import SentenceTransformer

embed_model = SentenceTransformer("intfloat/multilingual-e5-large")

  from tqdm.autonotebook import tqdm, trange


In [8]:
# In case CUDA error, if killing the particular PID doesn't work, (as killing PID doesn't work every time for me)

import gc
import torch

# Free up memory
gc.collect()
torch.cuda.empty_cache()

In [9]:
# This cell should be placed AFTER Cell 15 (embed_model definition)

# Initialize Qdrant client
# Assumes Qdrant server is running on localhost:6333
# For a purely in-memory (non-persistent, no server needed) setup, you could use:
# qdrant_service_client = QdrantClient(":memory:")
qdrant_service_client = QdrantClient(host="localhost", port=6333)

# Get embedding dimension from your E5 model
vector_size = embed_model.get_sentence_embedding_dimension() # For intfloat/multilingual-e5-large, this is 1024

# Define collection names
recipe_collection_name = "RecipeBot"
memory_collection_name = "ConversationMemory"

# Recreate "RecipeBot" collection (ensures a clean state for reruns)
# Use create_collection if you want to avoid deleting existing data and only create if not present
qdrant_service_client.recreate_collection(
    collection_name=recipe_collection_name,
    vectors_config=VectorParams(size=vector_size, distance=Distance.COSINE),
)
print(f"Collection '{recipe_collection_name}' created/recreated in Qdrant.")

# Recreate "ConversationMemory" collection
qdrant_service_client.recreate_collection(
    collection_name=memory_collection_name,
    vectors_config=VectorParams(size=vector_size, distance=Distance.COSINE),
)
print(f"Collection '{memory_collection_name}' created/recreated in Qdrant.")

  qdrant_service_client.recreate_collection(


Collection 'RecipeBot' created/recreated in Qdrant.


  qdrant_service_client.recreate_collection(


Collection 'ConversationMemory' created/recreated in Qdrant.


In [10]:
def e5_embed(texts, is_query=False):
    prefix = "query: " if is_query else "passage: "
    formatted_texts = [prefix + text.lower().strip() for text in texts]
    return embed_model.encode(formatted_texts, normalize_embeddings=True)

Load model

In [11]:
# llm = ChatOllama(model="deepseek-r1:14b")
llm = ChatOllama(model="mistral-small3.1")

Prompt for the model

In [12]:
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate

system_prompt = """You are Recipe Bot, an expert cooking assistant specializing in student-friendly recipes.
Follow these guidelines strictly:

1. Source Knowledge:
- Answer ONLY using the recipe book context
- Never invent recipes or ingredients
- For measurements, be precise (e.g., "200g mushrooms")

2. Conversation Flow:
- Maintain natural, friendly tone
- Reference previous answers when appropriate
- Acknowledge preferences from chat history
- If context is missing, say: "This isn't covered in my recipe book"

3. Special Cases:
- For substitution questions, suggest closest alternatives
- For timing questions, specify preparation vs cooking time

Examples:
Q: Can I substitute X with Y?
A: "Yes, Y works well. Use 25% less as it's more potent."

Q: How long does this take?
A: "Preparation: 15 mins, Cooking: 30 mins (total 45 mins)"

Q: I don’t like beef. Are there vegetarian options in the book?
A: Yes, the recipe collection includes vegetarian rice and several egg-based dishes like omelette and egg fried rice.

Q: Can I make Thai Green Curry easily?
A: Yes. Thai green curry is made by cooking curry paste with chicken, onion, and aubergine, then adding coconut milk and simmering until cooked. It’s a simple and delicious recipe ideal for students.

Current Context: {context}
Chat History: {chat_history}"""

prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(system_prompt),
    ("human", "{question}"),
])

In [14]:
from langchain.schema import Document  # Ensure this is imported (likely already is)
from langchain.embeddings.base import Embeddings # Ensure this is imported
from typing import List # Ensure this is imported

# 1. First create an Embeddings wrapper class for your E5 function (this part is unchanged)
class E5EmbeddingsWrapper(Embeddings):
    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        """For documents/passages"""
        return e5_embed(texts, is_query=False).tolist()

    def embed_query(self, text: str) -> List[float]:
        """For queries"""
        return e5_embed([text], is_query=True).tolist()[0]

# 2. Initialize the embeddings wrapper (this part is unchanged)
e5_embeddings = E5EmbeddingsWrapper()

# 3. Initialize LangChain Qdrant vector store for recipes
# qdrant_service_client, recipe_collection_name, and e5_embeddings should be defined from previous cells.
# (qdrant_service_client and recipe_collection_name from the new Qdrant setup cell)
# (e5_embeddings from just above)
vectorstore = Qdrant(
    client=qdrant_service_client,
    collection_name=recipe_collection_name,
    embeddings=e5_embeddings,
    # Qdrant Langchain integration maps Document.page_content to a content field
    # and Document.metadata to a metadata field in Qdrant payloads automatically.
    # The 'attributes' parameter from Weaviate is not used directly; Qdrant stores the full metadata.
)

# Add documents to the Qdrant collection.
# doc_splits (List[Document]) is defined in cell 11.
# The E5EmbeddingsWrapper (e5_embeddings) will be used by add_documents internally.
# This step replaces the removed cells 17 and 19.
vectorstore.add_documents(doc_splits)
print(f"Added {len(doc_splits)} documents to Qdrant collection '{recipe_collection_name}'.")


e5_retriever = vectorstore.as_retriever(search_kwargs={'k': 3}) # Retrieve top 3 relevant documents

  vectorstore = Qdrant(


Added 19 documents to Qdrant collection 'RecipeBot'.


In [15]:
# The Weaviate-specific schema creation for "ConversationMemory" is removed.
# The new Qdrant setup cell already created the "ConversationMemory" collection.

# memory_collection_name, qdrant_service_client, and e5_embeddings should be defined.
memory_vectorstore = Qdrant(
    client=qdrant_service_client,
    collection_name=memory_collection_name,
    embeddings=e5_embeddings,
    # Langchain Qdrant integration handles metadata and content keys.
    # The 'text_key' and 'attributes' from Weaviate are not directly analogous.
    # VectorStoreRetrieverMemory will store Langchain Document objects,
    # and Qdrant will store their page_content and metadata.
)

memory_retriever = memory_vectorstore.as_retriever(
     search_kwargs={"k": 3}
)

    Vector memory 
* better for longer conversations
* stores meaning and context necessary for the continuos conversation
* has one extra parameter retriver

In [16]:
memory = VectorStoreRetrieverMemory(
    retriever=memory_retriever,
    memory_key="chat_history",
    input_key="question",
    output_key="answer",
    return_messages=True,
    return_docs=True
)

In [17]:
# Create the conversational chain  > this chain is for conversational memory, so  replacing RetrievalQA with ConversationalRetrievalChain
qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=e5_retriever,
    memory=memory,
    chain_type="stuff",
    combine_docs_chain_kwargs={"prompt": prompt},    # same prompt from above
    # verbose=True,  # debugging here
    rephrase_question=True,  # Helps with follow-up questions
    get_chat_history=lambda h: h
)

Initialize RAG class

In [18]:
from langchain.schema import HumanMessage, AIMessage

class RAGApplication:
    def __init__(self, qa_chain):
        self.qa_chain = qa_chain
        self.chat_history = []  # This will store (question, answer) tuples

    def run(self, question: str) -> str:
        # Convert your chat history to LangChain's expected format
        lc_history = []
        for q, a in self.chat_history:
            lc_history.append(HumanMessage(content=q))
            lc_history.append(AIMessage(content=a))

        # Call your chain with properly formatted history
        result = self.qa_chain({
            "question": question,
            "chat_history": lc_history  # Now in correct format
        })

        # Store the new interaction
        self.chat_history.append((question, result["answer"]))
        return result["answer"]

In [19]:
# Initialize your RAG application (use your existing initialization)
rag_app = RAGApplication(qa_chain)

    Simple Gradio Chat template for UI

In [23]:
def chat(message: str, history: List[Tuple[str, str]]) -> Tuple[str, List[Tuple[str, str]]]:
    """Handle chat messages"""
    response = rag_app.run(message)
    history.append((message, response))
    return "", history

with gr.Blocks(title="Recipe Bot") as demo:
    gr.Markdown("# 🍳 Recipe Bot")
    gr.Markdown("Ask me anything about recipes from docs!")

    chatbot = gr.Chatbot(height=500)
    msg = gr.Textbox(label="Your question", placeholder="Type your question here...")
    clear = gr.Button("Clear Chat")

    msg.submit(
        chat,
        inputs=[msg, chatbot],
        outputs=[msg, chatbot]
    )
    clear.click(lambda: None, None, chatbot, queue=False)

demo.launch(share=True)

Running on local URL:  http://127.0.0.1:7860


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Running on public URL: https://f532c58c16ff041861.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




  result = self.qa_chain({
