## Conversational Recipe Bot with RAG (with Weaviate Vector DB(locally set) and LangChain Vector Summary)

All steps in a nutshell:

Read files | Talk to the database | Load PDFs | Split text | Handle memory | Chat with the model | Create a web UI using Gradio

    Set up the environment

#### Install all necessary libraries for RAG bot including Langchain, Weaviate DB, Ollama, and embedding tools.

* Langchain: A framework to build apps using LLMs.

* Weaviate: A database that stores text as numbers (vectors) for quick search.

* Ollama: A local LLM interface (we’re using it to talk to the model).

* Sentence-transformers: Converts text into vectors (embeddings).

* tiktoken: Token counter for OpenAI models (to manage costs and limits)(was mentioned in langchain weaviate doc).

In [7]:
%pip install langchain langchain_community scikit-learn langchain-ollama sentence-transformers tiktoken

Note: you may need to restart the kernel to use updated packages.


Weaviate version has to be 3.26.0 or above getting connection error if not 

In [8]:
!pip show weaviate-client

Name: weaviate-client
Version: 3.26.0
Summary: A python native Weaviate client
Home-page: https://github.com/weaviate/weaviate-python-client
Author: Weaviate
Author-email: hello@weaviate.io,
License: BSD 3-clause
Location: /home/sakhaglobal/.pyenv/versions/3.8.20/lib/python3.8/site-packages
Requires: authlib, requests, validators
Required-by: 


In [9]:
import weaviate
import json
from dotenv import load_dotenv
from langchain.vectorstores import Weaviate
import gradio as gr
from typing import Tuple
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_ollama import ChatOllama
from langchain.memory import VectorStoreRetrieverMemory
from langchain.chains import ConversationalRetrievalChain

What all we are importing:

*  Importing weaviate for vector store, 

*  PyPDFLoader for document loading, 

*  RecursiveCharacterTextSplitter for text splitting,

*  HuggingFaceEmbeddings for embeddings, ChatOllama for conversational AI,

*  PromptTemplate for templating, 

*  ConversationBufferMemory or ConversationVectorMemory for storing conversation history,and ConversationalRetrievalChain for conversational question answering.

    Load and prepare documents
Documents can be anything, we can load a PDF or use webpages as the source also

In [10]:
# List of PDF file paths to load documents from (the below mentioned book is 102 pages)
pdf_paths = [
    "/home/sakhaglobal/Documents/Personal_GitHub/ai-ml-practice/rag-using-llm/recipe-sample.pdf"
]

     Split documents

In [11]:
docs = [PyPDFLoader(pdf_path).load() for pdf_path in pdf_paths]
docs_list = [item for sublist in docs for item in sublist]

# chunk size set to 1000 for better context understanding, overlap set to 200 to avoid missing context
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", "(?<=\. )", " ", ""],
    add_start_index=True  # Helps track document position
)
doc_splits = text_splitter.split_documents(docs_list)

  separators=["\n\n", "\n", "(?<=\. )", " ", ""],


In [14]:
load_dotenv(dotenv_path="/home/sakhaglobal/Documents/Personal_GitHub/chefly/.env")

# Instantiate the Weaviate client for local instance
client = weaviate.Client(
    url="http://localhost:8080",  # Local Weaviate instance
)

# Create a class/schema if it doesn't already exist
if not client.schema.exists("RecipeBot"):
    client.schema.create_class({
        "class": "RecipeBot",
        "vectorizer": "none",  # We're using our own embeddings
        "properties": [
            {
                "name": "text",
                "dataType": ["text"],
            },
            {
                "name": "metadata",
                "dataType": ["text"],
            },
            # Add more properties as needed from your metadata
            {
                "name": "source",
                "dataType": ["text"],
            },
            {
                "name": "page",
                "dataType": ["number"],  # Changed from text to number for page
            }
        ],
    })

index_name = "RecipeBot"

# Print what's being inserted for verification
# print(f"Inserting {len(texts)} documents into Weaviate:")
# print("-" * 50)
# for i, (text, metadata) in enumerate(zip(texts, metadatas)):
#     print(f"Document {i+1}:")
#     print(f"Text snippet: {text[:200]}...")  # Show first 100 chars
#     print(f"Metadata: {json.dumps(metadata)[:100]}...")  # Truncate long metadata
#     print(f"Source: {metadata.get('source', 'unknown')}")
#     print(f"Page: {metadata.get('page', 0)}")
#     print("-" * 50)


In [15]:
import os
os.environ["SENTENCE_TRANSFORMERS_HOME"] = "/home/sakhaglobal/Documents/Personal_GitHub/chefly/ai/cache"
from sentence_transformers import SentenceTransformer

embed_model = SentenceTransformer("intfloat/multilingual-e5-large")

In [16]:
def e5_embed(texts, is_query=False):
    prefix = "query: " if is_query else "passage: "
    formatted_texts = [prefix + text.lower().strip() for text in texts]
    return embed_model.encode(formatted_texts, normalize_embeddings=True)

In [17]:
texts = [doc.page_content for doc in doc_splits]
metadatas = [doc.metadata for doc in doc_splits]
embeddings = e5_embed(texts, is_query=False)

In [18]:
# In case CUDA error, if killing the particular PID doesn't work, (as killing PID doesn't work every time for me)

import gc
import torch

# Free up memory
gc.collect()
torch.cuda.empty_cache()

In [19]:
client.batch.configure(batch_size=50)
for i, emb in enumerate(embeddings):
    data_obj = {
        "text": texts[i],
        "metadata": json.dumps(metadatas[i]),
        "source": metadatas[i].get("source", ""),
        "page": metadatas[i].get("page", 0)
    }
    client.batch.add_data_object(
        data_object=data_obj,
        class_name=index_name,
        vector=emb.tolist()
        # No UUID specified - Weaviate will generate one
    )
client.batch.flush()

In [21]:
# llm = ChatOllama(model="deepseek-r1:14b")
llm = ChatOllama(model="mistral-small3.1")

In [22]:
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate

system_prompt = """You are Recipe Bot, an expert cooking assistant specializing in student-friendly recipes.
Follow these guidelines strictly:

1. Source Knowledge:
- Answer ONLY using the recipe book context
- Never invent recipes or ingredients
- For measurements, be precise (e.g., "200g mushrooms")

2. Conversation Flow:
- Maintain natural, friendly tone
- Reference previous answers when appropriate
- Acknowledge preferences from chat history
- If context is missing, say: "This isn't covered in my recipe book"

3. Special Cases:
- For substitution questions, suggest closest alternatives
- For timing questions, specify preparation vs cooking time

Examples:
Q: Can I substitute X with Y?
A: "Yes, Y works well. Use 25% less as it's more potent."

Q: How long does this take?
A: "Preparation: 15 mins, Cooking: 30 mins (total 45 mins)"

Q: I don’t like beef. Are there vegetarian options in the book?
A: Yes, the recipe collection includes vegetarian rice and several egg-based dishes like omelette and egg fried rice.

Q: Can I make Thai Green Curry easily?
A: Yes. Thai green curry is made by cooking curry paste with chicken, onion, and aubergine, then adding coconut milk and simmering until cooked. It’s a simple and delicious recipe ideal for students.

Current Context: {context}
Chat History: {chat_history}"""

prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(system_prompt),
    ("human", "{question}"),
])

In [23]:
from langchain.schema import Document  # Add this import at the top
from langchain.schema.retriever import BaseRetriever
from langchain.embeddings.base import Embeddings
from typing import List

# 1. First create an Embeddings wrapper class for your E5 function
class E5EmbeddingsWrapper(Embeddings):
    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        """For documents/passages"""
        return e5_embed(texts, is_query=False).tolist()

    def embed_query(self, text: str) -> List[float]:
        """For queries"""
        return e5_embed([text], is_query=True).tolist()[0]

# 2. Initialize the embeddings wrapper
e5_embeddings = E5EmbeddingsWrapper()

# 3. Initialize LangChain Weaviate vector store and retriever
# Ensure 'client' (Weaviate client) and 'index_name' ("RecipeBot") are defined from previous cells.
# Ensure 'e5_embeddings' (E5EmbeddingsWrapper instance) is defined above.
vectorstore = Weaviate(
    client=client,
    index_name=index_name,      # Should be "RecipeBot"
    text_key="text",            # Key for the main text content in Weaviate
    embedding=e5_embeddings,    # Pass the wrapper instance here
    attributes=["source", "page", "metadata"],  # List of metadata fields to retrieve
    by_text=False
)

e5_retriever = vectorstore.as_retriever(search_kwargs={'k': 3}) # Retrieve top 3 relevant documents

In [30]:
# Create a class for conversation memory if it doesn't already exist
if not client.schema.exists("ConversationMemory"):
    client.schema.create_class({
        "class": "ConversationMemory",
        "vectorizer": "none",  # Using custom E5 embeddings
        "properties": [
            {
                "name": "text",
                "dataType": ["text"],
            },
            {
                "name": "metadata",
                "dataType": ["text"], # Storing metadata as a JSON string
            }
        ],
    })

# 4. Memory-specific vector store (separate from main index)
memory_vectorstore = Weaviate(
    client=client,
    index_name="ConversationMemory",
    text_key="text",
    attributes=["metadata"],
    embedding=e5_embeddings,
    by_text=False
)

memory_retriever = memory_vectorstore.as_retriever(
     search_kwargs={"k": 3}
)

    Vector memory 
* better for longer conversations
* stores meaning and context necessary for the continuos conversation
* has one extra parameter retriver

In [31]:
memory = VectorStoreRetrieverMemory(
    retriever=memory_retriever,
    memory_key="chat_history",
    input_key="question",
    output_key="answer",
    return_messages=True,
    return_docs=True
)

In [32]:
# Create the conversational chain  > this chain is for conversational memory, so  replacing RetrievalQA with ConversationalRetrievalChain
qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=e5_retriever,
    memory=memory,
    chain_type="stuff",
    combine_docs_chain_kwargs={"prompt": prompt},    # same prompt from above
    # verbose=True,  # debugging here
    rephrase_question=True,  # Helps with follow-up questions
    get_chat_history=lambda h: h
)

In [33]:
from langchain.schema import HumanMessage, AIMessage

class RAGApplication:
    def __init__(self, qa_chain):
        self.qa_chain = qa_chain
        self.chat_history = []  # This will store (question, answer) tuples

    def run(self, question: str) -> str:
        # Convert your chat history to LangChain's expected format
        lc_history = []
        for q, a in self.chat_history:
            lc_history.append(HumanMessage(content=q))
            lc_history.append(AIMessage(content=a))

        # Call your chain with properly formatted history
        result = self.qa_chain({
            "question": question,
            "chat_history": lc_history  # Now in correct format
        })

        # Store the new interaction
        self.chat_history.append((question, result["answer"]))
        return result["answer"]

In [34]:
# Initialize your RAG application (use your existing initialization)
rag_app = RAGApplication(qa_chain)

    Simple Gradio Chat template for UI

In [35]:
def chat(message: str, history: List[Tuple[str, str]]) -> Tuple[str, List[Tuple[str, str]]]:
    """Handle chat messages"""
    response = rag_app.run(message)
    history.append((message, response))
    return "", history

with gr.Blocks(title="Recipe Bot") as demo:
    gr.Markdown("# 🍳 Recipe Bot")
    gr.Markdown("Ask me anything about recipes from docs!")

    chatbot = gr.Chatbot(height=500)
    msg = gr.Textbox(label="Your question", placeholder="Type your question here...")
    clear = gr.Button("Clear Chat")

    msg.submit(
        chat,
        inputs=[msg, chatbot],
        outputs=[msg, chatbot]
    )
    clear.click(lambda: None, None, chatbot, queue=False)

demo.launch(share=True)

        on_event is deprecated, use lifespan event handlers instead.

        Read more about it in the
        [FastAPI docs for Lifespan Events](https://fastapi.tiangolo.com/advanced/events/).
        
  @app.on_event("startup")
        on_event is deprecated, use lifespan event handlers instead.

        Read more about it in the
        [FastAPI docs for Lifespan Events](https://fastapi.tiangolo.com/advanced/events/).
        
  return self.router.on_event(event_type)
  s = socket.socket()
  app.stop_event = asyncio.Event(loop=loop)


Running on local URL:  http://127.0.0.1:7861


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Running on public URL: https://2296a9eb71bd15d888.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




Replace `TemplateResponse(name, {"request": request})` by `TemplateResponse(request, name)`.
