## LangChain Vector store and RAG with HeatWave GenAI

This notebook demonstrates a simple Retrieval-Augmented Generation (RAG) workflow powered by [HeatWave GenAI](https://www.oracle.com/heatwave/genai/) on MySQL HeatWave. You will ingest content from a public web page, chunk and index it into the HeatWave Vector Store, and then use a HeatWave-hosted LLM to answer questions grounded in the retrieved context, all within the familiar [LangChain](https://github.com/langchain-ai/langchain) framework. This notebook follows the LangChain guide to [creating vector stores and using them for RAG](https://python.langchain.com/v0.2/docs/tutorials/rag/). 

### What you’ll build

- Data ingestion: Load a LangChain blog post via WebBaseLoader.
- Chunking: Split the text into overlapping chunks for better recall.
- Indexing: Create embeddings with MyEmbeddings and store them in MyVectorStore on HeatWave.
- Retrieval: Convert the vector store into a retriever to fetch relevant chunks at query time.
- Generation: Use a reusable RAG prompt from the LangChain Hub and MyLLM to produce grounded answers.

### Workflow overview

- Load the source page and produce Document objects.
- Split documents into chunks (size 1000, overlap 200).
- Embed and index chunks in HeatWave Vector Store.
- Retrieve top matches for a user question.
- Construct the final prompt (context + question) and generate an answer with HeatWave GenAI.

All vector operations and inference calls are routed through HeatWave GenAI components (MyEmbeddings, MyVectorStore, MyLLM) for low-latency, in-database AI.

**This requires mysql-connector-python>=9.5.0**

### Connect to the HeatWave instance
We create a connection to an active HeatWave instance using the MySQL Connector/Python. We also define an API to execute a SQL query using a cursor, and the result is returned as a Pandas DataFrame. Modify the below variables to point to your HeatWave instance. On AWS, set USE_BASTION to False. On OCI, please create a tunnel on your machine using the below command by substituting the variable with their respective values.

ssh -o ServerAliveInterval=60 -i BASTION_PKEY -L LOCAL_PORT:DBSYSTEM_IP:DBSYSTEM_PORT BASTION_USER@BASTION_IP

In [None]:
import mysql.connector

BASTION_IP = "ip_address"
BASTION_USER = "opc"
BASTION_PKEY = "private_key_file"
DBSYSTEM_IP = "127.0.0.1"
DBSYSTEM_PORT = 3306
DBSYSTEM_USER = "root"
DBSYSTEM_PASSWORD = ""
DBSYSTEM_SCHEMA = "ml_benchmark"
LOCAL_PORT = 31231
USE_BASTION = False

if USE_BASTION is True:
    DBSYSTEM_IP = "127.0.0.1"
else:
    LOCAL_PORT = DBSYSTEM_PORT

mydb = mysql.connector.connect(
    host=DBSYSTEM_IP,
    port=LOCAL_PORT,
    user=DBSYSTEM_USER,
    password=DBSYSTEM_PASSWORD,
    database=DBSYSTEM_SCHEMA,
    allow_local_infile=True,
    use_pure=True,
    autocommit=True,
)

### Retrieve and parse a LangChain blog using the WebBaseLoader

In [2]:
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_text_splitters import RecursiveCharacterTextSplitter
from mysql.ai.genai import MyEmbeddings, MyVectorStore, MyLLM

# Load, chunk and index the contents of the blog.
loader = WebBaseLoader(web_paths=("https://blog.langchain.com/deep-agents/",))
docs = loader.load()

USER_AGENT environment variable not set, consider setting it to identify your requests.


### Split the retrieved page into smaller chunks to create meaningful embeddings.
This step partitions the source text into manageable segments while preserving continuity between them. By aiming for a target size and introducing a small overlap, it maintains context that might otherwise be lost at boundaries, improving downstream representation and recall. The recursive strategy prefers higher-level breakpoints (e.g., paragraphs or sentences) before falling back to finer splits, producing chunks that are both structurally coherent and well-suited for embedding, indexing, and retrieval in knowledge-grounded generation.

In [None]:
splits = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200
).split_documents(docs)

### Embed and store the chunks into the MySQL HeatWave Vector store
Initialize a vector-backed database powered by HeatWave GenAI and populate it with the prepared text segments. An embedding model is attached to the store so that each chunk is transformed into a dense vector representation at ingestion time. By adding the documents, the system builds an efficient similarity-search surface over the corpus, enabling later retrieval of semantically related passages that can ground downstream generation and question answering.

In [4]:
vectorstore = MyVectorStore(mydb, MyEmbeddings(mydb))
_ = vectorstore.add_documents(documents=splits)

### Create a LangChain retriever and RAG prompt

In [5]:
# Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever()
# Define prompt for question-answering
prompt = hub.pull("rlm/rag-prompt")



### Create a RAG chain 
Assemble the retrieval-augmented generation flow. A small formatter condenses retrieved documents into a clean context block by joining their text, while the question is passed through unchanged. The pipeline then fuses context and question into a reusable prompt, invokes the HeatWave GenAI model to generate an answer grounded in that context, and finally normalizes the output to plain text. The result is a streamlined orchestration that couples retrieval with generation, turning relevant snippets into coherent, answer-ready responses.

In [None]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


llm = MyLLM(mydb)
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

### Invoke the RAG chain with a user query

In [7]:
rag_chain.invoke("What are deep agents?")

'Deep agents are advanced AI models that can plan over longer time horizons and execute complex tasks, allowing them to dive deeper into topics and achieve more sophisticated results. They use the same core algorithm as LLMs but with additional features such as detailed system prompts, planning tools, sub-agents, and a virtual file system. These agents are capable of executing more complex tasks and can be customized with custom prompts, tools, and sub-agents to suit specific needs.'

We invite you to try [HeatWave AutoML and GenAI](https://www.oracle.com/heatwave/free/). If you’re new to Oracle Cloud Infrastructure, try Oracle Cloud Free Trial, a free 30-day trial with US$300 in credits.