# Building a RAG Application with DuckDB
In this project, we will build an RAG application with LlamaIndex and use DuckDB as a Vector database and retriever. 

In [1]:
# %%capture
# %pip install duckdb
# %pip install llama-index
# %pip install llama-index-vector-stores-duckdb

In [None]:
import os
import duckdb

from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.duckdb import DuckDBVectorStore
from llama_index.core import StorageContext
from llama_index.core import Settings

from IPython.display import Markdown, display

We will create the embed model client using the OpenAI text-embedding-3-small model. 

In [None]:
llm = OpenAI(model="gpt-4o",api_key=os.environ["OPENAI_API_KEY"])
embed_model = OpenAIEmbedding(model="text-embedding-3-small")

We will make OpenAI LLM and Embedding models global for all LlamaIndex functions to use. In short, these models will be set as default.

In [None]:
Settings.llm = llm
Settings.embed_model = embed_model

In [4]:
documents = SimpleDirectoryReader("Data").load_data()

Create the vector store called “blog” using an existing database called “datacamp.duckdb.” 

After that, convert the PDF's data into embeddings and store them in the vector store. 

In [None]:
vector_store = DuckDBVectorStore(database_name = "datacamp.duckdb",table_name = "blog",persist_dir="./", embed_dim=1536)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

To check if our vector store was successfully created, we will connect the database using the DuckDB Python API and run the SQL query to display all the tables in the database. 

In [None]:
con = duckdb.connect("datacamp.duckdb")
con.execute("SHOW ALL TABLES").fetchdf()

We have two tables: a “bank” promotional table and a “blog” table, which is a vector store. 

The “blog” table has an “embedding” column where all the embeddings are stored

## Simple RAG application
Convert the index into the query engine, which will automatically first search the vector database for similar documents and use the additional context to generate the response. 

In [None]:
query_engine = index.as_query_engine()
response = query_engine.query("Who wrote 'GitHub Actions and MakeFile: A Hands-on Introduction'?")
display(Markdown(f"<b>{response}</b>"))

## RAG chatbot with memory

Let’s create an advanced RAG application that uses the conversation history to generate the response. 

For that, we have to create a chat memory buffer and then a chat engine with memory, LLM, and vector store retriever. 

In [None]:
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.chat_engine import CondensePlusContextChatEngine

In [None]:
memory = ChatMemoryBuffer.from_defaults(token_limit=3900)

chat_engine = CondensePlusContextChatEngine.from_defaults(
    index.as_retriever(),
    memory=memory,
    llm=llm
)

In [None]:
response = chat_engine.chat(
    "What is the easiest way of finetuning the Llama 3 model? Please provide step-by-step instructions."
)
display(Markdown(response.response))

In [None]:
response = chat_engine.chat(
    "Could you please provide more details about the Post Fine-Tuning Steps?"
)
display(Markdown(response.response))

The chat engine remembered the previous conversation and responded accordingly