# Using DuckDB as the persisted Vector Store

I'll probably need to do some testing around regarding performance for different vector stores. I've wanted to use duckdb for a while now, so I'll use this opportunity to test it out.

In [5]:
from llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.duckdb import DuckDBVectorStore
from llama_index.core import StorageContext
from llama_index.embeddings.huggingface import HuggingFaceEmbedding


import pathlib

In [12]:
# Load the documents
path_to_docs = pathlib.PurePosixPath(r"/Users/lukasalemu/Documents/00. Bank of England/00. Degree/Dissertation/structured-rag/data/01_raw")
documents = SimpleDirectoryReader(path_to_docs).load_data()

# Instantiate the vector store
path_to_db = pathlib.PurePosixPath(r"//Users/lukasalemu/Documents/00. Bank of England/00. Degree/Dissertation/structured-rag/data/02_processed")
vector_store = DuckDBVectorStore("pg.duckdb", persist_dir=str(path_to_db))

# Instantiate the db
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

In [28]:
# Configure things to point in the right place
Settings.embed_model = embed_model
Settings.chunk_size = 512

storage_settings = StorageContext.from_defaults(
    vector_store=vector_store,
)

In [29]:
# Construct the index
index = VectorStoreIndex.from_documents(documents, storage_context=storage_settings)

In [31]:
# Load the vector store from local
vector_store = DuckDBVectorStore.from_local(str(path_to_db/"pg.duckdb"))
index = VectorStoreIndex.from_vector_store(vector_store)

In [33]:
# Now we can retrieve similar documents to a given query
query_text = "What is the forecast"

results = index.as_retriever().retrieve(query_text)

print(len(results))
print(results[0].text)
print(results[0].metadata)

2
forecast period. Business investment is projected to be flat over the first half of the forecast period,
but to pick up slightly thereafter (Section 3). Weakness in near-term private final domestic demand
growth is expected to be offset by strong real government consumption growth.
In the GDP projection conditioned on the alternative assumption of constant interest rates at 5.25%
over the forecast period, growth is significantly weaker over the forecast period compared with the
MPC’s projection conditioned on the declining path of market-implied rates.
There are risks in both directions around the central projections for domestic spending and GDP ,
including those related to the transmission of monetary policy. In particular, there is uncertainty
around the collateral and precautionary savings channels through which house prices af fect
consumer spending, and around the extent to which the full effects of interest rates on businessChart 1.2: GDP growth projection based on market inte