🔹 What is LanceDB?

LanceDB is a lightweight, high-performance vector database designed for fast similarity search using Apache Arrow storage format.

✅ Why Use LanceDB?

Fast & Efficient: Uses Apache Arrow for quick reads & writes.

Local & Cloud Support: Works with local storage and cloud platforms (S3, GCS, Azure).

Metadata Filtering: Enables hybrid search using metadata + vector embeddings.

Supports HNSW Indexing for scalable vector search.

In [2]:
import lancedb 
import pandas as pd 
import numpy as np

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
#create a database directory 
db=lancedb.connect("./my_lancedb")


In [5]:
#create sample vector embeddings (100 vectors of size 128)

data=pd.DataFrame({
    "id":range(100),
    "vector":[np.random.rand(128).tolist()  for _ in range(100)],
    "text":[f"Document {i}" for i in range(100)]
})

In [6]:
data

Unnamed: 0,id,vector,text
0,0,"[0.6556539185489961, 0.14354633177002707, 0.10...",Document 0
1,1,"[0.33022824733496425, 0.09385907244657055, 0.8...",Document 1
2,2,"[0.686507699282665, 0.2267776718039155, 0.6019...",Document 2
3,3,"[0.6596477018193583, 0.3913487887649575, 0.259...",Document 3
4,4,"[0.13748603576643081, 0.5588817867592942, 0.18...",Document 4
...,...,...,...
95,95,"[0.4028959235908124, 0.34738282941454457, 0.73...",Document 95
96,96,"[0.8617890065790046, 0.09130835586531005, 0.87...",Document 96
97,97,"[0.6651351804320706, 0.7636460254080883, 0.039...",Document 97
98,98,"[0.1162755960476407, 0.006091737655898433, 0.1...",Document 98


In [7]:
#create a table and insert data 
table=db.create_table("my_table", data)
print("Database Created")

Database Created


In [11]:
table

LanceTable(name='my_table', version=1, _conn=LanceDBConnection(uri='c:\\Users\\thaku\\Langchain\\Rag\\my_lancedb'))

In [12]:
query_vector=np.random.rand(128).tolist()

#Perform similarity search 
results=table.search(query_vector).limit(5).to_pandas()
print(results)

   id                                             vector         text  \
0  34  [0.0050523463, 0.17449492, 0.25296238, 0.34711...  Document 34   
1  85  [0.6118548, 0.7615207, 0.8241931, 0.73986316, ...  Document 85   
2  39  [0.031372502, 0.6180652, 0.23412517, 0.1235939...  Document 39   
3  11  [0.6356514, 0.3318761, 0.32781395, 0.42966428,...  Document 11   
4  45  [0.2028191, 0.48757368, 0.033598285, 0.6999477...  Document 45   

   _distance  
0  14.993999  
1  17.283613  
2  17.616222  
3  17.795942  
4  17.807203  


Using LanceDB with LangChain for RAG

In [14]:
import lancedb 
from langchain.document_loaders import TextLoader 
from langchain.embeddings import SentenceTransformerEmbeddings 
import pandas as pd 

In [None]:
#Initialize lancedb
db=lancedb.connect("./rag_lancedb")

In [17]:
#load sample text documents

docs=[
    "LanceDB is a powerful vector database for RAG.",
    "Langchain simplifies working with LLMs.",
    "Retrieval-Augmented Generation improves chatbot responses."
]

embed_model=SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-V2")
data=pd.DataFrame({
    "id":range(len(docs)),
    "text":docs,
    "vector":[embed_model.embed_query(doc) for doc in docs]
})

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


In [18]:
#create LanceDB table 
table=db.create_table("rag_table", data)
print("Documents added to LanceDB!")

Documents added to LanceDB!


Query Using LanceDB + LangChain

In [19]:
from langchain.vectorstores import LanceDB 
from langchain.chains import RetrievalQA 
from langchain_community.llms import Ollama 
from langchain.schema import Document 

In [20]:
#wrap lancedb for langchain
vectorstore=LanceDB(connection=db,table_name="rag_table",embedding=embed_model)

#define llm
llm=Ollama(model="gemma3")

  llm=Ollama(model="gemma3")


In [21]:
#create rag retriever 
retriever=vectorstore.as_retriever()

In [22]:
#setup QA chain 
qa_chain=RetrievalQA.from_chain_type(llm,retriever=retriever)

In [23]:
#ask a question
query="What is LanceDB?"
response=qa_chain.run(query)

  response=qa_chain.run(query)


In [24]:
print("Answer:", response)

Answer: LanceDB is a powerful vector database for RAG.
