# LanceDB

>[LanceDB](https://lancedb.com/) is an open-source database for vector-search built with persistent storage, which greatly simplifies retrevial, filtering and management of embeddings. Fully open source.

This notebook shows how to use functionality related to the `LanceDB` vector database based on the Lance data format.

In [1]:
! pip install -U langchain-openai langchain-community

Collecting langchain-community
  Using cached langchain_community-0.2.4-py3-none-any.whl.metadata (2.4 kB)
Using cached langchain_community-0.2.4-py3-none-any.whl (2.2 MB)
Installing collected packages: langchain-community
  Attempting uninstall: langchain-community
    Found existing installation: langchain-community 0.2.1
    Uninstalling langchain-community-0.2.1:
      Successfully uninstalled langchain-community-0.2.1
Successfully installed langchain-community-0.2.4


In [1]:
! pip install lancedb



We want to use OpenAIEmbeddings so we have to get the OpenAI API Key. 

In [4]:
import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

In [5]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import LanceDB
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("../../how_to/state_of_the_union.txt")
documents = loader.load()

documents = CharacterTextSplitter().split_documents(documents)
embeddings = OpenAIEmbeddings()

##### For LanceDB cloud, you can invoke the vector store as follows :


```python
db_url = "db://lang_test" # url of db you created
api_key = "xxxxx" # your API key
region="us-east-1-dev"  # your selected region

vector_store = LanceDB(
    uri=db_url,
    api_key=api_key,
    region=region,
    embedding=embeddings,
    table_name='langchain_test'
    )
```


In [6]:
docsearch = LanceDB.from_documents(documents, embeddings,distance="cosine")
query = "What did the president say about Ketanji Brown Jackson"


In [7]:
docs = docsearch.similarity_search_with_relevance_scores('txt')

Querying with embedding..


In [8]:
docs[0]

(Document(page_content='My administration is providing assistance with job training and housing, and now helping lower-income veterans get VA care debt-free.  \n\nOur troops in Iraq and Afghanistan faced many dangers. \n\nOne was stationed at bases and breathing in toxic smoke from “burn pits” that incinerated wastes of war—medical and hazard material, jet fuel, and more. \n\nWhen they came home, many of the world’s fittest and best trained warriors were never the same. \n\nHeadaches. Numbness. Dizziness. \n\nA cancer that would put them in a flag-draped coffin. \n\nI know. \n\nOne of those soldiers was my son Major Beau Biden. \n\nWe don’t know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops. \n\nBut I’m committed to finding out everything we can. \n\nCommitted to military families like Danielle Robinson from Ohio. \n\nThe widow of Sergeant First Class Heath Robinson.  \n\nHe was born a soldier. Army National Guard. Combat medic in Ko

In [4]:
docs = docsearch.similarity_search_with_score('txt')

Querying with embedding..


In [7]:
type(docs[0][0].metadata)

dict

In [None]:
docsearch._table.update(where=f'id={ids[0]}',
                        values={'metadata':})

In [7]:
import numpy as np 
from langchain_community.vectorstores.utils import maximal_marginal_relevance
from langchain_community.utils.math import cosine_similarity
import pyarrow as pa

# tbl = docsearch.get_table()
# # tbl.create_fts_index("text")
# res=tbl.search(np.arange(1536)).limit(2).to_arrow()
# print(type(res))
# print(isinstance(res,pa.lib.Table))
# print("vec", res['_distance'].to_pylist())

# print(np.expand_dims(res['vector'].to_numpy(),axis=0).shape)
# print(np.arange(10).ndim)
# print(np.expand_dims(np.arange(10), axis=0).shape)

# print(maximal_marginal_relevance(
#     np.arange(10),
#     res['vector'].to_pylist() , 0.5) )

# print(cosine_similarity(np.expand_dims(np.arange(10),axis=0),
#                         np.expand_dims(res['vector'].to_numpy()[0],axis=0)))

docsearch.max_marginal_relevance_search(query='text')

[(Document(page_content='My administration is providing assistance with job training and housing, and now helping lower-income veterans get VA care debt-free.  \n\nOur troops in Iraq and Afghanistan faced many dangers. \n\nOne was stationed at bases and breathing in toxic smoke from “burn pits” that incinerated wastes of war—medical and hazard material, jet fuel, and more. \n\nWhen they came home, many of the world’s fittest and best trained warriors were never the same. \n\nHeadaches. Numbness. Dizziness. \n\nA cancer that would put them in a flag-draped coffin. \n\nI know. \n\nOne of those soldiers was my son Major Beau Biden. \n\nWe don’t know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops. \n\nBut I’m committed to finding out everything we can. \n\nCommitted to military families like Danielle Robinson from Ohio. \n\nThe widow of Sergeant First Class Heath Robinson.  \n\nHe was born a soldier. Army National Guard. Combat medic in K

Additionaly, to explore the table you can load it into a df or save it in a csv file: 
```python
tbl = docsearch.get_table()
print("tbl:", tbl)
pd_df = tbl.to_pandas()
# pd_df.to_csv("docsearch.csv", index=False)

# you can also create a new vector store object using an older connection object:
vector_store = LanceDB(connection=tbl, embedding=embeddings)
```

In [10]:
print(docs[0].page_content)

They were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. 

Officer Mora was 27 years old. 

Officer Rivera was 22. 

Both Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers. 

I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. 

I’ve worked on these issues a long time. 

I know what works: Investing in crime prevention and community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and safety. 

So let’s not abandon our streets. Or choose between safety and equal justice. 

Let’s come together to protect our communities, restore trust, and hold law enforcement accountable. 

That’s why the Justice Department required body cameras, banned chokeholds, and restricted no-knock warrants for its officers. 

That’s why the American Rescue 

In [7]:
print(docs[0].page_content)

They were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. 

Officer Mora was 27 years old. 

Officer Rivera was 22. 

Both Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers. 

I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. 

I’ve worked on these issues a long time. 

I know what works: Investing in crime prevention and community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and safety. 

So let’s not abandon our streets. Or choose between safety and equal justice. 

Let’s come together to protect our communities, restore trust, and hold law enforcement accountable. 

That’s why the Justice Department required body cameras, banned chokeholds, and restricted no-knock warrants for its officers. 

That’s why the American Rescue 

In [8]:
print(docs[0].metadata)

{'vector': [-0.00584288639947772, -0.0019810933154076338, -0.004651553463190794, -0.0027457550168037415, -0.0010064422385767102, 0.019717231392860413, 0.01705346442759037, 0.008580275811254978, -0.019208572804927826, -0.01677236333489418, 0.010875933803617954, 0.011009791865944862, -0.0004063834494445473, -0.007542878855019808, -0.0035070704761892557, -0.0005990130011923611, 0.03338409960269928, -0.009992473758757114, 0.007569650188088417, -0.030706945806741714, -0.013800723478198051, 0.003657660214230418, 0.01465741265565157, 0.006756464950740337, 0.0008361919899471104, 0.01052121166139841, 0.031751036643981934, -0.009811765514314175, 0.011009791865944862, -0.018566057085990906, -0.01725425198674202, 0.0060470192693173885, -0.03675731271505356, -0.018686527386307716, -0.02778884768486023, -0.006167491432279348, 0.00015174019790720195, -0.011163728311657906, 0.018793614581227303, -0.013854267075657845, 0.03584707900881767, -0.02578098326921463, 0.0022555014584213495, -0.018699914216995