<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/vector_stores/LanceDBIndexDemo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LanceDB Vector Store
In this notebook we are going to show how to use [LanceDB](https://www.lancedb.com) to perform vector searches in LlamaIndex

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

In [None]:
%pip install llama-index-vector-stores-lancedb

In [None]:
! pip install llama-index

In [None]:
import logging
import sys

# Uncomment to see debug logs
# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index.core import SimpleDirectoryReader, Document, StorageContext
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.lancedb import LanceDBVectorStore
import textwrap

### Setup OpenAI
The first step is to configure the openai key. It will be used to created embeddings for the documents loaded into the index

In [None]:
import openai

openai.api_key = ""

### Download Data

In [None]:
! mkdir -p 'data/paul_graham/'
! wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

### Loading documents
Load the documents stored in the `data/paul_graham/` using the SimpleDirectoryReader

In [None]:
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
print("Document ID:", documents[0].doc_id, "Document Hash:", documents[0].hash)

Document ID: 7f3087db-8ede-4b85-9338-c31279298442 Document Hash: 47138a501f5338126a55bb277a783e54fa896d55b50dafacad280f0fb5e13421


### Create the index
Here we create an index backed by LanceDB using the documents loaded previously. LanceDBVectorStore takes a few arguments.
- uri (str, required): Location where LanceDB will store its files.
- table_name (str, optional): The table name where the embeddings will be stored. Defaults to "vectors".
- nprobes (int, optional): The number of probes used. A higher number makes search more accurate but also slower. Defaults to 20.
- refine_factor: (int, optional): Refine the results by reading extra elements and re-ranking them in memory. Defaults to None

- More details can be found at the [LanceDB docs](https://lancedb.github.io/lancedb/ann_indexes)

#### To use LanceDB Cloud :

Configure the following parameters :

- db_url = "db://your_db_name"
- api_key = "sk_your_api_key"
- region="region_you_set"
- table_name = "your_table_name"

Just add these parameters in our `LanceDBVectorStore` Class and use all existing functions in the same way as the local version !

```python
vector_store = LanceDBVectorStore(uri=db_url, api_key=api_key, region=region, table_name=table_name)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)
```

In [None]:
vector_store = LanceDBVectorStore(uri="/tmp/lancedb")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

### Query the index
We can now ask questions using our index.

In [None]:
query_engine = index.as_query_engine()
response = query_engine.query("How much did Viaweb charge per month?")
print(textwrap.fill(str(response), 100))

Viaweb charged $100 a month for a small store and $300 a month for a big one.


In [None]:
response = query_engine.query("What did the author do growing up?")
print(textwrap.fill(str(response), 100))

The author worked on writing short stories and programming, particularly on an IBM 1401 computer in
9th grade using an early version of Fortran. Later, the author transitioned to working on
microcomputers, starting with a TRS-80 in 1980, where they wrote simple games, programs, and a word
processor.


### Appending data
You can also add data to an existing index

In [None]:
del index

index = VectorStoreIndex.from_documents(
    [Document(text="The sky is purple in Portland, Maine")],
    uri="/tmp/new_dataset",
)

In [None]:
query_engine = index.as_query_engine()
response = query_engine.query("Where is the sky purple?")
print(textwrap.fill(str(response), 100))

The sky is purple in Portland, Maine.


In [None]:
index = VectorStoreIndex.from_documents(documents, uri="/tmp/new_dataset")

In [None]:
query_engine = index.as_query_engine()
response = query_engine.query("What companies did the author start?")
print(textwrap.fill(str(response), 100))

The author started two companies: Viaweb and Y Combinator.
