# Vector search with LanceDB Cloud and LlamaIndex 


### Credentials

Copy and paste the project name and the api key from your project page.
These will be used later to [connect to LanceDB Cloud](#scroll-to=5q8m6GMD7sGu)

In [1]:
project_slug = "your-project-slug"  # @param {type:"string"}

In [2]:
api_key = "sk_..."  # @param {type:"string"}

You can also set the LANCEDB_API_KEY as an environment variable. More details can be found <a href="https://github.com/lancedb/vectordb-recipes/tree/main/examples/RAG_Reranking/lancedb_cloud/README.md">**here**</a>.

Since we will be using OPENAI API, let us set the OPENAI API KEY as well.

In [None]:
openai_api_key = "sk-..."  # @param {type:"string"}

### Installing dependencies

In [None]:
! pip install llama-index-vector-stores-lancedb llama-index-readers-file llama-index-embeddings-openai llama-index-llms-openai

### Importing libraries

In [None]:
import openai
import logging
import sys

# Uncomment to see debug logs
# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index.core import SimpleDirectoryReader, Document, StorageContext
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.lancedb import LanceDBVectorStore
import textwrap

openai.api_key = openai_api_key
assert openai.models.list() is not None

### Download the data


In [None]:
! mkdir -p 'data/paul_graham/'
! wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
! ls 'data/paul_graham/'

Load the documents stored in the data/paul_graham/ using the SimpleDirectoryReader:

In [None]:
documents = SimpleDirectoryReader("data/paul_graham/").load_data()
print("Document ID:", documents[0].doc_id, "Document Hash:", documents[0].hash)

### Store data in LanceDB Cloud

Let's connect to LanceDB so we can store our documents, It requires 0 setup !

In [None]:
uri = "db://" + project_slug
table_name = "llamaindex_vectorstore"  #optional, default table name is "vectors" 

vector_store = LanceDBVectorStore( 
    uri=uri, # your remote DB URI
    api_key="sk_..", # lancedb cloud api key
    region="your-region" # the region you configured
    ...
)

### Create an index

In [None]:
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

And thats it! We're all setup. The next step is to run some queries, let's try a few:

### Query the index
We can now ask questions using the created index. Filtering can be enabled via `MetadataFilters` or use native lance `where` clause.

In [None]:
from datetime import datetime
from llama_index.core.vector_stores import (
    MetadataFilters,
    FilterOperator,
    FilterCondition,
    MetadataFilter,
)

date = datetime.today().strftime("%Y-%m-%d")
query_filters = MetadataFilters(
    filters=[
        MetadataFilter(
            key="creation_date",
            operator=FilterOperator.EQ,
            value=date,  # using current date as the latest data is scraped
        ),
        MetadataFilter(key="file_size", value=75040, operator=FilterOperator.GT),
    ],
    condition=FilterCondition.AND,
)

In [None]:
query_engine = index.as_query_engine(
    filters=query_filters,
)

response = query_engine.query("How much did Viaweb charge per month?")
print(response)
print("metadata -", response.metadata)

Viaweb charged $100 a month for a small store and $300 a month for a big one.
metadata - ...

Let's use LanceDB filters(SQL like) directly via the `where` clause :

In [None]:
lance_filter = "metadata.file_name = 'paul_graham_essay.txt' "
retriever = index.as_retriever(vector_store_kwargs={"where": lance_filter})
response = retriever.retrieve("What did the author do growing up?")
print(response[0].get_content())
print("metadata -", response[0].metadata)

### Append data to the index 
You can also add data to an existing index

In [None]:
del index

index = VectorStoreIndex.from_documents(
    [Document(text="The sky is purple in Portland, Maine")],
    uri="/tmp/new_dataset",
)

In [None]:
query_engine = index.as_query_engine()
response = query_engine.query("Where is the sky purple?")
print(textwrap.fill(str(response), 100))

Portland, Maine
