# LanceDB Index Demo
In this notebook we are going to show how to use [LanceDB](https://www.lancedb.com) to perform vector searches in LlamaIndex

In [10]:
import logging
import sys

# Uncomment to see debug logs
# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
from gpt_index import SimpleDirectoryReader, Document
from gpt_index.indices.vector_store.vector_indices import GPTLanceDBIndex
import textwrap

### Setup OpenAI
The first step is to configure the openai key. It will be used to created embeddings for the documents loaded into the index

In [11]:
import os
os.environ['OPENAI_API_KEY'] = ""

### Loading documents
Load the documents stored in the `paul_graham_essay/data` using the SimpleDirectoryReader

In [12]:
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
print('Document ID:', documents[0].doc_id, 'Document Hash:', documents[0].doc_hash)

Document ID: df5cfb31-d014-4780-9bc4-34a541544e35 Document Hash: 77ae91ab542f3abb308c4d7c77c9bc4c9ad0ccd63144802b7cbe7e1bb3a4094e


### Create the index
Here we create an index backed by LanceDB using the documents loaded previously. GPTLanceDBIndex takes a few arguments.
- uri (str, required): Location where LanceDB will store its files.
- table_name (str, optional): The table name where the embeddings will be stored. Defaults to "vectors".
- nprobes (int, optional): The number of probes used. A higher number makes search more accurate but also slower. Defaults to 20.
- refine_factor: (int, optional): Refine the results by reading extra elements and re-ranking them in memory. Defaults to None

- More details can be found at the [LanceDB docs](https://lancedb.github.io/lancedb/ann_indexes)

In [13]:
index = GPTLanceDBIndex.from_documents(documents, uri="/tmp/lancedb")

INFO:gpt_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:gpt_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 17617 tokens


### Query the index
We can now ask questions using our index.

In [14]:
response = index.query("Who is the author?")

INFO:gpt_index.token_counter.token_counter:> [query] Total LLM token usage: 3720 tokens
INFO:gpt_index.token_counter.token_counter:> [query] Total embedding token usage: 5 tokens


In [15]:
print(textwrap.fill(str(response), 100))

  The author of the text is Paul Graham, co-founder of Y Combinator.


In [16]:
response = index.query("What did the author do growing up?")

INFO:gpt_index.token_counter.token_counter:> [query] Total LLM token usage: 4082 tokens
INFO:gpt_index.token_counter.token_counter:> [query] Total embedding token usage: 8 tokens


In [17]:
print(textwrap.fill(str(response), 100))

  The author grew up writing short stories, programming on an IBM 1401, and building a computer kit
with a friend. They also wrote programs for a TRS-80 computer, such as games, a program to predict
model rocket flight, and a word processor. In college, they studied philosophy and AI, and wrote a
book about Lisp hacking. They also took art classes and applied to art schools, and while a student
at the Accademia, they started painting still lives in their bedroom at night. These paintings were
tiny, because the room was, and because they painted them on leftover scraps of canvas, which was
all they could afford at the time. They also arrived at an arrangement with the faculty whereby the
students wouldn't require the faculty to teach anything, and in return the faculty wouldn't require
the students to learn anything. They even had a little stove, fed with kindling, that you see in
19th century studio paintings, and a nude model sitting as close to it as possible without getting
burned. 

### Saving / Loading the Index
You can save the index configuration for later usage

In [19]:
saved_index = index.save_to_dict()

You can load the index from the saved information

In [20]:
del index

index = GPTLanceDBIndex.load_from_dict(saved_index)
print(index.query("Who is the author?"))

INFO:gpt_index.token_counter.token_counter:> [query] Total LLM token usage: 3720 tokens
INFO:gpt_index.token_counter.token_counter:> [query] Total embedding token usage: 5 tokens




The author of the text is Paul Graham, co-founder of Y Combinator.


### Appending data
You can also add data to an existing index

In [25]:
del index

index = GPTLanceDBIndex.from_documents([Document("The sky is blue")], uri="/tmp/new_dataset")

INFO:gpt_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:gpt_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 4 tokens


In [26]:
response = index.query("Who is the author?")
print(textwrap.fill(str(response), 100))

INFO:gpt_index.token_counter.token_counter:> [query] Total LLM token usage: 43 tokens
INFO:gpt_index.token_counter.token_counter:> [query] Total embedding token usage: 5 tokens


 The author is unknown.


In [27]:
index = GPTLanceDBIndex.from_documents(documents, uri="/tmp/new_dataset")

INFO:gpt_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:gpt_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 17617 tokens


In [28]:
response = index.query("Who is the author?")
print(textwrap.fill(str(response), 100))

INFO:gpt_index.token_counter.token_counter:> [query] Total LLM token usage: 3720 tokens
INFO:gpt_index.token_counter.token_counter:> [query] Total embedding token usage: 5 tokens


  The author of the text is Paul Graham, co-founder of Y Combinator.
