# Getting Started with LlamaIndex

**Disclaimer**: _LlamaIndex is under active development, you should expect to see a lot of changes, always refer to the code relevant to the version of LlamaIndex you are using for up-to-date information. One example is whilst this instruction is being created, `GPTVectorStoreIndex` is still available in `v0.6.19`, however work to [remove GPT prefix](https://github.com/jerryjliu/llama_index/pull/5822) is currently underway._

LlamaIndex is created to provide an interface between LLMs and your data, to achieve a range of tasks, including `question-answering`, `summarization`, `structured queries` etc.

Let's take a look at a flow diagram for LlamaIndex's query interface.

![query_interface](query_interface.png)

- **Documents**: Data is represented in the form of Document. LlamaIndex provides a range of data connectors to load data from various sources. Refer to [Llama Hub](https://llamahub.ai/), you are most likely to find the data connector you need, of course, feel free to contribute to it if not.
- **Nodes**: Nodes are effectively chunks of source documents. They can be chunks of text, images, and more.
- **Index**: Index is at the core of LlamaIndex library. It abstracts away underlying storage and maintains the state of the processed documents., and exposes a lightweight view of the data.
- **Retriever**: A retriever retrieves the most relevant nodes from an index given a query.
- **Response Synthesizer**: A synthesizer can filter and augment the retrieved nodes to further optimise the relevancy of result and reduce token cost.
- **Query Engine**: A query engine takes in a user query and returns a response.
- **Storage Context**: The storage context provides the abstraction layer over data storage layer.
- **Storage**: This can be a number of things, document store (documents), index store (metadata) or vector store (embeddings).

To start the tutorial, first install `llama-index` library.

In [None]:
!pip install llama-index

Make sure `OpenAI API key` is set.

In [101]:
import os
os.environ['OPENAI_API_KEY'] = "YOUR OPENAI API KEY"

Alternatively, if you have OpenAI api key setup in `.env`.

In [None]:
import dotenv
dotenv.load_dotenv()

## Load in Documents

First step is to load the documents.

In [102]:
from llama_index import SimpleDirectoryReader

documents = SimpleDirectoryReader('data').load_data()
print(documents[0])

Document(text='The Tragedy of Hamlet, Prince of\nDenmark\nASCII text placed in the public domain by Moby Lexical Tools, 1992. SGML markup by Jon Bosak,\n1992-1994. XML version by Jon Bosak, 1996-1999. Simplified XML version by Max Froumentin, 2001. The\nXML markup in this version is Copyright © 1999 Jon Bosak. This work may freely be distributed on condition\nthat it not be modified or altered in any way.', doc_id='36ce4a60-be0f-42e3-8f10-62051a88d3ac', embedding=None, doc_hash='dc9a9296c1508709c152f8fe8fd787359df4934ddd203486243327a44b6f92bb', extra_info={'page_label': '1', 'file_name': 'hamlet.pdf'})


**Customise a Document**

You can also manually create a `Document`.

In [31]:
from llama_index.readers import Document

customised_document = Document(
    documents[0],
    extra_info = {
        'filename': 'hamlet.pdf',
        'genre': 'drama'
    }
)

## Parse the Documents into Nodes

Then turn the documents loaded into a list of `nodes`, affectively `chunks` of the documents.

In [103]:
from llama_index.node_parser import SimpleNodeParser

parser = SimpleNodeParser()
nodes = parser.get_nodes_from_documents(documents)

print(nodes[0])

Node(text='The Tragedy of Hamlet, Prince of\nDenmark\nASCII text placed in the public domain by Moby Lexical Tools, 1992. SGML markup by Jon Bosak,\n1992-1994. XML version by Jon Bosak, 1996-1999. Simplified XML version by Max Froumentin, 2001. The\nXML markup in this version is Copyright © 1999 Jon Bosak. This work may freely be distributed on condition\nthat it not be modified or altered in any way.', doc_id='3b55546b-b1b7-4f46-ba52-61fa7153abcb', embedding=None, doc_hash='dc9a9296c1508709c152f8fe8fd787359df4934ddd203486243327a44b6f92bb', extra_info={'page_label': '1', 'file_name': 'hamlet.pdf'}, node_info={'start': 0, 'end': 388, '_node_type': <NodeType.TEXT: '1'>}, relationships={<DocumentRelationship.SOURCE: '1'>: '36ce4a60-be0f-42e3-8f10-62051a88d3ac'})


## Construct Index

`Index` is the interface that combines the data (from nodes or documents) and LLM, and make the data ready for querying.

There are many different types of indexes, such as `List Index`, `Vector Store Index`, `Tree Index`, `Keyword Table Index`, `Structured Store Index` and `Knowledge Graph Index`. All created for unique use cases. Here, I will only focus on the most commonly used indexes, and elaborate on other index types another time.

This is the simplest way to create an index from documents.

In [104]:
from llama_index import GPTVectorStoreIndex

index = GPTVectorStoreIndex.from_documents(documents)

**Reuse Nodes**

You can also create an `index` from the nodes created. Also note that once the nodes are created and persisted, they can be reused as well.

In [25]:
from llama_index import StorageContext, GPTListIndex

storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)

index1 = GPTVectorStoreIndex(nodes, storage_context = storage_context)
index2 = GPTListIndex(nodes, storage_context = storage_context)

**Insert Nodes or Documents**

To enrich index with more data, you can insert more nodes or documents to it.

In [29]:
from llama_index import GPTVectorStoreIndex

index = GPTVectorStoreIndex([])
index.insert_nodes(nodes)

In [28]:
from llama_index import GPTVectorStoreIndex

index = GPTVectorStoreIndex([])
for document in documents:
    index.insert(document)

**Customise LLMs**

You can also customise the LLMs used in the index. The default model is `text-davinci-003`.

In [34]:
from langchain import OpenAI
from llama_index import LLMPredictor, GPTVectorStoreIndex, ServiceContext

llm = OpenAI(model_name = 'text-davinci-003')
llm_predictor = LLMPredictor(llm = llm)
service_context = ServiceContext.from_defaults(llm_predictor = llm_predictor)

index = GPTVectorStoreIndex.from_documents(documents, service_context = service_context)

**Persist Index**

Index can also be persisted via `Storage Context` so that it can be reused.

In [36]:
from llama_index import StorageContext, load_index_from_storage

index.storage_context.persist(persist_dir = 'storage')

storage_context = StorageContext.from_defaults(persist_dir = 'storage')
index = load_index_from_storage(storage_context)

## Query Index

In order to query the index, first turn an `index` into a `query engine`.

In [38]:
query_engine = index.as_query_engine()
response = query_engine.query("What, according to Hamlet, keeps a person from escaping the troubles of this life?")

print(response)


According to Hamlet, what keeps a person from escaping the troubles of this life is the fear of the unknown after death. He states that the "dread of something after death, the undiscover'd country from whose bourn no traveller returns, puzzles the will and makes us rather bear those ills we have than fly to others that we know not of."


**Custom Index**

However, query engine creation can be more sophisticated. You can specify a `retriever` and `response synthesizer`.

In [94]:
from llama_index import ResponseSynthesizer
from llama_index.retrievers import VectorIndexRetriever
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.indices.postprocessor import SimilarityPostprocessor

retriever = VectorIndexRetriever(index=index, similarity_top_k=5)

response_synthesizer = ResponseSynthesizer.from_args(
    node_postprocessors=[
        SimilarityPostprocessor(similarity_cutoff=0.7)
    ]
)

query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
)

response = query_engine.query("What prompts Hamlet to say, 'My thoughts be bloody or nothing worth!'?")
print(response)


Hamlet is prompted to say, "My thoughts be bloody or nothing worth!" after he reflects on the imminent death of twenty thousand men who are going to their graves for a plot of land that is not worth fighting for. He is overwhelmed by the senselessness of the situation and is determined to take revenge for the injustice.


**Configure Retriever**

There are a couple of ways to define a retriever. Followings are the second, where you convert the index into a retriever by supplying the `mode` of retriever.
- **default**: This creates a simple retriever, `ListIndexRetriever`, for `ListIndex` that returns all nodes.
- **embedding**: This creates an embedding based retriever, `ListIndexEmbeddingRetriever`, for `ListIndex`.
- **llm**: This creates an LLM based retriever, `ListIndexLLMRetriever`, for `ListIndex`.

In [42]:
retriever = index.as_retriever(retriever_mode='default')

query_engine = RetrieverQueryEngine(retriever)
response = query_engine.query("What, according to Hamlet, keeps a person from escaping the troubles of this life?")
print(response)


According to Hamlet, the fear of something after death, the unknown of the "undiscovered country from whose bourn no traveller returns," keeps a person from escaping the troubles of this life.


**Configure Response Synthesis**

Further more, you can also customise `response_synthesizer`. Here is another way to define the `mode` of response_synthesizer. There are currently eight different modes.
- **refine**: This uses the context in the first node, along with the query, to generate an initial answer. Then pass this answer, the query, and the context of the second node as input into a "refine prompt" to generate a refined answer.
- **compact**: This is the default mode. It first combines text chunks into larger consolidated chunks that more fully utilize the available context window, then refine answers across them. This mode is faster than refine since we make fewer calls to the LLM.
- **simple_summarize**: It merges all text chunks into one, and make a LLM call. This will fail if the merged text chunk exceeds the context window size.
- **tree_summarize**: This approach builds a tree index over the set of candidate nodes, with a summary prompt seeded with the query. 
- **generation**: It ignores context, just use LLM to generate a response.
- **no_text**: It returns the retrieved context nodes, without synthesizing a final response.
- **accumulate**: It synthesises a response for each text chunk, and then return the concatenation.
- **compact_accumulate**: This mode combines text chunks into larger consolidated chunks that more fully utilise the available context window, then accumulate answers for each of them and finally return the concatenation. This mode is faster than accumulate since we make fewer calls to the LLM.

I will list a few representative modes for your reference below. 

In [105]:
from llama_index.query_engine import RetrieverQueryEngine

retriever = index.as_retriever()

query_engine = RetrieverQueryEngine.from_args(retriever, response_mode='compact')
response = query_engine.query("What prompts Hamlet to say, 'My thoughts be bloody or nothing worth!'?")
print(f"response_mode='compact': {response}\n")

query_engine = RetrieverQueryEngine.from_args(retriever, response_mode='tree_summarize')
response = query_engine.query("What prompts Hamlet to say, 'My thoughts be bloody or nothing worth!'?")
print(f"response_mode='tree_summarize': {response}\n")

query_engine = RetrieverQueryEngine.from_args(retriever, response_mode='generation')
response = query_engine.query("What prompts Hamlet to say, 'My thoughts be bloody or nothing worth!'?")
print(f"response_mode='generation': {response}\n")

query_engine = RetrieverQueryEngine.from_args(retriever, response_mode='compact_accumulate')
response = query_engine.query("What prompts Hamlet to say, 'My thoughts be bloody or nothing worth!'?")
print(f"response_mode='compact_accumulate': {response}\n")

response_mode='compact': 
Hamlet is prompted to say, "My thoughts be bloody or nothing worth!" after reflecting on the imminent death of twenty thousand men who are fighting for a plot of land that is not worth the cost of their lives. He is disgusted by the idea of so many people dying for something so trivial and decides that his thoughts must be bloody in order to be of any worth.

response_mode='tree_summarize': 
Hamlet is prompted to say, "My thoughts be bloody or nothing worth!" after reflecting on the imminent death of twenty thousand men who are fighting for a plot of land that is not worth the cost of their lives. He is disgusted by the idea of so many people dying for something so trivial and decides that his thoughts must be bloody in order to be of any worth.

response_mode='generation': 

Hamlet says this line in Act IV, Scene IV, after he has decided to take revenge on his uncle Claudius for murdering his father. He is reflecting on his decision and the consequences of it

**Configure Node Post-Processors**

The `post-processors` work via `filtering` and `augmenting` retrieved nodes. This is a more advanced feature, it can improve the relevancy of the the retrieved nodes, and reduce the time and number of calls to LLMs. To demonstrate its usage, let's compare the results for this:

**Question**: What is the content of Hamlet's letter to Horatio?

**Answer**: It explains he escaped Rosencrantz and Guildenstern onto a Pirate ship. They are treating him well because pirates want favors from him. He wants Horatio to give letters to King and Queen and come to see him.

In [79]:
query_engine = index.as_query_engine()
response = query_engine.query("What is the content of Hamlet's letter?")

print(response)


Hamlet's letter is to his friend Horatio, informing him that Rosencrantz and Guildenstern are on their way to England and that he has something important to tell him. He also expresses his grief over his mother's remarriage to his uncle so soon after his father's death. He laments the frailty of women and the unfairness of the world.


In [83]:
from llama_index.indices.postprocessor.node import KeywordNodePostprocessor

node_postprocessors = [
    KeywordNodePostprocessor(
        required_keywords=["Rosencrantz", "Guildenstern"]
    )
]
query_engine = index.as_query_engine(node_postprocessors=node_postprocessors)
response = query_engine.query("What is the content of Hamlet's letter?")

print(response)


Hamlet's letter is to a friend, instructing them to bring Rosencrantz and Guildenstern to him with as much speed as possible. He has words to tell them that will make them dumb, and he promises to tell them more when they arrive. He also asks his friend to deliver the letters he has sent to the King.


## Parse the Response

Before the result is presented to the users, you can format the `Response` object accordingly.

In [87]:
query_engine = index.as_query_engine()
response = query_engine.query("What, according to Hamlet, keeps a person from escaping the troubles of this life?")

In [90]:
from IPython.display import Markdown, display

display(Markdown(f"{response}"))


According to Hamlet, the fear of something after death, the unknown of the "undiscovered country from whose bourn no traveller returns," keeps a person from escaping the troubles of this life.

In [91]:
print(response.get_formatted_sources())

> Source (Doc id: d9c09b96-8dd9-4721-841b-82c05aee7dad): page_label: 63
file_name: hamlet.pdf

Your loneliness. We are oft to blame in this,--
'Tis too mu...

> Source (Doc id: 65ddf2d1-501b-4a70-8c80-608ee84c9e1f): page_label: 49
file_name: hamlet.pdf

HAMLET
Then is doomsday near: but your news is not true.
Le...


In [92]:
print(response.source_nodes)

[NodeWithScore(node=Node(text="Your loneliness. We are oft to blame in this,--\n'Tis too much proved--that with devotion's visage\nAnd pious action we do sugar o'er\nThe devil himself.\nKING CLAUDIUS\nAside\nO, 'tis too true!\nHow smart a lash that speech doth give my conscience!\nThe harlot's cheek, beautied with plastering art,\nIs not more ugly to the thing that helps it\nThan is my deed to my most painted word:\nO heavy burthen!\nLORD POLONIUS\nI hear him coming: let's withdraw, my lord.\nExeunt KING CLAUDIUS and POLONIUS\nEnter HAMLET\nHAMLET\nTo be, or not to be: that is the question:\nWhether 'tis nobler in the mind to suffer\nThe slings and arrows of outrageous fortune,\nOr to take arms against a sea of troubles,\nAnd by opposing end them? To die: to sleep;\nNo more; and by a sleep to say we end\nThe heart-ache and the thousand natural shocks\nThat flesh is heir to, 'tis a consummation\nDevoutly to be wish'd. To die, to sleep;\nTo sleep: perchance to dream: ay, there's the rub;