# Getting Started with LlamaIndex

**Disclaimer**: LlamaIndex is under active development, you should expect to see a lot of changes, always refer to the code relevant to the version of LlamaIndex you are using for up-to-date information. One example is whilst this instruction is being created, `GPTVectorStoreIndex` is still available in `v0.6.19`, however work to [remove GPT prefix](https://github.com/jerryjliu/llama_index/pull/5822) is currently underway.

In [None]:
!pip install llama-index

In [None]:
import os
import dotenv

dotenv.load_dotenv()

## Load in Documents

In [7]:
from llama_index import SimpleDirectoryReader

documents = SimpleDirectoryReader('data').load_data()
print(documents[0])

Document(text='The Tragedy of Hamlet, Prince of\nDenmark\nASCII text placed in the public domain by Moby Lexical Tools, 1992. SGML markup by Jon Bosak,\n1992-1994. XML version by Jon Bosak, 1996-1999. Simplified XML version by Max Froumentin, 2001. The\nXML markup in this version is Copyright © 1999 Jon Bosak. This work may freely be distributed on condition\nthat it not be modified or altered in any way.', doc_id='fe86ebaf-87d1-478c-97c9-8a4d922e9aed', embedding=None, doc_hash='dc9a9296c1508709c152f8fe8fd787359df4934ddd203486243327a44b6f92bb', extra_info={'page_label': '1', 'file_name': 'hamlet.pdf'})


**Customise a Document**

In [31]:
from llama_index.readers import Document

customised_document = Document(
    documents[0],
    extra_info = {
        'filename': 'hamlet.pdf',
        'genre': 'drama'
    }
)

## Parse the Documents into Nodes

In [6]:
from llama_index.node_parser import SimpleNodeParser

parser = SimpleNodeParser()
nodes = parser.get_nodes_from_documents(documents)

print(nodes[0])

Node(text='The Tragedy of Hamlet, Prince of\nDenmark\nASCII text placed in the public domain by Moby Lexical Tools, 1992. SGML markup by Jon Bosak,\n1992-1994. XML version by Jon Bosak, 1996-1999. Simplified XML version by Max Froumentin, 2001. The\nXML markup in this version is Copyright © 1999 Jon Bosak. This work may freely be distributed on condition\nthat it not be modified or altered in any way.', doc_id='3768ae18-11fb-4964-93ea-7a80ad9e95ba', embedding=None, doc_hash='dc9a9296c1508709c152f8fe8fd787359df4934ddd203486243327a44b6f92bb', extra_info={'page_label': '1', 'file_name': 'hamlet.pdf'}, node_info={'start': 0, 'end': 388, '_node_type': <NodeType.TEXT: '1'>}, relationships={<DocumentRelationship.SOURCE: '1'>: 'e878f013-2f3f-49d9-9ef0-34cc125a60b0'})


## Construct Index (from Nodes or Documents)

In [17]:
from llama_index import GPTVectorStoreIndex

index = GPTVectorStoreIndex.from_documents(documents)

**Reuse Nodes**

In [25]:
from llama_index import StorageContext, GPTListIndex

storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)

index1 = GPTVectorStoreIndex(nodes, storage_context = storage_context)
index2 = GPTListIndex(nodes, storage_context = storage_context)

**Insert Nodes or Documents**

In [29]:
from llama_index import GPTVectorStoreIndex

index = GPTVectorStoreIndex([])
index.insert_nodes(nodes)

In [28]:
from llama_index import GPTVectorStoreIndex

index = GPTVectorStoreIndex([])
for document in documents:
    index.insert(document)

**Customise LLMs**

Default model is `text-davinci-003`, you can specify other models.

In [34]:
from langchain import OpenAI
from llama_index import LLMPredictor, GPTVectorStoreIndex, ServiceContext

llm = OpenAI(model_name = 'text-davinci-003')
llm_predictor = LLMPredictor(llm = llm)
service_context = ServiceContext.from_defaults(llm_predictor = llm_predictor)

index = GPTVectorStoreIndex.from_documents(documents, service_context = service_context)

## Persist Index

In [36]:
from llama_index import StorageContext, load_index_from_storage

index.storage_context.persist(persist_dir = 'storage')

storage_context = StorageContext.from_defaults(persist_dir = 'storage')
index = load_index_from_storage(storage_context)

## Query Index

In [38]:
query_engine = index.as_query_engine()
response = query_engine.query("What, according to Hamlet, keeps a person from escaping the troubles of this life?")

print(response)


According to Hamlet, what keeps a person from escaping the troubles of this life is the fear of the unknown after death. He states that the "dread of something after death, the undiscover'd country from whose bourn no traveller returns, puzzles the will and makes us rather bear those ills we have than fly to others that we know not of."


In [94]:
from llama_index import ResponseSynthesizer
from llama_index.retrievers import VectorIndexRetriever
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.indices.postprocessor import SimilarityPostprocessor

retriever = VectorIndexRetriever(index=index, similarity_top_k=5)

response_synthesizer = ResponseSynthesizer.from_args(
    node_postprocessors=[
        SimilarityPostprocessor(similarity_cutoff=0.7)
    ]
)

query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
)

response = query_engine.query("What prompts Hamlet to say, 'My thoughts be bloody or nothing worth!'?")
print(response)


Hamlet is prompted to say, "My thoughts be bloody or nothing worth!" after he reflects on the imminent death of twenty thousand men who are going to their graves for a plot of land that is not worth fighting for. He is overwhelmed by the senselessness of the situation and is determined to take revenge for the injustice.


**Configure Retriever**

Default is `ListIndexRetriever`

In [42]:
retriever = index.as_retriever(retriever_mode='default') # ListIndexRetriever

query_engine = RetrieverQueryEngine(retriever)
response = query_engine.query("What, according to Hamlet, keeps a person from escaping the troubles of this life?")
print(response)


According to Hamlet, the fear of something after death, the unknown of the "undiscovered country from whose bourn no traveller returns," keeps a person from escaping the troubles of this life.


In [44]:
retriever = index.as_retriever(retriever_mode='embedding') # ListIndexEmbeddingRetriever

query_engine = RetrieverQueryEngine(retriever)
response = query_engine.query("What, according to Hamlet, keeps a person from escaping the troubles of this life?")
print(response)


According to Hamlet, the fear of something after death, the unknown of the "undiscovered country from whose bourn no traveller returns," keeps a person from escaping the troubles of this life.


**Configure Response Synthesis**

In [52]:
from llama_index.query_engine import RetrieverQueryEngine

retriever = index.as_retriever()

query_engine = RetrieverQueryEngine.from_args(retriever, response_mode='refine')
response = query_engine.query("What prompts Hamlet to say, 'My thoughts be bloody or nothing worth!'?")
print(f"response_mode='refine': {response}\n")

query_engine = RetrieverQueryEngine.from_args(retriever, response_mode='compact')
response = query_engine.query("What prompts Hamlet to say, 'My thoughts be bloody or nothing worth!'?")
print(f"response_mode='compact': {response}\n")

query_engine = RetrieverQueryEngine.from_args(retriever, response_mode='simple_summarize')
response = query_engine.query("What prompts Hamlet to say, 'My thoughts be bloody or nothing worth!'?")
print(f"response_mode='simple_summarize': {response}\n")

query_engine = RetrieverQueryEngine.from_args(retriever, response_mode='tree_summarize')
response = query_engine.query("What prompts Hamlet to say, 'My thoughts be bloody or nothing worth!'?")
print(f"response_mode='tree_summarize': {response}\n")

query_engine = RetrieverQueryEngine.from_args(retriever, response_mode='generation')
response = query_engine.query("What prompts Hamlet to say, 'My thoughts be bloody or nothing worth!'?")
print(f"response_mode='generation': {response}\n")

query_engine = RetrieverQueryEngine.from_args(retriever, response_mode='no_text')
response = query_engine.query("What prompts Hamlet to say, 'My thoughts be bloody or nothing worth!'?")
print(f"response_mode='no_text': {response}\n")

query_engine = RetrieverQueryEngine.from_args(retriever, response_mode='accumulate')
response = query_engine.query("What prompts Hamlet to say, 'My thoughts be bloody or nothing worth!'?")
print(f"response_mode='accumulate': {response}\n")

query_engine = RetrieverQueryEngine.from_args(retriever, response_mode='compact_accumulate')
response = query_engine.query("What prompts Hamlet to say, 'My thoughts be bloody or nothing worth!'?")
print(f"response_mode='compact_accumulate': {response}\n")

response_mode='refine': 

Hamlet is prompted to say, 'My thoughts be bloody or nothing worth!' after he reflects on the fact that twenty thousand men are going to their graves for a fantasy and trick of fame, and that their deaths are not even enough to hide the slain. He is overwhelmed by the tragedy of the situation and decides that his thoughts must be bloody in order to be of any worth. He is further prompted by the realization that he has the cause, will, strength, and means to take revenge for his father's death, yet he is still hesitating. He is disgusted by his own cowardice and decides that if he is to be great, he must find the courage to act, even if it is for something as insignificant as a straw.

response_mode='compact': 
Hamlet is prompted to say, "My thoughts be bloody or nothing worth!" after reflecting on the imminent death of twenty thousand men who are fighting for a cause that cannot be tried by numbers. He is overwhelmed by the senselessness of the situation and i

**Configure Node Post-processors**

This is a powerful feature, it improves the relevancy of the the retrieved nodes, it also reduces the time and number of calls to LLMs. The post-processors work via filtering and augmenting, to demonstrate its usage, let's take a look at this example:

**Question**: What is the content of Hamlet's letter to Horatio?

**Answer**: It explains he escaped Rosencrantz and Guildenstern onto a Pirate ship. They are treating him well because pirates want favors from him. He wants Horatio to give letters to King and Queen and come to see him.

In [79]:
query_engine = index.as_query_engine()
response = query_engine.query("What is the content of Hamlet's letter?")

print(response)


Hamlet's letter is to his friend Horatio, informing him that Rosencrantz and Guildenstern are on their way to England and that he has something important to tell him. He also expresses his grief over his mother's remarriage to his uncle so soon after his father's death. He laments the frailty of women and the unfairness of the world.


In [83]:
from llama_index.indices.postprocessor.node import KeywordNodePostprocessor

node_postprocessors = [
    KeywordNodePostprocessor(
        required_keywords=["Rosencrantz", "Guildenstern"]
    )
]
query_engine = index.as_query_engine(node_postprocessors=node_postprocessors)
response = query_engine.query("What is the content of Hamlet's letter?")

print(response)


Hamlet's letter is to a friend, instructing them to bring Rosencrantz and Guildenstern to him with as much speed as possible. He has words to tell them that will make them dumb, and he promises to tell them more when they arrive. He also asks his friend to deliver the letters he has sent to the King.


## Parse the Response

In [87]:
query_engine = index.as_query_engine()
response = query_engine.query("What, according to Hamlet, keeps a person from escaping the troubles of this life?")

In [90]:
from IPython.display import Markdown, display

display(Markdown(f"{response}"))


According to Hamlet, the fear of something after death, the unknown of the "undiscovered country from whose bourn no traveller returns," keeps a person from escaping the troubles of this life.

In [91]:
print(response.get_formatted_sources())

> Source (Doc id: d9c09b96-8dd9-4721-841b-82c05aee7dad): page_label: 63
file_name: hamlet.pdf

Your loneliness. We are oft to blame in this,--
'Tis too mu...

> Source (Doc id: 65ddf2d1-501b-4a70-8c80-608ee84c9e1f): page_label: 49
file_name: hamlet.pdf

HAMLET
Then is doomsday near: but your news is not true.
Le...


In [92]:
print(response.source_nodes)

[NodeWithScore(node=Node(text="Your loneliness. We are oft to blame in this,--\n'Tis too much proved--that with devotion's visage\nAnd pious action we do sugar o'er\nThe devil himself.\nKING CLAUDIUS\nAside\nO, 'tis too true!\nHow smart a lash that speech doth give my conscience!\nThe harlot's cheek, beautied with plastering art,\nIs not more ugly to the thing that helps it\nThan is my deed to my most painted word:\nO heavy burthen!\nLORD POLONIUS\nI hear him coming: let's withdraw, my lord.\nExeunt KING CLAUDIUS and POLONIUS\nEnter HAMLET\nHAMLET\nTo be, or not to be: that is the question:\nWhether 'tis nobler in the mind to suffer\nThe slings and arrows of outrageous fortune,\nOr to take arms against a sea of troubles,\nAnd by opposing end them? To die: to sleep;\nNo more; and by a sleep to say we end\nThe heart-ache and the thousand natural shocks\nThat flesh is heir to, 'tis a consummation\nDevoutly to be wish'd. To die, to sleep;\nTo sleep: perchance to dream: ay, there's the rub;