In [20]:
from llama_index import download_loader, ServiceContext, VectorStoreIndex
from dotenv import load_dotenv, find_dotenv
from llama_index.llms import OpenAI
import openai
from llama_index.embeddings import OpenAIEmbedding
from llama_index.embeddings.openai import OpenAIEmbeddingModelType
import os

_ = load_dotenv(find_dotenv()) # read local .env file
openai.api_key = os.environ['OPENAI_API_KEY']

WikipediaReader = download_loader("WikipediaReader")

loader = WikipediaReader()
pages = ['Nicolas_Cage', 'The_Best_of_Times_(1981_film)', 'Leonardo DiCaprio']
documents = loader.load_data(pages=pages, auto_suggest=False, redirect = False)
llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
gpt3 = OpenAI(temperature=0, model="text-davinci-003")

embed_model = OpenAIEmbedding(model= OpenAIEmbeddingModelType.TEXT_EMBED_ADA_002)

service_context_gpt3 = ServiceContext.from_defaults(llm=gpt3, chunk_size = 256, chunk_overlap=0, embed_model=embed_model)

index = VectorStoreIndex.from_documents(documents, service_context=service_context_gpt3)

retriever = index.as_retriever(similarity_top_k=3)

In [2]:
# The response from original prompt
from llama_index.prompts import PromptTemplate

template = (
"We have provided context information below. \n"
    "---------------------\n"
    "{context_str}"
    "\n---------------------\n"
    "Given this information, please answer the question: {query_str}\n"
    "Don't give an answer unless it is supported by the context above.\n"
)

qa_template = PromptTemplate(template)

### Query 1

In [3]:
question = "Who directed the pilot that marked the acting debut of Nicolas Cage?"

contexts = retriever.retrieve(question)

# you can create text prompt (for completion API)
context_list = [n.get_content() for n in contexts]

prompt = qa_template.format(context_str="\n\n".join(context_list), query_str=question)

response = llm.complete(prompt)
print(str(response))

The context does not provide information about who directed the pilot that marked the acting debut of Nicolas Cage.


In [4]:
context_list

['It blew my mind. I was like, \'That\'s what I want to do\'."At age 15, he tried to convince his uncle, Francis Ford Coppola, to give him a screen test, telling him "I\'ll show you acting." His outburst was met with "silence in the car". By this stage of his career, Coppola had already directed Marlon Brando, Al Pacino, Gene Hackman and Robert De Niro. Although early in his career Cage appeared in some of his uncle\'s films, he changed his name to Nicolas Cage to avoid the appearance of nepotism as Coppola\'s nephew. His choice of name was inspired by the Marvel Comics superhero Luke Cage and composer John Cage.\n\n\n== Career ==\n\n\n=== 1981–1988: Early work and breakthrough ===\nCage made his acting debut in the 1981 television pilot The Best of Times, which was never picked up by ABC. His film debut followed in 1982, with a minor role as an unnamed co-worker of Judge Reinhold\'s character in the coming-of-age film Fast Times at Ridgemont High, having originally auditioned for Rein

### Query 2

In [21]:
question = "Compare the education received by Nicolas Cage and Leonardo DiCaprio."

contexts = retriever.retrieve(question)

# you can create text prompt (for completion API)
context_list = [n.get_content() for n in contexts]

prompt = qa_template.format(context_str="\n\n".join(context_list), query_str=question)

response = llm.complete(prompt)
print(str(response))

Based on the provided context, both Nicolas Cage and Leonardo DiCaprio had different educational experiences. DiCaprio attended the Los Angeles Center for Enriched Studies for four years and later the Seeds Elementary School before enrolling at John Marshall High School. However, he dropped out of high school and eventually earned a general equivalency diploma. On the other hand, there is no specific information provided about Nicolas Cage's education in the given context.


In [22]:
context_list

['He has described his parents as "bohemian in every sense of the word" and as "the people I trust the most in the world". DiCaprio has stated that he grew up poor in a neighborhood plagued with prostitution, crime and violence. Attending the Los Angeles Center for Enriched Studies for four years and later the Seeds Elementary School, he later enrolled at the John Marshall High School. DiCaprio disliked public school and wanted to audition for acting jobs instead. He dropped out of high school later, eventually earning a general equivalency diploma.As a child, DiCaprio wanted to become either a marine biologist or an actor. He eventually favored the latter; he liked impersonating characters and imitating people, and enjoyed seeing their reactions to his acting. According to DiCaprio, his interest in performing began at the age of two when he went onto the stage at a performance festival and danced spontaneously to a positive response from the crowd. He was also motivated to learn actin

# Multi-Step Query

In [23]:
import logging

# Set logging level to WARNING to suppress INFO and DEBUG messages
logging.basicConfig(level=logging.WARNING)
logging.getLogger("httpx").setLevel(logging.WARNING)

from llama_index.indices.query.query_transform.base import (
    StepDecomposeQueryTransform,
)
from llama_index.query_engine.multistep_query_engine import (
    MultiStepQueryEngine,
)

# gpt-3
step_decompose_transform_gpt3 = StepDecomposeQueryTransform(
    llm=gpt3, verbose=True
)
index_summary = "Used to answer questions"

query_engine = index.as_query_engine(service_context=service_context_gpt3)

query_engine = MultiStepQueryEngine(
    query_engine=query_engine,
    query_transform=step_decompose_transform_gpt3,
    index_summary=index_summary,
    num_steps=2
)

### Query 1

In [25]:
response_gpt3 = query_engine.query(
    "Who directed the pilot that marked the acting debut of Nicolas Cage?",
)
print(str(response_gpt3))
sub_qa_q1 = response_gpt3.metadata["sub_qa"]
tuples = [(t[0], t[1].response) for t in sub_qa_q1]
print(tuples)

[1;3;33m> Current query: Who directed the pilot that marked the acting debut of Nicolas Cage?
[0m[1;3;38;5;200m> New query:  Who directed the pilot that marked the acting debut of Nicolas Cage?
[0m[1;3;33m> Current query: Who directed the pilot that marked the acting debut of Nicolas Cage?
[0m[1;3;38;5;200m> New query:  Who directed the Best of Times pilot that marked the acting debut of Nicolas Cage?
[0mDon Mischer directed the pilot that marked the acting debut of Nicolas Cage.
[(' Who directed the pilot that marked the acting debut of Nicolas Cage?', ' The Best of Times pilot that marked the acting debut of Nicolas Cage was not directed by anyone in the Coppola family. It was directed by Rod Amateau.'), (' Who directed the Best of Times pilot that marked the acting debut of Nicolas Cage?', ' Don Mischer directed the Best of Times pilot that marked the acting debut of Nicolas Cage.')]


In [26]:
sub_qa_q1

[(' Who directed the pilot that marked the acting debut of Nicolas Cage?',
  Response(response=' The Best of Times pilot that marked the acting debut of Nicolas Cage was not directed by anyone in the Coppola family. It was directed by Rod Amateau.', source_nodes=[NodeWithScore(node=TextNode(id_='669fa13f-f82f-45d2-a216-f34704b4faf5', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='90329e81-b998-4852-9749-1e0578a1c0f9', node_type=<ObjectType.DOCUMENT: '4'>, metadata={}, hash='bd8d403a56276363f7c740bd2bf0063238d599ff367d16c35a896ace089a6588'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='5f20a4a6-f50d-41fb-9926-cee921f4c96f', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='7ff2dfa6446c234ed67fc726db050efbb3306e952c8920e12d9bbf6d8836ed79'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='c45eea83-3dce-467f-ba41-0eee51210e0a', node_type=<ObjectType.

### Query 2

In [10]:
response_gpt3 = query_engine.query(
    "Compare the education received by Nicolas Cage and Leonardo DiCaprio.",
)
print(str(response_gpt3))
sub_qa = response_gpt3.metadata["sub_qa"]
tuples = [(t[0], t[1].response) for t in sub_qa]
print(tuples)

[1;3;33m> Current query: Compare the education received by Nicolas Cage and Leonardo DiCaprio.
[0m[1;3;38;5;200m> New query:  What educational institutions did Nicolas Cage and Leonardo DiCaprio attend?
[0m[1;3;33m> Current query: Compare the education received by Nicolas Cage and Leonardo DiCaprio.
[0m[1;3;38;5;200m> New query:  What educational institutions did Nicolas Cage attend?
[0m[1;3;33m> Current query: Compare the education received by Nicolas Cage and Leonardo DiCaprio.
[0m[1;3;38;5;200m> New query:  What type of education did Nicolas Cage and Leonardo DiCaprio receive?
[0mNicolas Cage received education in the field of theater, film, and television at UCLA School of Theater, Film and Television. On the other hand, Leonardo DiCaprio attended the Los Angeles Center for Enriched Studies, Seeds Elementary School, and John Marshall High School. However, DiCaprio dropped out of high school and later earned a general equivalency diploma.
[(' What educational institution

In [11]:
sub_qa[1]

(' What educational institutions did Nicolas Cage attend?',
 Response(response=' Nicolas Cage attended UCLA School of Theater, Film and Television.', source_nodes=[NodeWithScore(node=TextNode(id_='347ca8de-0eb4-4215-9f05-32671276ce99', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='a9d91101-72a6-4a53-bb67-80fbcf63cd6f', node_type=<ObjectType.DOCUMENT: '4'>, metadata={}, hash='bd8d403a56276363f7c740bd2bf0063238d599ff367d16c35a896ace089a6588'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='f4a7a9b7-cff1-4090-befb-77415c87c448', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='0144e0f063f964c00b07bfe140e2d95582c5a462b2e50e3bc6dc6ebd51fc45c2')}, hash='3cf5fc4e01c79e88435e9b9c837fda9d4b1a4b50bc8f3a29423b1c4a2d8c5e7a', text='Nicolas Kim Coppola (born January 7, 1964), known by his stage name Nicolas Cage, is an American actor and film producer. He is the recipien

# Sub Question Query Engine

In [12]:
from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.tools import QueryEngineTool, ToolMetadata
from llama_index.query_engine import SubQuestionQueryEngine
from llama_index.callbacks import CallbackManager, LlamaDebugHandler
from llama_index import ServiceContext

llama_debug = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([llama_debug])
service_context = ServiceContext.from_defaults(
    callback_manager=callback_manager,chunk_size=256, chunk_overlap=0
)
# build index and query engine
vector_query_engine = VectorStoreIndex.from_documents(
    documents, use_async=False, service_context=service_context
).as_query_engine(similarity_top_k=5)
# setup base query engine as tool
query_engine_tools = [
    QueryEngineTool(
        query_engine=vector_query_engine,
        metadata=ToolMetadata(
            name="leo and nic",
            description="Questions about actors",
        ),
    ),
]

query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    service_context=service_context,
    use_async= False
)

**********
Trace: index_construction
    |_node_parsing ->  0.103631 seconds
      |_chunking ->  0.045342 seconds
      |_chunking ->  0.000245 seconds
      |_chunking ->  0.044692 seconds
    |_embedding ->  2.139989 seconds
    |_embedding ->  1.646099 seconds
    |_embedding ->  1.841294 seconds
    |_embedding ->  0.761552 seconds
**********


### Query 1

In [13]:
response = query_engine.query(
    "Who directed the pilot that marked the acting debut of Nicolas Cage?"
)

print(response)

# iterate through sub_question items captured in SUB_QUESTION event
from llama_index.callbacks.schema import CBEventType, EventPayload

for i, (start_event, end_event) in enumerate(
    llama_debug.get_event_pairs(CBEventType.SUB_QUESTION)
):
    qa_pair = end_event.payload[EventPayload.SUB_QUESTION]
    print("Sub Question " + str(i) + ": " + qa_pair.sub_q.sub_question.strip())
    print("Answer: " + qa_pair.answer.strip())
    print("====================================")

Generated 1 sub questions.
[1;3;38;2;237;90;200m[leo and nic] Q: Who directed the pilot that marked the acting debut of Nicolas Cage?
[0m[1;3;38;2;237;90;200m[leo and nic] A: The pilot that marked the acting debut of Nicolas Cage was directed by an unknown director.
[0m**********
Trace: query
    |_query ->  3.560915 seconds
      |_llm ->  1.48143 seconds
      |_sub_question ->  1.186522 seconds
        |_query ->  1.186015 seconds
          |_retrieve ->  0.275159 seconds
            |_embedding ->  0.241721 seconds
          |_synthesize ->  0.910596 seconds
            |_templating ->  2e-05 seconds
            |_llm ->  0.903577 seconds
      |_synthesize ->  0.891443 seconds
        |_templating ->  2.7e-05 seconds
        |_llm ->  0.887538 seconds
**********
The director of the pilot that marked the acting debut of Nicolas Cage is unknown.
Sub Question 0: Who directed the pilot that marked the acting debut of Nicolas Cage?
Answer: The pilot that marked the acting debut of 

### Query 2

In [14]:
llama_debug = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([llama_debug])

response = query_engine.query(
    "Compare the education received by Nicolas Cage and Leonardo DiCaprio."
)

print(response)

# iterate through sub_question items captured in SUB_QUESTION event
from llama_index.callbacks.schema import CBEventType, EventPayload

for i, (start_event, end_event) in enumerate(
    llama_debug.get_event_pairs(CBEventType.SUB_QUESTION)
):
    qa_pair = end_event.payload[EventPayload.SUB_QUESTION]
    print("Sub Question " + str(i) + ": " + qa_pair.sub_q.sub_question.strip())
    print("Answer: " + qa_pair.answer.strip())
    print("====================================")

Generated 2 sub questions.
[1;3;38;2;237;90;200m[leo and nic] Q: What is the education of Nicolas Cage?
[0m[1;3;38;2;237;90;200m[leo and nic] A: Nicolas Cage attended UCLA School of Theater, Film and Television.
[0m[1;3;38;2;90;149;237m[leo and nic] Q: What is the education of Leonardo DiCaprio?
[0m[1;3;38;2;90;149;237m[leo and nic] A: Leonardo DiCaprio attended the Los Angeles Center for Enriched Studies for four years and later the Seeds Elementary School. He later enrolled at the John Marshall High School but dropped out to pursue acting. He eventually earned a general equivalency diploma.
[0m**********
Trace: query
    |_query ->  6.600185 seconds
      |_llm ->  1.713314 seconds
      |_sub_question ->  1.408598 seconds
        |_query ->  1.408146 seconds
          |_retrieve ->  0.414056 seconds
            |_embedding ->  0.382935 seconds
          |_synthesize ->  0.993901 seconds
            |_templating ->  1.7e-05 seconds
            |_llm ->  0.987334 seconds
     

# HyDE Query Transform

In [15]:
from llama_index.indices.query.query_transform import HyDEQueryTransform
from llama_index.query_engine.transform_query_engine import (
    TransformQueryEngine,
)

index = VectorStoreIndex.from_documents(documents, service_context=service_context_gpt3)
query_engine = index.as_query_engine(similarity_top_k=5)

hyde = HyDEQueryTransform(include_original=True)
hyde_query_engine = TransformQueryEngine(query_engine, hyde)

### Query 1

In [16]:
response = hyde_query_engine.query("Who directed the pilot that marked the acting debut of Nicolas Cage?")
print(response)

 The Best of Times, the television pilot that marked the acting debut of Nicolas Cage, was not picked up by ABC.


In this example, HyDE improves output quality significantly, by hallucinating.
Hypothetical Document Embeddings (HyDE) query transform.

It uses an LLM to generate hypothetical answer(s) to a given query,
and use the resulting documents as embedding strings.

As described in [Precise Zero-Shot Dense Retrieval without Relevance Labels]
(https://arxiv.org/abs/2212.10496)


In [17]:
query_bundle = hyde("Who directed the pilot that marked the acting debut of Nicolas Cage?")
hyde_doc = query_bundle.embedding_strs[0]
hyde_doc

'The pilot that marked the acting debut of Nicolas Cage was directed by his uncle, Francis Ford Coppola. This significant moment in Cage\'s career occurred in 1981 when he was cast in the television movie "Best of Times." Coppola, a renowned filmmaker himself, took on the role of director for this project, showcasing his support and belief in his nephew\'s talent. The pilot served as a stepping stone for Cage, propelling him into the world of acting and setting the stage for his future success in the industry. With Coppola\'s guidance, Cage was able to make a memorable first impression, laying the foundation for his illustrious career as one of Hollywood\'s most versatile and acclaimed actors.'

### Query 2

In [18]:
response = hyde_query_engine.query("Compare the education received by Nicolas Cage and Leonardo DiCaprio.")
print(response)


Nicolas Cage attended UCLA School of Theater, Film and Television and had his first non-cinematic acting experience in a school production of Golden Boy. Leonardo DiCaprio attended the Los Angeles Center for Enriched Studies for four years and later the Seeds Elementary School, and later enrolled at the John Marshall High School. DiCaprio disliked public school and wanted to audition for acting jobs instead. He dropped out of high school later, eventually earning a general equivalency diploma.


In [19]:
query_bundle = hyde("Compare the education received by Nicolas Cage and Leonardo DiCaprio.")
hyde_doc = query_bundle.embedding_strs[0]
hyde_doc

"Nicolas Cage and Leonardo DiCaprio, two renowned actors in Hollywood, have both achieved great success in their careers. However, when it comes to their education, they have taken different paths. \n\nNicolas Cage, born Nicolas Kim Coppola, comes from a family deeply rooted in the entertainment industry. Despite his family's background, Cage decided to pursue a formal education in acting. He attended the prestigious Beverly Hills High School, known for its strong performing arts program. During his time there, Cage honed his acting skills and participated in various school productions. After graduating, he continued his education at the American Conservatory Theater in San Francisco, where he further refined his craft. Cage's dedication to his education undoubtedly played a significant role in shaping his acting abilities and contributed to his successful career.\n\nOn the other hand, Leonardo DiCaprio's educational journey took a different route. Born in Los Angeles, DiCaprio grew up