# HyDE Query Transform Demo

#### Load documents, build the GPTSimpleVectorIndex

In [10]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from gpt_index import GPTSimpleVectorIndex, SimpleDirectoryReader
from gpt_index.indices.query.query_transform import HyDEQueryTransform
from IPython.display import Markdown, display

In [2]:
# load documents
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()

In [3]:
index = GPTSimpleVectorIndex(documents)

INFO:root:> [build_index_from_documents] Total LLM token usage: 0 tokens
> [build_index_from_documents] Total LLM token usage: 0 tokens
INFO:root:> [build_index_from_documents] Total embedding token usage: 17598 tokens
> [build_index_from_documents] Total embedding token usage: 17598 tokens


In [4]:
# save index to disk
index.save_to_disk('index_simple.json')

In [5]:
# load index from disk
index = GPTSimpleVectorIndex.load_from_disk('index_simple.json')

## Example 1: HyDE improves specific temporal query

In [33]:
query_str = "what did paul graham do after going to RISD"

#### First, we query *without* transformation: The same query string is used for embedding lookup and also summarization.

In [34]:
response = index.query(query_str)
display(Markdown(f"<b>{response}</b>"))

INFO:root:> [query] Total LLM token usage: 3922 tokens
> [query] Total LLM token usage: 3922 tokens
> [query] Total LLM token usage: 3922 tokens
INFO:root:> [query] Total embedding token usage: 12 tokens
> [query] Total embedding token usage: 12 tokens
> [query] Total embedding token usage: 12 tokens


<b>

After going to RISD, Paul Graham continued to pursue his passion for painting and art. He took classes in the painting department at the Accademia di Belli Arti in Florence, and he also took the entrance exam for the school. He also continued to work on his book On Lisp, and he took on consulting work to make money. At the school, Paul Graham and the other students had an arrangement where the faculty wouldn't require the students to learn anything, and in return the students wouldn't require the faculty to teach anything. Paul Graham was one of the few students who actually painted the nude model that was provided, while the rest of the students spent their time chatting or occasionally trying to imitate things they'd seen in American art magazines. The model turned out to live just down the street from Paul Graham, and she made a living from a combination of modelling and making fakes for a local antique dealer.</b>

> After going to RISD, Paul Graham continued to pursue his passion for painting and art. He took classes in the painting department at the Accademia di Belli Arti in Florence, and he also took the entrance exam for the school. He also continued to work on his book On Lisp, and he took on consulting work to make money. At the school, Paul Graham and the other students had an arrangement where the faculty wouldn't require the students to learn anything, and in return the students wouldn't require the faculty to teach anything. Paul Graham was one of the few students who actually painted the nude model that was provided, while the rest of the students spent their time chatting or occasionally trying to imitate things they'd seen in American art magazines. The model turned out to live just down the street from Paul Graham, and she made a living from a combination of modelling and making fakes for a local antique dealer.

#### Now, we use `HyDEQueryTransform` to generate a hypothetical document and use it for embedding lookup. 

In [36]:
hyde = HyDEQueryTransform(include_original=True)
response = index.query(query_str, query_transform=hyde)
display(Markdown(f"<b>{response}</b>"))

INFO:root:> [query] Total LLM token usage: 4394 tokens
> [query] Total LLM token usage: 4394 tokens
> [query] Total LLM token usage: 4394 tokens
INFO:root:> [query] Total embedding token usage: 127 tokens
> [query] Total embedding token usage: 127 tokens
> [query] Total embedding token usage: 127 tokens


<b>

After going to RISD, Paul Graham worked as a consultant for Interleaf and then co-founded Viaweb with Robert Morris. They created a software that allowed users to build websites via the web and received $10,000 in seed funding from Idelle's husband Julian. They gave Julian 10% of the company in return for the initial legal work and business advice. Paul Graham had a negative net worth due to taxes he owed, so the seed funding was necessary for him to live on. They opened for business in January 1996 with 6 stores. 

Paul Graham then left Yahoo after his options vested and went back to New York. He resumed his old life, but now he was rich. He tried to paint, but he didn't have much energy or ambition. He eventually moved back to Cambridge and started working on a web app for making web apps. He recruited Dan Giffin and two undergrads to help him, but he eventually realized he didn't want to run a company and decided to build a subset of the project as an open source project. He and Dan worked on a new dialect of Lisp, which he called Arc, in a house he bought in Cambridge. The subset he built as an open source project was the new Lisp, whose</b>

> After going to RISD, Paul Graham worked as a consultant for Interleaf and then co-founded Viaweb with Robert Morris. They created a software that allowed users to build websites via the web and received $10,000 in seed funding from Idelle's husband Julian. They gave Julian 10% of the company in return for the initial legal work and business advice. Paul Graham had a negative net worth due to taxes he owed, so the seed funding was necessary for him to live on. They opened for business in January 1996 with 6 stores.

> Paul Graham then left Yahoo after his options vested and went back to New York. He resumed his old life, but now he was rich. He tried to paint, but he didn't have much energy or ambition. He eventually moved back to Cambridge and started working on a web app for making web apps. He recruited Dan Giffin and two undergrads to help him, but he eventually realized he didn't want to run a company and decided to build a subset of the project as an open source project. He and Dan worked on a new dialect of Lisp, which he called Arc, in a house he bought in Cambridge. The subset he built as an open source project was the new Lisp, whose

#### In this example, `HyDE` improves output quality significantly, by hallucinating accurately what Paul Graham did after RISD (see below), and thus improving the embedding quality, and final output.

In [37]:
query_bundle = hyde(query_str)
hyde_doc = query_bundle.embedding_strs[0]

In [38]:
hyde_doc

'\nAfter graduating from the Rhode Island School of Design (RISD) in 1985, Paul Graham went on to pursue a career in computer programming. He worked as a software developer for several companies, including Viaweb, which he co-founded in 1995. Viaweb was eventually acquired by Yahoo in 1998, and Graham used the proceeds to become a venture capitalist. He founded Y Combinator in 2005, a startup accelerator that has helped launch over 2,000 companies, including Dropbox, Airbnb, and Reddit. Graham has also written several books on programming and startups, and he continues to be an active investor in the tech industry.'

> After graduating from the Rhode Island School of Design (RISD) in 1985, Paul Graham went on to pursue a career in computer programming. He worked as a software developer for several companies, including Viaweb, which he co-founded in 1995. Viaweb was eventually acquired by Yahoo in 1998, and Graham used the proceeds to become a venture capitalist. He founded Y Combinator in 2005, a startup accelerator that has helped launch over 2,000 companies, including Dropbox, Airbnb, and Reddit. Graham has also written several books on programming and startups, and he continues to be an active investor in the tech industry.

## Example 2: open-ended query

In [70]:
query_str = "Is art more improtant or engineering?"

### Querying without transformation yields reasonable answer

In [71]:
response = index.query(query_str)
display(Markdown(f"<b>{response}</b>"))

INFO:root:> [query] Total LLM token usage: 3780 tokens
> [query] Total LLM token usage: 3780 tokens
> [query] Total LLM token usage: 3780 tokens
INFO:root:> [query] Total embedding token usage: 9 tokens
> [query] Total embedding token usage: 9 tokens
> [query] Total embedding token usage: 9 tokens


<b>
It is impossible to answer this question without prior knowledge. Art and engineering are both important in different ways, and the importance of each depends on the context. For example, in the context of a startup, engineering may be more important in order to develop the product and secure funding, while art may be more important in order to create a unique and attractive brand. Ultimately, the importance of each depends on the individual situation.</b>

> The author would say that it is important to find a balance between work and life. He experienced this firsthand when he was working on Y Combinator and his mother was ill. He realized that he needed to take time away from work to focus on his family and other aspects of his life. He also experienced this when he was working on Bel, a Lisp interpreter written in itself. He had to ban himself from writing essays during most of this time, or he would never have finished. He found that it was important to have a precisely defined goal in order to stay motivated and keep working on a project for a long time. He also found that it was important to take breaks and focus on other aspects of life in order to stay productive. He also learned that customs can continue to constrain us even after the restrictions that caused them have disappeared. This is something he experienced when he was publishing essays online, as the customs of the print world still applied even though the constraints of the print world had disappeared. He found that it was important to be aware of these customs and to be mindful of how they can affect our work-life balance.

In [72]:
hyde = HyDEQueryTransform(include_original=False)
response = index.query(query_str, query_transform=hyde)
display(Markdown(f"<b>{response}</b>"))

INFO:root:> [query] Total LLM token usage: 3780 tokens
> [query] Total LLM token usage: 3780 tokens
> [query] Total LLM token usage: 3780 tokens
INFO:root:> [query] Total embedding token usage: 179 tokens
> [query] Total embedding token usage: 179 tokens
> [query] Total embedding token usage: 179 tokens


<b>
It is impossible to answer this question without prior knowledge. Art and engineering are both important in different ways, and the importance of each depends on the context. For example, in the context of a startup, engineering may be more important in order to develop the product and secure funding, while art may be more important in order to create a unique and attractive brand. Ultimately, the importance of each depends on the individual situation.</b>

> The author would likely say that work life balance is important, but can be difficult to achieve. He experienced this himself when he was working at Interleaf, where he was expected to show up during certain working hours and found it difficult to balance his work with his other interests. He also experienced it when he was working on his book and freelance projects while living in New York, where he had to live frugally in order to save enough money to go back to RISD. He also learned from his experience at Interleaf that it is important to be the "entry level" option, as it can be dangerous to pursue prestige. He also learned that it is important to find a job that allows for flexibility and creativity, as this can help to achieve a better work life balance. Additionally, he experienced the difficulty of achieving work life balance when he was starting his own company, Viaweb, and had to rely on seed funding to live on. This experience taught him the importance of having a supportive network of people who can provide financial and emotional support when needed.

In [73]:
query_bundle = hyde(query_str)
hyde_doc = query_bundle.embedding_strs[0]

In [74]:
hyde_doc

'The question of whether art or engineering is more important is a difficult one to answer. Both are essential components of our society and have their own unique benefits. Art is important because it allows us to express ourselves, to explore our creativity, and to connect with others. It can also be used to tell stories, to create beauty, and to inspire. Engineering, on the other hand, is important because it allows us to build and create things that can improve our lives. It can be used to develop new technologies, to solve problems, and to create new products.\n\nUltimately, it is difficult to say which is more important because both art and engineering are essential to our society. Art provides us with a way to express ourselves and to connect with others, while engineering provides us with the tools to create and build things that can improve our lives. Both are important and should be valued equally.'