# HyDE Query Transform Demo

#### Load documents, build the GPTSimpleVectorIndex

In [10]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from gpt_index import GPTSimpleVectorIndex, SimpleDirectoryReader
from gpt_index.indices.query.query_transform import HyDEQueryTransform
from IPython.display import Markdown, display

In [2]:
# load documents
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()

In [3]:
index = GPTSimpleVectorIndex(documents)

INFO:root:> [build_index_from_documents] Total LLM token usage: 0 tokens
> [build_index_from_documents] Total LLM token usage: 0 tokens
INFO:root:> [build_index_from_documents] Total embedding token usage: 17598 tokens
> [build_index_from_documents] Total embedding token usage: 17598 tokens


In [4]:
# save index to disk
index.save_to_disk('index_simple.json')

In [5]:
# load index from disk
index = GPTSimpleVectorIndex.load_from_disk('index_simple.json')

## Example 1: temporal query

Query: what did author do after going to RISD?

In [17]:
query_str = "what did Paul Graham do after going to RISD?"

#### First, we query *without* transformation: The same query string is used for embedding lookup and also summarization.

In [18]:
response = index.query(query_str)
display(Markdown(f"<b>{response}</b>"))

INFO:root:> [query] Total LLM token usage: 3988 tokens
> [query] Total LLM token usage: 3988 tokens
> [query] Total LLM token usage: 3988 tokens
INFO:root:> [query] Total embedding token usage: 11 tokens
> [query] Total embedding token usage: 11 tokens
> [query] Total embedding token usage: 11 tokens


<b>

After going to RISD, Paul Graham moved back to Providence and got a job at a software company called Interleaf. He did freelance Lisp hacking work and wrote a book on Lisp. He then moved to New York City and became a New York artist. He worked as a de facto studio assistant for Idelle Weber and started a company called Viaweb with Robert Morris to build online stores. He got $10,000 in seed funding from Julian Weber and used it to live on while working on Viaweb. They originally hoped to launch in September, but they got more ambitious and the seed funding allowed Paul to live while working on the project. Ten years later, this deal became the model for Y Combinator's.</b>

The output should read:

> After going to RISD, Paul Graham moved back to Providence and got a job at a software company called Interleaf. He did freelance Lisp hacking work and wrote a book on Lisp. He then moved to New York City and became a New York artist. He worked as a de facto studio assistant for Idelle Weber and started a company called Viaweb with Robert Morris to build online stores. He got $10,000 in seed funding from Julian Weber and used it to live on while working on Viaweb. They originally hoped to launch in September, but they got more ambitious and the seed funding allowed Paul to live while working on the project. Ten years later, this deal became the model for Y Combinator's.

#### Now, we use `HyDEQueryTransform` to generate a hypothetical document and use it for embedding lookup. 

In [19]:
hyde = HyDEQueryTransform(include_original=True)
response = index.query(query_str, query_transform=hyde)
display(Markdown(f"<b>{response}</b>"))

INFO:root:> [query] Total LLM token usage: 4192 tokens
> [query] Total LLM token usage: 4192 tokens
> [query] Total LLM token usage: 4192 tokens
INFO:root:> [query] Total embedding token usage: 139 tokens
> [query] Total embedding token usage: 139 tokens
> [query] Total embedding token usage: 139 tokens


<b>

After going to RISD, Paul Graham worked as a consultant for Interleaf and then co-founded Viaweb with Robert Morris. They developed software that allowed users to create online stores and got $10,000 in seed funding from Idelle's husband Julian. They then recruited Trevor Blackwell to help with the software and opened for business in January 1996. Paul Graham then worked on the software and learned a lot about retail. When Yahoo bought Viaweb, Paul Graham became rich and decided to quit to paint. He returned to New York and resumed his old life, but with more money. He then had an idea for a web app for making web apps and moved to Cambridge to start the company. However, he decided against running the company and instead worked on a new dialect of Lisp called Arc. He was invited to give a talk at a Lisp conference, so he gave one about how they'd used Lisp at Viaweb. Afterward, he put a postscript file of this talk online, on paulgraham.com, which he'd created years earlier.</b>

The output should read:
> After going to RISD, Paul Graham worked as a consultant for Interleaf and then co-founded Viaweb with Robert Morris. They developed software that allowed users to create online stores and got $10,000 in seed funding from Idelle's husband Julian. They then recruited Trevor Blackwell to help with the software and opened for business in January 1996. Paul Graham then worked on the software and learned a lot about retail. When Yahoo bought Viaweb, Paul Graham became rich and decided to quit to paint. He returned to New York and resumed his old life, but with more money. He then had an idea for a web app for making web apps and moved to Cambridge to start the company. However, he decided against running the company and instead worked on a new dialect of Lisp called Arc. He was invited to give a talk at a Lisp conference, so he gave one about how they'd used Lisp at Viaweb. Afterward, he put a postscript file of this talk online, on paulgraham.com, which he'd created years earlier.

#### In this example, `HyDE` improves output quality significantly, by hallucinating accurately what Paul Graham did after RISD, and thus improving the embedding quality, and final output.

In [22]:
query_bundle = hyde(query_str)
hyde_doc = query_bundle.embedding_strs[0]

In [23]:
hyde_doc

'\nAfter graduating from the Rhode Island School of Design (RISD) in 1985, Paul Graham pursued a career in the visual arts. He began by working as a freelance illustrator and graphic designer, creating artwork for magazines, books, and advertising campaigns. He also worked as a freelance photographer, capturing images of people, places, and events. In the early 1990s, Graham began to focus more on painting and sculpture, and he had his first solo exhibition in 1995. He has since had numerous solo and group exhibitions in galleries and museums around the world. In addition to his visual art, Graham has also written several books on art and design, and he has taught at RISD and other institutions. He continues to work as an artist and educator, and his work has been featured in publications such as The New York Times, The Guardian, and The Wall Street Journal.'

The hypothetical document should read:
> After graduating from the Rhode Island School of Design (RISD) in 1985, Paul Graham pursued a career in the visual arts. He began by working as a freelance illustrator and graphic designer, creating artwork for magazines, books, and advertising campaigns. He also worked as a freelance photographer, capturing images of people, places, and events. In the early 1990s, Graham began to focus more on painting and sculpture, and he had his first solo exhibition in 1995. He has since had numerous solo and group exhibitions in galleries and museums around the world. In addition to his visual art, Graham has also written several books on art and design, and he has taught at RISD and other institutions. He continues to work as an artist and educator, and his work has been featured in publications such as The New York Times, The Guardian, and The Wall Street Journal.