# HyDE
For a given query, HyDE retrieval pipeline contains 4 components:
1. Promptor: bulid prompt for generator based on specific task.
2. Generator: generates hypothesis documents using Large Language Model.
3. Encoder: encode hypothesis documents to HyDE vector.
4. Searcher: search nearest neighbour for the HyDE vector (dense retrieval).

### Initialize HyDE components
We use [pyserini](https://github.com/castorini/pyserini) as the search interface.

In [1]:
import json
# from pyserini.search.faiss import FaissSearcher
# from pyserini.search.lucene import LuceneSearcher
# from pyserini.encode import AutoQueryEncoder

from hyde import Promptor, VLLMGenerator, HyDE

In [2]:
import gc
import torch

gc.collect()
torch.cuda.empty_cache()

In [3]:
KEY = '' # replace with your API key, it can be OpenAI api key or Cohere api key
model_path = '/datasets/ai/ibm-granite/hub/models--ibm-granite--granite-3.0-2b-instruct/snapshots/69e41fe735f54cec1792de2ac4f124b6cc84638f'
promptor = Promptor('web search')
import torch
torch.cuda.empty_cache()


query = "Write a Python function to check if a number is even."

# encoder = AutoQueryEncoder(encoder_dir='facebook/contriever', pooling='mean')
# searcher = FaissSearcher('contriever_msmarco_index/', encoder)
# corpus = LuceneSearcher.from_prebuilt_index('msmarco-v1-passage')

In [4]:
print(torch.cuda.memory_summary())


|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |      0 B   |      0 B   |      0 B   |      0 B   |
|       from large pool |      0 B   |      0 B   |      0 B   |      0 B   |
|       from small pool |      0 B   |      0 B   |      0 B   |      0 B   |
|---------------------------------------------------------------------------|
| Active memory         |      0 B   |      0 B   |      0 B   |      0 B   |
|       from large pool |      0 B   |      0 B   |      0 B   |      0 B   |
|       from small pool |      0 B   |      0 B   |      0 B   |      0 B   |
|---------------------------------------------------------------

### Build a HyDE pipeline

In [5]:
generator= VLLMGenerator('granite', model_path, api_key='')
hyde = HyDE( generator, promptor)

Loading vLLM model from /datasets/ai/ibm-granite/hub/models--ibm-granite--granite-3.0-2b-instruct/snapshots/69e41fe735f54cec1792de2ac4f124b6cc84638f
INFO 04-06 18:41:23 llm_engine.py:223] Initializing an LLM engine (v0.6.1.post1) with config: model='/datasets/ai/ibm-granite/hub/models--ibm-granite--granite-3.0-2b-instruct/snapshots/69e41fe735f54cec1792de2ac4f124b6cc84638f', speculative_config=None, tokenizer='/datasets/ai/ibm-granite/hub/models--ibm-granite--granite-3.0-2b-instruct/snapshots/69e41fe735f54cec1792de2ac4f124b6cc84638f', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, de

  @torch.library.impl_abstract("xformers_flash::flash_fwd")
  @torch.library.impl_abstract("xformers_flash::flash_bwd")


INFO 04-06 18:41:24 model_runner.py:997] Starting to load model /datasets/ai/ibm-granite/hub/models--ibm-granite--granite-3.0-2b-instruct/snapshots/69e41fe735f54cec1792de2ac4f124b6cc84638f...
INFO 04-06 18:41:24 selector.py:259] Cannot use FlashAttention-2 backend because the vllm_flash_attn package is not found. `pip install vllm-flash-attn` for better performance.
INFO 04-06 18:41:24 selector.py:116] Using XFormers backend.


Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]


INFO 04-06 18:41:28 model_runner.py:1008] Loading model weights took 4.7198 GB
INFO 04-06 18:41:28 gpu_executor.py:122] # GPU blocks: 28294, # CPU blocks: 3276
INFO 04-06 18:41:30 model_runner.py:1309] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 04-06 18:41:30 model_runner.py:1313] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
INFO 04-06 18:41:41 model_runner.py:1428] Graph capturing finished in 11 secs.


### Load example Query

In [37]:
query = 'how long does it take to remove wisdom tooth'

### Build Zeroshot Prompt

In [6]:
class HyDE:
    # def __init__(self, promptor, generator, encoder, searcher):
    def __init__(self,promptor, generator):
        self.promptor = promptor
        self.generator = generator
        # self.encoder = encoder
        # self.searcher = searcher
    
    def prompt(self, query):
        return self.promptor.build_prompt(query)

    def generate(self, query):
        prompt = self.promptor.build_prompt(query)
        hypothesis_documents = self.generator.generate(prompt)
        return hypothesis_documents

In [7]:
hyde = HyDE(promptor, generator)

In [8]:
prompt = hyde.prompt(query)
print(prompt)

Please write a passage to answer the question.
Question: Write a Python function to check if a number is even.
Passage:


### Generate Hypothesis Documents

In [9]:
hypothesis_documents = hyde.generate(query)
for i, doc in enumerate(hypothesis_documents):
    print(f'HyDE Generated Document: {i}')
    print(doc.strip())

Processed prompts: 100%|██████████| 1/1 [00:01<00:00,  1.21s/it, est. speed input: 23.07 toks/s, output: 106.27 toks/s]

HyDE Generated Document: 0
Here is a simple Python function that checks if a number is even:

```python
def is_even(n):
    return n % 2 == 0
```

This function uses the modulo operator (`%`) to find the remainder of the division of `n` by 2. If the remainder is 0, then the number is even. Otherwise, it is odd.

You can use this function like this:

```python
print(is_even(4))  # Output: True
print(is_even(7))  # Output: False
```





### Encode HyDE vector

In [40]:
hyde_vector = hyde.encode(query, hypothesis_documents)
print(hyde_vector.shape)

(1, 768)


### Search Relevant Documents

In [41]:
hits = hyde.search(hyde_vector, k=10)
for i, hit in enumerate(hits):
    print(f'HyDE Retrieved Document: {i}')
    print(hit.docid)
    print(json.loads(corpus.doc(hit.docid).raw())['contents'])

HyDE Retrieved Document: 0
4174313
The time it takes to remove the tooth will vary. Some procedures only take a few minutes, whereas others can take 20 minutes or longer. After your wisdom teeth have been removed, you may experience swelling and discomfort, both on the inside and outside of your mouth.This is usually worse for the first three days, but it can last for up to two weeks. Read more about how a wisdom tooth is removed and recovering from wisdom tooth removal.he time it takes to remove the tooth will vary. Some procedures only take a few minutes, whereas others can take 20 minutes or longer. After your wisdom teeth have been removed, you may experience swelling and discomfort, both on the inside and outside of your mouth.
HyDE Retrieved Document: 1
18103
Before having your wisdom teeth removed, you'll be given an injection of local anaesthetic to numb the tooth and surrounding area. If you're particularly anxious about the procedure, your dentist or surgeon may give you a se

### End to End Search

e2e search will directly go through all the steps descripted above.

In [42]:
hits = hyde.e2e_search(query, k=10)
for i, hit in enumerate(hits):
    print(f'HyDE Retrieved Document: {i}')
    print(hit.docid)
    print(json.loads(corpus.doc(hit.docid).raw())['contents'])

HyDE Retrieved Document: 0
4174313
The time it takes to remove the tooth will vary. Some procedures only take a few minutes, whereas others can take 20 minutes or longer. After your wisdom teeth have been removed, you may experience swelling and discomfort, both on the inside and outside of your mouth.This is usually worse for the first three days, but it can last for up to two weeks. Read more about how a wisdom tooth is removed and recovering from wisdom tooth removal.he time it takes to remove the tooth will vary. Some procedures only take a few minutes, whereas others can take 20 minutes or longer. After your wisdom teeth have been removed, you may experience swelling and discomfort, both on the inside and outside of your mouth.
HyDE Retrieved Document: 1
91493
The time it takes to remove the tooth will vary. Some procedures only take a few minutes, whereas others can take 20 minutes or longer. After your wisdom teeth have been removed, you may experience swelling and discomfort, b