# 🧠 Cache-Augmented Generation (CAG) - Manual Colab Flow

Questo notebook permette di:
- Generare una knowledge cache da un file
- Salvare i `past_key_values`
- Rispondere a query usando la cache

**Funziona con Mistral 7B quantizzato via Transformers + BitsAndBytes**

In [None]:
# ✅ Setup
!pip install -q transformers bitsandbytes accelerate
import torch
from transformers import BitsAndBytesConfig, AutoTokenizer, AutoModelForCausalLM
from transformers.cache_utils import DynamicCache
import os
torch.serialization.add_safe_globals([DynamicCache])
torch.serialization.add_safe_globals([set])
device = 'cuda' if torch.cuda.is_available() else 'cpu'

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.1/76.1 MB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m43.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m36.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m59.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m15.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
# 🔁 Load quantized Mistral model
model_id = 'mistralai/Mistral-7B-Instruct-v0.2'
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map='auto', quantization_config=quant_config)
model.eval();

tokenizer_config.json:   0%|          | 0.00/2.10k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/596 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

In [None]:
# 📥 Carica file knowledge
with open("knowledge.txt", "r", encoding="utf-8") as f:
    knowledge_text = f.read().strip()

system_prompt = f"""
<|system|>
Sei un assistente che fornisce risposte concise e accurate.
<|user|>
Context:
{knowledge_text}
Domanda:
""".strip()

In [None]:
# 💾 Crea KV cache e salvala
input_ids = tokenizer(system_prompt, return_tensors="pt").input_ids.to(device)
cache = DynamicCache()
with torch.no_grad():
    _ = model(input_ids=input_ids, past_key_values=cache, use_cache=True)
torch.save(cache, "kv_cache.pt")
print("✅ KV cache salvata in 'kv_cache.pt'")

✅ KV cache salvata in 'kv_cache.pt'


In [None]:
# 🤖 Query con cache
def query_with_cache(question):
    kv_loaded = torch.load("kv_cache.pt", weights_only=False)
    origin_len = kv_loaded.key_cache[0].shape[-2]
    for i in range(len(kv_loaded.key_cache)):
        kv_loaded.key_cache[i] = kv_loaded.key_cache[i][:, :, :origin_len, :]
        kv_loaded.value_cache[i] = kv_loaded.value_cache[i][:, :, :origin_len, :]

    input_ids = tokenizer(question + "\n", return_tensors="pt").input_ids.to(device)
    output_ids = input_ids.clone()
    next_token = input_ids

    with torch.no_grad():
        for _ in range(100):
            out = model(input_ids=next_token, past_key_values=kv_loaded, use_cache=True)
            logits = out.logits[:, -1, :]
            token = torch.argmax(logits, dim=-1, keepdim=True)
            output_ids = torch.cat([output_ids, token], dim=-1)
            kv_loaded = out.past_key_values
            next_token = token.to(device)
            if token.item() == tokenizer.eos_token_id:
                break
    return tokenizer.decode(output_ids[0], skip_special_tokens=True).strip()

In [None]:
# 📌 Esempio
response = query_with_cache("Qual è il sangue che trasporta l’aorta?")
print("🧠 Risposta:", response)

🧠 Risposta: Qual è il sangue che trasporta l’aorta?
<|system|>
La aorta trasporta il sangue ossigenato.


# 🔧 HyperGraphRAG + CAG Integration

Questo notebook esegue l'integrazione dell'approccio **Cache-Augmented Generation (CAG)** nel sistema HyperGraphRAG già patchato per Colab.

In [None]:
!pip install -q -U transformers bitsandbytes accelerate tiktoken networkx graspologic nano-vectordb
!pip install -q aioboto3 aiohttp ollama oracledb pymongo pymysql pymilvus numpy
!pip install -q --upgrade numpy scipy

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.2/40.2 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.0/61.0 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.4/60.4 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.5/10.5 MB[0m [31m77.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.1/76.1 MB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m362.1/362.1 kB[0m [31m28.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.2/5.2 MB[0m [31m69.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m38.4/38.4 MB[0m [31m22.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
!unzip -o HyperGraphRag.zip -d HyperGraphRag
%cd HyperGraphRag/HyperGraphRag

Archive:  HyperGraphRag.zip
   creating: HyperGraphRag/HyperGraphRag/
  inflating: HyperGraphRag/HyperGraphRag/.gitignore  
  inflating: HyperGraphRag/HyperGraphRag/example_contexts.json  
   creating: HyperGraphRag/HyperGraphRag/hypergraphrag/
  inflating: HyperGraphRag/HyperGraphRag/hypergraphrag/base.py  
  inflating: HyperGraphRag/HyperGraphRag/hypergraphrag/hypercag.py  
  inflating: HyperGraphRag/HyperGraphRag/hypergraphrag/hypergraphrag.py  
  inflating: HyperGraphRag/HyperGraphRag/hypergraphrag/hypergraphrag_cag.py  
   creating: HyperGraphRag/HyperGraphRag/hypergraphrag/kg/
  inflating: HyperGraphRag/HyperGraphRag/hypergraphrag/kg/chroma_impl.py  
  inflating: HyperGraphRag/HyperGraphRag/hypergraphrag/kg/milvus_impl.py  
  inflating: HyperGraphRag/HyperGraphRag/hypergraphrag/kg/mongo_impl.py  
  inflating: HyperGraphRag/HyperGraphRag/hypergraphrag/kg/neo4j_impl.py  
  inflating: HyperGraphRag/HyperGraphRag/hypergraphrag/kg/oracle_impl.py  
  inflating: HyperGraphRag/HyperGraph

In [None]:
%cd /content/HyperGraphRag/HyperGraphRag

/content/HyperGraphRag/HyperGraphRag


In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, AutoModel
from hypergraphrag.utils import EmbeddingFunc
import torch, numpy as np

def build_quantized_model_fn(model_name: str):
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        device_map="auto",
        torch_dtype=torch.float16,
        quantization_config=BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_compute_dtype=torch.float16,
            bnb_4bit_use_double_quant=True,
            bnb_4bit_quant_type="nf4"
        )
    )

    async def hf_quantized_llm(prompt: str, system_prompt=None, history_messages=[], **kwargs):
        full_prompt = prompt
        if system_prompt:
            full_prompt = f"<system>{system_prompt}</system>\n{prompt}"
        inputs = tokenizer(full_prompt, return_tensors="pt", truncation=True, max_length=32000).to("cuda")
        outputs = model.generate(**inputs, max_new_tokens=90, temperature=0.0, do_sample=False)
        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
        return response.replace(full_prompt, "").strip()

    return hf_quantized_llm

def build_hf_embedding_func(model_name="intfloat/e5-small-v2"):
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModel.from_pretrained(model_name).to("cuda")

    async def embed_fn(texts: list[str]) -> np.ndarray:
        inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt").to("cuda")
        with torch.no_grad():
            model_output = model(**inputs)
        embeddings = model_output.last_hidden_state.mean(dim=1).detach().cpu().numpy()
        return embeddings

    return EmbeddingFunc(model.config.hidden_size, 512, embed_fn)

quantized_llm_func = build_quantized_model_fn("mistralai/Mistral-7B-Instruct-v0.2")
embedding_func = build_hf_embedding_func()


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [None]:
#upload hypergraph (skip this step if you don't have the hypergraph)
!unzip -o /content/example.zip

Archive:  /content/example.zip
   creating: expr/example/
  inflating: expr/example/vdb_hyperedges.json  
  inflating: expr/example/kv_store_full_docs.json  
  inflating: expr/example/vdb_chunks.json  
  inflating: expr/example/vdb_entities.json  
 extracting: expr/example/kv_store_llm_response_cache.json  
  inflating: expr/example/graph_chunk_entity_relation.graphml  
  inflating: expr/example/kv_store_text_chunks.json  


In [None]:
from hypergraphrag.hypergraphrag import HyperGraphRAG
from hypergraphrag.hypergraphrag_cag import HyperGraphCAGRAG

# 1. Prima si costruisce l'istanza base per popolare il grafo
rag_base = HyperGraphRAG(
    working_dir="expr/example",
    llm_model_func=quantized_llm_func,
    embedding_func=embedding_func,
    llm_model_name="mistralai/Mistral-7B-Instruct-v0.2"
)

# 2. Poi si costruisce l'estensione CAG
rag = HyperGraphCAGRAG(
    graph_store=rag_base.chunk_entity_relation_graph,
    entity_vdb=rag_base.entities_vdb,
    hyperedge_vdb=rag_base.hyperedges_vdb,
    text_store=rag_base.text_chunks,
    hashing_kv=rag_base.llm_response_cache,
    model_name="mistralai/Mistral-7B-Instruct-v0.2",
    kv_cache_dir="/content/HyperGraphRag/HyperGraphRag/cag_cache",
    hf_token=None
)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [None]:
#limita PDF input a 10k token (testing)
# 📥 Upload PDF + limite 10k token + tempo di inserimento
!pip install -q pymupdf tiktoken

import fitz  # PyMuPDF
import tiktoken
from google.colab import files
import time

# 1. Upload file
uploaded = files.upload()
pdf_path = list(uploaded.keys())[0]

# 2. Estrazione testo per pagina
def extract_text_from_pdf(path):
    doc = fitz.open(path)
    return [page.get_text() for page in doc]

pages = extract_text_from_pdf(pdf_path)

# 3. Limita a massimo 10k token (blocco sotto la soglia)
enc = tiktoken.encoding_for_model("gpt-3.5-turbo")

def limit_texts_to_max_tokens(texts, max_tokens=24000):
    selected = []
    total = 0
    for txt in texts:
        tokens = len(enc.encode(txt))
        if total + tokens <= max_tokens:
            selected.append(txt)
            total += tokens
        else:
            break
    return selected, total

limited_pages, total_tokens = limit_texts_to_max_tokens(pages, max_tokens=10000)
print(f"✅ PDF: {pdf_path}")
print(f"✅ Selezionate {len(limited_pages)} pagine, {total_tokens} token totali")

# 4. Inserimento nel sistema con timing
start_time = time.time()
await rag_base.ainsert(limited_pages)
end_time = time.time()

print(f"🕒 Tempo di inserimento: {end_time - start_time:.2f} secondi")

Saving Linee guida ESC 2024_Pressione elevata e ipertensione (e1-107).pdf to Linee guida ESC 2024_Pressione elevata e ipertensione (e1-107).pdf
✅ PDF: Linee guida ESC 2024_Pressione elevata e ipertensione (e1-107).pdf
✅ Selezionate 3 pagine, 8022 token totali


Chunking documents: 100%|██████████| 3/3 [00:00<00:00,  7.73doc/s]
Generating embeddings: 100%|██████████| 1/1 [00:00<00:00,  1.59batch/s]
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


⠙ Processed 1 chunks, 1 entities(duplicated), 1 relations(duplicated)

Extracting entities from chunks:  12%|█▎        | 1/8 [00:25<02:59, 25.71s/chunk]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


⠹ Processed 2 chunks, 1 entities(duplicated), 1 relations(duplicated)

Extracting entities from chunks:  25%|██▌       | 2/8 [00:50<02:29, 24.89s/chunk]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


⠸ Processed 3 chunks, 2 entities(duplicated), 2 relations(duplicated)

Extracting entities from chunks:  38%|███▊      | 3/8 [01:16<02:07, 25.48s/chunk]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


⠼ Processed 4 chunks, 3 entities(duplicated), 3 relations(duplicated)

Extracting entities from chunks:  50%|█████     | 4/8 [01:41<01:41, 25.48s/chunk]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


⠴ Processed 5 chunks, 4 entities(duplicated), 4 relations(duplicated)

Extracting entities from chunks:  62%|██████▎   | 5/8 [02:05<01:14, 24.90s/chunk]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


⠦ Processed 6 chunks, 5 entities(duplicated), 5 relations(duplicated)

Extracting entities from chunks:  75%|███████▌  | 6/8 [02:30<00:50, 25.07s/chunk]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


⠧ Processed 7 chunks, 5 entities(duplicated), 6 relations(duplicated)

Extracting entities from chunks:  88%|████████▊ | 7/8 [02:57<00:25, 25.41s/chunk]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


⠇ Processed 8 chunks, 6 entities(duplicated), 7 relations(duplicated)

Extracting entities from chunks: 100%|██████████| 8/8 [03:21<00:00, 25.23s/chunk]
Inserting hyperedges: 100%|██████████| 6/6 [00:00<00:00, 5947.96entity/s]
Inserting entities: 100%|██████████| 5/5 [00:00<00:00, 3890.10entity/s]
Inserting relationships: 100%|██████████| 5/5 [00:00<00:00, 8422.30relationship/s]
Generating embeddings: 100%|██████████| 1/1 [00:00<00:00, 45.90batch/s]
Generating embeddings: 100%|██████████| 1/1 [00:00<00:00, 87.51batch/s]

🕒 Tempo di inserimento: 203.02 secondi





In [None]:
import time
import json

start = time.time()
with open("contexts.json") as f:
    contexts = json.load(f)
await rag_base.ainsert([doc["context"] for doc in contexts])

end = time.time()
print(f"🕒 Caching time: {end - start:.2f} secondi")

Chunking documents: 100%|██████████| 1/1 [00:03<00:00,  3.87s/doc]
Generating embeddings: 100%|██████████| 1/1 [00:01<00:00,  1.13s/batch]
Extracting entities from chunks:   0%|          | 0/22 [00:00<?, ?chunk/s]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


⠙ Processed 1 chunks, 2 entities(duplicated), 1 relations(duplicated)

Extracting entities from chunks:   5%|▍         | 1/22 [00:13<04:50, 13.83s/chunk]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


⠹ Processed 2 chunks, 4 entities(duplicated), 2 relations(duplicated)

Extracting entities from chunks:   9%|▉         | 2/22 [00:27<04:32, 13.64s/chunk]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


⠸ Processed 3 chunks, 5 entities(duplicated), 3 relations(duplicated)

Extracting entities from chunks:  14%|█▎        | 3/22 [00:41<04:22, 13.79s/chunk]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


⠼ Processed 4 chunks, 61 entities(duplicated), 28 relations(duplicated)

Extracting entities from chunks:  18%|█▊        | 4/22 [00:55<04:11, 13.98s/chunk]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


⠴ Processed 5 chunks, 116 entities(duplicated), 52 relations(duplicated)

Extracting entities from chunks:  23%|██▎       | 5/22 [01:09<03:57, 14.00s/chunk]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


⠦ Processed 6 chunks, 171 entities(duplicated), 76 relations(duplicated)

Extracting entities from chunks:  27%|██▋       | 6/22 [01:23<03:43, 13.95s/chunk]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


⠧ Processed 7 chunks, 226 entities(duplicated), 100 relations(duplicated)

Extracting entities from chunks:  32%|███▏      | 7/22 [01:37<03:28, 13.88s/chunk]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


⠇ Processed 8 chunks, 228 entities(duplicated), 101 relations(duplicated)

Extracting entities from chunks:  36%|███▋      | 8/22 [01:50<03:13, 13.86s/chunk]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


⠏ Processed 9 chunks, 230 entities(duplicated), 102 relations(duplicated)

Extracting entities from chunks:  41%|████      | 9/22 [02:05<03:02, 14.04s/chunk]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


⠋ Processed 10 chunks, 232 entities(duplicated), 103 relations(duplicated)

Extracting entities from chunks:  45%|████▌     | 10/22 [02:19<02:48, 14.04s/chunk]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


⠙ Processed 11 chunks, 287 entities(duplicated), 127 relations(duplicated)

Extracting entities from chunks:  50%|█████     | 11/22 [02:33<02:34, 14.04s/chunk]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


⠹ Processed 12 chunks, 342 entities(duplicated), 151 relations(duplicated)

Extracting entities from chunks:  55%|█████▍    | 12/22 [02:47<02:20, 14.00s/chunk]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


⠸ Processed 13 chunks, 344 entities(duplicated), 152 relations(duplicated)

Extracting entities from chunks:  59%|█████▉    | 13/22 [03:01<02:05, 13.98s/chunk]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


⠼ Processed 14 chunks, 346 entities(duplicated), 153 relations(duplicated)

Extracting entities from chunks:  64%|██████▎   | 14/22 [03:15<01:51, 13.96s/chunk]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


⠴ Processed 15 chunks, 402 entities(duplicated), 178 relations(duplicated)

Extracting entities from chunks:  68%|██████▊   | 15/22 [03:29<01:37, 13.97s/chunk]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


⠦ Processed 16 chunks, 457 entities(duplicated), 202 relations(duplicated)

Extracting entities from chunks:  73%|███████▎  | 16/22 [03:43<01:23, 13.98s/chunk]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


⠧ Processed 17 chunks, 514 entities(duplicated), 227 relations(duplicated)

Extracting entities from chunks:  77%|███████▋  | 17/22 [03:57<01:09, 13.99s/chunk]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


⠇ Processed 18 chunks, 571 entities(duplicated), 252 relations(duplicated)

Extracting entities from chunks:  82%|████████▏ | 18/22 [04:11<00:55, 14.00s/chunk]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


⠏ Processed 19 chunks, 573 entities(duplicated), 253 relations(duplicated)

Extracting entities from chunks:  86%|████████▋ | 19/22 [04:25<00:41, 13.99s/chunk]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


⠋ Processed 20 chunks, 629 entities(duplicated), 278 relations(duplicated)

Extracting entities from chunks:  91%|█████████ | 20/22 [04:39<00:27, 14.00s/chunk]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


⠙ Processed 21 chunks, 684 entities(duplicated), 302 relations(duplicated)

Extracting entities from chunks:  95%|█████████▌| 21/22 [04:53<00:14, 14.01s/chunk]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


⠹ Processed 22 chunks, 686 entities(duplicated), 303 relations(duplicated)

Extracting entities from chunks: 100%|██████████| 22/22 [05:06<00:00, 13.94s/chunk]
Inserting hyperedges: 100%|██████████| 32/32 [00:00<00:00, 12660.86entity/s]
Inserting entities: 100%|██████████| 67/67 [00:00<00:00, 8949.63entity/s]
Inserting relationships: 100%|██████████| 67/67 [00:00<00:00, 8737.59relationship/s]
Generating embeddings: 100%|██████████| 1/1 [00:00<00:00, 26.83batch/s]
Generating embeddings: 100%|██████████| 3/3 [00:00<00:00, 28.29batch/s]


🕒 Caching time: 311.95 secondi


In [None]:
#downloading hypergraph after extraction
!zip -r example.zip expr/example
from google.colab import files
files.download('example.zip')

In [None]:
#caching Hyperedges
import time

start = time.time()
await rag.cache_all_hyperedges()
end = time.time()

print(f"🕒 Caching time: {end - start:.2f} secondi")

In [None]:
await rag.cache_all_entities()

🚀 Avvio generazione cache per tutte le entità...
✅ Cache salvata per entità: "US" → cag-entity-7a20c4af648fdfb4a3e5fce3d812aab5.pt
✅ Cache salvata per entità: "SERENE" → cag-entity-30ad0437128525e720eccb377b23340a.pt
✅ Cache salvata per entità: "BRIM" → cag-entity-8c26d3ec6d2e6e5e98c238e34705184b.pt
✅ Cache salvata per entità: "SPOTTED CAMELEOPARD" → cag-entity-afed7ce0c751efdd83c808ce47ce26cf.pt
✅ Cache salvata per entità: "MARRIAGE" → cag-entity-ebfbf749249573e125d9c8c50957dba6.pt
✅ Cache salvata per entità: "TAPESTRY" → cag-entity-d85efba9383405cbd4d1510e43b7dc3b.pt
✅ Cache salvata per entità: "MARGIN TRIM" → cag-entity-6618f620d585af3fb94eb1d0f94be44e.pt
✅ Cache salvata per entità: "WISE AND FEARLESS ELEPHANT" → cag-entity-add4ba6c14ae0ed747fcbe7046fdf74a.pt
✅ Cache salvata per entità: "HAPPINESS" → cag-entity-4e8897dc3bed836e2f94634724341761.pt
✅ Cache salvata per entità: "FLEECE-LIKE MIST" → cag-entity-4b6cef9c9349608ebd992620f80bd8b6.pt
✅ Cache salvata per entità: "MORTAL BOAT" 

###aquery

In [None]:
from hypergraphrag.base import QueryParam

response = await rag.aquery(
    "What is the main objection Mary has to the poem \"The Witch of Atlas\"?",
    QueryParam()
)

print("📤 Risposta CAG:", response)


#final query setting (no DEBUG)

In [None]:
import torch
from pathlib import Path
from hypergraphrag.utils import compute_mdhash_id
import gc

def load_and_fuse_kvcaches(cache_ids: list[str], kv_cache_dir: str):
    kv_list = []

    for cache_id in cache_ids:
        path = Path(kv_cache_dir) / f"{cache_id}.pt"
        if not path.exists():
            continue
        data = torch.load(path, map_location="cpu")
        kv = data.get("kv_cache")
        if kv is not None:
            kv_list.append(kv)

    if not kv_list:
        raise ValueError("❌ Nessuna KV-cache valida trovata.")

    gc.collect()
    torch.cuda.empty_cache()

    fused_kv = []
    for layer_i in range(len(kv_list[0])):
        k_list = [kv[layer_i][0].to("cuda") for kv in kv_list]
        v_list = [kv[layer_i][1].to("cuda") for kv in kv_list]
        k_cat = torch.cat(k_list, dim=2)
        v_cat = torch.cat(v_list, dim=2)
        fused_kv.append((k_cat, v_cat))

    return fused_kv

In [None]:
async def query_with_cache_and_kv_fused(query: str, rag, max_tokens: int = 4096, top_k: int = 4):
    from hypergraphrag.utils import compute_mdhash_id
    from pathlib import Path
    import torch, gc

    print(f"🔍 Analyzing query: {query}")

    # Step 1: Retrieve top-k similar entities
    entity_results = await rag.entity_vdb.query(query, top_k=top_k)
    entity_names = [res.get("entity_name") for res in entity_results if "entity_name" in res]
    print(f"🔗 Top-{top_k} similar entities: {entity_names}")

    # Step 2: Get connected hyperedges
    all_hyperedges = set()
    for ent in entity_names:
        if not await rag.graph.has_node(ent):
            continue
        neighbors = list(rag.graph._graph.neighbors(ent))
        all_hyperedges.update(n for n in neighbors if rag.graph._graph.nodes[n].get("role") == "hyperedge")
    all_hyperedges = list(all_hyperedges)
    print(f"🧩 Retrieved hyperedges: {all_hyperedges}")

    # Step 3: Select caches fitting in max token limit
    def estimate_tokens(text): return int(len(text.split()) / 0.75)
    def load_kv_cache(path): return torch.load(path, map_location="cpu")

    selected = []
    total_tokens = 0
    for he in all_hyperedges:
        cache_id = compute_mdhash_id(he, prefix="cag-")
        path = Path(rag.cag_engine.kv_cache_dir) / f"{cache_id}.pt"
        if not path.exists():
            continue
        prompt = load_kv_cache(path).get("prompt", "")
        n_tokens = estimate_tokens(prompt)
        if total_tokens + n_tokens > max_tokens:
            break
        selected.append(cache_id)
        total_tokens += n_tokens

    if not selected:
        print("⚠️ No usable cache found. Answering based only on the query.")
        return rag.cag_engine.generate_from_query_only(query)

    print(f"📦 Selected cache IDs (fit within {max_tokens} tokens): {selected}")

    # Step 4: Load and fuse KV caches
    fused_kv = load_and_fuse_kvcaches(selected, rag.cag_engine.kv_cache_dir)

    # Step 5: Generate answer with fused KV cache
    guided_query = f"Answer in a short and focused paragraph (max 2-3 sentences) based only on the available knowledge: {query.strip()}"
    response = rag.cag_engine.generate_from_fused_cache(guided_query, fused_kv)

    # Cleanup
    del fused_kv
    torch.cuda.empty_cache()
    gc.collect()

    return response


#evaluation Agricolture

In [None]:
import time

start = time.time()
query = "What are the nutritional needs of honey bees, and how do they meet them?"
response = await query_with_cache_and_kv_fused(query,rag)
end = time.time()

print(response)
#response time 4 testing
print(f"🕒 Tempo di riposta: {end - start:.2f} secondi")

🔍 Analyzing query: What are the nutritional needs of honey bees, and how do they meet them?
🔗 Top-4 similar entities: ['"HONEY"', '"BEES"', '"BEE"', '"WINTER"']
🧩 Retrieved hyperedges: ['<hyperedge>"bees or with a nucleus hive (a nuc) rather than with an established colony."', '<hyperedge>"The most important aspect of understanding the activities and behavior of bees is to recognize that every bee action is attributable to some kind of situation or stimulus."', '<hyperedge>"However, pollen is found naturally in honey, so the bees consume some amount of it throughout their lives."', '<hyperedge>"it is not necessary for us to understand the bees\' language, it is interesting and at times helpful to be able to do so."', '<hyperedge>"Text: winter. However, bees should never be totally ignored, even in winter. Life goes on in the colony year-round."', '<hyperedge>"After a winter with no stings, the tolerance may wear off and have to be reestablished. A few stings in the early season will ta

In [None]:
import time

start = time.time()
query = "What is the role of smoke in beekeeping, and how should it be used effectively?"
response = await query_with_cache_and_kv_fused(query,rag)
end = time.time()

print(response)
#response time 4 testing
print(f"🕒 Tempo di riposta: {end - start:.2f} secondi")

🔍 Analyzing query: What is the role of smoke in beekeeping, and how should it be used effectively?
🔗 Top-5 similar entities: ['"BEEKEEPER"', '"HONEY"', '"WINTER"', '"DEMONSTRATIONS"', '"YELLOW JACKET"']
🧩 Retrieved hyperedges: ['<hyperedge>"bees or with a nucleus hive (a nuc) rather than with an established colony."', '<hyperedge>"is this defensive behavior that allows a colony of bees to store away large quantities of honey with minimum likelihood that it will be taken from them by predators."', '<hyperedge>"the yellow jacket, which somewhat resembles the honey bee, hangs around at summer outings, and is known for its repeated stings."', '<hyperedge>"However, pollen is found naturally in honey, so the bees consume some amount of it throughout their lives."', '<hyperedge>"Text: winter. However, bees should never be totally ignored, even in winter. Life goes on in the colony year-round."', '<hyperedge>"After a winter with no stings, the tolerance may wear off and have to be reestablishe

In [None]:
import time

start = time.time()
query = "What is the role of communication in a bee colony, particularly the 'dance language'?"
response = await query_with_cache_and_kv_fused(query,rag)
end = time.time()

print(response)
#response time 4 testing
print(f"🕒 Tempo di riposta: {end - start:.2f} secondi")


🔍 Analyzing query: What is the role of communication in a bee colony, particularly the 'dance language'?
🔗 Top-5 similar entities: ['"LANGUAGE"', '"BEES"', '"BEEKEEPER"', '"BEE"', '"HONEY"']
🧩 Retrieved hyperedges: ['<hyperedge>"bees or with a nucleus hive (a nuc) rather than with an established colony."', '<hyperedge>"The most important aspect of understanding the activities and behavior of bees is to recognize that every bee action is attributable to some kind of situation or stimulus."', '<hyperedge>"However, pollen is found naturally in honey, so the bees consume some amount of it throughout their lives."', '<hyperedge>"it is not necessary for us to understand the bees\' language, it is interesting and at times helpful to be able to do so."', '<hyperedge>"is this defensive behavior that allows a colony of bees to store away large quantities of honey with minimum likelihood that it will be taken from them by predators."']
📦 Selected cache IDs (fit within 4096 tokens): ['cag-33fff6ec

#evaluation Medicine

In [None]:
import time

start = time.time()
query = "What is the recommended first-line treatment for hypertension?"
response = await query_with_cache_and_kv_fused_hybrid(query,rag)
end = time.time()

print(response)
#response time 4 testing
print(f"🕒 Tempo di riposta: {end - start:.2f} secondi")

🔍 Analyzing query: What is the recommended first-line treatment for hypertension?
🔗 Retrieved entities: ['"LINEE GUIDA ESC PA ELEVATA E IPERTENSIONE"', '"TEXT"', '"CONTROLLO PRESSORIO"', '"DOCUMENT"', '"IPERTENSIONE"', '"HBPM"', '"GESTIONE DELL’IPERTENSIONE"', '"ESC/EUROPEAN SOCIETY OF HYPERTENSION (ESH)"', '"RECOMMENDATIONS"', '"ABPM"']
🧩 Retrieved hyperedges: ['<hyperedge>"This document represents an update of the ESC/European Society of Hypertension (ESH) 2018 guidelines for the diagnosis and treatment of hypertension1 and, while maintaining the previous guidelines, includes important new recommendations based on the currently available evidence."', '<hyperedge>"recommendations – Recommendations for the non-pharmacological treatment of hypertension and for the reduction of cardiovascular risk (Tables 22-26 of evidence)"', '<hyperedge>"B. Lo screening per iperaldosteronismo primario mediante determinazione delle concentrazioni plasmatiche di renina e aldosterone dovrebbe essere preso

In [None]:
import time

start = time.time()
query = "What is the role of beta-blockers in the treatment of hypertension according to the 2024 ESC guidelines?"
response = await query_with_cache_and_kv_fused_hybrid(query,rag)
end = time.time()

print(response)
#response time 4 testing
print(f"🕒 Tempo di riposta: {end - start:.2f} secondi")

🔍 Analyzing query: What is the role of beta-blockers in the treatment of hypertension according to the 2024 ESC guidelines?
🔗 Retrieved entities: ['"DOCUMENT"', '"LINEE GUIDA ESC PA ELEVATA E IPERTENSIONE"', '"IPERTENSIONE"', '"TEXT"', '"ESC/EUROPEAN SOCIETY OF HYPERTENSION (ESH)"', '"SOCIETÀ EUROPEA DI CARDIOLOGIA (ESC)"', '"CONTROLLO PRESSORIO"', '"TASK FORCE"', '"PA"', '"SRA INHIBITORS"', '"GESTIONE DELL’IPERTENSIONE"', '"LINEE GUIDA"', '"EUROPEAN HEART JOURNAL"', '"LINEE GUIDE"', '"HBPM"', '"PSEUDO-RESISTENZA"', '"ABPM"', '"ESC PA"', '"RECOMMENDATIONS"', '"IFICATORI"', '"EUROPEAN SOCIETY OF ENDOCRINOLOGY (ESE)"', '"DOCUMENTO"', '"G ITAL CARDIOL"', '"ESC"', '"DETERMINAZIONE"', '"TRATTAMENTO ANTIPERTENSIVO"', '"RESISTENZA"', '"AUTOMISURAZIONE DELLA PA"', '"PA CLINICA"', '"DIAGNOSI"', '"CARDIOVASCULAR RISK"', '"HYPERTENSION"', '"CORRETA TECNICA DI MISURAZIONE STANDARDIZZATA"', '"MALATTIA CARDIOVASCOLARE NON TRADIZIONALI"', '"VOL 25"', '"APPARECCHI OSCILLOMETRICI AUTOMATICI"', '"INERZI

In [None]:
import time

start = time.time()
query = "What is ambulatory blood pressure monitoring (ABPM) used for in hypertension management?"
response = await query_with_cache_and_kv_fused_hybrid(query,rag)
end = time.time()

print(response)
#response time 4 testing
print(f"🕒 Tempo di riposta: {end - start:.2f} secondi")

🔍 Analyzing query: What is ambulatory blood pressure monitoring (ABPM) used for in hypertension management?
🔗 Retrieved entities: ['"ABPM"', '"HBPM"', '"IPERTENSIONE"', '"DOCUMENT"', '"ESC/EUROPEAN SOCIETY OF HYPERTENSION (ESH)"', '"TEXT"', '"CONTROLLO PRESSORIO"', '"TASK FORCE"', '"DIAGNOSI"', '"LINEE GUIDA ESC PA ELEVATA E IPERTENSIONE"', '"IFICATORI"', '"GESTIONE DELL’IPERTENSIONE"', '"SOCIETÀ EUROPEA DI CARDIOLOGIA (ESC)"', '"PA"', '"AUTOMISURAZIONE DELLA PA"', '"EUROPEAN HEART JOURNAL"', '"G ITAL CARDIOL"', '"DETERMINAZIONE"', '"PA CLINICA"', '"PSEUDO-RESISTENZA"', '"DOCUMENTO"', '"LINEE GUIDA"', '"EUROPEAN SOCIETY OF ENDOCRINOLOGY (ESE)"', '"RECOMMENDATIONS"', '"ESC PA"', '"LINEE GUIDE"', '"TRATTAMENTO ANTIPERTENSIVO"', '"ESC"', '"HYPERTENSION"', '"MALATTIA CARDIOVASCOLARE NON TRADIZIONALI"', '"RESISTENZA"', '"CARDIOVASCULAR RISK"', '"VOL 25"', '"INERZIA CLINICA"', '"APPARECCHI OSCILLOMETRICI AUTOMATICI"', '"SRA INHIBITORS"', '"SOCIETÀ EUROPEA DI CARDIOLOGIA"', '"CORRETA TECNICA 

In [None]:
import time

start = time.time()
query = "What are the recommended blood pressure targets for patients with chronic kidney disease and diabetes?"
response = await query_with_cache_and_kv_fused_hybrid(query,rag)
end = time.time()

print(response)
#response time 4 testing
print(f"🕒 Tempo di riposta: {end - start:.2f} secondi")

🔍 Analyzing query: What are the recommended blood pressure targets for patients with chronic kidney disease and diabetes?
🔗 Retrieved entities: ['"LINEE GUIDA ESC PA ELEVATA E IPERTENSIONE"', '"IPERTENSIONE"', '"TEXT"', '"DOCUMENT"', '"CONTROLLO PRESSORIO"', '"DETERMINAZIONE"', '"HBPM"', '"GESTIONE DELL’IPERTENSIONE"', '"PA"', '"DIAGNOSI"']
🧩 Retrieved hyperedges: ['<hyperedge>"This document represents an update of the ESC/European Society of Hypertension (ESH) 2018 guidelines for the diagnosis and treatment of hypertension1 and, while maintaining the previous guidelines, includes important new recommendations based on the currently available evidence."', '<hyperedge>"recommendations – Recommendations for the non-pharmacological treatment of hypertension and for the reduction of cardiovascular risk (Tables 22-26 of evidence)"', '<hyperedge>"ificatori del rischio di malattia cardiovascolare non tradizionali sesso-specifici"', '<hyperedge>"B. Lo screening per iperaldosteronismo primario 

In [None]:
import time

start = time.time()
query = "What factors should be considered when selecting antihypertensive therapy in elderly frail patients?"
response = await query_with_cache_and_kv_fused_hybrid(query,rag)
end = time.time()

print(response)
#response time 4 testing
print(f"🕒 Tempo di riposta: {end - start:.2f} secondi")

🔍 Analyzing query: What factors should be considered when selecting antihypertensive therapy in elderly frail patients?
🔗 Retrieved entities: ['"TEXT"', '"LINEE GUIDA ESC PA ELEVATA E IPERTENSIONE"', '"IPERTENSIONE"', '"DOCUMENT"', '"CONTROLLO PRESSORIO"', '"GESTIONE DELL’IPERTENSIONE"', '"PA"', '"DETERMINAZIONE"', '"LINEE GUIDA"', '"SRA INHIBITORS"']
🧩 Retrieved hyperedges: ['<hyperedge>"B. Lo screening per iperaldosteronismo primario mediante determinazione delle concentrazioni plasmatiche di renina e aldosterone dovrebbe essere preso in considerazione in tutti i soggetti con ipertensione accertata (PA ≥140/90 mmHg)."', '<hyperedge>"This document represents an update of the ESC/European Society of Hypertension (ESH) 2018 guidelines for the diagnosis and treatment of hypertension1 and, while maintaining the previous guidelines, includes important new recommendations based on the currently available evidence."', '<hyperedge>"Le linee guida hanno l’obiettivo di riassumere e valutare le 

In [None]:
import time

start = time.time()
query = "How does ethnicity affect the acceptance of blood pressure monitoring methods?"
response = await query_with_cache_and_kv_fused_hybrid(query,rag)
end = time.time()

print(response)
#response time 4 testing
print(f"🕒 Tempo di riposta: {end - start:.2f} secondi")

🔍 Analyzing query: How does ethnicity affect the acceptance of blood pressure monitoring methods?
🔗 Retrieved entities: ['"DOCUMENT"', '"IPERTENSIONE"', '"LINEE GUIDA ESC PA ELEVATA E IPERTENSIONE"', '"TEXT"', '"ESC/EUROPEAN SOCIETY OF HYPERTENSION (ESH)"', '"CONTROLLO PRESSORIO"', '"IFICATORI"', '"HBPM"', '"DOCUMENTO"', '"DIAGNOSI"']
🧩 Retrieved hyperedges: ['<hyperedge>"This document represents an update of the ESC/European Society of Hypertension (ESH) 2018 guidelines for the diagnosis and treatment of hypertension1 and, while maintaining the previous guidelines, includes important new recommendations based on the currently available evidence."', '<hyperedge>"B. Lo screening per iperaldosteronismo primario mediante determinazione delle concentrazioni plasmatiche di renina e aldosterone dovrebbe essere preso in considerazione in tutti i soggetti con ipertensione accertata (PA ≥140/90 mmHg)."', '<hyperedge>"possono causare pseudo-resistenza or resistenza al trattamento antipertensivo"

In [None]:
import time

start = time.time()
query = "What are the key findings regarding the accuracy of home blood pressure monitors owned by patients?"
response = await query_with_cache_and_kv_fused_hybrid(query,rag)
end = time.time()

print(response)
#response time 4 testing
print(f"🕒 Tempo di riposta: {end - start:.2f} secondi")

🔍 Analyzing query: What are the key findings regarding the accuracy of home blood pressure monitors owned by patients?
🔗 Retrieved entities: ['"AUTOMISURAZIONE DELLA PA"', '"DOCUMENT"', '"HBPM"', '"CONTROLLO PRESSORIO"', '"IPERTENSIONE"', '"DOCUMENTO"', '"G ITAL CARDIOL"', '"DIAGNOSI"', '"ESC/EUROPEAN SOCIETY OF HYPERTENSION (ESH)"', '"EUROPEAN HEART JOURNAL"', '"TEXT"', '"LINEE GUIDA ESC PA ELEVATA E IPERTENSIONE"', '"ABPM"', '"TASK FORCE"', '"IFICATORI"', '"PA"', '"GESTIONE DELL’IPERTENSIONE"', '"SOCIETÀ EUROPEA DI CARDIOLOGIA (ESC)"', '"LINEE GUIDA"', '"PA CLINICA"', '"LINEE GUIDE"', '"TRATTAMENTO ANTIPERTENSIVO"', '"PSEUDO-RESISTENZA"', '"MALATTIA CARDIOVASCOLARE NON TRADIZIONALI"', '"APPARECCHI OSCILLOMETRICI AUTOMATICI"', '"CARDIOVASCULAR RISK"', '"EVIDENCE"', '"DETERMINAZIONE"', '"RECOMMENDATIONS"', '"INERZIA CLINICA"', '"ESC PA"', '"VOL 25"', '"CORRETA TECNICA DI MISURAZIONE STANDARDIZZATA"', '"SRA INHIBITORS"', '"HYPERTENSION"', '"RESISTENZA"', '"EUROPEAN SOCIETY OF ENDOCRINOL

#evaluation Legal pt2

In [None]:
import time

start = time.time()
query = "What role does the Restructuring Term Sheet play in the agreement?"
response = await query_with_cache_and_kv_fused_hybrid(query,rag)
end = time.time()

print(response)
#response time 4 testing
print(f"🕒 Tempo di riposta: {end - start:.2f} secondi")

🔍 Analyzing query: What role does the Restructuring Term Sheet play in the agreement?
🔗 Retrieved entities: ['"PARTIES"', '"SUPPORTING LENDERS"', '"RESTRUCTURING TERM SHEET"', '"PLAN"', '"BUSINESS"', '"RESTRUCTURING"', '"COMPANY"', '"AGREEMENT"', '"TERMS"', '"DEBTORS"']
🧩 Retrieved hyperedges: ['<hyperedge>"Agreement, the Restructuring Term Sheet or the Definitive Documents or otherwise inconsistent with, or reasonably expected to prevent, interfere with or impede the implementation or consummation of, the Restructuring."', '<hyperedge>"The Parties have agreed to support the Restructuring subject to and in accordance with the terms of this Agreement and to use commercially reasonable efforts to complete the negotiation of the terms of the documents and completion of the actions specified to effect the Restructuring in accordance with the Restructuring Term Sheet."', '<hyperedge>"From the date of this Agreement, the Supporting Lenders commit to supporting the Restructuring."', '<hypered

In [None]:
import time

start = time.time()
query = "Which documents must be finalized or approved before the restructuring can proceed?"
response = await query_with_cache_and_kv_fused_hybrid(query,rag)
end = time.time()

print(response)
#response time 4 testing
print(f"🕒 Tempo di riposta: {end - start:.2f} secondi")

🔍 Analyzing query: Which documents must be finalized or approved before the restructuring can proceed?
🔗 Retrieved entities: ['"PLAN"', '"PARTIES"', '"SUPPORTING LENDERS"', '"BUSINESS"', '"RESTRUCTURING"', '"DOCUMENTS"', '"RESTRUCTURING TERM SHEET"', '"AGREEMENT"', '"DEFINITIVE DOCUMENTS"', '"SUBSIDIARIES"']
🧩 Retrieved hyperedges: ['<hyperedge>"The Parties have agreed to support the Restructuring subject to and in accordance with the terms of this Agreement and to use commercially reasonable efforts to complete the negotiation of the terms of the documents and completion of the actions specified to effect the Restructuring in accordance with the Restructuring Term Sheet."', '<hyperedge>"Agreement, the Restructuring Term Sheet or the Definitive Documents or otherwise inconsistent with, or reasonably expected to prevent, interfere with or impede the implementation or consummation of, the Restructuring."', '<hyperedge>"The execution and delivery of this Agreement, the consummation of the

In [None]:
import time

start = time.time()
query = "What milestones must the Debtors meet to comply with the RSA?"
response = await query_with_cache_and_kv_fused_hybrid(query,rag)
end = time.time()

print(response)
#response time 4 testing
print(f"🕒 Tempo di riposta: {end - start:.2f} secondi")

🔍 Analyzing query: What milestones must the Debtors meet to comply with the RSA?
🔗 Retrieved entities: ['"DEBTORS"', '"SUPPORTING LENDERS"', '"BUSINESS"', '"COMPANY"', '"PARTIES"', '"PLAN"', '"AGREEMENT"', '"RESTRUCTURING"', '"NON-CONFORMING MATERIAL REPORTS"', '"ALL OBLIGATIONS"']
🧩 Retrieved hyperedges: ['<hyperedge>"The Debtors are engaged in the business of, directly or indirectly, researching, developing,"', '<hyperedge>"failure of the Business to meet any budgets, plans, projections or forecasts (internal or otherwise) or any decline in the trading price or trading volume of the Company’s common stock or any change in the ratings or ratings outlook for the Company as a result of the commencement of the Chapter 11 Cases (each of clauses (i) through (viii), an “Excluded Matter”)"', '<hyperedge>"From the date of this Agreement and as long as this Agreement has not been terminated pursuant to its terms (such period, the ‘Effective Period’), subject to the terms of this Agreement, eac

In [None]:
import time

start = time.time()
query = "Which parties are responsible for voting in favor of the restructuring plan?"
response = await query_with_cache_and_kv_fused_hybrid(query,rag)
end = time.time()

print(response)
#response time 4 testing
print(f"🕒 Tempo di riposta: {end - start:.2f} secondi")

🔍 Analyzing query: Which parties are responsible for voting in favor of the restructuring plan?
🔗 Retrieved entities: ['"PARTIES"', '"SUPPORTING LENDERS"', '"PLAN"', '"RESTRUCTURING"', '"RESTRUCTURING TERM SHEET"', '"BUSINESS"', '"DEBTORS"', '"CHAPTER 11 PLAN"', '"COMPANY"', '"CAUSES OF ACTION"']
🧩 Retrieved hyperedges: ['<hyperedge>"From the date of this Agreement, the Supporting Lenders commit to supporting the Restructuring."', '<hyperedge>"The Parties have agreed to support the Restructuring subject to and in accordance with the terms of this Agreement and to use commercially reasonable efforts to complete the negotiation of the terms of the documents and completion of the actions specified to effect the Restructuring in accordance with the Restructuring Term Sheet."', '<hyperedge>"Section 3.1 Support of Restructuring."', '<hyperedge>"WHEREAS, the Parties have agreed to support the Restructuring subject to and in accordance with the terms of this Agreement and to use commercially r

# Naive RAG

In [None]:
# ✅ Setup modello (solo se non già fatto)
!pip install -q transformers bitsandbytes accelerate

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

model_name = "mistralai/Mistral-7B-Instruct-v0.2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto"
)

# ✅ Funzione Naive
def naive_generate_mistral(question):
    prompt = f"""---Role---

You are a helpful assistant responding to questions based on given knowledge.

---Knowledge---

(none)

---Goal---

Answer the given question.
You must first conduct reasoning inside <think>...</think>.
When you have the final answer, you can output the answer inside <answer>...</answer>.

---Question---

{question}
"""
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    output = model.generate(**inputs, max_new_tokens=512)
    return tokenizer.decode(output[0], skip_special_tokens=True)

# ✅ Inserisci la query qui
query = "What was the reason behind the change in voice actors for Meg Griffin in Family Guy?"
naive_response = naive_generate_mistral(query)

# ✅ Stampa la risposta
print("🔍 Query:", query)
print("\n🧠 Naive Response:\n")
print(naive_response)

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.0/67.0 MB[0m [31m12.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m84.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m60.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m48.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m14.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

tokenizer_config.json:   0%|          | 0.00/2.10k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/596 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


🔍 Query: What was the reason behind the change in voice actors for Meg Griffin in Family Guy?

🧠 Naive Response:

---Role---

You are a helpful assistant responding to questions based on given knowledge.

---Knowledge---

(none)

---Goal---

Answer the given question.
You must first conduct reasoning inside <think>...</think>.
When you have the final answer, you can output the answer inside <answer>...</answer>.

---Question---

What was the reason behind the change in voice actors for Meg Griffin in Family Guy?

---Answer---

<think>
The change in voice actors for Meg Griffin in Family Guy occurred due to contract negotiations between the creators of the show and Mila Kunis, the original voice actress. Kunis's salary demands were not met, leading to the casting of new voice actresses, including Alex Borstein and Zoie Palmer, to voice Meg in later seasons.
</think>

<answer>
The change in voice actors for Meg Griffin in Family Guy was due to contract negotiations between the creators o

In [None]:
#test2
# ✅ Inserisci la query qui
query = "What is the role of communication in a bee colony, particularly the 'dance language'?"
naive_response = naive_generate_mistral(query)

# ✅ Stampa la risposta
print("🔍 Query:", query)
print("\n🧠 Naive Response:\n")
print(naive_response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


🔍 Query: What is the role of communication in a bee colony, particularly the 'dance language'?

🧠 Naive Response:

---Role---

You are a helpful assistant responding to questions based on given knowledge.

---Knowledge---

(none)

---Goal---

Answer the given question.
You must first conduct reasoning inside <think>...</think>.
When you have the final answer, you can output the answer inside <answer>...</answer>.

---Question---

What is the role of communication in a bee colony, particularly the 'dance language'?

---Answer---

<think>
Communication plays a crucial role in a bee colony, enabling bees to coordinate their activities and ensure the survival of the colony. One of the most fascinating aspects of bee communication is the 'dance language'. Bees use this dance to communicate the location of food sources, specifically nectar and water, to other bees in the colony.

The dance language consists of various movements and vibrations that convey different types of information. For i