# Hierarchical RAG System

Bu notebook, Project Gutenberg'den "The Children of the New Forest" kitabını kullanarak hiyerarşik parçalama yöntemiyle bir RAG (Retrieval-Augmented Generation) sistemi oluşturur.

**Proje Detayları:**
- **Kitap:** The Children of the New Forest by Frederick Marryat
- **Release:** May 21, 2007
- **Dataset:** NarrativeQA
- **Chunking:** LlamaIndex HierarchicalNodeParser
- **Vector DB:** Milvus Lite
- **Embedding Model:** BAAI/bge-large-en-v1.5 (1024 dim)
- **Retrieval:** AutoMergingRetriever (parent-child hierarchy)
- **LLM:** google/gemma-3-1b-it
- **Metrikler:** BLEU, ROUGE-1, ROUGE-2, ROUGE-L

---
## 1. Kurulum ve Hazırlık

### 1.1 Kütüphaneleri Yükle

In [None]:
# Git clone ve requirements
import os
import sys

# Google Colab için
if 'google.colab' in sys.modules:
    # Repo'yu klonla veya güncelle
    if not os.path.exists('V-RAG-Final'):
        !git clone https://github.com/sendayildirim/V-RAG-Final
    else:
        %cd V-RAG-Final
        !git pull
        %cd ..

    %cd V-RAG-Final

    # Tüm requirements'ı
    !pip install -q -r requirements.txt

    sys.path.append('/content/V-RAG-Final/src')
else:
    # Local için
    sys.path.append('/Users/senda.yildirim/Desktop/V-RAG-Final/src')
    print("Local environment - requirements.txt'i manuel yükleyin: pip install -r requirements.txt")

print("Kütüphaneler yüklendi!")

Cloning into 'V-RAG-Final'...
remote: Enumerating objects: 112, done.[K
remote: Counting objects: 100% (112/112), done.[K
remote: Compressing objects: 100% (112/112), done.[K
remote: Total 112 (delta 54), reused 0 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (112/112), 690.60 KiB | 6.06 MiB/s, done.
Resolving deltas: 100% (54/54), done.
/content/V-RAG-Final
  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.8/51.8 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.9/11.9 MB[0m [31m126.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m278.0/278.0 kB[0m [31m26.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.3/55.3 MB[0m [31m47.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m104.1/104.1 kB[0m [31m9.

### 1.2 Gerekli Modülleri İçe Aktar

In [None]:
from data_loader import DataLoader
from chunker_v2 import HierarchicalChunker
from vector_store_v2 import VectorStore
from rag_pipeline_v2 import RAGPipeline
from baseline_model_v2 import BaselineModel
from metrics import MetricsEvaluator
from experiment_runner_v2 import ExperimentRunner

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import json
import torch


print("Modüller yüklendi!")
print(f"GPU kullanılabilme durumu: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

### 1.3 Hugging Face Login (Gemma-3 için)

In [None]:
from huggingface_hub import login
login(new_session=False)

---
## 2. Veri Hazırlama

### 2.1 Kitap ve Soruları İndir

In [None]:
loader = DataLoader(data_dir="data")
data_paths = loader.load_all_data()

print("\nİndirilen dosyalar:")
print(f"  Book: {data_paths['book']}")
print(f"  Test: {data_paths['test']}")

### 2.2 Test Verilerini Yükle

In [None]:
test_df = pd.read_csv(data_paths['test'])

print(f"Toplam test sorusu: {len(test_df)}")
print("\nİlk 3 soru:")
test_df[['question', 'answer1', 'answer2']].head(3)

---
## 3. Hiyerarşik Chunking (LlamaIndex)

### 3.1 HierarchicalNodeParser ile Chunking

In [None]:
with open(data_paths['book'], 'r', encoding='utf-8') as f:
    book_text = f.read()

print(f"Kitap uzunluğu: {len(book_text)} karakter")

chunker = HierarchicalChunker(
    parent_size=2048,
    child_size=512,
    chunk_overlap=100
)

nodes, node_mapping = chunker.chunk_text(book_text)

stats = chunker.get_chunk_stats(nodes)
print("\nNode İstatistikleri:")
for key, value in stats.items():
    if isinstance(value, dict):
        print(f"  {key}: {value}")
    else:
        print(f"  {key}: {value:.1f}" if isinstance(value, float) else f"  {key}: {value}")

### 3.2 Node Yapısını İncele

In [None]:
from llama_index.core.node_parser import get_leaf_nodes

leaf_nodes = get_leaf_nodes(nodes)

sample_child = leaf_nodes[0]
print("Örnek Child Node:")
print(f"  Node ID: {sample_child.node_id}")
print(f"  Chapter: {sample_child.metadata.get('chapter')}")
print(f"  Chapter Title: {sample_child.metadata.get('chapter_title')}")
print(f"  Text uzunluğu: {len(sample_child.text)} karakter")
print(f"  Text (ilk 200 karakter): {sample_child.text[:200]}...")

if hasattr(sample_child, 'parent_node') and sample_child.parent_node:
    parent_id = sample_child.parent_node.node_id
    print(f"\n  Parent Node ID: {parent_id}")

    if parent_id in node_mapping:
        parent_node = node_mapping[parent_id]
        print(f"  Parent text uzunluğu: {len(parent_node.text)} karakter")
        print(f"  Parent text (ilk 100 karakter): {parent_node.text[:100]}...")
    else:
        print(f"  Parent node mapping'de bulunamadı")

---
## 4. Vector Store (Milvus + bge-large)

### 4.1 Milvus Vector Store Oluştur

In [None]:
vector_store = VectorStore(
    db_path="./milvus_rag.db",
    model_name="BAAI/bge-large-en-v1.5"
)

print(f"Embedding boyutu: {vector_store.embedding_dim}")

### 4.2 Node'ları İndeksle

In [None]:
import time

start = time.time()
vector_store.create_index(nodes, node_mapping)
indexing_time = time.time() - start

print(f"\nToplam indexing süresi: {indexing_time:.2f}s")

stats = vector_store.get_stats()
print(f"\nVector Store Stats:")
for key, value in stats.items():
    print(f"  {key}: {value}")

### 4.3 AutoMergingRetriever Testi

In [None]:
test_query = "What is the title of this story?"

_, results = vector_store.hybrid_search(
    query=test_query,
    top_parents=3
)

print(f"Test sorusu: {test_query}")
print(f"\nAutoMerging sonuçları ({len(results)} node):")
for i, result in enumerate(results, 1):
    node_type = "PARENT" if result.get('is_parent', False) else "CHILD"
    print(f"\n{i}. {node_type} (Chapter {result['chapter']}, Score: {result['score']:.4f})")
    print(f"   Text: {result['text'][:150]}...")

---
## 5. Baseline Model (RAG'sız)

### 5.1 Baseline Model Oluştur

In [None]:
baseline = BaselineModel(model_name="google/gemma-3-1b-it")

### 5.2 Baseline ile Test Soruları

In [None]:
from performance_monitor import PeakMemoryMonitor
import time

questions = test_df['question'].tolist()

baseline_monitor = PeakMemoryMonitor()
baseline_monitor.record()

print("Baseline model ile sorular cevaplanıyor...")

start_time = time.time()
baseline_results = baseline.batch_answer_questions(
    questions,
    max_new_tokens=100,
    memory_monitor=baseline_monitor
)
baseline_inference_time = time.time() - start_time

baseline_memory_snapshot = baseline_monitor.record()
baseline_memory_summary = baseline_monitor.summary()
baseline_memory_used = baseline_memory_summary['memory_used_mb']
baseline_initial_memory = baseline_memory_summary['initial_memory_mb']
baseline_peak_memory = baseline_memory_summary['peak_memory_mb']
baseline_end_memory = baseline_memory_snapshot.current_mb

print(f"{len(baseline_results)} soru cevaplandı!")

# Metrics
print("" + "="*60)
print("BASELINE MODEL PERFORMANS METRİKLERİ")
print("="*60)
print(f"Toplam Inference Time: {baseline_inference_time:.2f} saniye")
print(f"Ortalama Soru Başı Süre: {baseline_inference_time/len(questions):.2f} saniye")
print(f"Memory Kullanımı (Peak - Initial): {baseline_memory_used:.2f} MB")
print(f"Başlangıç Memory: {baseline_initial_memory:.2f} MB")
print(f"Peak Memory: {baseline_peak_memory:.2f} MB")
print(f"Bitiş Memory: {baseline_end_memory:.2f} MB")
print("="*60)

print("Örnek Baseline Cevaplar:")
for i, result in enumerate(baseline_results[:3], 1):
    print(f"{i}. Soru: {result['question']}")
    print(f"   Cevap: {result['answer']}")


In [None]:
baseline_df = pd.DataFrame(baseline_results)
os.makedirs("results", exist_ok=True)
baseline_df.to_csv("results/baseline_QA.csv", index=False)
print("Baseline sonuçları kaydedildi: results/baseline_QA.csv")

---
## 6. RAG Pipeline (AutoMergingRetriever)

### 6.1 RAG Pipeline Oluştur

In [None]:
rag_pipeline = RAGPipeline(
    vector_store=vector_store,
    model_name="google/gemma-3-1b-it",
    temperature=0.5
)

### 6.2 RAG ile Test Soruları

In [None]:
from performance_monitor import PeakMemoryMonitor
import time

rag_monitor = PeakMemoryMonitor()
rag_monitor.record()

print("RAG pipeline ile sorular cevaplanıyor...")

start_time = time.time()
rag_results = rag_pipeline.batch_answer_questions(
    questions,
    top_k=3,
    max_new_tokens=100,
    memory_monitor=rag_monitor
)
rag_inference_time = time.time() - start_time

rag_memory_snapshot = rag_monitor.record()
rag_memory_summary = rag_monitor.summary()
rag_memory_used = rag_memory_summary['memory_used_mb']
rag_initial_memory = rag_memory_summary['initial_memory_mb']
rag_peak_memory = rag_memory_summary['peak_memory_mb']
rag_end_memory = rag_memory_snapshot.current_mb

print(f"{len(rag_results)} soru cevaplandı!")

# Metrics
print("="*60)
print("RAG PIPELINE PERFORMANS METRİKLERİ")
print("="*60)
print(f"Toplam Inference Time: {rag_inference_time:.2f} saniye")
print(f"Ortalama Soru Başı Süre: {rag_inference_time/len(questions):.2f} saniye")
print(f"Memory Kullanımı (Peak - Initial): {rag_memory_used:.2f} MB")
print(f"Başlangıç Memory: {rag_initial_memory:.2f} MB")
print(f"Peak Memory: {rag_peak_memory:.2f} MB")
print(f"Bitiş Memory: {rag_end_memory:.2f} MB")
print("="*60)

print("Örnek RAG Cevaplar:")
for i, result in enumerate(rag_results[:3], 1):
    print(f"{i}. Soru: {result['question']}")
    print(f"   Cevap: {result['answer']}")
    print(f"   Context (ilk 100 karakter): {result['context'][:100]}...")


In [None]:
rag_results_df = pd.DataFrame(rag_results)
rag_results_df
rag_results_df.to_csv("results/RAG_QA.csv", index=False)


### 6.3 Performans Metrikleri Karşılaştırması

In [None]:
try:
    baseline_inference = globals().get('baseline_inference_time', 'N/A')
    baseline_mem = globals().get('baseline_memory_used', 'N/A')
    rag_inference = globals().get('rag_inference_time', 'N/A')
    rag_mem = globals().get('rag_memory_used', 'N/A')

    perf_comparison = pd.DataFrame({
        'Model': ['RAG Pipeline', 'Baseline'],
        'Toplam Inference Time (s)': [
            f"{rag_inference:.2f}" if isinstance(rag_inference, (int, float)) else rag_inference,
            f"{baseline_inference:.2f}" if isinstance(baseline_inference, (int, float)) else baseline_inference
        ],
        'Avg Time per Question (s)': [
            f"{rag_inference/len(questions):.2f}" if isinstance(rag_inference, (int, float)) else 'N/A',
            f"{baseline_inference/len(questions):.2f}" if isinstance(baseline_inference, (int, float)) else 'N/A'
        ],
        'Memory Usage (MB)': [
            f"{rag_mem:.2f}" if isinstance(rag_mem, (int, float)) else rag_mem,
            f"{baseline_mem:.2f}" if isinstance(baseline_mem, (int, float)) else baseline_mem
        ]
    })

    print("\n" + "="*80)
    print("PERFORMANS METRİKLERİ KARŞILAŞTIRMASI (RAG vs BASELINE)")
    print("="*80)
    print(perf_comparison.to_string(index=False))
    print("="*80)

    fig, axes = plt.subplots(1, 2, figsize=(14, 5))

    models = ['RAG Pipeline', 'Baseline']

    inference_times = [
        rag_inference if isinstance(rag_inference, (int, float)) else 0,
        baseline_inference if isinstance(baseline_inference, (int, float)) else 0
    ]
    memory_usage = [
        rag_mem if isinstance(rag_mem, (int, float)) else 0,
        baseline_mem if isinstance(baseline_mem, (int, float)) else 0
    ]

    # Inference Time
    axes[0].bar(models, inference_times, color=['green', 'blue'], alpha=0.7)
    axes[0].set_ylabel('Toplam Süre (saniye)', fontsize=11)
    axes[0].set_title('Inference Time Karşılaştırması', fontsize=12, fontweight='bold')
    axes[0].grid(axis='y', alpha=0.3)

    for i, (model, time) in enumerate(zip(models, inference_times)):
        axes[0].text(i, time + max(inference_times)*0.02, f"{time:.1f}s", ha='center', fontsize=10)

    # Memory Usage
    axes[1].bar(models, memory_usage, color=['green', 'blue'], alpha=0.7)
    axes[1].set_ylabel('Memory Kullanımı (MB)', fontsize=11)
    axes[1].set_title('Memory Usage Karşılaştırması', fontsize=12, fontweight='bold')
    axes[1].grid(axis='y', alpha=0.3)

    for i, (model, mem) in enumerate(zip(models, memory_usage)):
        axes[1].text(i, mem + abs(max(memory_usage, key=abs))*0.02, f"{mem:.1f}MB", ha='center', fontsize=10)

    plt.tight_layout()
    plt.savefig('results/performance_comparison.png', dpi=300, bbox_inches='tight')
    plt.show()

    print("\nGrafik kaydedildi: results/performance_comparison.png")

except NameError as e:
    print(f"UYARI: Bazı değişkenler tanımlı değil. Lütfen önce cell-29 ve cell-35'i çalıştırın.")
    print(f"Hata: {e}")

---
## 7. Performans Değerlendirme

### 7.1 BLEU ve ROUGE Metrikleri

In [None]:
evaluator = MetricsEvaluator()

comparison = evaluator.compare_models(
    rag_results=rag_results,
    baseline_results=baseline_results,
    ground_truth=test_df
)

evaluator.print_comparison(comparison)

evaluator.save_results(comparison, "results/rag_vs_baseline.json")

### 7.2 Sonuçları Görselleştir

In [None]:
metrics = ['bleu', 'rouge1', 'rouge2', 'rougeL']
rag_scores = [comparison['rag'][m] for m in metrics]
baseline_scores = [comparison['baseline'][m] for m in metrics]

x = range(len(metrics))
width = 0.35

fig, ax = plt.subplots(figsize=(10, 6))
ax.bar([i - width/2 for i in x], rag_scores, width, label='RAG', color='green')
ax.bar([i + width/2 for i in x], baseline_scores, width, label='Baseline', color='blue')

ax.set_xlabel('Metrikler', fontsize=12)
ax.set_ylabel('Skor', fontsize=12)
ax.set_title('RAG vs Baseline Performans Karşılaştırması', fontsize=14, fontweight='bold')
ax.set_xticks(x)
ax.set_xticklabels([m.upper() for m in metrics])
ax.legend()
ax.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.savefig('results/rag_vs_baseline.png', dpi=300, bbox_inches='tight')
plt.show()

print("Grafik kaydedildi: results/rag_vs_baseline.png")

---
## 8. Hiperparametre Optimizasyonu

### 8.1 Grid Search Parametreleri

In [None]:
##NOT BU KISIMDA 2 WORKER'LI PARALLEL RUN ÇALISIYOR. T4 MAKİNEDE ZORLANDI VE HATA ALDI, A100 İLE DEVAM ETMEM GEREKTİ. RESTART EDIP A100 ile devam ettim.

PARENT_SIZES = [2048, 4096]
CHILD_SIZES = [512, 1024]
TEMPERATURES = [0.1, 0.3]
CHUNK_OVERLAPS = [0, 100, 200]

runner = ExperimentRunner(
    book_path=data_paths['book'],
    test_questions_path=data_paths['test'],
    results_dir="results/experiments_v2"
)

print("Grid search başlatılıyor...")
print(f"Toplam index oluşturma: {len(PARENT_SIZES) * len(CHILD_SIZES) * len(CHUNK_OVERLAPS)} kere")
print(f"Toplam deney sayısı: {len(PARENT_SIZES) * len(CHILD_SIZES) * len(TEMPERATURES) * len(CHUNK_OVERLAPS)} deney")
print("\nNOT: Her index bir kere oluşturulup farklı temperature'lerle test edilecek")

all_results = runner.run_grid_search(
    parent_sizes=PARENT_SIZES,
    child_sizes=CHILD_SIZES,
    temperatures=TEMPERATURES,
    chunk_overlaps=CHUNK_OVERLAPS
)

runner.save_summary(all_results, summary_filename="experiment_summary_v2")

### 8.2 En İyi Parametreleri Bul

In [None]:
exp_df = pd.read_csv('results/experiments_v2/experiment_summary_v2.csv')

baseline_row = pd.Series({
    'parent_size': 'Baseline',
    'child_size': 'Baseline',
    'temperature': 0.5,
    'chunk_overlap': 0,
    'bleu': comparison['baseline']['bleu'],
    'rouge1': comparison['baseline']['rouge1'],
    'rouge2': comparison['baseline']['rouge2'],
    'rougeL': comparison['baseline']['rougeL'],
    'avg_question_time': f'{baseline_inference_time/len(questions):.2f}',
    'inference_time': f'{baseline_inference_time:.2f}',
    'memory_used_mb': f'{baseline_memory_used:.2f}',
    'total_time': None
})


exp_df_final = pd.concat([exp_df, baseline_row.to_frame().T], ignore_index=True)



In [None]:
exp_df_final.to_csv("grid_search_results.csv", index=False)
exp_df_final

In [None]:
exp_df_final = exp_df_final.copy()
exp_df_final['experiment_name'] = [f"exp_{i+1}" for i in range(len(exp_df_final))]

import matplotlib.pyplot as plt

metrics = ['bleu', 'rougeL']

plt.figure(figsize=(15,8))

for metric in metrics:
    plt.plot(
        exp_df_final['experiment_name'],
        exp_df_final[metric],
        marker='o',
        linewidth=2,
        label=metric.upper()
    )


    for x, y in zip(exp_df_final['experiment_name'], exp_df_final[metric]):
        plt.text(
            x, y + 0.15,
            f"{y:.2f}",
            fontsize=7,
            rotation=45,
            ha='center'
        )

plt.xlabel("Experiment")
plt.ylabel("Score")
plt.title("BLEU / ROUGE-L Comparison Across Experiments")
plt.grid(True, alpha=0.3)
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

plt.savefig('results/rag_vs_baseline_param_opt.png', dpi=300, bbox_inches='tight')

