# Building the Legal Retrieval Pipeline

Build and test the complete retrieval system with FAISS indexing.

In [1]:
import sys
sys.path.append('..')

import json
from src.retrieval.retriever import load_retriever_from_config

## 1. Load Configuration and Create Retriever

In [2]:
# Load retriever with configuration
retriever = load_retriever_from_config('../configs/retrieval_config.yaml')

print("Retriever loaded successfully!")

Loading embedding model: sentence-transformers/all-mpnet-base-v2
Model loaded on mps
Embedding dimension: 768
Retriever loaded successfully!


## 2. Index Sample Documents

In [3]:
# Load documents
with open('../data/samples/sample_documents.json', 'r') as f:
    documents = json.load(f)

print(f"Loaded {len(documents)} documents")

# Index documents
print("\nIndexing documents...")
retriever.index_documents(documents, chunk_documents=True)

print(f"\n✅ Indexed {retriever.get_num_documents()} document chunks")

Loaded 10 documents

Indexing documents...
Chunking 10 documents...
Created 10 chunks
Generating embeddings for 10 text segments...


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Using CPU index
Created IndexFlatIP index with dimension 768
Added 10 embeddings to index
Total documents in index: 10
Indexing complete. Total documents in index: 10

✅ Indexed 10 document chunks


## 3. Test Retrieval with Queries

In [4]:
# Test queries
test_queries = [
    "What are the elements of negligence?",
    "How is causation established?",
    "What is res ipsa loquitur?"
]

for query in test_queries:
    print(f"\n{'='*80}")
    print(f"Query: {query}")
    print(f"{'='*80}\n")
    
    results = retriever.retrieve(query, top_k=3)
    
    for i, doc in enumerate(results, 1):
        print(f"{i}. Score: {doc['score']:.4f}")
        print(f"   Title: {doc.get('title', 'N/A')}")
        print(f"   {doc['text'][:150]}...\n")


Query: What are the elements of negligence?

1. Score: 0.6484
   Title: Elements of Negligence
   To establish negligence, a plaintiff must prove four essential elements: (1) duty of care, (2) breach of that duty, (3) causation, and (4) damages. Ea...

2. Score: 0.5998
   Title: Causation Requirements
   Causation in negligence requires both actual cause (cause-in-fact) and proximate cause (legal cause). Actual cause is often determined using the but-f...

3. Score: 0.5985
   Title: Professional Negligence Standards
   Professional negligence, or malpractice, occurs when a professional fails to exercise the degree of skill and care ordinarily expected of members of t...


Query: How is causation established?

1. Score: 0.5564
   Title: Causation Requirements
   Causation in negligence requires both actual cause (cause-in-fact) and proximate cause (legal cause). Actual cause is often determined using the but-f...

2. Score: 0.4499
   Title: Elements of Negligence
   To establish neglig

## 4. Save Index for Later Use

In [5]:
# Save index
retriever.save_index('../data/embeddings')
print("\n✅ Index saved to ../data/embeddings")

Index saved to ../data/embeddings/faiss_index.faiss
Documents saved to ../data/embeddings/documents.pkl

✅ Index saved to ../data/embeddings


## 5. Load Index and Test

In [6]:
# Create new retriever and load saved index
new_retriever = load_retriever_from_config('../configs/retrieval_config.yaml')
new_retriever.load_index('../data/embeddings')

print(f"Loaded index with {new_retriever.get_num_documents()} documents")

# Test
query = "What is negligence per se?"
results = new_retriever.retrieve(query, top_k=2)

print(f"\nTest query: {query}")
for i, doc in enumerate(results, 1):
    print(f"\n{i}. Score: {doc['score']:.4f}")
    print(f"   {doc['text'][:100]}...")

Loading embedding model: sentence-transformers/all-mpnet-base-v2
Model loaded on mps
Embedding dimension: 768
Using CPU index
Created IndexFlatIP index with dimension 768
Index loaded from ../data/embeddings/faiss_index.faiss
Total documents in index: 10
Documents loaded from ../data/embeddings/documents.pkl
Loaded index with 10 documents

Test query: What is negligence per se?

1. Score: 0.8270
   Negligence per se occurs when a defendant violates a statute or regulation designed to protect again...

2. Score: 0.6219
   Professional negligence, or malpractice, occurs when a professional fails to exercise the degree of ...


## Summary

Retrieval pipeline complete!
- ✅ Built retrieval system
- ✅ Indexed legal documents
- ✅ Tested queries
- ✅ Saved index for reuse

**Next:** Proceed to `03_self_rag_training.ipynb` to train Self-RAG models