# Legal Case-to-Statute Retrieval System
by: Michael Hecher, Oskar Tillmann Winter and Lukas Tesch

The goal of the project is to develop an LLM that finds the most relevant German laws or specific statutes to a given legal case. The core task is to treat the body of German statutes (BGB, GG, VwGO, etc.) as a knowledge base and the legal case text as a query. The system should return the most semantically similar statutes from the knowledge base.

It is important to note that the model will not predict case outcomes or provide legal conclusions. Such tasks would require specialized legal expertise that we do not have and cannot train the model for. Our focus is strictly on information retrieval, not legal interpretation.

## 1. Set-Up
Here we import all the libraries we use, define configurations for the training of our models and also the base models used.

We are using three models:
- all-mpnet-base-v2
    - General Purpose. Highly effective, robust, and balanced model from the Sentence Transformer family.
- nlpaueb/legal-bert-base-uncased
    - Legal Domain Focus. Standard BERT model pre-trained on a corpus of legal texts. No Sentence Transformer (we add custom pooling later)
- Stern5497/sbert-legal-xlm-roberta-base
    - Legal and Multilingual Focus. Based on XLM-RoBERTa combined with legal domain pre-training. Also a Sentence Transformer.

We also define the html links we use later for html scraping of the law texts here.
 

In [1]:
# 1. INSTALL NECESSARY LIBRARIES
# !pip install pandas faiss-cpu sentence-transformers requests beautifulsoup4 numpy torch tabulate

# 2. IMPORT of libraries and packages
import pandas as pd
import ast
import faiss
import requests
import re
import json
import pickle
from pathlib import Path
from bs4 import BeautifulSoup
from sentence_transformers import SentenceTransformer, InputExample, losses
from torch.utils.data import DataLoader
from typing import List, Dict
import numpy as np
import torch

# 3. CONFIGURATION

CONFIG = {
    "cache_dir": Path("./cache"),
    "output_dir": Path("./output"),
    "base_model": "sentence-transformers/all-mpnet-base-v2", 
    "unsup_epochs": 1,
    "sup_epochs": 1,
    "batch_size": 8,
    "warmup_steps": 100,
    "sample_cases": 1000,
    "sample_citations": 66,
}

# Available base models for experimentation
BASE_MODELS = {
    "mpnet": "sentence-transformers/all-mpnet-base-v2",
    "legal-bert": "nlpaueb/legal-bert-base-uncased",
    "legal-sbert": "Stern5497/sbert-legal-xlm-roberta-base",
}

# Create necessary directories
CONFIG["cache_dir"].mkdir(exist_ok=True)
CONFIG["output_dir"].mkdir(exist_ok=True)

# German Statute Pages (Source of Law Texts)
STATUTE_PAGES = {
    "BGB": "https://www.gesetze-im-internet.de/bgb/BJNR001950896.html",
    "GG": "https://www.gesetze-im-internet.de/gg/BJNR000010949.html",
    "VwGO": "https://www.gesetze-im-internet.de/vwgo/BJNR000170960.html",
    "BauGB": "https://www.gesetze-im-internet.de/bbaug/BJNR003410960.html",
    "AsylG": "https://www.gesetze-im-internet.de/asylvfg_1992/BJNR111260992.html",
    "StGB": "https://www.gesetze-im-internet.de/stgb/BJNR001270871.html",
    "ZPO": "https://www.gesetze-im-internet.de/zpo/BJNR005330950.html",
    "AufenthG": "https://www.gesetze-im-internet.de/aufenthg_2004/BJNR195010004.html",
}

print("Setup complete. Configuration loaded and directories created.")

  from .autonotebook import tqdm as notebook_tqdm


Setup complete. Configuration loaded and directories created.


# 2. Data Preparation

- Here, we load the two datasets we created: one containing raw case texts for unsupervised training and another containing cases with statutory citations for supervised training and testing.
- To support efficient experimentation, we added dataset sampling as well as error handling to detect missing files or malformed inputs.
- We also implemented a robust citation-parsing mechanism that can process JSON strings, Python list literals, delimiter-separated fields, and missing values. This ensures that all citations are normalized into a consistent list-of-strings format for downstream training and evaluation.

In [2]:
# 1. Data Loading with Error Handling

def load_datasets():
    """Load and validate datasets"""
    try:
        cases_df = pd.read_csv("training_cases.csv", sep=";", encoding='utf-8')
        cited_cases_df = pd.read_csv("test_cases.csv", sep=";", encoding='utf-8')
        
        # Validate required columns
        assert "text" in cases_df.columns, "Missing 'text' column in training_cases.csv"
        assert "text" in cited_cases_df.columns, "Missing 'text' column in test_cases.csv"
        assert "citations" in cited_cases_df.columns, "Missing 'citations' column"
        
        # Sample data
        cases_sample = cases_df.sample(
            min(CONFIG["sample_cases"], len(cases_df)), 
            random_state=42
        )["text"].tolist()
        
        cited_cases_sample = cited_cases_df.sample(
            min(CONFIG["sample_citations"], len(cited_cases_df)),
            random_state=42
        )
        
        print(f"Loaded {len(cases_sample)} training cases (unsupervised sample)")
        print(f"Loaded {len(cited_cases_sample)} citation cases (supervised/test sample)")
        
        return cases_sample, cited_cases_sample
        
    except Exception as e:
        print(f"Error loading datasets: {e}")
        raise

# 2. Robust Citation Parsing

def parse_citations(citations_raw) -> List[str]:
    """Safely parse citations from various formats"""
    if pd.isna(citations_raw):
        return []
    
    if isinstance(citations_raw, list):
        return citations_raw
    
    if isinstance(citations_raw, str):
        # Try JSON first
        try:
            parsed = json.loads(citations_raw)
            return parsed if isinstance(parsed, list) else [parsed]
        except json.JSONDecodeError:
            pass
        
        # Try ast.literal_eval
        try:
            parsed = ast.literal_eval(citations_raw)
            return parsed if isinstance(parsed, list) else [parsed]
        except (ValueError, SyntaxError):
            pass
        
        # Fallback: split by common separators as dataset is split by ;
        if ";" in citations_raw:
            return [c.strip() for c in citations_raw.split(";") if c.strip()]
        elif "," in citations_raw:
            return [c.strip() for c in citations_raw.split(",") if c.strip()]
        else:
            return [citations_raw.strip()]
    
    return []

# Load data immediately
try:
    cases_sample, cited_cases_sample = load_datasets()
except:
    print("Data loading failed. Please ensure 'training_cases.csv' and 'test_cases.csv' are present.")

Loaded 1000 training cases (unsupervised sample)
Loaded 66 citation cases (supervised/test sample)


# 3. Model Training

Our system uses a Sentence Transformer, an encoder-only Large Language Model (LLM), and trains it in two stages:
- #### Stage 1: Unsupervised Pre-training
    - add a transformer-based sentence embedding model with support for multiple architectures
    - performing unsupervised contrastive pre-training, where each case text is paired with itself to create positive examples
    - Method: Uses Multiple Negatives Ranking Loss (MNRL)
    - model learns domain-specific semantic representations of legal case language
- #### Stage 2: Supervised Fine-tuning
    - Method: Uses the ground truth pairs from the test_cases.csv and continues training with Multiple Negatives Ranking Loss
    - strengthening the model’s discrimination between relevant and irrelevant legal provisions
    - allows model to capture general linguistic characteristics and specific relationships between cases and their statutes

In [3]:
# 1. Improved Unsupervised Pre-training

def load_base_model(model_name: str = None):
    """Load base model with support for multiple architectures"""
    if model_name is None:
        model_name = CONFIG["base_model"]
    
    print(f"Loading base model: {model_name}")
    
    # Handle Legal-BERT which needs custom pooling
    if "legal-bert" in model_name and "sbert" not in model_name:
        from sentence_transformers import models as st_models
        word_embedding_model = st_models.Transformer(model_name)
        pooling_model = st_models.Pooling(
            word_embedding_model.get_word_embedding_dimension(),
            pooling_mode_mean_tokens=True
        )
        model = SentenceTransformer(modules=[word_embedding_model, pooling_model])
    else:
        # Standard sentence-transformers models
        model = SentenceTransformer(model_name)
    
    return model

def pretrain_model(cases_sample: List[str]):
    """Unsupervised contrastive pre-training"""
    print("\n=== Stage 1: Unsupervised Pre-training ===")
    
    model_path = CONFIG["output_dir"] / "mpnet-unsupervised"
    
    # Load base model
    model = load_base_model()
    
    # Create contrastive examples (text paired with itself)
    unsup_examples = [InputExample(texts=[text, text]) for text in cases_sample]
    unsup_dataloader = DataLoader(
        unsup_examples, 
        shuffle=True, 
        batch_size=CONFIG["batch_size"]
    )
    
    unsup_loss = losses.MultipleNegativesRankingLoss(model)
    
    # Train
    model.fit(
        train_objectives=[(unsup_dataloader, unsup_loss)],
        epochs=CONFIG["unsup_epochs"],
        warmup_steps=CONFIG["warmup_steps"],
        show_progress_bar=True,
        output_path=str(model_path)
    )
    
    print(f"Saved unsupervised model to {model_path}")
    return model


# 2. Supervised Fine-tuning with Hard Negatives

def finetune_model(model, cited_cases_sample):
    """Supervised fine-tuning with case-citation pairs"""
    print("\n=== Stage 2: Supervised Fine-tuning ===")
    
    train_examples = []
    skipped = 0
    
    for _, row in cited_cases_sample.iterrows():
        case_text = row["text"]
        citations = parse_citations(row["citations"])
        
        if not citations:
            skipped += 1
            continue
        
        for citation in citations:
            if isinstance(citation, str) and len(citation) > 3:
                # Label=1.0 indicates a positive (relevant) pair
                train_examples.append(
                    InputExample(texts=[case_text, citation], label=1.0)
                )
    
    print(f"Created {len(train_examples)} training pairs (skipped {skipped} cases)")
    
    if len(train_examples) == 0:
        print("WARNING: No training examples created!")
        return model
    
    train_dataloader = DataLoader(
        train_examples, 
        shuffle=True, 
        batch_size=CONFIG["batch_size"]
    )
    # Using MultipleNegativesRankingLoss treats other items in the batch as 'hard negatives'
    train_loss = losses.MultipleNegativesRankingLoss(model)
    
    model.fit(
        train_objectives=[(train_dataloader, train_loss)],
        epochs=CONFIG["sup_epochs"],
        warmup_steps=CONFIG["warmup_steps"],
        show_progress_bar=True
    )
    
    model_path = CONFIG["output_dir"] / "fine_tuned_case_to_law"
    model.save(str(model_path))
    print(f"Saved fine-tuned model to {model_path}")
    
    return model

# 4. HTML Scraping law texts

- We retrieve German statutory texts (e.g., BGB, GG, StGB) directly from gesetze-im-internet.de using requests and BeautifulSoup.
- Individual legal provisions are identified by extracting section markers such as “§ 433 BGB” or “Art. 5 GG” from the HTML structure, using element IDs, heading tags (h1–h4, strong, b, span) and regular-expression matching.
    - h1–h4 are heading levels: h1 is a main title, h4 a smaller subsection.
    - strong and b make text bold.
    - span is a general inline container often used to highlight or structure text
- The associated paragraph content is then collected, cleaned, and normalized to form coherent statute sections.
- When the structured extraction yields insufficient results, a fallback strategy based on splitting the raw HTML by “§” markers is used to recover additional provisions.
- All extracted statutes are kept in a standardized format—including citation, text and law abbreviation and a minimal fallback corpus is provided to ensure system robustness in cases where scraping fails.

In [4]:
# HTML Scraping

def fetch_statute_sections(abbr: str, url: str, use_cache=True) -> List[Dict]:
    """Fetch statute sections with caching and improved parsing"""
    cache_file = CONFIG["cache_dir"] / f"{abbr}_sections.pkl"
    
    # Try cache first
    if use_cache and cache_file.exists():
        print(f"Loading {abbr} from cache...")
        with open(cache_file, "rb") as f:
            return pickle.load(f)
    
    try:
        print(f"Fetching {abbr} from web...")
        resp = requests.get(url, timeout=30)
        
        if resp.status_code != 200:
            print(f"Could not fetch {abbr} (status {resp.status_code})")
            return []
        
        soup = BeautifulSoup(resp.text, "html.parser")
        sections = []
        
        # Strategy 1: Look for section numbers in headings or IDs
        for para in soup.find_all(['div', 'article', 'section']):
            title = None
            if para.get('id'):
                section_id = para.get('id', '')
                match = re.search(r'§\s*(\d+[a-z]*)', section_id, re.IGNORECASE)
                if match:
                    title = f"§ {match.group(1)} {abbr}"
            
            if not title:
                for heading_tag in ['h1', 'h2', 'h3', 'h4', 'strong', 'b', 'span']:
                    heading = para.find(heading_tag)
                    if heading:
                        heading_text = heading.get_text(strip=True)
                        match = re.search(r'(§|Art\.|Artikel)\s*(\d+[a-z]*)', heading_text)
                        if match:
                            section_marker = match.group(1)
                            section_num = match.group(2)
                            title = f"{section_marker} {section_num}"
                            if abbr not in title:
                                title = f"{title} {abbr}"
                            break
            
            # Extract text content
            if title:
                texts = []
                for p in para.find_all(['p', 'div'], recursive=True):
                    text = p.get_text(" ", strip=True)
                    if len(text) > 20:
                        texts.append(text)
                
                body = " ".join(texts).strip()
                body = re.sub(r'\s+', ' ', body)
                
                if body.startswith(title):
                    body = body[len(title):].strip()
                
                if len(body) > 50:
                    sections.append({
                        "citation": title,
                        "text": body,
                        "law": abbr
                    })
        
        # Strategy 2: Fallback - split by § markers in raw text
        if len(sections) < 10:
            print(f"  Primary extraction found only {len(sections)} sections, trying fallback...")
            raw_text = soup.get_text()
            parts = re.split(r'(§\s*\d+[a-z]*)', raw_text)
            
            for i in range(1, len(parts), 2):
                if i+1 < len(parts):
                    section_marker = parts[i].strip()
                    section_text = parts[i+1].strip()
                    section_marker = re.sub(r'§(\d)', r'§ \1', section_marker)
                    section_text = re.sub(r'\s+', ' ', section_text)
                    section_text = section_text[:1000]
                    
                    if len(section_text) > 100:
                        sections.append({
                            "citation": f"{section_marker} {abbr}",
                            "text": section_text,
                            "law": abbr
                        })
        
        print(f"Fetched {len(sections)} sections from {abbr}")
        
        # Cache results
        if sections:
            with open(cache_file, "wb") as f:
                pickle.dump(sections, f)
        
        return sections
        
    except Exception as e:
        print(f"Error fetching {abbr}: {e}")
        return []

def load_all_statutes(use_cache=True) -> List[Dict]:
    """Load all statute sections with fallback"""
    all_sections = []
    
    for law, url in STATUTE_PAGES.items():
        sections = fetch_statute_sections(law, url, use_cache)
        all_sections.extend(sections)
    
    # Fallback if scraping fails
    if not all_sections:
        print("\n⚠️ No laws fetched, using fallback corpus...")
        all_sections = [
            {"citation": "Art. 1 GG", "text": "Die Würde des Menschen ist unantastbar. Sie zu achten und zu schützen ist Verpflichtung aller staatlicher Gewalt.", "law": "GG"},
            {"citation": "§ 433 BGB", "text": "Durch den Kaufvertrag wird der Verkäufer verpflichtet, dem Käufer die Sache zu übergeben und das Eigentum zu verschaffen.", "law": "BGB"},
            {"citation": "§ 80 VwGO", "text": "Widerspruch und Anfechtungsklage haben aufschiebende Wirkung.", "law": "VwGO"},
        ]
    
    print(f"\nTotal sections loaded: {len(all_sections)}")
    return all_sections


# Building the FAISS-Based Retrieval Index
- All extracted law sections are encoded into dense embeddings using the trained Sentence Transformer
- embeddings are normalized and stored in a FAISS Index, which performs similarity search using the inner product
- esulting index contains one vector per statute section and allows rapid nearest-neighbor queries during retrieval
- step is essential because semantic retrieval becomes computationally expensive when comparing every query against hundreds or thousands of statute sections
- Without FAISS, each retrieval would require a full pairwise comparison between the case embedding and all law embeddings, resulting in significant slowdown
- FAISS solves this by organizing embeddings into a highly optimized vector index, enabling efficient similarity search at scale and ensuring the system can return relevant legal provisions in milliseconds

In [5]:
# FAISS

def build_index(model, all_sections: List[Dict]):
    """Build FAISS index for retrieval with optimizations"""
    print("\n=== Building FAISS Index ===")
    
    laws = [s["text"] for s in all_sections]
    print(f"Encoding {len(laws)} statute sections...")
    
    import time
    start_time = time.time()
    
    # Use GPU if available
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    
    # Encode with batching
    law_emb = model.encode(
        laws,
        batch_size=CONFIG["batch_size"] * 2,
        convert_to_numpy=True,
        normalize_embeddings=True,
        show_progress_bar=True,
        device=device
    )
    
    encoding_time = time.time() - start_time
    print(f"Encoding took {encoding_time:.2f} seconds")
    
    # Create index (IndexFlatIP uses Inner Product for speed/simplicity)
    index = faiss.IndexFlatIP(law_emb.shape[1])
    index.add(law_emb)
    
    print(f"Index built with {index.ntotal} vectors")
    return index, law_emb

# Retrieval and Evaluation
- full retrieval and evaluation pipeline
- retriever encodes an input case text into an embedding and queries the FAISS index to obtain the top-k most similar statutory provisions
- For each retrieved result, system returns:
      - citation
      - law abbreviation
      - similarity score
      - text snippet
- evaluation module assesses retrieval quality using multiple ranking-based metrics, including Recall@k (top few results), Precision@k (top few results), F1, Hit Rate, MRR, and MAP
- compares the retrieved citations against ground-truth citations from the test dataset using a fuzzy citation-matching function that tolerates subsection differences (e.g., matching “§ 60 AufenthG” with “§ 60 Abs. 5 AufenthG”)
- computing metric
- system includes error and warning logging
- records missing citations (errors) and correctly retrieved citations that appear at low ranks (warnings)
- providing detailed per-case summaries and highlighting systematic weaknesses

This part enables us to deeper qualitative analysis beyond numerical metrics and helps identify failure patterns in the model’s decision process. ALso it helps to evaulate errors to further improve our model in the future.

In [6]:
class LegalRetriever:
    def __init__(self, model, index, sections):
        self.model = model
        self.index = index
        self.sections = sections
    
    def retrieve(self, query: str, k: int = 5) -> List[Dict]:
        """Retrieve top-k relevant statutes"""
        qvec = self.model.encode(
            [query],
            convert_to_numpy=True,
            normalize_embeddings=True
        )
        
        # Search FAISS index
        D, I = self.index.search(qvec, k=k)
        
        results = []
        for j, idx in enumerate(I[0]):
            section = self.sections[idx]
            results.append({
                "citation": section["citation"],
                "law": section["law"],
                "text": section["text"],
                "text_snippet": section["text"][:300] + "...",
                "score": float(D[0][j]),
                "rank": j + 1
            })
        
        return results
    
    def evaluate(self, test_cases_df, k=5):
        """
        Comprehensive evaluation with multiple metrics and error/warning tracking.
        """
        print(f"\n=== Evaluating on {len(test_cases_df)} test cases ===")
        
        recall_hits = 0
        total_relevant = 0
        hit_rate_count = 0
        valid_queries = 0
        reciprocal_ranks = []
        average_precisions = []
        
        # Tracking for errors/warnings
        evaluation_log = []
        
        for idx, row in test_cases_df.iterrows():
            case_text = row["text"]
            true_citations = parse_citations(row["citations"])
            
            if not true_citations:
                continue
            
            valid_queries += 1
            results = self.retrieve(case_text, k=k)
            retrieved_citations = [r["citation"] for r in results]
            
            found_positions = []
            case_errors = []
            case_warnings = []
            
            # Check retrieval for each true citation
            for true_cit in true_citations:
                total_relevant += 1
                hit = False
                
                for rank, ret_cit in enumerate(retrieved_citations, 1):
                    if self._is_citation_match(true_cit, ret_cit):
                        recall_hits += 1
                        hit = True
                        if rank not in found_positions:
                            found_positions.append(rank)
                        
                        # Warning: Correct citation found, but ranked low (e.g., Rank 3 or higher)
                        if rank > 2:
                             case_warnings.append(f"Low Rank ({rank}) for {true_cit}")
                        break
                
                # Error: Citation was not found in the top k results
                if not hit:
                    case_errors.append(f"MISSING: {true_cit}")
            
            # Log the case results if errors or warnings exist
            if case_errors or case_warnings:
                evaluation_log.append({
                    'case_id': idx,
                    'case_snippet': case_text[:100] + '...',
                    'true_citations': true_citations,
                    'retrieved_citations': retrieved_citations,
                    'errors': case_errors,
                    'warnings': case_warnings,
                    'first_hit_rank': min(found_positions) if found_positions else 0
                })
            
            # Hit Rate
            if found_positions:
                hit_rate_count += 1
            
            # MRR (Mean Reciprocal Rank)
            if found_positions:
                reciprocal_ranks.append(1.0 / min(found_positions))
            else:
                reciprocal_ranks.append(0.0)
            
            # MAP (Mean Average Precision)
            if found_positions:
                precisions_at_relevant = []
                for pos in sorted(found_positions):
                    relevant_in_top_pos = sum(1 for p in found_positions if p <= pos)
                    precision_at_pos = relevant_in_top_pos / pos
                    precisions_at_relevant.append(precision_at_pos)
                average_precisions.append(np.mean(precisions_at_relevant))
            else:
                average_precisions.append(0.0)

        # Calculate final metrics
        precision_hits = recall_hits
        total_retrieved = valid_queries * k
        
        recall_at_k = recall_hits / total_relevant if total_relevant > 0 else 0
        precision_at_k = precision_hits / total_retrieved if total_retrieved > 0 else 0
        f1_at_k = 2 * (precision_at_k * recall_at_k) / (precision_at_k + recall_at_k) if (precision_at_k + recall_at_k) > 0 else 0
        hit_rate = hit_rate_count / valid_queries if valid_queries > 0 else 0
        mrr = np.mean(reciprocal_ranks) if reciprocal_ranks else 0
        map_score = np.mean(average_precisions) if average_precisions else 0
        
        # --- Print Results and Error Summary ---
        
        # Print main results table
        print(f"\n{'Metric':<25} {'Value':<10} {'Description'}")
        print(f"{'-'*70}")
        print(f"{'Recall@' + str(k):<25} {recall_at_k:.4f}        {recall_hits}/{total_relevant} relevant items found")
        print(f"{'Precision@' + str(k):<25} {precision_at_k:.4f}        {precision_hits}/{total_retrieved} retrieved items relevant")
        print(f"{'F1@' + str(k):<25} {f1_at_k:.4f}        Harmonic mean of P and R")
        print(f"{'Hit Rate@' + str(k):<25} {hit_rate:.4f}        {hit_rate_count}/{valid_queries} queries with $\\geq$1 hit")
        print(f"{'MRR (Mean Recip. Rank)':<25} {mrr:.4f}        Average 1/rank of first hit")
        print(f"{'MAP (Mean Avg. Prec.)':<25} {map_score:.4f}        Precision averaged over ranks")

        # Error Summary (with limit set to 5 cases)
        total_errors = sum(len(log['errors']) for log in evaluation_log)
        total_warnings = sum(len(log['warnings']) for log in evaluation_log)
        
        if total_errors > 0 or total_warnings > 0:
            print("\n" + "="*70)
            print("**ERROR AND WARNING SUMMARY**")
            print("="*70)
            print(f"Total Cases Evaluated: {valid_queries}")
            print(f"Cases with Errors (Missed Citations): {len([log for log in evaluation_log if log['errors']])}")
            print(f"Total MISSING Citations (Errors): {total_errors}")
            print(f"Total Low-Rank Citations (Warnings): {total_warnings}")
            
            if total_errors > 0:
                 print("\n**Sample Cases with MISSING Citations (Top 5):**")
                 for i, log in enumerate([log for log in evaluation_log if log['errors']]):
                      if i >= 5: break # Limit output to 5 cases
                      print(f"  - Case ID {log['case_id']}: {log['errors']}")
                      print(f"    - True Citations: {log['true_citations']} | Retrieved Citations: {log['retrieved_citations']}")
        
        return {
            'recall_at_k': recall_at_k,
            'precision_at_k': precision_at_k,
            'f1_at_k': f1_at_k,
            'hit_rate': hit_rate,
            'mrr': mrr,
            'map': map_score
        }
    
    def _is_citation_match(self, true_cit: str, retrieved_cit: str) -> bool:
        """Check if citations match (fuzzy matching with subsection tolerance)"""
        # NOTE: STATUTE_PAGES must be globally accessible or passed here
        law_variations = {k.lower(): k.lower() for k in STATUTE_PAGES.keys()}
        
        true_lower = true_cit.lower().strip()
        ret_lower = retrieved_cit.lower().strip()
        
        for variation, canonical in law_variations.items():
            true_lower = true_lower.replace(variation, canonical)
            ret_lower = ret_lower.replace(variation, canonical)
        
        # Extract section numbers
        true_sections = re.findall(r'§\s*(\d+[a-z]?)', true_lower)
        ret_sections = re.findall(r'§\s*(\d+[a-z]?)', ret_lower)
        
        # Extract law book
        true_law = next((law for law in law_variations.values() if law in true_lower), None)
        ret_law = next((law for law in law_variations.values() if law in ret_lower), None)
        
        # Match if SAME section number AND SAME law book (allowing for subsection differences)
        if true_sections and ret_sections and true_law and ret_law:
            for ts in true_sections:
                for rs in ret_sections:
                    if ts == rs and true_law == ret_law:
                        return True
        
        return False

# Model Comparison
Here we implemented a multi-model evaluation to determine which base model achieves the strongest performance in our task.

1. Baseline (Zero-Shot Evaluation):

   Each model is first evaluated without any additional training to establish its initial performance on the test set.
3. Training (Unsupervised + Supervised):

   The model is then trained using our two-stage pipeline—unsupervised contrastive pre-training on raw case texts, followed by supervised fine-tuning on case–citation pairs.
4. Fine-Tuned Evaluation:

   After training, the updated model is re-evaluated using the same metrics

For each model, a comparison table is generated showing baseline performance, fine-tuned performance, and the percentage improvement for every metric. Our system tracks which model achieves the highest F1@5 score and selects this model as the final retrieval system. The chosen model is re-indexed using FAISS and returned as the best-performing model.

In [7]:
def compare_models(cases_sample, cited_cases_sample):
    """
    Iterates through all models in BASE_MODELS, runs baseline and fine-tuned 
    evaluation, displays the performance report for each, and returns the 
    LegalRetriever for the model with the best overall F1@5 score.
    """
    
    best_f1 = -1
    best_retriever = None
    best_model_name = ""
    
    all_sections = load_all_statutes(use_cache=True)
    
    for name, model_path in BASE_MODELS.items():
        print("\n" + "#" * 70)
        print(f"STARTING FULL COMPARISON FOR: {name} ({model_path})")
        print("#" * 70)

        # 1. --- BASELINE STAGE (ZERO-SHOT) ---
        baseline_model = load_base_model(model_path)
        baseline_metrics = run_evaluation_stage(
            baseline_model, 
            all_sections, 
            cited_cases_sample, 
            f"Baseline ({name})"
        )
        
        # 2. --- TRAINING STAGES ---
        # Start training from the current baseline model state
        trained_model = pretrain_model(cases_sample) 
        trained_model = finetune_model(trained_model, cited_cases_sample)
        
        # 3. --- FINE-TUNED STAGE (FINAL) ---
        finetuned_metrics = run_evaluation_stage(
            trained_model, 
            all_sections, 
            cited_cases_sample, 
            f"Fine-Tuned ({name})"
        )
        
        # 4. --- DETAILED COMPARISON TABLE ---
        print("\n" + "="*70)
        print(f"**PERFORMANCE IMPROVEMENT SUMMARY: {name}**")
        print("="*70)
        
        df_metrics = pd.DataFrame({
            'Metric': ['Recall@5', 'Precision@5', 'F1@5', 'MRR', 'MAP'],
            'Baseline': [
                baseline_metrics['recall_at_k'], 
                baseline_metrics['precision_at_k'], 
                baseline_metrics['f1_at_k'], 
                baseline_metrics['mrr'], 
                baseline_metrics['map']
            ],
            'Fine-Tuned': [
                finetuned_metrics['recall_at_k'], 
                finetuned_metrics['precision_at_k'], 
                finetuned_metrics['f1_at_k'], 
                finetuned_metrics['mrr'], 
                finetuned_metrics['map']
            ],
        })
        
        df_metrics['Improvement (%)'] = (
            (df_metrics['Fine-Tuned'] - df_metrics['Baseline']) / df_metrics['Baseline']
        ) * 100
        df_metrics = df_metrics.set_index('Metric').round(4)
        print(df_metrics.to_markdown(floatfmt=".4f"))
        
        # 5. --- CHECK FOR BEST OVERALL MODEL ---
        current_f1 = finetuned_metrics['f1_at_k']
        
        if current_f1 > best_f1:
            best_f1 = current_f1
            # Re-index one last time for the final retriever object
            final_index, _ = build_index(trained_model, all_sections)
            best_retriever = LegalRetriever(trained_model, final_index, all_sections)
            best_model_name = name
            
    print("\n" + "=" * 70)
    print(f"**BEST OVERALL MODEL SELECTED: {best_model_name} (F1@5: {best_f1:.4f})**")
    print("=" * 70)

    return best_retriever

# Final pipeline
This part is our heart and the final implementation of our project. Here we evaluate and compare how different models perform on the task of retrieving legal laws. It allows us to measure the effect of fine-tuning and to select the best-performing model based on F1@5. 

## Single-Model Evaluation:
- Compares the default base model in its baseline (zero-shot) form against its fully fine-tuned version.
- This demonstrates whether the two-stage training procedure improves retrieval performance.
## Multi-Model Comparison:
- Iterates over all models defined in BASE_MODELS (e.g., MPNet, Legal-BERT, Legal-SBERT), evaluating each both before and after fine-tuning.

For every model, the system:
1. Builds a fresh FAISS index
2. Runs an evaluation stage
3. Computes Recall@5, Precision@5, F1@5, MRR, and MAP
4. Calculates percentage improvements
5. Produces a formatted performance comparison table
6. Logs errors and warnings from retrieval

After selecting the best model, the pipeline re-builds a final FAISS index and demonstrates example retrieval outputs.

In [8]:
# NOTE: The following required functions/classes are assumed to be defined in preceding cells:
# - CONFIG (dictionary with base_model)
# - cases_sample, cited_cases_sample (loaded dataframes)
# - load_all_statutes(), load_base_model(), pretrain_model(), finetune_model()
# - build_index()
# - LegalRetriever (class with evaluate and retrieve methods)


def run_evaluation_stage(model, all_sections, cited_cases_sample, stage_name: str):
    """Encodes laws and runs evaluation for a given model stage"""
    print(f"\n--- Evaluation Stage: **{stage_name}** ---")
    
    # 1. Build Index with the current model state
    # NOTE: Need to re-build the index every time the model changes!
    # Assuming build_index returns (index, corpus_embeddings)
    index, _ = build_index(model, all_sections)
    
    # 2. Create Retriever
    retriever = LegalRetriever(model, index, all_sections)
    
    # 3. Evaluate
    # Assuming LegalRetriever.evaluate returns a dictionary of metrics
    metrics = retriever.evaluate(cited_cases_sample, k=5)
    
    return metrics

def compare_baseline_to_finetuned(base_model_name: str, cases_sample, cited_cases_sample):
    """Compares the performance of the model before and after fine-tuning."""
    
    print("\n" + "="*70)
    print(f" **STARTING COMPARISON FOR: {base_model_name}**")
    print("="*70)
    
    # 1. Load Statutes (assumes function is defined in a previous cell)
    all_sections = load_all_statutes(use_cache=True)
    
    # --- BASELINE STAGE (ZERO-SHOT) ---
    # Load the model directly from the HuggingFace Hub (no training yet)
    baseline_model = load_base_model(base_model_name)
    baseline_metrics = run_evaluation_stage(
        baseline_model, 
        all_sections, 
        cited_cases_sample, 
        "Baseline (Zero-Shot)"
    )
    
    # --- TRAINING STAGES ---
    
    # 2. Stage 1: Unsupervised pre-training
    trained_model = pretrain_model(cases_sample)
    
    # 3. Stage 2: Supervised fine-tuning (The full training process)
    # The final model after all training steps
    trained_model = finetune_model(trained_model, cited_cases_sample)
    
    # --- FINE-TUNED STAGE (FINAL) ---
    finetuned_metrics = run_evaluation_stage(
        trained_model, 
        all_sections, 
        cited_cases_sample, 
        "Fine-Tuned (Final)"
    )

    # 4. FINAL COMPARISON TABLE
    print("\n" + "="*70)
    print("**PERFORMANCE IMPROVEMENT SUMMARY**")
    print("="*70)
    
    df_metrics = pd.DataFrame({
        'Metric': ['Recall@5', 'Precision@5', 'F1@5', 'MRR', 'MAP'],
        'Baseline': [
            baseline_metrics['recall_at_k'], 
            baseline_metrics['precision_at_k'], 
            baseline_metrics['f1_at_k'], 
            baseline_metrics['mrr'], 
            baseline_metrics['map']
        ],
        'Fine-Tuned': [
            finetuned_metrics['recall_at_k'], 
            finetuned_metrics['precision_at_k'], 
            finetuned_metrics['f1_at_k'], 
            finetuned_metrics['mrr'], 
            finetuned_metrics['map']
        ],
    })
    
    # Calculate percentage change
    df_metrics['Improvement (%)'] = (
        (df_metrics['Fine-Tuned'] - df_metrics['Baseline']) / df_metrics['Baseline']
    ) * 100
    df_metrics = df_metrics.set_index('Metric').round(4)

    # Print a nicely formatted comparison table
    print(df_metrics.to_markdown(floatfmt=".4f"))

    # Return the final retriever for the demo (needs to re-index with the final model weights)
    final_index, _ = build_index(trained_model, all_sections)
    final_retriever = LegalRetriever(trained_model, final_index, all_sections)
    return final_retriever



def main_comparison_pipeline():
    print("=== Legal Case-to-Statute Retrieval System: Full Comparison ===")
    print("=" * 70)
    
    # Ensure data is loaded
    if 'cases_sample' not in globals() or 'cited_cases_sample' not in globals():
        print("\nPipeline stopped: Data samples were not loaded successfully in a previous cell.")
        return None

    # --- Interactive Choice ---
    # Check if the multi-model comparison function is available
    multi_model_available = 'compare_models' in globals()
    
    if multi_model_available:
        mode = input("Compare ALL base models (y/n)? (Default is 'n', runs Baseline vs. Fine-Tuned for default model): ").strip().lower()
    else:
        print("Note: Multi-model comparison ('y' option) is unavailable. Defaulting to single-model comparison.")
        mode = 'n'
        
    retriever = None

    if mode == 'y' and multi_model_available:
        # Option 1: Compare all base models
        print("\nRunning Multi-Model Comparison...")
        retriever = compare_models(cases_sample, cited_cases_sample)
        if retriever is not None:
             print("\nUsing the best model found during the full comparison for the final demo.")
        
    if mode == 'n' or retriever is None:
        # Option 2: Use single model (default) and run Baseline vs. Fine-Tuned
        default_model = CONFIG["base_model"]
        print(f"\nRunning Baseline vs. Fine-Tuned comparison for default model: **{default_model}**")
        retriever = compare_baseline_to_finetuned(
            default_model, 
            cases_sample, 
            cited_cases_sample
        )
    
    if retriever is None:
         print("\nPipeline stopped: No retriever was successfully generated.")
         return None
         
    # --- Final Demo ---
    print("\n=== Example Retrieval using **Fine-Tuned Model** ===")
    sample_case = cited_cases_sample.iloc[0]["text"]
    true_citations = cited_cases_sample.iloc[0]["citations"]
    print(f"**True Citations:** {true_citations}")
    print(f"**Query (Case Text):** {sample_case[:200]}...\n")
    
    results = retriever.retrieve(sample_case, k=3)
    for r in results:
        print(f"[{r['rank']}] {r['citation']} (score={r['score']:.3f})")
        print(f"    Snippet: {r['text_snippet']}\n")
    
    return retriever


if __name__ == "__main__":
    # Execute the new main function for comparison
    retriever = main_comparison_pipeline()

=== Legal Case-to-Statute Retrieval System: Full Comparison ===


Compare ALL base models (y/n)? (Default is 'n', runs Baseline vs. Fine-Tuned for default model):  y



Running Multi-Model Comparison...
Loading BGB from cache...
Loading GG from cache...
Loading VwGO from cache...
Loading BauGB from cache...
Loading AsylG from cache...
Loading StGB from cache...
Loading ZPO from cache...
Loading AufenthG from cache...

Total sections loaded: 4975

######################################################################
STARTING FULL COMPARISON FOR: mpnet (sentence-transformers/all-mpnet-base-v2)
######################################################################
Loading base model: sentence-transformers/all-mpnet-base-v2

--- Evaluation Stage: **Baseline (mpnet)** ---

=== Building FAISS Index ===
Encoding 4975 statute sections...


Batches: 100%|███████████████████████████████████████████████████████████████████████| 311/311 [22:28<00:00,  4.34s/it]


Encoding took 1349.50 seconds
Index built with 4975 vectors

=== Evaluating on 66 test cases ===

Metric                    Value      Description
----------------------------------------------------------------------
Recall@5                  0.0075        22/2927 relevant items found
Precision@5               0.0667        22/330 retrieved items relevant
F1@5                      0.0135        Harmonic mean of P and R
Hit Rate@5                0.1515        10/66 queries with $\geq$1 hit
MRR (Mean Recip. Rank)    0.0687        Average 1/rank of first hit
MAP (Mean Avg. Prec.)     0.0687        Precision averaged over ranks

Total Cases Evaluated: 66
Cases with Errors (Missed Citations): 66
Total MISSING Citations (Errors): 2905

**Sample Cases with MISSING Citations (Top 5):**
  - Case ID 95: ['MISSING: § 144 Abs. 1 Nr. 1 BauGB', 'MISSING: § 80 Abs. 5 Satz 1 VwGO', 'MISSING: § 80 Abs. 2 Satz 1 Nr. 4 VwGO', 'MISSING: § 80 Abs. 3 Satz 1 VwGO', 'MISSING: § 80 Abs. 5 Satz 1 VwGO', 'MISSI



Step,Training Loss


Saved unsupervised model to output\mpnet-unsupervised

=== Stage 2: Supervised Fine-tuning ===
Created 2927 training pairs (skipped 0 cases)


Step,Training Loss


Saved fine-tuned model to output\fine_tuned_case_to_law

--- Evaluation Stage: **Fine-Tuned (mpnet)** ---

=== Building FAISS Index ===
Encoding 4975 statute sections...


Batches: 100%|███████████████████████████████████████████████████████████████████████| 311/311 [23:06<00:00,  4.46s/it]


Encoding took 1387.53 seconds
Index built with 4975 vectors

=== Evaluating on 66 test cases ===

Metric                    Value      Description
----------------------------------------------------------------------
Recall@5                  0.0307        90/2927 relevant items found
Precision@5               0.2727        90/330 retrieved items relevant
F1@5                      0.0553        Harmonic mean of P and R
Hit Rate@5                0.3182        21/66 queries with $\geq$1 hit
MRR (Mean Recip. Rank)    0.2652        Average 1/rank of first hit
MAP (Mean Avg. Prec.)     0.2467        Precision averaged over ranks

Total Cases Evaluated: 66
Cases with Errors (Missed Citations): 66
Total MISSING Citations (Errors): 2837

**Sample Cases with MISSING Citations (Top 5):**
  - Case ID 95: ['MISSING: § 144 Abs. 1 Nr. 1 BauGB', 'MISSING: § 80 Abs. 5 Satz 1 VwGO', 'MISSING: § 80 Abs. 2 Satz 1 Nr. 4 VwGO', 'MISSING: § 80 Abs. 3 Satz 1 VwGO', 'MISSING: § 80 Abs. 5 Satz 1 VwGO', 'MISSI

Batches: 100%|███████████████████████████████████████████████████████████████████████| 311/311 [23:05<00:00,  4.46s/it]


Encoding took 1386.85 seconds
Index built with 4975 vectors

######################################################################
STARTING FULL COMPARISON FOR: legal-bert (nlpaueb/legal-bert-base-uncased)
######################################################################
Loading base model: nlpaueb/legal-bert-base-uncased

--- Evaluation Stage: **Baseline (legal-bert)** ---

=== Building FAISS Index ===
Encoding 4975 statute sections...


Batches: 100%|███████████████████████████████████████████████████████████████████████| 311/311 [27:21<00:00,  5.28s/it]


Encoding took 1642.23 seconds
Index built with 4975 vectors

=== Evaluating on 66 test cases ===

Metric                    Value      Description
----------------------------------------------------------------------
Recall@5                  0.0120        35/2927 relevant items found
Precision@5               0.1061        35/330 retrieved items relevant
F1@5                      0.0215        Harmonic mean of P and R
Hit Rate@5                0.1515        10/66 queries with $\geq$1 hit
MRR (Mean Recip. Rank)    0.0876        Average 1/rank of first hit
MAP (Mean Avg. Prec.)     0.0876        Precision averaged over ranks

Total Cases Evaluated: 66
Cases with Errors (Missed Citations): 66
Total MISSING Citations (Errors): 2892

**Sample Cases with MISSING Citations (Top 5):**
  - Case ID 95: ['MISSING: § 144 Abs. 1 Nr. 1 BauGB', 'MISSING: § 80 Abs. 5 Satz 1 VwGO', 'MISSING: § 80 Abs. 2 Satz 1 Nr. 4 VwGO', 'MISSING: § 80 Abs. 3 Satz 1 VwGO', 'MISSING: § 80 Abs. 5 Satz 1 VwGO', 'MISSI



Step,Training Loss


Saved unsupervised model to output\mpnet-unsupervised

=== Stage 2: Supervised Fine-tuning ===
Created 2927 training pairs (skipped 0 cases)


Step,Training Loss


Saved fine-tuned model to output\fine_tuned_case_to_law

--- Evaluation Stage: **Fine-Tuned (legal-bert)** ---

=== Building FAISS Index ===
Encoding 4975 statute sections...


Batches: 100%|███████████████████████████████████████████████████████████████████████| 311/311 [23:09<00:00,  4.47s/it]


Encoding took 1390.53 seconds
Index built with 4975 vectors

=== Evaluating on 66 test cases ===

Metric                    Value      Description
----------------------------------------------------------------------
Recall@5                  0.0212        62/2927 relevant items found
Precision@5               0.1879        62/330 retrieved items relevant
F1@5                      0.0381        Harmonic mean of P and R
Hit Rate@5                0.2727        18/66 queries with $\geq$1 hit
MRR (Mean Recip. Rank)    0.1371        Average 1/rank of first hit
MAP (Mean Avg. Prec.)     0.1376        Precision averaged over ranks

Total Cases Evaluated: 66
Cases with Errors (Missed Citations): 66
Total MISSING Citations (Errors): 2865

**Sample Cases with MISSING Citations (Top 5):**
  - Case ID 95: ['MISSING: § 144 Abs. 1 Nr. 1 BauGB', 'MISSING: § 80 Abs. 5 Satz 1 VwGO', 'MISSING: § 80 Abs. 2 Satz 1 Nr. 4 VwGO', 'MISSING: § 80 Abs. 3 Satz 1 VwGO', 'MISSING: § 80 Abs. 5 Satz 1 VwGO', 'MISSI

Batches: 100%|███████████████████████████████████████████████████████████████████████| 311/311 [24:21<00:00,  4.70s/it]


Encoding took 1462.09 seconds
Index built with 4975 vectors

=== Evaluating on 66 test cases ===

Metric                    Value      Description
----------------------------------------------------------------------
Recall@5                  0.0102        30/2927 relevant items found
Precision@5               0.0909        30/330 retrieved items relevant
F1@5                      0.0184        Harmonic mean of P and R
Hit Rate@5                0.1667        11/66 queries with $\geq$1 hit
MRR (Mean Recip. Rank)    0.1096        Average 1/rank of first hit
MAP (Mean Avg. Prec.)     0.1071        Precision averaged over ranks

Total Cases Evaluated: 66
Cases with Errors (Missed Citations): 66
Total MISSING Citations (Errors): 2897

**Sample Cases with MISSING Citations (Top 5):**
  - Case ID 95: ['MISSING: § 144 Abs. 1 Nr. 1 BauGB', 'MISSING: § 80 Abs. 5 Satz 1 VwGO', 'MISSING: § 80 Abs. 2 Satz 1 Nr. 4 VwGO', 'MISSING: § 80 Abs. 3 Satz 1 VwGO', 'MISSING: § 80 Abs. 5 Satz 1 VwGO', 'MISSI



Step,Training Loss


Saved unsupervised model to output\mpnet-unsupervised

=== Stage 2: Supervised Fine-tuning ===
Created 2927 training pairs (skipped 0 cases)


Step,Training Loss


Saved fine-tuned model to output\fine_tuned_case_to_law

--- Evaluation Stage: **Fine-Tuned (legal-sbert)** ---

=== Building FAISS Index ===
Encoding 4975 statute sections...


Batches: 100%|███████████████████████████████████████████████| 311/311 [25:03<00:00,  4.84s/it]


Encoding took 1505.08 seconds
Index built with 4975 vectors

=== Evaluating on 66 test cases ===

Metric                    Value      Description
----------------------------------------------------------------------
Recall@5                  0.0212        62/2927 relevant items found
Precision@5               0.1879        62/330 retrieved items relevant
F1@5                      0.0381        Harmonic mean of P and R
Hit Rate@5                0.2727        18/66 queries with $\geq$1 hit
MRR (Mean Recip. Rank)    0.1371        Average 1/rank of first hit
MAP (Mean Avg. Prec.)     0.1376        Precision averaged over ranks

Total Cases Evaluated: 66
Cases with Errors (Missed Citations): 66
Total MISSING Citations (Errors): 2865

**Sample Cases with MISSING Citations (Top 5):**
  - Case ID 95: ['MISSING: § 144 Abs. 1 Nr. 1 BauGB', 'MISSING: § 80 Abs. 5 Satz 1 VwGO', 'MISSING: § 80 Abs. 2 Satz 1 Nr. 4 VwGO', 'MISSING: § 80 Abs. 3 Satz 1 VwGO', 'MISSING: § 80 Abs. 5 Satz 1 VwGO', 'MISSI

# Simple Input Prompt for Interactive Retrieval
To approximate a real-world application scenario, we implemented an interactive user interface that allows users to input arbitrary legal case descriptions and receive the most relevant statutory provisions in response. This component enables direct, practical interaction with the fine-tuned model, demonstrating how the retrieval system could function as a tool for legal professionals by providing on-demand identification of potentially applicable laws.

1. The user enters a natural-language case description, such as “Tenant stopped paying rent, landlord wants to terminate the contract.”
2. The user optionally chooses how many results (Top-K) should be returned.
3. If the input is empty or invalid, it defaults to K = 5.
4. The system encodes the query, retrieves the k most similar statutory provisions using the trained FAISS index, and prints: the citation (e.g., “§ 543 BGB”), the law book (e.g., BGB), the similarity score, and a 300-character text snippet for easy inspection.
5. Results are displayed cleanly, giving the user an immediate sense of which statutes are potentially relevant to the case.
6. The interface then closes or waits for the next query depending on how the notebook/script is run.

In [11]:
if 'retriever' not in globals() or retriever is None:
    print("The final 'retriever' object is not available. Please ensure all cells ran successfully.")
else:
    print("==============================================")
    print("Ready for Legal Retrieval Query")
    print("==============================================")
    
    # 1. Get the case text from the user
    prompt = input("Please enter your case inquiry (e.g., 'Tenant didn't pay rent, eviction'):\n> ")
    
    # 2. Get the desired number of results
    try:
        # Using a standard input with a default value of 5
        k_input = input("How many results (Top K) should be displayed? (Default: 5): ")
        k_val = int(k_input) if k_input.strip() else 5
    except ValueError:
        k_val = 5
        
    print("-" * 50)
    
    if not prompt.strip():
        print("Query is empty. Please enter a case text.")
    else:
        print(f"**Searching for:** {prompt[:100]}...")
        print(f"**Expected Results:** Top {k_val}\n")
        
        # Call the retrieval method from the trained retriever object
        results = retriever.retrieve(prompt, k=k_val)
        
        if not results:
            print("No results found.")
        else:
            print("--- 🔎 Found Statute Sections ---")
            for r in results:
                # Output the results in a clear format
                print(f"[{r['rank']}] **{r['citation']}** ({r['law']})")
                print(f"    Similarity Score: {r['score']:.4f}")
                print(f"    Snippet: {r['text_snippet']}\n")
    
    print("==============================================")

Ready for Legal Retrieval Query


Please enter your case inquiry (e.g., 'Tenant didn't pay rent, eviction'):
>  Mein Fahrrad wurde gestohlen.
How many results (Top K) should be displayed? (Default: 5):  5


--------------------------------------------------
**Searching for:** Mein Fahrrad wurde gestohlen....
**Expected Results:** Top 5

--- 🔎 Found Statute Sections ---
[1] **§ 1775 BGB** (BGB)
    Similarity Score: 0.3698
    Snippet: Nichtamtliches Inhaltsverzeichnis § 1775 Mehrere Vormünder (1) Ehegatten können gemeinschaftlich zu Vormündern bestellt werden. (2) Für Geschwister soll nur ein Vormund bestellt werden, es sei denn, es liegen besondere Gründe vor, jeweils einen Vormund für einzelne Geschwister zu bestellen. (1) Eheg...

[2] **§ 420 BGB** (BGB)
    Similarity Score: 0.3563
    Snippet: Nichtamtliches Inhaltsverzeichnis § 420 Teilbare Leistung Schulden mehrere eine teilbare Leistung oder haben mehrere eine teilbare Leistung zu fordern, so ist im Zweifel jeder Schuldner nur zu einem gleichen Anteil verpflichtet, jeder Gläubiger nur zu einem gleichen Anteil berechtigt. Schulden mehre...

[3] **§ 1956 BGB** (BGB)
    Similarity Score: 0.3559
    Snippet: Nichtamtliches Inhaltsver