# Project 2 - Methodology 2: Hallucination Vector Routing

**Lead:** Ayesha Imran (ayesha_imr, ayesha.ml2002@gmail.com)

**Research Objective:** Cut the hallucination rate of a base Llama-3.1-8B model by ≥15% at <10% extra average latency by (i) predicting risk from the prompt's projection onto a hallucination vector and (ii) routing risky prompts through increasingly stronger (but still cheap) mitigations.

**Target Performance:**
- ≥15% relative reduction in hallucination metrics
- ≤10% average latency increase
- AUROC of prompt-risk predictor ≥0.75
- Single RTX 4090 deployment capability

**NOTEBOOK STRUCTURE:**
- Section 1: Environment Setup & Model Loading
- Section 2: Literature Review & Persona Vector Theory
- Section 3: Hallucination Vector Extraction
- Section 4: Risk Scoring with Logistic Regression
- Section 5: Three-Tier Guard-Rail Implementation
- Section 6: Evaluation Framework
- Section 7: Benchmark Testing (TruthfulQA, HalluLens, FActScore)
- Section 8: Performance Analysis & Optimization
- Section 9: Package Integration
- Section 10: Results & Documentation

        "#===============================================================================\n",
        "# SECTION 1: ENVIRONMENT SETUP & MODEL LOADING\n",
        "# Lead: Ayesha Imran | Contributors: Suparnojit Sarkar, Soham Chatterjee\n",
        "#==============================================================================="

In [None]:
        "# Cell 1.1: Core Dependencies Installation\n",
        "\"\"\"\n",
        "TODO: Install and configure required packages for hallucination vector routing\n",
        "- Set up Llama-3.1-8B model and tokenizer\n",
        "- Install evaluation datasets (TruthfulQA, HalluLens, FActScore, SimpleQA)\n",
        "- Configure GPU memory optimization\n",
        "- Set up logging and monitoring systems\n",

In [None]:
        "TODO: Load and configure Llama-3.1-8B for hallucination vector extraction\n",
        "- Load model with appropriate memory optimization\n",
        "- Set up tokenizer with proper padding and special tokens\n",
        "- Configure forward hooks for hidden state extraction\n",
        "- Test basic model functionality\n",

        "#===============================================================================\n",
        "# SECTION 2: LITERATURE REVIEW & PERSONA VECTOR THEORY\n",
        "# Primary: Suparnojit Sarkar | Supporting: Soham Chatterjee\n",
        "#==============================================================================="

In [None]:
        "TODO: Implement core persona vector theory for hallucination detection\n",
        "- Review and implement persona vector extraction methodology\n",
        "- Create trait-eliciting and trait-suppressing prompt templates\n",
        "- Implement vector arithmetic for trait direction discovery\n",
        "- Validate approach with known personality traits\n",

        "#===============================================================================\n",
        "# SECTION 3: HALLUCINATION VECTOR EXTRACTION\n",
        "# Primary: Ayesha Imran | Supporting: Suparnojit Sarkar\n",
        "#==============================================================================="

In [None]:
        "TODO: Extract the optimal hallucination vector from Llama-3.1-8B\n",
        "- Generate responses using trait-eliciting and suppressing prompts\n",
        "- Test vectors from multiple layers to find most effective\n",
        "- Validate vector by injection testing\n",
        "- Select best layer and vector for risk prediction\n",

        "#===============================================================================\n",
        "# SECTION 4: RISK SCORING WITH LOGISTIC REGRESSION\n",
        "# Primary: Ayesha Imran | Supporting: Soham Chatterjee\n",
        "#==============================================================================="

In [None]:
        "TODO: Implement logistic regression model for hallucination risk prediction\n",
        "- Collect labeled data from TruthfulQA, HalluLens, and FActScore\n",
        "- Project prompts onto hallucination vector to get features\n",
        "- Train one-feature logistic regression: p = σ(β₀ + β₁ * z)\n",
        "- Validate AUROC ≥ 0.75 on held-out data\n",

        "#===============================================================================\n",
        "# SECTION 5: THREE-TIER GUARD-RAIL IMPLEMENTATION\n",
        "# Primary: Ayesha Imran | Supporting: Suparnojit Sarkar\n",
        "#==============================================================================="

In [None]:
        "TODO: Implement three-tier routing system for hallucination mitigation\n",
        "- Fast path: Low risk prompts answered normally\n",
        "- Medium path: Medium risk prompts with vector steering\n",
        "- Safe path: High risk prompts with rewriting and optional retrieval\n",
        "- Optimize thresholds for performance vs safety trade-off\n",
        "\"\"\"\n",

        "#===============================================================================\n",
        "# SECTION 6: EVALUATION FRAMEWORK\n",
        "# Primary: Soham Chatterjee | Supporting: Suparnojit Sarkar\n",
        "#==============================================================================="

In [None]:
        "TODO: Implement evaluation framework for hallucination mitigation\n",
        "- Set up TruthfulQA, HalluLens, FActScore, and SimpleQA benchmarks\n",
        "- Implement metrics: hallucination rate, latency, AUROC\n",
        "- Create baseline comparison framework\n",
        "- Add statistical significance testing\n",