CELL 1 ‚Äî Install Dependencies

In [1]:
!pip install -U pytesseract pillow gradio langchain faiss-cpu tiktoken langchain-openai langchain-community matplotlib numpy -q





## Prompt Engineering for Regulatory-Aligned Loan Evaluation

FDIC RMS Manual ‚Äì Section 3.2 (Loans)



## Problem Statement

Banks operate under strict regulatory oversight when originating, administering, and reviewing loans.  
Examiners evaluate not only individual credit files, but also governance, underwriting discipline, documentation quality, risk identification, and allowance adequacy.

Large Language Models (LLMs) can assist with regulatory interpretation, but they pose risks if they:
- Hallucinate unsupported regulatory rules
- Make or imply loan approval or denial decisions
- Apply business judgment instead of examiner reasoning
- Infer conclusions from incomplete data

**This project addresses the challenge of constraining an LLM to operate strictly as a regulatory reasoning assistant**, aligned exclusively with **FDIC RMS Manual Section 3.2**, without making credit decisions or introducing external knowledge.

## **Proposed Method**
- Use **prompt engineering only** (no model training or fine-tuning).  
- Encode regulatory guidance into a **structured constraints object**.  
- Enforce behavior using a **strict system prompt**.  
- Separate **data extraction** from **regulatory reasoning**.  
- Explicitly refuse prohibited requests (e.g., loan approval decisions).



## Model Used & Context
- **Model:** gpt-4.1-nano  
- **Context Window:** ~128,000 tokens  
- All prompts, constraints, and inputs are **within the context limit**, ensuring stable regulatory behavior.



## Workflow Overview
1. **Dependencies Installation:**  
   Install required packages like `pytesseract`, `gradio`, `langchain`, `FAISS`, etc.  

2. **Data and API Setup:**  
   - Load FDIC RMS Section 3.2 JSON as the **authoritative regulatory source**.  
   - Configure OpenAI API key and endpoint.  

3. **Text Normalization & Chunking:**  
   - Flatten JSON into text lines.  
   - Split into chunks using `RecursiveCharacterTextSplitter`.  
   - Generate embeddings and store in FAISS for **retrieval-augmented generation (RAG)**.  

4. **LLM Initialization:**  
   - Use `ChatOpenAI` (gpt-4.1-nano) with **temperature 0** for deterministic responses.  

5. **OCR Extraction:**  
   - Extract loan and credit facts from uploaded images using `pytesseract`.  
   - Generate a structured applicant profile.  

6. **Regulatory Reasoning Engine:**  
   - Retrieve relevant regulatory chunks from FAISS.  
   - Use **system prompt** to constrain LLM behavior.  
   - Respond with **examiner-aligned observations**, never making credit decisions.  

7. **UI / Interaction:**  
   - Gradio interface allows:  
     - Uploading loan and credit documents.  
     - Viewing extracted applicant facts.  
     - Chat-based Q&A with regulatory reasoning.  
     - FAQ exploration based on Section 3.2 guidance.  

8. **Response Governance:**  
   - Follow strict formatting rules: bullet points, explicit references, neutral tone.  
   - Prevent hallucinations and prohibited terms.  
   - Clearly indicate when information is **not stated** in the source documents.

CELL 2 - Imports libraries for OCR, UI, image handling, typing, and LangChain components for regulatory loan extraction.


In [2]:
import os
import gradio as gr
import pytesseract
from PIL import Image
from typing import Any, List

from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter


from openai import OpenAI
import json
import numpy as np
import matplotlib.pyplot as plt



CELL 3 ‚Äî API Configuration

In [3]:
os.environ["OPENAI_API_KEY"] = "sk-5MOyzg2xQQiJT1uYcZ9Tzg"
BASE_URL = "https://apidev.navigatelabsai.com/v1"


CELL 4 ‚Äî FDIC RMS Manual Section 3.2

In [4]:
GUIDELINES_JSON = {
  "document": {
    "title": "FDIC RMS Manual Section 3.2: Loans",
    "last_updated": "05/2023",
    "introduction": {
      "legal_basis": "Section 39 of the Federal Deposit Insurance Act",
      "overview_points": [
        "Loans are the largest and riskiest asset category for banks",
        "Loan portfolio quality directly impacts institutional safety and the FDIC insurance fund",
        "Examinations emphasize lending policies, credit administration, and allowance adequacy",
        "Review scope extends beyond individual loans to systems, policies, and risk management"
      ]
    },
    "loan_administration": {
      "lending_policies": {
        "board_responsibility": True,
        "policy_characteristics": [
          "Written",
          "Up-to-date",
          "Responsive to economic and institutional changes"
        ],
        "components": [
          "Lending authorities",
          "Loan types and diversification goals",
          "Collateral and appraisal requirements",
          "Credit file maintenance",
          "Collection procedures",
          "Loan volume limits",
          "Loan review and grading systems",
          "ALLL / ACL review process",
          "Environmental liability safeguards"
        ]
      },
      "loan_review_systems": {
        "objectives": [
          "Identify credit weaknesses early",
          "Support ALLL/ACL determinations",
          "Monitor credit trends",
          "Ensure policy compliance",
          "Evaluate lending staff performance"
        ],
        "independence_required": True
      },
      "credit_risk_grading": {
        "initial_assignment": "Loan Officers",
        "independent_review": True,
        "features": [
          "Formal risk ratings",
          "Regulatory alignment",
          "Problem loan identification",
          "Loss experience documentation"
        ]
      },
      "allowance_for_credit_losses": {
        "methodologies": {
          "CECL": {
            "standard": "ASC Topic 326",
            "approach": "Forward-looking expected credit loss estimation"
          },
          "ALLL": {
            "approach": "Incurred loss model"
          }
        },
        "evaluation_frequency": "Quarterly",
        "components": [
          "Individually evaluated loans",
          "Collectively evaluated loans",
          "Cross-border transfer risks"
        ],
        "key_factors": [
          "Underwriting changes",
          "Economic conditions",
          "Loan volume",
          "Management quality",
          "Credit concentrations"
        ]
      }
    },
    "portfolio_composition": {
      "commercial_loans": {
        "types": [
          "Working capital",
          "Term loans",
          "Business loans"
        ],
        "key_controls": [
          "Financial statements",
          "Collateral verification"
        ],
        "accounts_receivable_financing": [
          "Blanket assignment",
          "Ledgering"
        ]
      },
      "leveraged_lending": {
        "characteristics": [
          "High leverage ratios",
          "Buyouts",
          "Acquisitions"
        ],
        "risk_management": [
          "Strong underwriting",
          "Stress testing",
          "Independent credit review",
          "Conflict of interest policies"
        ],
        "valuation_methods": [
          "Asset approach",
          "Income approach",
          "Market approach"
        ]
      },
      "oil_and_gas_lending": {
        "collateral_basis": "Proved reserves",
        "reserve_types": [
          "PDP",
          "PDNP",
          "PUD"
        ],
        "advance_rates": {
          "PDP": "50-65%",
          "PDNP": "Lower than PDP",
          "PUD": "Lowest"
        },
        "risk_controls": [
          "Price decks",
          "Discount rates",
          "Hedging",
          "Amortization alignment"
        ]
      },
      "real_estate_loans": {
        "regulatory_basis": [
          "Section 18(o) of FDI Act",
          "FDIC Part 365"
        ],
        "ltv_guidelines": {
          "raw_land": "65%",
          "improved_property": "85%"
        },
        "special_focus": [
          "Construction loans",
          "ADC loans",
          "Home equity loans",
          "Subprime real estate loans"
        ]
      },
      "agricultural_loans": {
        "types": [
          "Production",
          "Livestock",
          "Machinery",
          "Real estate"
        ],
        "key_risks": [
          "Collateral liquidity",
          "Borrower cash flow",
          "Carryover loans"
        ]
      },
      "installment_loans": {
        "features": [
          "Small loan size",
          "Consumer focus"
        ],
        "policy_focus": [
          "Credit checks",
          "Renewals",
          "Delinquencies",
          "Charge-offs"
        ]
      }
    },
    "loan_problems": {
      "common_causes": [
        "Poor risk selection",
        "Overlending",
        "Incomplete credit information",
        "Self-dealing",
        "Weak supervision",
        "Economic changes",
        "Competitive pressure"
      ],
      "loan_sampling": {
        "selection_basis": "Risk-based judgment",
        "categories": [
          "In Scope",
          "Discuss Only",
          "Group"
        ]
      },
      "loan_classifications": {
        "substandard": "Well-defined weaknesses",
        "doubtful": "Collection highly questionable",
        "loss": "Uncollectible",
        "special_mention": "Potential weaknesses"
      }
    },
    "impaired_loans_and_tdrs": {
      "impairment_standard": "ASC 310-10",
      "measurement_methods": [
        "Discounted cash flows",
        "Collateral fair value"
      ],
      "tdr_definition": "Concession granted due to borrower financial difficulty"
    },
    "concentrations_of_credit": {
      "risk_types": [
        "Borrower",
        "Industry",
        "Geography",
        "Foreign exposure"
      ],
      "monitoring_tools": [
        "NAICS codes"
      ]
    },
    "legal_framework": {
      "ucc_article_9": {
        "concepts": [
          "Attachment",
          "Perfection",
          "Priority",
          "Default rights"
        ]
      },
      "bankruptcy": {
        "chapters": [
          "Chapter 7",
          "Chapter 11",
          "Chapter 13"
        ],
        "key_concepts": [
          "Automatic stay",
          "Property of the estate",
          "Discharge",
          "Preferences",
          "Setoff"
        ]
      }
    },
    "syndicated_lending": {
      "phases": [
        "Pre-launch",
        "Launch",
        "Post-launch",
        "Post-closing"
      ],
      "risk_controls": [
        "Independent credit analysis",
        "Covenant monitoring",
        "Agent bank oversight"
      ],
      "programs": [
        "Shared National Credit (SNC)"
      ]
    },
    "credit_scoring": {
      "benefits": [
        "Speed",
        "Consistency",
        "Expanded lending"
      ],
      "controls": [
        "Human overrides",
        "Model validation",
        "Periodic updates"
      ]
    },
    "subprime_lending": {
      "risk_level": "High",
      "capital_requirements": "1.5x to 3x prime loans",
      "controls": [
        "Risk-based pricing",
        "Stress testing",
        "Early collections",
        "Compliance oversight"
      ],
      "concentration_threshold": "25% of capital plus ALLL"
    },
    "appraisals_and_environmental_risk": {
      "appraisal_methods": [
        "Cost",
        "Market",
        "Income"
      ],
      "regulatory_basis": "FIRREA Title XI",
      "environmental_controls": [
        "All Appropriate Inquiry",
        "Environmental covenants",
        "Ongoing monitoring"
      ]
    }
  },
  "regulatory_identity": {
    "assistant_role": "Loan Examination Reasoning Assistant",
    "regulatory_basis": "FDIC RMS Manual Section 3.2 ‚Äì Loans",
    "operating_principle": "Evaluate safety and soundness, not credit decisions",
    "primary_objective": "Assess loan governance, risk identification, and documentation adequacy",
    "secondary_objective": "Explain examiner expectations clearly and consistently"
  },

  "scope_and_boundary_controls": {
    "permitted_functions": [
      "Interpret regulatory loan guidance",
      "Evaluate loan file completeness",
      "Identify credit risk indicators",
      "Highlight policy deviations",
      "Explain regulatory concerns"
    ],
    "explicitly_prohibited_functions": [
      "Loan approval or denial",
      "Interest rate recommendation",
      "Borrower suitability judgment",
      "Credit score interpretation",
      "Profitability optimization"
    ],
    "boundary_enforcement_rules": [
      "If a request implies a credit decision, respond with regulatory context only",
      "If data is missing, state insufficiency clearly",
      "If conclusions exceed evidence, stop and flag limitation"
    ]
  },

  "examiner_reasoning_model": {
    "mandatory_reasoning_order": [
      "Institutional policy alignment",
      "Borrower repayment capacity",
      "Collateral and structure",
      "Documentation and controls",
      "Risk grading accuracy",
      "Portfolio and concentration impact",
      "Allowance and loss implications"
    ],
    "reasoning_quality_requirements": [
      "Evidence-based",
      "Documentation-supported",
      "Current-condition focused",
      "Conservative when uncertain"
    ]
  },

  "loan_policy_governance": {
    "policy_expectations": {
      "form": "Written",
      "board_approved": True,
      "periodically_reviewed": True
    },
    "customization": "Aligned to institution size, complexity, and risk profile",
    "enforcement": "Consistently applied across loan portfolio",
    "minimum_policy_components": {
      "loan_types": "Clearly defined permissible loan products",
      "approval_authority": "Defined limits by officer, committee, or board",
      "underwriting_criteria": "Repayment, collateral, guarantor standards",
      "pricing_and_terms": "Risk-based, policy-consistent",
      "exception_handling": "Documented justification and approval",
      "concentration_limits": "Quantitative and qualitative thresholds",
      "problem_loan_process": "Identification, escalation, and resolution",
      "loan_review_system": "Independent, periodic, and documented"
    },
    "policy_failure_indicators": [
      "Frequent undocumented exceptions",
      "Inconsistent application",
      "Outdated or generic policies",
      "Board unfamiliarity with loan risks"
    ]
  },

  "borrower_credit_analysis_framework": {
    "repayment_source_hierarchy": [
      "Primary: Operating cash flow",
      "Secondary: Guarantor support",
      "Tertiary: Collateral liquidation"
    ],
    "required_analysis_elements": {
      "financial_statements": [
        "Income statement",
        "Balance sheet",
        "Cash flow statement"
      ],
      "analysis_expectations": [
        "Trend analysis",
        "Reasonableness of assumptions",
        "Stress sensitivity"
      ],
      "repayment_validation": [
        "Debt service coverage",
        "Cash flow stability",
        "Reliance on non-recurring income"
      ]
    },
    "heightened_risk_conditions": [
      "Start-up or early-stage borrower",
      "Highly leveraged capital structure",
      "Cyclical or declining industry",
      "Significant related-party exposure"
    ]
  },

  "collateral_evaluation_and_controls": {
    "collateral_principles": [
      "Collateral is a secondary source of repayment",
      "Valuations must be reasonable and supportable",
      "Lien position must be legally perfected"
    ],
    "valuation_requirements": {
      "independence": "No conflict of interest",
      "timeliness": "Reflect current market conditions",
      "methodology": "Appropriate to collateral type"
    },
    "collateral_monitoring": [
      "Periodic revaluation",
      "Insurance verification",
      "Environmental or legal risk awareness"
    ],
    "collateral_risk_flags": [
      "Outdated appraisals",
      "Unsupported valuation assumptions",
      "Incomplete lien documentation",
      "Market volatility exposure"
    ]
  },

  "loan_structure_and_terms": {
    "structural_elements": [
      "Maturity and amortization",
      "Covenants",
      "Guarantor support",
      "Repayment schedule"
    ],
    "examiner_expectations": [
      "Structure aligns with repayment capacity",
      "Covenants are measurable and enforceable",
      "Guarantors have documented capacity"
    ],
    "weak_structuring_indicators": [
      "Balloon payments without take-out analysis",
      "Covenants not monitored",
      "Dependence on refinancing"
    ]
  },

  "documentation_standards": {
    "required_documents": [
      "Credit approval memorandum",
      "Loan agreement and promissory note",
      "Collateral documentation",
      "Guaranty agreements",
      "Financial analysis support"
    ],
    "documentation_quality_metrics": [
      "Completeness",
      "Internal consistency",
      "Timeliness",
      "Traceability to policy"
    ],
    "documentation_deficiencies": [
      "Missing approvals",
      "Unsigned agreements",
      "Unsupported risk ratings",
      "Inconsistent borrower data"
    ]
  },

  "loan_review_and_independent_assessment": {
    "system_objectives": [
      "Early identification of deterioration",
      "Validation of risk ratings",
      "Support allowance estimation"
    ],
    "independence_criteria": [
      "Reviewer not involved in loan origination",
      "Authority to recommend corrective action",
      "Direct reporting to senior management or board"
    ],
    "review_failures": [
      "Delayed downgrades",
      "Infrequent reviews",
      "Reviewer override without justification"
    ]
  },

  "credit_risk_grading_framework": {
    "grading_philosophy": "Reflect current risk, not hoped-for outcomes",
    "grade_definitions": {
      "pass": "Acceptable risk with adequate repayment capacity",
      "special_mention": "Potential weaknesses requiring monitoring",
      "substandard": "Well-defined weaknesses jeopardizing repayment",
      "doubtful": "Collection improbable without liquidation",
      "loss": "Uncollectible",
      "special_mention": "Potential weaknesses"
    },
    "grading_errors": [
      "Grade inflation",
      "Delayed recognition of weakness",
      "Collateral-driven grading"
    ]
  },

  "problem_loan_management_protocol": {
    "identification_triggers": [
      "Payment delinquency",
      "Covenant violations",
      "Adverse borrower trends",
      "Collateral impairment"
    ],
    "required_responses": [
      "Risk grade reassessment",
      "Accrual status review",
      "Workout planning",
      "Charge-off consideration"
    ],
    "regulatory_expectation": "Prompt, realistic, and well-documented action"
  },

  "allowance_for_credit_losses_framework": {
    "governance_expectations": [
      "Management responsibility",
      "Board oversight",
      "Quarterly evaluation minimum"
    ],
    "methodology_requirements": [
      "Consistent application",
      "Reasonable assumptions",
      "Supportable forecasts"
    ],
    "qualitative_adjustment_controls": [
      "Documented rationale",
      "Avoid earnings management",
      "Reflect portfolio-specific risks"
    ]
  },

  "portfolio_and_concentration_risk": {
    "concentration_identification": [
      "Industry",
      "Geography",
      "Collateral type",
      "Borrower relationship"
    ],
    "risk_management_expectations": [
      "Board awareness",
      "Monitoring thresholds",
      "Stress considerations"
    ]
  },

  "examiner_assessment_outputs": {
    "acceptable_outcomes": [
      "Adequate",
      "Needs improvement",
      "Deficient"
    ],
    "assessment_basis": [
      "Policy adherence",
      "Risk identification timeliness",
      "Documentation strength",
      "Management responsiveness"
    ]
  },

  "model_response_governance": {
    "required_response_structure": [
      "Observation",
      "Regulatory context",
      "Risk implication",
      "Data gap (if applicable)"
    ],
    "tone_requirements": [
      "Neutral",
      "Professional",
      "Non-judgmental"
    ],
    "hallucination_prevention_rules": [
      "Do not infer missing data",
      "Cite regulatory basis implicitly",
      "Allow uncertainty responses"
    ]
  },

  "training_and_evaluation_use": {
    "rag_alignment": [
      "Chunk-to-risk mapping",
      "Policy-to-question retrieval",
      "Evidence-grounded responses"
    ],
    "evaluation_tasks": [
      "Missing document detection",
      "Risk justification explanation",
      "Policy deviation identification"
    ],
    "success_metrics": [
      "Reduced hallucination",
      "Consistent regulatory tone",
      "Accurate risk framing"
    ]
  }
}


CELL 5 ‚Äî Normalize JSON

In [5]:
def normalize_json(data: Any, prefix: str = "") -> List[str]:
    texts = []
    if isinstance(data, dict):
        for k, v in data.items():
            texts.extend(normalize_json(v, f"{prefix} {k}".strip()))
    elif isinstance(data, list):
        for item in data:
            texts.append(f"{prefix}: {item}")
    else:
        texts.append(f"{prefix}: {data}")
    return texts


CELL 6 ‚Äî Chunking + Vector Store

In [6]:
lines = normalize_json(GUIDELINES_JSON)

splitter = RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=50
)

chunks = []
for line in lines:
    chunks.extend(splitter.split_text(line))

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",
    openai_api_key=os.environ["OPENAI_API_KEY"],
    openai_api_base=BASE_URL
)

vectorstore = FAISS.from_texts(chunks, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 8})

print("‚úÖ FDIC RMS Manual indexed")


‚úÖ FDIC RMS Manual indexed


CELL 7 ‚Äî LLM Initialization

In [7]:
llm = ChatOpenAI(
    model="gpt-4.1-nano",
    temperature=0,
    openai_api_key=os.environ["OPENAI_API_KEY"],
    openai_api_base=BASE_URL
)


CELL 8 ‚Äî System Prompt (Behavior Control)

Enforces strict regulatory behavior:

* Grounded only in FDIC Section 3.2
* Neutral, examiner-appropriate tone
* Answer only the user‚Äôs explicit question
* Refuse any loan approval, denial, or eligibility requests
* Provide mandatory answers for repayment and credit analysis questions



In [8]:
SYSTEM_PROMPT = """
ROLE
You are a regulatory documentation assistant operating in a controlled
banking examination and compliance context.

Your sole function is to restate, list, or describe information that is
explicitly stated in the supplied authoritative regulatory documents.
You do not interpret intent, infer unstated purposes, or apply professional
judgment.

AUTHORITATIVE SOURCES (SOLE SOURCE OF TRUTH)
‚Ä¢ FDIC RMS Manual (including Section 3.2 ‚Äì Loans)
‚Ä¢ Company loan guidelines (if provided)
‚Ä¢ Loan application document (if provided)
‚Ä¢ Credit score document (if provided)

No other documents, regulatory frameworks, industry practices, examples,
or general banking knowledge may be used.

OPERATING CONSTRAINTS
‚Ä¢ You do NOT approve, reject, recommend, or decide any loan outcome
‚Ä¢ You do NOT determine borrower status, qualification, or suitability
‚Ä¢ You do NOT infer rationale, benefits, or consequences unless they are
  explicitly stated in the authoritative documents
‚Ä¢ You do NOT calculate thresholds, apply formulas, or reconcile ambiguities
‚Ä¢ You do NOT introduce standard practices unless explicitly documented

PROHIBITED LANGUAGE
The following words and their variations must never be used:
‚Ä¢ eligible
‚Ä¢ eligibility
‚Ä¢ approved
‚Ä¢ rejected
‚Ä¢ denied

RESPONSE RULES
‚Ä¢ Respond only to the user‚Äôs explicit question
‚Ä¢ Use only language that is directly stated or unambiguously described
  in the authoritative documents
‚Ä¢ Do not replace documented regulatory items with generalized summaries
‚Ä¢ Do not add interpretive explanations or reasoning

OBJECTIVE AND LISTING RULE
If a question asks for objectives, purposes, factors, components, criteria,
or classifications:
‚Ä¢ List each explicitly stated item separately
‚Ä¢ Use bullet points where appropriate
‚Ä¢ Do not combine, compress, or generalize listed items

ABSENCE HANDLING
‚Ä¢ If the authoritative documents do not explicitly state the requested
  information, respond with:
  ‚ÄúThe document does not specify this information.‚Äù
‚Ä¢ Do not speculate or extend beyond documented content

CONTENT HANDLING
‚Ä¢ Regulatory responsibilities may be described only as written
‚Ä¢ Structural or procedural requirements may be listed verbatim or summarized
  without interpretation
‚Ä¢ Any omission or lack of explanation in the documents must be reported
  factually and without analysis

TONE AND STYLE
‚Ä¢ Neutral
‚Ä¢ Descriptive
‚Ä¢ Examiner-appropriate
‚Ä¢ Non-conclusive
‚Ä¢ Non-interpretive

FORMAT
‚Ä¢ Short paragraphs or bullet points as appropriate
‚Ä¢ No emojis
‚Ä¢ No conclusions, recommendations, or evaluative statements
"""

CELL 9 ‚Äî OCR Function

In [9]:
def extract_text_from_image(image_path):
    return pytesseract.image_to_string(Image.open(image_path)).strip()


CELL 10 ‚Äî OCR Fact Extraction

In [10]:
def extract_applicant_facts(raw_text: str) -> str:
    messages = [
        {
            "role": "system",
            "content": """
Extract ONLY explicitly stated facts.
DO NOT infer, judge, summarize, or calculate.
If data is missing, state 'Not stated'.
Return bullet points only.
"""
        },
        {"role": "user", "content": raw_text}
    ]
    return llm.invoke(messages).content


 CELL 11 ‚Äî Build Applicant Profile (From Uploaded Images)

In [11]:
def build_user_profile(loan_image, credit_image):
    loan_text = extract_text_from_image(loan_image)
    credit_text = extract_text_from_image(credit_image)

    loan_facts = extract_applicant_facts(loan_text)
    credit_facts = extract_applicant_facts(credit_text)

    return f"""
Loan Application ‚Äì Extracted Facts:
{loan_facts}

Credit Report ‚Äì Extracted Facts:
{credit_facts}
"""


CELL 12 ‚Äî Answer Engine (RAG + System Prompt)

In [12]:
def answer_user_question(question: str, profile: str) -> str:
    # Expand retrieval for "objective" or "factor" questions
    k = 8 if any(word in question.lower() for word in ["objective", "purpose", "factor", "component"]) else 3

    # Create a new retriever instance with the dynamic k value
    current_retriever = vectorstore.as_retriever(search_kwargs={"k": k})

    docs = current_retriever.invoke(question)
    context = "\n".join([d.page_content for d in docs])

    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {
            "role": "user",
            "content": f"""
FDIC RMS Manual Section 3.2:
{context}

Applicant Information (OCR Extracted):
{profile}

Question:
{question}

IMPORTANT:
- If the question asks for objectives, factors, components, purposes, or system goals:
  ‚Ä¢ List each item exactly as stated in the regulatory documents
  ‚Ä¢ Do not summarize, paraphrase, or combine items
  ‚Ä¢ Use bullet points
- If the documents do not contain the requested information, respond:
  "The document does not specify this information."
"""
        }
    ]
    return llm.invoke(messages).content

CELL 13 ‚Äî FAQ Questions

In [13]:
FAQ_QUESTIONS = [
    "What regulatory factors are reviewed for a loan?",
    "What documents must be maintained in a credit file?",
    "How are problem loans classified?",
    "What credit administration controls are required?",
    "Can eligibility be determined from this manual?"
]


CELL 14 ‚Äî Gradio UI

In [None]:
with gr.Blocks(title="FDIC Loan Advisory System") as demo:

    profile_state = gr.State()

    # ---------------- UPLOAD PAGE ----------------
    with gr.Column() as upload_page:
        gr.Markdown("## üìÑ Upload Loan Documents")
        loan_img = gr.Image(type="filepath", label="Loan Application")
        credit_img = gr.Image(type="filepath", label="Credit Score")
        next_btn = gr.Button("Generate Applicant Profile")
        status = gr.Markdown()

    # ---------------- MENU PAGE ----------------
    with gr.Column(visible=False) as menu_page:
        live_btn = gr.Button("Live Chat")
        faq_btn = gr.Button("FAQs")

    # ---------------- CHAT PAGE ----------------
    with gr.Column(visible=False) as chat_page:
        chatbot = gr.Chatbot(label="FDIC Loan Advisory Chat")
        user_input = gr.Textbox(placeholder="Ask a regulatory question...")
        send = gr.Button("Send")
        back1 = gr.Button("Back")

    # ---------------- FAQ PAGE ----------------
    with gr.Column(visible=False) as faq_page:
        faq_q = gr.Radio(FAQ_QUESTIONS)
        faq_ans = gr.Textbox(lines=6)
        back2 = gr.Button("Back")

    # ---------------- FUNCTIONS (DEFINE FIRST) ----------------

    def process_images(loan, credit):
        profile = build_user_profile(loan, credit)
        return (
            profile,
            gr.update(visible=False),
            gr.update(visible=True),
            "‚úÖ Applicant facts extracted"
        )

    def chat_fn(message, history, profile):
        answer = answer_user_question(message, profile)

        if history is None:
            history = []

        history.append({
            "role": "user",
            "content": message
        })
        history.append({
            "role": "assistant",
            "content": answer
        })

        return history

    def faq_fn(question, profile):
        return answer_user_question(question, profile)

    # ---------------- BUTTON WIRING ----------------

    next_btn.click(
        process_images,
        inputs=[loan_img, credit_img],
        outputs=[profile_state, upload_page, menu_page, status]
    )

    live_btn.click(
        lambda: (gr.update(visible=False), gr.update(visible=True)),
        outputs=[menu_page, chat_page]
    )

    faq_btn.click(
        lambda: (gr.update(visible=False), gr.update(visible=True)),
        outputs=[menu_page, faq_page]
    )

    back1.click(
        lambda: (gr.update(visible=True), gr.update(visible=False)),
        outputs=[menu_page, chat_page]
    )

    back2.click(
        lambda: (gr.update(visible=True), gr.update(visible=False)),
        outputs=[menu_page, faq_page]
    )

    send.click(
        chat_fn,
        inputs=[user_input, chatbot, profile_state],
        outputs=chatbot
    )

    faq_q.change(
        faq_fn,
        inputs=[faq_q, profile_state],
        outputs=faq_ans
    )

demo.launch(debug=True)


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://9fe0588b457a202a17.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


<h1>Evaluation</h1>

#API Setup

In [None]:
client = OpenAI(
    api_key="sk-Mgz_yXQAT_3UODXADvPblQ",
    base_url="https://apidev.navigatelabsai.com"
)

#Judge Models

In [None]:
JUDGE_MODELS = [
    "nova-micro",
    "gpt-4.1-nano",
]

#Evaluation Question & Ground Truth

In [None]:
QUESTION = """
According to FDIC Section 3.2, what are the primary objectives of an effective loan review system, and how do these objectives support credit risk management?"""

GROUND_TRUTH = """
An effective loan review system is intended to identify loans with well-defined credit weaknesses in a timely manner, enable prompt corrective action to minimize credit losses, and provide management with accurate information regarding the quality of the loan portfolio. The system also supports credit risk management by identifying trends affecting collectibility, assessing adherence to lending policies, and providing reliable input for determining the Allowance for Loan and Lease Losses (ALLL) or Allowance for Credit Losses (ACL).
"""

#Prompt and Response Input

In [None]:
STUDENT_SYSTEM_PROMPT = """
 ROLE
You are a regulatory documentation assistant operating in a controlled
banking examination and compliance context.

Your sole function is to restate, list, or describe information that is
explicitly stated in the supplied authoritative regulatory documents.
You do not interpret intent, infer unstated purposes, or apply professional
judgment.

AUTHORITATIVE SOURCES (SOLE SOURCE OF TRUTH)
‚Ä¢ FDIC RMS Manual (including Section 3.2 ‚Äì Loans)
‚Ä¢ Company loan guidelines (if provided)
‚Ä¢ Loan application document (if provided)
‚Ä¢ Credit score document (if provided)

No other documents, regulatory frameworks, industry practices, examples,
or general banking knowledge may be used.

OPERATING CONSTRAINTS
‚Ä¢ You do NOT approve, reject, recommend, or decide any loan outcome
‚Ä¢ You do NOT determine borrower status, qualification, or suitability
‚Ä¢ You do NOT infer rationale, benefits, or consequences unless they are
  explicitly stated in the authoritative documents
‚Ä¢ You do NOT calculate thresholds, apply formulas, or reconcile ambiguities
‚Ä¢ You do NOT introduce standard practices unless explicitly documented

PROHIBITED LANGUAGE
The following words and their variations must never be used:
‚Ä¢ eligible
‚Ä¢ eligibility
‚Ä¢ approved
‚Ä¢ rejected
‚Ä¢ denied

RESPONSE RULES
‚Ä¢ Respond only to the user‚Äôs explicit question
‚Ä¢ Use only language that is directly stated or unambiguously described
  in the authoritative documents
‚Ä¢ Do not replace documented regulatory items with generalized summaries
‚Ä¢ Do not add interpretive explanations or reasoning

OBJECTIVE AND LISTING RULE
If a question asks for objectives, purposes, factors, components, criteria,
or classifications:
‚Ä¢ List each explicitly stated item separately
‚Ä¢ Use bullet points where appropriate
‚Ä¢ Do not combine, compress, or generalize listed items

ABSENCE HANDLING
‚Ä¢ If the authoritative documents do not explicitly state the requested
  information, respond with:
  ‚ÄúThe document does not specify this information.‚Äù
‚Ä¢ Do not speculate or extend beyond documented content

CONTENT HANDLING
‚Ä¢ Regulatory responsibilities may be described only as written
‚Ä¢ Structural or procedural requirements may be listed verbatim or summarized
  without interpretation
‚Ä¢ Any omission or lack of explanation in the documents must be reported
  factually and without analysis

TONE AND STYLE
‚Ä¢ Neutral
‚Ä¢ Descriptive
‚Ä¢ Examiner-appropriate
‚Ä¢ Non-conclusive
‚Ä¢ Non-interpretive

FORMAT
‚Ä¢ Short paragraphs or bullet points as appropriate
‚Ä¢ No emojis
‚Ä¢ No conclusions, recommendations, or evaluative statements
"""


STUDENT_RESPONSE = """
- Monitor credit trends
- Evaluate lending staff performance
- Assess loan governance, risk identification, and documentation adequacy
- Ensure policy compliance
- Identify credit weaknesses early
"""

#System Prompt Evaluation Prompt

In [None]:
SYSTEM_PROMPT_EVAL_PROMPT = """
You are acting as an independent evaluator reviewing a system prompt
designed for a regulatory loan evaluation assistant in a banking context.

Your task is to assess the quality of the system prompt as an engineering
artifact, not the quality of any generated answers.

EVALUATION SCOPE
Evaluate whether the system prompt clearly defines:
‚Ä¢ The role of the assistant
‚Ä¢ The regulatory context in which it operates
‚Ä¢ The expected behavior and limitations
‚Ä¢ The professional standards required for use in a banking environment

Do not use external knowledge. Evaluate only what is explicitly stated
or clearly implied within the system prompt.

EVALUATION CRITERIA
Assign a single score from 0 to 5 based on the following dimensions:

‚Ä¢ Task Clarity
  Does the prompt clearly explain what the assistant is and what it is
  expected to do?

‚Ä¢ Context Definition
  Does the prompt clearly describe the real-world banking or regulatory
  scenario in which the assistant is used?

‚Ä¢ Constraint Enforcement
  Does the prompt explicitly restrict hallucination, external knowledge,
  approval or rejection decisions, and unsupported assumptions?

‚Ä¢ Document Grounding
  Does the prompt clearly establish the regulatory document as the single
  source of truth?

‚Ä¢ Professional and Regulatory Tone
  Is the prompt written in a neutral, professional, examiner-appropriate
  manner suitable for a regulated financial environment?

SCORING GUIDANCE
‚Ä¢ 5 ‚Äì Excellent: Clear, complete, well-structured, and enforceable
‚Ä¢ 3 ‚Äì Adequate: Generally correct but missing clarity or specificity
‚Ä¢ 0 ‚Äì Poor: Vague, incomplete, or lacks enforceable constraints

OUTPUT REQUIREMENT
Return ONLY a valid JSON object in the following format.
Do not include explanations or additional text.

Return JSON only.

{
  "task_clarity": number,
  "context_definition": number,
  "constraint_enforcement": number,
  "document_grounding": number,
  "professional_tone": number
}
"""

#Answer Evaluation Prompt

In [None]:
ANSWER_EVAL_PROMPT = """
You are acting as an independent evaluator reviewing a response generated
by a regulatory loan evaluation assistant.

Your task is to assess the response strictly against the provided
regulatory ground truth.

EVALUATION SCOPE
Evaluate whether the response accurately reflects the regulatory
requirements and limitations described in the ground truth.
Do not use external banking knowledge or assumptions.
Evaluate only what is explicitly stated or clearly supported.

EVALUATION CRITERIA
Assign a single score from 0 to 5 based on the following considerations:

‚Ä¢ Accuracy
  Does the response correctly reflect the regulatory ground truth?

‚Ä¢ Faithfulness
  Is the response grounded in the regulation without adding unsupported
  interpretations or external information?

‚Ä¢ Hallucination Control
  Does the response avoid fabricating rules, thresholds, or conclusions
  not present in the regulation?

‚Ä¢ Regulatory Judgment
  Does the response appropriately acknowledge when the regulation does
  not provide sufficient guidance or requires additional review?

‚Ä¢ Professional Tone
  Is the response written in a neutral, objective, examiner-appropriate
  manner suitable for a regulated banking environment?

SCORING GUIDANCE
‚Ä¢ 5 ‚Äì Fully accurate, faithful, and professional
‚Ä¢ 3 ‚Äì Partially correct with minor omissions or ambiguity
‚Ä¢ 0 ‚Äì Incorrect, misleading, or not grounded in regulation

OUTPUT REQUIREMENT
Return ONLY a valid JSON object in the following format.
Do not include explanations or additional text.

{
  "accuracy": number,
  "faithfulness": number,
  "hallucination_control": number,
  "regulatory_judgment": number,
  "professional_tone": number
}
"""

#Judge Call Function

In [None]:
def judge(model, system_prompt, user_input):
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_input}
        ],
        temperature=0
    )
    return json.loads(response.choices[0].message.content)

#Evaluate Across 2 Models

In [None]:
def evaluate_all(prompt, content):
    results = []
    for model in JUDGE_MODELS:
        score = judge(model, prompt, content)
        results.append(score)
    return results

#Aggregate Rubric Scores

In [None]:
def average_scores(scores):
    return {
        k: round(np.mean([s[k] for s in scores]), 2)
        for k in scores[0]
    }

Evaluate System Prompt

In [None]:
system_prompt_scores = evaluate_all(
    SYSTEM_PROMPT_EVAL_PROMPT,
    STUDENT_SYSTEM_PROMPT
)

avg_system_prompt_scores = average_scores(system_prompt_scores)
print("System Prompt Scores:", avg_system_prompt_scores)

#Evaluate Answer

In [None]:
answer_scores = evaluate_all(
    ANSWER_EVAL_PROMPT,
    f"""
    Question:
    {QUESTION}

    Ground Truth:
    {GROUND_TRUTH}

    Student Response:
    {STUDENT_RESPONSE}
    """
)

avg_answer_scores = average_scores(answer_scores)
print("Answer Scores:", avg_answer_scores)

#Visualize System Prompt Rubrics

In [None]:
plt.figure()
plt.bar(avg_system_prompt_scores.keys(), avg_system_prompt_scores.values())
plt.title("System Prompt Evaluation Scores")
plt.ylim(0,5)
plt.ylabel("Score")
plt.xticks(rotation=30)
plt.show()

#Visualize Answer Rubrics

In [None]:
plt.figure()
plt.bar(avg_answer_scores.keys(), avg_answer_scores.values())
plt.title("Answer Evaluation Scores")
plt.ylim(0,5)
plt.ylabel("Score")
plt.xticks(rotation=30)
plt.show()

#Final Score

In [None]:
final_score = round(
    (np.mean(list(avg_system_prompt_scores.values())) +
     np.mean(list(avg_answer_scores.values())))/2 ,
    2
)

print("FINAL SCORE:", final_score)