# LAB 1.1: BASELINE ASSESSMENT - CREDIT APPLICATION ANALYSIS

**Course:** Advanced Prompt Engineering Training  
**Session:** Session 1 - Prompt Engineering Fundamentals Review  
**Duration:** 45 minutes  
**Model:** GPT-4o (temperature=0)

---

## Overview

This lab presents credit application analysis challenges to demonstrate:
- Basic prompt construction with output format control
- Multi-step reasoning with chain-of-thought
- Strict format enforcement for compliance
- Edge case handling without hallucination
- Comparative analysis and ranking

## Setup

In [None]:
# Import required libraries
import os
import json
from openai import OpenAI
from datetime import datetime
import pandas as pd
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Check if API key exists
if not os.environ.get("OPENAI_API_KEY"):
    raise ValueError("OPENAI_API_KEY not found. Please set it in .env file")

print("✓ Libraries imported successfully")

In [None]:
# Initialize OpenAI client
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

# Configuration
MODEL = os.getenv("MODEL_NAME")
TEMPERATURE = 0  # Deterministic for BFSI applications

if not MODEL:
    raise ValueError("MODEL_NAME not found. Please set it in .env file")

print(f"✓ Model: {MODEL}")
print(f"✓ Temperature: {TEMPERATURE}")

In [None]:
# Helper function for GPT-4 API calls
def call_gpt4(prompt, system_prompt="You are a helpful AI assistant.", temperature=0):
    """
    Wrapper function for GPT-4 API calls
    
    Args:
        prompt (str): User prompt
        system_prompt (str): System prompt to set behavior
        temperature (float): Controls randomness (0-2)
    
    Returns:
        str: Response text from GPT-4
    """
    try:
        response = client.chat.completions.create(
            model=MODEL,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": prompt}
            ],
            temperature=temperature
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"Error: {str(e)}"

print("✓ Helper function created")

In [None]:
# Test OpenAI connection
print("Testing connection...")
test_response = call_gpt4("Say 'Connection successful' if you receive this.")
print(f"Response: {test_response}")

if "successful" in test_response.lower():
    print("\n✓ Connection verified")
else:
    print("\n✗ Check your API key")

## Load Dataset

In [None]:
# Load credit applications from data folder
with open('../data/credit_applications.json', 'r') as f:
    credit_applications = json.load(f)

# Display dataset
df = pd.DataFrame(credit_applications)
print("Credit Application Dataset:")
print("=" * 100)
print(df.to_string(index=False))
print("=" * 100)

## Challenge 1: Basic Information Extraction

**Objective:** Extract specific fields and return clean JSON with no additional text.

**Requirements:**
- Extract: `application_id`, `applicant_name`, `requested_amount`, `credit_score`
- Output must be valid JSON (parseable)
- No preamble or markdown code blocks

In [None]:
# Challenge 1 Solution

app = credit_applications[0]  # TechStart Solutions

prompt = f"""
Given this credit application data:

{json.dumps(app, indent=2)}

Extract exactly these four fields and return as JSON:

OUTPUT REQUIREMENTS:
1. Return ONLY a JSON object
2. No markdown, no code blocks, no explanations
3. Use exact keys: "application_id", "applicant_name", "requested_amount", "credit_score"
4. Preserve data types: strings for text, numbers for numeric values
5. Start response with {{ and end with }}
"""

system_prompt = "You are a JSON extraction tool. Output valid JSON only."

response = call_gpt4(prompt, system_prompt)
print("Response:")
print(response)

# Validate
try:
    parsed = json.loads(response)
    print("\n✓ JSON is valid")
    print(f"✓ Contains {len(parsed)} fields")
    
    required_keys = ["application_id", "applicant_name", "requested_amount", "credit_score"]
    missing = [k for k in required_keys if k not in parsed]
    
    if not missing:
        print("✓ All required fields present")
    else:
        print(f"✗ Missing fields: {missing}")
        
except json.JSONDecodeError as e:
    print(f"\n✗ Invalid JSON: {e}")

## Challenge 2: Multi-Step Reasoning

**Objective:** Analyze credit application step-by-step using chain-of-thought reasoning.

**Lending Criteria:**
1. Credit score ≥ 680
2. Debt Service Coverage Ratio (DSCR) ≥ 1.25
3. Years in business ≥ 2
4. Collateral value ≥ requested amount
5. Industry risk rating not "High" for loans > $300,000

In [None]:
# Challenge 2 Solution

app = credit_applications[3]  # FastTrack Logistics - marginal case

prompt = f"""
You are a senior credit analyst evaluating a commercial loan application. 
Analyze the following application step-by-step against our lending criteria.

APPLICATION DATA:
{json.dumps(app, indent=2)}

LENDING CRITERIA:
1. Credit score must be ≥ 680
2. Debt Service Coverage Ratio (DSCR) must be ≥ 1.25
3. Years in business must be ≥ 2
4. Collateral value must be ≥ requested loan amount
5. Industry risk rating must not be "High" for loans exceeding $300,000

ANALYSIS INSTRUCTIONS:
For each criterion, use this format:

Criterion [number]: [description]
- Applicant value: [actual value]
- Required value: [threshold]
- Assessment: [PASS or FAIL]
- Reasoning: [brief explanation]

After analyzing all criteria, provide:

OVERALL DECISION: [APPROVE or DENY]
SUMMARY: [2-3 sentence explanation]

Begin your analysis now.
"""

system_prompt = """You are a senior credit analyst with 15 years of experience. 
You always provide thorough, step-by-step analysis with clear reasoning. 
You never make recommendations without examining all criteria."""

print("CHAIN-OF-THOUGHT ANALYSIS:")
print("=" * 80)
response = call_gpt4(prompt, system_prompt)
print(response)
print("=" * 80)

## Challenge 3: Output Format Control

**Objective:** Generate a standardized credit decision letter with exact formatting.

**Format Requirements:**
1. First line: exactly "CREDIT DECISION NOTICE"
2. Second line: "Application ID: [id] | Date: [YYYY-MM-DD]"
3. Decision: APPROVED, DENIED, or CONDITIONAL APPROVAL
4. If DENIED: list exactly 3 specific reasons (numbered)
5. Final line: exactly "This decision is final and binding."
6. Total length: 150-200 words

In [None]:
# Challenge 3 Solution

app = credit_applications[4]  # Sunrise Restaurant - likely denial

prompt = f"""
Generate a credit decision letter for this loan application following EXACT format requirements.

APPLICATION DATA:
{json.dumps(app, indent=2)}

MANDATORY FORMAT REQUIREMENTS:
1. First line: exactly "CREDIT DECISION NOTICE"
2. Second line: "Application ID: [id] | Date: [YYYY-MM-DD format]"
3. Blank line
4. "Decision: " followed by one of: APPROVED, DENIED, CONDITIONAL APPROVAL
5. If DENIED, list exactly 3 specific reasons (numbered 1, 2, 3)
6. Each reason must reference specific application data
7. Final line: exactly "This decision is final and binding."
8. Total length: 150-200 words (excluding header/footer)

LENDING CRITERIA:
- Credit score minimum: 680
- DSCR minimum: 1.25
- Years in business minimum: 2
- Collateral must cover loan amount
- High-risk industries require DSCR ≥ 1.5 for loans > $100,000

Generate the letter now. Follow the format precisely.
"""

system_prompt = """You are an automated credit decision system. 
You generate standardized decision letters following exact formatting requirements.
You never deviate from specified formats."""

print("FORMATTED DECISION LETTER:")
print("=" * 80)
response = call_gpt4(prompt, system_prompt)
print(response)
print("\n" + "=" * 80)

In [None]:
# Validate format compliance

lines = response.strip().split('\n')

print("\nFORMAT VALIDATION:")
print("-" * 80)

# Check 1: Header
if lines[0].strip() == "CREDIT DECISION NOTICE":
    print("✓ Header correct")
else:
    print(f"✗ Header incorrect: '{lines[0]}'")

# Check 2: Application ID line
if "Application ID:" in lines[1] and "Date:" in lines[1]:
    print("✓ Application ID and date present")
else:
    print("✗ Second line format incorrect")

# Check 3: Decision keyword
decision_lines = [l for l in lines if "Decision:" in l]
if decision_lines:
    decision = decision_lines[0].split("Decision:")[1].strip()
    if decision in ["APPROVED", "DENIED", "CONDITIONAL APPROVAL"]:
        print(f"✓ Valid decision: {decision}")
    else:
        print(f"✗ Invalid decision: {decision}")

# Check 4: Footer
if lines[-1].strip() == "This decision is final and binding.":
    print("✓ Footer correct")
else:
    print(f"✗ Footer incorrect: '{lines[-1]}'")

# Check 5: Word count
word_count = len(response.split())
if 150 <= word_count <= 220:
    print(f"✓ Word count OK: {word_count}")
else:
    print(f"⚠ Word count: {word_count} (target: 150-200)")

# Check 6: Numbered reasons for denial
if "DENIED" in response:
    if all(f"{i}." in response for i in [1, 2, 3]):
        print("✓ Three numbered reasons provided")
    else:
        print("✗ Must include 3 numbered reasons")

print("=" * 80)

## Challenge 4: Edge Case Handling

**Objective:** Handle incomplete/invalid data without hallucinating.

**Requirements:**
- Identify ALL data quality issues
- Do NOT crash or hallucinate missing data
- Provide clear report of problems

In [None]:
# Edge case applications with data quality issues

edge_case_apps = [
    {
        "application_id": "CA-2024-101",
        "applicant_name": "Mystery Corp",
        "business_type": "Unknown",
        "years_in_business": None,  # Missing
        "annual_revenue": 0,  # Suspicious
        "requested_amount": 1000000,  # Large request
        "credit_score": None,  # Missing
        "existing_debt": "Not disclosed",  # Wrong type
        "collateral_value": -50000,  # Invalid negative
    },
    {
        "application_id": "CA-2024-102",
        "applicant_name": "",  # Empty
        "business_type": "Retail",
        "requested_amount": "five hundred thousand",  # Text not number
        "credit_score": 1200,  # Impossible (max is 850)
    }
]

print("Edge Case Applications:")
for i, app in enumerate(edge_case_apps, 1):
    print(f"\nEdge Case {i}:")
    print(json.dumps(app, indent=2))

In [None]:
# Challenge 4 Solution

def analyze_data_quality(app_data):
    """Analyzes credit application for data quality issues"""
    
    prompt = f"""
You are a data quality analyst for a credit processing system. 
Identify data quality issues in this loan application BEFORE it reaches underwriting.

CRITICAL INSTRUCTIONS:
1. Do NOT make assumptions about missing data
2. Do NOT fill in missing values with estimates
3. Do NOT proceed with credit analysis if data is incomplete
4. Flag ALL data quality issues

APPLICATION DATA:
{json.dumps(app_data, indent=2)}

EXPECTED DATA STANDARDS:
- application_id: Non-empty string starting with "CA-"
- applicant_name: Non-empty string
- years_in_business: Positive integer or zero
- annual_revenue: Non-negative number
- requested_amount: Positive number
- credit_score: Integer between 300-850, or null
- existing_debt: Non-negative number
- collateral_value: Non-negative number

ANALYSIS FORMAT:
For each field:
- Field name
- Current value
- Issue (if any)
- Severity: CRITICAL (blocks processing) or WARNING (review needed)

Then provide:

DATA QUALITY SUMMARY:
- Total issues: [count]
- Critical issues: [count]
- Warnings: [count]

RECOMMENDATION:
Can this application proceed? Yes/No and why.

Begin analysis.
"""
    
    system_prompt = """You are a data quality validation system. 
You never make assumptions about missing data. 
You never fill in values not provided.
You flag all data quality issues objectively."""
    
    return call_gpt4(prompt, system_prompt, temperature=0)

# Test Edge Case 1
print("EDGE CASE 1: Missing and Invalid Data")
print("=" * 80)
response1 = analyze_data_quality(edge_case_apps[0])
print(response1)
print("\n" + "=" * 80 + "\n")

# Test Edge Case 2
print("EDGE CASE 2: Invalid Formats and Impossible Values")
print("=" * 80)
response2 = analyze_data_quality(edge_case_apps[1])
print(response2)
print("=" * 80)

## Challenge 5: Comparative Analysis

**Objective:** Compare all five applications and rank by approval likelihood.

**Requirements:**
- Rank all 5 applications from most to least likely to approve
- Provide data-driven justification for each
- Identify strongest/weakest factor for each

In [None]:
# Challenge 5 Solution

all_apps = credit_applications

prompt = f"""
You are a senior credit committee member reviewing multiple loan applications. 
Rank these by approval likelihood with detailed justification.

APPLICATIONS:
{json.dumps(all_apps, indent=2)}

LENDING CRITERIA (for reference):
1. Credit score ≥ 680 (weight: 25%)
2. DSCR ≥ 1.25 (weight: 30%)
3. Years in business ≥ 2 (weight: 15%)
4. Collateral coverage ≥ 100% (weight: 20%)
5. Industry risk rating (weight: 10%)

ANALYSIS INSTRUCTIONS:

Step 1: For each application, note:
- Strongest factor
- Weakest factor

Step 2: Rank 1 (most likely) to 5 (least likely)

Step 3: For each rank, provide:
- Rank number
- Application ID and name
- Assessment: STRONG / MODERATE / WEAK / HIGH RISK
- Justification (2-3 sentences with specific metrics)

Step 4: Summary:
- Clear winner and why
- Clear reject and why
- Borderline cases needing review

Use data-driven reasoning. Reference specific metrics.
"""

system_prompt = """You are a senior credit analyst with expertise in comparative analysis.
You provide objective, data-driven rankings based on established criteria.
You always support conclusions with specific metrics."""

print("COMPARATIVE CREDIT ANALYSIS:")
print("=" * 80)
response = call_gpt4(prompt, system_prompt, temperature=0)
print(response)
print("=" * 80)

## Lab Summary

### Key Principles Learned

1. **Specificity beats vagueness** - The more explicit your prompt, the better the output
2. **Structure enables consistency** - Templates and formats improve reliability
3. **Validation is mandatory** - Always programmatically verify outputs
4. **Chain-of-thought for BFSI** - Explainability is regulatory requirement
5. **Edge cases matter** - Production systems need robust error handling

