# 🤖 Natural Language Student Loan Risk Assessment

This notebook demonstrates an AI-powered interface for student loan risk assessment using borrower IDs. The system will:

1. **Look up** borrower information by ID from synthetic data
2. **Call** the Cloudera ML model for risk prediction  
3. **Generate** a natural language risk assessment report using AWS Bedrock

## Prerequisites
- AWS Bedrock access with Claude model permissions
- Cloudera ML model deployed and accessible
- AWS credentials configured
- Synthetic data files in `data/synthetic/`

## Usage
Simply provide a borrower ID (e.g., `BOR_000001`) and get a comprehensive risk assessment!


In [1]:
# Import required libraries
import json
import os
import requests
import boto3
import pandas as pd
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# For CML model access
try:
    import cmlapi
    CML_AVAILABLE = True
except ImportError:
    print("⚠️ cmlapi not available - using mock responses for demo")
    CML_AVAILABLE = False

print("📦 Libraries imported successfully")


📦 Libraries imported successfully


## 🔑 AWS Credentials Setup

**⚠️ IMPORTANT: Set your AWS credentials before running the next cell!**

You have 3 options to configure AWS access:

### Option 1: Direct (for demo/testing)
Edit the next cell and add your AWS credentials:
```python
AWS_ACCESS_KEY_ID = "your_access_key_here"
AWS_SECRET_ACCESS_KEY = "your_secret_key_here"
```

### Option 2: Environment Variables (recommended)
```bash
export AWS_ACCESS_KEY_ID="your_access_key_here"
export AWS_SECRET_ACCESS_KEY="your_secret_key_here"
```

### Option 3: AWS CLI
```bash
aws configure
```

**🎯 AWS Bedrock Requirements:**
- Bedrock service access in your AWS region
- Claude 3 Haiku model access enabled
- Appropriate IAM permissions for bedrock:InvokeModel


In [2]:
# 🔑 AWS CREDENTIALS CONFIGURATION
# ⚠️ ENTER YOUR AWS CREDENTIALS HERE:
AWS_ACCESS_KEY_ID = "your_access_key_here"      # Enter your AWS access key (starts with AKIA...)
AWS_SECRET_ACCESS_KEY = "your_secret_key_here"  # Enter your AWS secret key 
AWS_REGION = "us-east-1"    # Change to your preferred region (us-east-1, us-west-2, etc.)

# Alternative: Use environment variables (uncomment if using this method)
# AWS_ACCESS_KEY_ID = os.getenv('AWS_ACCESS_KEY_ID')
# AWS_SECRET_ACCESS_KEY = os.getenv('AWS_SECRET_ACCESS_KEY')

# AWS Bedrock Configuration
# Updated to use inference profile for on-demand throughput
BEDROCK_MODEL_ID = "us.anthropic.claude-3-haiku-20240307-v1:0"  # Inference profile for Claude 3 Haiku

# Alternative models that work with on-demand (uncomment to try different models):
# BEDROCK_MODEL_ID = "anthropic.claude-3-sonnet-20240229-v1:0"     # More capable but slower
# BEDROCK_MODEL_ID = "anthropic.claude-instant-v1"                 # Legacy but reliable
# BEDROCK_MODEL_ID = "amazon.titan-text-express-v1"                # Amazon's model

def test_aws_credentials():
    """Test AWS credentials and permissions"""
    try:
        # Test basic AWS access with STS
        sts_client = boto3.client(
            'sts',
            region_name=AWS_REGION,
            aws_access_key_id=AWS_ACCESS_KEY_ID,
            aws_secret_access_key=AWS_SECRET_ACCESS_KEY
        )
        
        # Get caller identity to test credentials
        identity = sts_client.get_caller_identity()
        print(f"✅ AWS Credentials Valid - Account: {identity.get('Account', 'Unknown')}")
        print(f"   User/Role: {identity.get('Arn', 'Unknown')}")
        return True
        
    except Exception as e:
        print(f"❌ AWS Credentials Test Failed: {str(e)}")
        return False

def test_bedrock_access():
    """Test Bedrock service access and model availability"""
    try:
        bedrock_client = boto3.client(
            'bedrock',
            region_name=AWS_REGION,
            aws_access_key_id=AWS_ACCESS_KEY_ID,
            aws_secret_access_key=AWS_SECRET_ACCESS_KEY
        )
        
        # List available models to test Bedrock access
        models = bedrock_client.list_foundation_models()
        claude_models = [m for m in models['modelSummaries'] if 'claude' in m['modelId'].lower()]
        print(f"✅ Bedrock Access Valid - Found {len(claude_models)} Claude models")
        
        # Test the specific model we want to use
        bedrock_runtime = boto3.client(
            'bedrock-runtime',
            region_name=AWS_REGION,
            aws_access_key_id=AWS_ACCESS_KEY_ID,
            aws_secret_access_key=AWS_SECRET_ACCESS_KEY
        )
        
        # Try a simple test call to validate the model works
        test_body = json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 10,
            "messages": [{"role": "user", "content": "Hello"}]
        })
        
        response = bedrock_runtime.invoke_model(
            body=test_body,
            modelId=BEDROCK_MODEL_ID,
            accept="application/json",
            contentType="application/json"
        )
        
        print(f"✅ Model Test Successful - {BEDROCK_MODEL_ID} is working")
        return True
        
    except Exception as e:
        print(f"❌ Bedrock Test Failed: {str(e)}")
        if "UnrecognizedClientException" in str(e):
            print("💡 This usually means invalid AWS credentials")
        elif "AccessDeniedException" in str(e):
            print("💡 Credentials valid but missing Bedrock permissions")
        elif "ValidationException" in str(e) and "inference profile" in str(e):
            print("💡 Model ID needs inference profile - trying alternative models...")
            return test_alternative_models()
        elif "ValidationException" in str(e):
            print("💡 Model validation failed - checking alternative models...")
            return test_alternative_models()
        return False

def test_alternative_models():
    """Try alternative Bedrock models that might be available"""
    alternative_models = [
        "anthropic.claude-instant-v1",
        "anthropic.claude-v2",
        "amazon.titan-text-express-v1",
        "ai21.j2-mid-v1",
        "cohere.command-text-v14"
    ]
    
    global BEDROCK_MODEL_ID
    
    for model_id in alternative_models:
        try:
            print(f"🔄 Testing {model_id}...")
            bedrock_runtime = boto3.client(
                'bedrock-runtime',
                region_name=AWS_REGION,
                aws_access_key_id=AWS_ACCESS_KEY_ID,
                aws_secret_access_key=AWS_SECRET_ACCESS_KEY
            )
            
            # Different models have different request formats
            if "claude" in model_id:
                if "instant" in model_id or "v2" in model_id:
                    # Legacy Claude format
                    test_body = json.dumps({
                        "prompt": "\n\nHuman: Hello\n\nAssistant:",
                        "max_tokens_to_sample": 10
                    })
                else:
                    # New Claude format
                    test_body = json.dumps({
                        "anthropic_version": "bedrock-2023-05-31",
                        "max_tokens": 10,
                        "messages": [{"role": "user", "content": "Hello"}]
                    })
            elif "titan" in model_id:
                test_body = json.dumps({
                    "inputText": "Hello",
                    "textGenerationConfig": {"maxTokenCount": 10}
                })
            elif "ai21" in model_id:
                test_body = json.dumps({
                    "prompt": "Hello",
                    "maxTokens": 10
                })
            elif "cohere" in model_id:
                test_body = json.dumps({
                    "prompt": "Hello",
                    "max_tokens": 10
                })
            
            response = bedrock_runtime.invoke_model(
                body=test_body,
                modelId=model_id,
                accept="application/json",
                contentType="application/json"
            )
            
            print(f"✅ Found working model: {model_id}")
            BEDROCK_MODEL_ID = model_id
            return True
            
        except Exception as e:
            print(f"   ❌ {model_id} failed: {str(e)[:100]}...")
            continue
    
    print("❌ No working Bedrock models found")
    return False

# Initialize Bedrock client
print("🔍 Testing AWS Configuration...")
print(f"📍 Region: {AWS_REGION}")
print(f"🔑 Access Key: {AWS_ACCESS_KEY_ID[:8]}...{AWS_ACCESS_KEY_ID[-4:] if AWS_ACCESS_KEY_ID else 'NOT SET'}")

if not AWS_ACCESS_KEY_ID or not AWS_SECRET_ACCESS_KEY:
    print("❌ AWS credentials not configured!")
    print("💡 Please set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY above")
    BEDROCK_AVAILABLE = False
else:
    # Test credentials step by step
    if test_aws_credentials():
        if test_bedrock_access():
            try:
                bedrock_runtime = boto3.client(
                    service_name='bedrock-runtime',
                    region_name=AWS_REGION,
                    aws_access_key_id=AWS_ACCESS_KEY_ID,
                    aws_secret_access_key=AWS_SECRET_ACCESS_KEY
                )
                print("✅ AWS Bedrock runtime client initialized successfully")
                BEDROCK_AVAILABLE = True
            except Exception as e:
                print(f"❌ Error creating Bedrock runtime client: {str(e)}")
                BEDROCK_AVAILABLE = False
        else:
            BEDROCK_AVAILABLE = False
    else:
        BEDROCK_AVAILABLE = False

if not BEDROCK_AVAILABLE:
    print("\n🔧 TROUBLESHOOTING GUIDE:")
    print("1. 🔑 Get AWS credentials from AWS Console > IAM > Users > Security Credentials")
    print("2. 🌍 Ensure Bedrock is available in your region (us-east-1, us-west-2 recommended)")
    print("3. 🛡️  Request Bedrock model access: AWS Console > Bedrock > Model Access")
    print("4. 📋 Ensure IAM permissions include: bedrock:InvokeModel, bedrock:ListFoundationModels")
    print("5. ⏰ Check if credentials are expired or deactivated")
    print("\n🎯 For now, the system will use MOCK responses for demonstration")


🔍 Testing AWS Configuration...
📍 Region: us-east-1
🔑 Access Key: AKIA6I6S...2NV4
✅ AWS Credentials Valid - Account: 981304421142
   User/Role: arn:aws:iam::981304421142:user/ktalbert
✅ Bedrock Access Valid - Found 25 Claude models
✅ Model Test Successful - us.anthropic.claude-3-haiku-20240307-v1:0 is working
✅ AWS Bedrock runtime client initialized successfully


In [3]:
# CML Model Configuration
if CML_AVAILABLE:
    try:
        # Dynamic model endpoint discovery
        client = cmlapi.default_client(
            url=os.getenv("CDSW_API_URL").replace("/api/v1", ""), 
            cml_api_key=os.getenv("CDSW_APIV2_KEY")
        )
        model = client.list_models(project_id=os.getenv("CDSW_PROJECT_ID"))
        selected_model = model.models[0]  # Use the first available model
        
        MODEL_ACCESS_KEY = selected_model.access_key
        MODEL_ENDPOINT = os.getenv("CDSW_API_URL").replace(
            "https://", "https://modelservice."
        ).replace("/api/v1", "/model?accessKey=") + MODEL_ACCESS_KEY
        print(MODEL_ENDPOINT)
        print(f"✅ CML Model endpoint discovered: {MODEL_ENDPOINT}")
    except Exception as e:
        print(f"❌ Error setting up CML model: {str(e)}")
        CML_AVAILABLE = False
else:
    print("⚠️ CML not available - will use mock model responses")


https://modelservice.ml-dbfc64d1-783.go01-dem.ylcu-atmi.cloudera.site/model?accessKey=mmvgj0ohn56keah5ypeemnd91mb2tf49
✅ CML Model endpoint discovered: https://modelservice.ml-dbfc64d1-783.go01-dem.ylcu-atmi.cloudera.site/model?accessKey=mmvgj0ohn56keah5ypeemnd91mb2tf49


## 🧠 Natural Language Processing Functions


In [4]:
def call_bedrock_claude(prompt, max_tokens=1000):
    """
    Call AWS Bedrock Claude model with a prompt.
    """
    if not BEDROCK_AVAILABLE:
        # Enhanced mock response that looks professional
        if "risk assessment report" in prompt.lower():
            return """**Executive Summary**
This borrower presents a moderate risk profile based on the available financial and demographic data. While there are some areas of concern, the overall assessment suggests manageable risk with appropriate monitoring.

**Key Risk Factors**
• Credit score indicates potential payment challenges
• Debt-to-income ratio may strain monthly budget
• Employment status affects income stability

**Positive Indicators**
• Educational background demonstrates commitment to advancement
• Age profile suggests career growth potential
• Housing stability provides foundation for financial planning

**Recommendations for StudentCare**
• Implement proactive outreach program with monthly check-ins
• Provide financial literacy resources and budgeting assistance
• Consider flexible payment options during economic transitions
• Monitor for early warning signs of payment difficulties

**Next Steps**
• Schedule initial consultation within 30 days
• Establish automated payment monitoring alerts
• Review account status quarterly for risk reassessment
• Document all interventions for compliance tracking

*Note: This assessment was generated using mock data for demonstration purposes.*"""
        else:
            return "Mock response: This is a demonstration response since AWS Bedrock is not configured. Please set up your AWS credentials to get real AI-generated responses."
    
    try:
        # Handle different model formats
        if "claude" in BEDROCK_MODEL_ID:
            if "instant" in BEDROCK_MODEL_ID or "v2" in BEDROCK_MODEL_ID:
                # Legacy Claude format
                body = json.dumps({
                    "prompt": f"\n\nHuman: {prompt}\n\nAssistant:",
                    "max_tokens_to_sample": max_tokens,
                    "temperature": 0.1,
                    "top_p": 0.9
                })
            else:
                # New Claude format (Claude 3)
                body = json.dumps({
                    "anthropic_version": "bedrock-2023-05-31",
                    "max_tokens": max_tokens,
                    "messages": [
                        {
                            "role": "user",
                            "content": prompt
                        }
                    ],
                    "temperature": 0.1,
                    "top_p": 0.9
                })
        elif "titan" in BEDROCK_MODEL_ID:
            # Amazon Titan format
            body = json.dumps({
                "inputText": prompt,
                "textGenerationConfig": {
                    "maxTokenCount": max_tokens,
                    "temperature": 0.1,
                    "topP": 0.9
                }
            })
        elif "ai21" in BEDROCK_MODEL_ID:
            # AI21 format
            body = json.dumps({
                "prompt": prompt,
                "maxTokens": max_tokens,
                "temperature": 0.1,
                "topP": 0.9
            })
        elif "cohere" in BEDROCK_MODEL_ID:
            # Cohere format
            body = json.dumps({
                "prompt": prompt,
                "max_tokens": max_tokens,
                "temperature": 0.1,
                "p": 0.9
            })
        else:
            # Default to Claude 3 format
            body = json.dumps({
                "anthropic_version": "bedrock-2023-05-31",
                "max_tokens": max_tokens,
                "messages": [
                    {
                        "role": "user",
                        "content": prompt
                    }
                ]
            })
        
        response = bedrock_runtime.invoke_model(
            body=body,
            modelId=BEDROCK_MODEL_ID,
            accept="application/json",
            contentType="application/json"
        )
        
        response_body = json.loads(response.get('body').read())
        
        # Extract text based on model type
        if "claude" in BEDROCK_MODEL_ID:
            if "instant" in BEDROCK_MODEL_ID or "v2" in BEDROCK_MODEL_ID:
                return response_body['completion']
            else:
                return response_body['content'][0]['text']
        elif "titan" in BEDROCK_MODEL_ID:
            return response_body['results'][0]['outputText']
        elif "ai21" in BEDROCK_MODEL_ID:
            return response_body['completions'][0]['data']['text']
        elif "cohere" in BEDROCK_MODEL_ID:
            return response_body['generations'][0]['text']
        else:
            # Try to extract text from common response formats
            if 'content' in response_body and response_body['content']:
                return response_body['content'][0]['text']
            elif 'completion' in response_body:
                return response_body['completion']
            elif 'text' in response_body:
                return response_body['text']
            else:
                return str(response_body)
    
    except Exception as e:
        return f"Error calling Bedrock: {str(e)}"

def load_borrower_data():
    """
    Load all borrower data from CSV files for lookup.
    """
    try:
        # Load the master dataset which has all features needed for ML model
        master_df = pd.read_csv('data/synthetic/student_loan_master_dataset.csv')
        print(f"✅ Loaded {len(master_df)} borrowers from master dataset")
        print(f"📊 Available columns: {len(master_df.columns)}")
        
        # Also load borrowers file to get risk segments for examples
        try:
            borrowers_df = pd.read_csv('data/synthetic/student_loan_borrowers.csv')
            # Merge risk segment into master dataset
            master_df = master_df.merge(
                borrowers_df[['borrower_id', '_risk_segment']], 
                on='borrower_id', 
                how='left'
            )
            print(f"✅ Added risk segments from borrowers file")
        except Exception as merge_error:
            print(f"⚠️  Could not load risk segments: {str(merge_error)}")
        
        return master_df
    except Exception as e:
        print(f"❌ Error loading borrower data: {str(e)}")
        return None

def get_borrower_by_id(borrower_id, master_df):
    """
    Get borrower information by ID.
    """
    borrower = master_df[master_df['borrower_id'] == borrower_id]
    
    if borrower.empty:
        print(f"❌ Borrower {borrower_id} not found")
        return None
    
    # Convert to dictionary for model input
    borrower_data = borrower.iloc[0].to_dict()
    
    # Remove any NaN values and convert to appropriate types
    for key, value in borrower_data.items():
        if pd.isna(value):
            borrower_data[key] = None
        elif isinstance(value, (pd.Int64Dtype, pd.Float64Dtype)):
            borrower_data[key] = float(value) if pd.notna(value) else None
    
    return borrower_data

print("🧠 Data loading functions defined")


🧠 Data loading functions defined


In [5]:
def call_cml_model(borrower_data):
    """
    Call the Cloudera ML model with borrower features (copied from model_demo.ipynb).
    """
    if not CML_AVAILABLE:
        # Mock response for demo purposes
        credit_score = borrower_data.get('credit_score_at_origination', 650)
        debt_ratio = borrower_data.get('debt_to_income_ratio', 0.5)
        
        # Simple mock risk calculation
        risk_score = max(0.1, min(0.9, (850 - credit_score) / 550 + debt_ratio * 0.3))
        risk_category = 'High' if risk_score > 0.7 else 'Medium' if risk_score > 0.3 else 'Low'
        
        return {
            'success': True,
            'risk_assessment': {
                'overall_risk_probability': risk_score,
                'risk_category': risk_category,
                'risk_score': risk_score * 100
            }
        }
    
    try:
        # Construct the request payload for CML model service (same as model_demo.ipynb)
        payload = {
            "accessKey": MODEL_ACCESS_KEY,
            "request": borrower_data
        }
        
        headers = {
            'Content-Type': 'application/json'
        }
        
        # Call the model endpoint
        endpoint_url = MODEL_ENDPOINT.split('?')[0]
        response = requests.post(
            endpoint_url, 
            json=payload, 
            headers=headers, 
            timeout=30
        )
        
        if response.status_code == 200:
            result = response.json()
            print(f"🔍 Raw model response: {result}")  # Debug output
            
            # Extract risk assessment from the actual CML model response structure
            risk_assessment = {}
            
            # Based on the actual response structure:
            # {'response': {'prediction': {'risk_assessment': {...}}}}
            if 'response' in result and 'prediction' in result['response']:
                prediction = result['response']['prediction']
                
                if 'risk_assessment' in prediction:
                    # Use the main risk_assessment from the selected model
                    risk_assessment = prediction['risk_assessment'].copy()
                    
                    # Also store information about all model predictions for transparency
                    if 'all_model_predictions' in prediction:
                        risk_assessment['all_models'] = prediction['all_model_predictions']
                    
                    # Add metadata
                    risk_assessment['model_used'] = prediction.get('model_used', 'unknown')
                    risk_assessment['borrower_id'] = prediction.get('borrower_id', 'unknown')
                    risk_assessment['prediction_timestamp'] = prediction.get('prediction_timestamp', 'unknown')
                    
                elif 'all_model_predictions' in prediction:
                    # If no main risk_assessment, use the first available model
                    all_preds = prediction['all_model_predictions']
                    if all_preds:
                        # Use the first model's prediction as primary
                        first_model = next(iter(all_preds.values()))
                        risk_assessment = first_model.copy()
                        risk_assessment['all_models'] = all_preds
                        risk_assessment['model_used'] = next(iter(all_preds.keys()))
            
            # Fallback to old structure if new parsing didn't work
            if not risk_assessment:
                if 'risk_assessment' in result:
                    risk_assessment = result['risk_assessment']
                elif 'response' in result and isinstance(result['response'], dict):
                    if 'risk_assessment' in result['response']:
                        risk_assessment = result['response']['risk_assessment']
                    else:
                        risk_assessment = result['response']
                else:
                    risk_assessment = result
            
            # Ensure we have the required fields with fallback values
            if not risk_assessment.get('risk_category'):
                # Try to determine risk category from probability
                prob = risk_assessment.get('overall_risk_probability', 0)
                if isinstance(prob, (int, float)):
                    if prob > 0.7:
                        risk_assessment['risk_category'] = 'High'
                    elif prob > 0.3:
                        risk_assessment['risk_category'] = 'Medium'
                    else:
                        risk_assessment['risk_category'] = 'Low'
            
            if not risk_assessment.get('risk_score'):
                prob = risk_assessment.get('overall_risk_probability', 0)
                risk_assessment['risk_score'] = prob * 100 if isinstance(prob, (int, float)) else 0
            
            return {
                'success': True,
                'risk_assessment': risk_assessment,
                'full_response': result
            }
        else:
            return {
                'success': False,
                'error': f"HTTP {response.status_code}: {response.text}"
            }
    
    except Exception as e:
        return {
            'success': False,
            'error': str(e)
        }

print("🎯 ML model function defined")


🎯 ML model function defined


## 🚀 Main Risk Assessment Functions


In [6]:
def generate_risk_report(borrower_data, model_results):
    """
    Generate a natural language risk assessment report using AWS Bedrock.
    """
    
    if not model_results['success']:
        return f"❌ Error generating risk assessment: {model_results.get('error', 'Unknown error')}"
    
    # Extract risk information
    risk_data = model_results.get('risk_assessment', {})
    risk_probability = risk_data.get('risk_probability', 0.5)
    risk_category = risk_data.get('risk_category', 'Medium')
    risk_score = risk_data.get('risk_score', 50)
    model_used = risk_data.get('model_used', 'Machine Learning Model')
    
    # Get all model predictions for comprehensive analysis
    all_models = risk_data.get('all_models', {})
    
    # Format borrower information for the report
    borrower_summary = f"""
Borrower Profile:
- ID: {borrower_data.get('borrower_id', 'Unknown')}
- Age: {borrower_data.get('age', 'Unknown')}
- Credit Score: {borrower_data.get('credit_score_at_origination', 'Unknown')}
- Annual Income: ${borrower_data.get('annual_income', 0):,.0f}
- Employment: {borrower_data.get('employment_status', 'Unknown')}
- Total Loan Amount: ${borrower_data.get('total_loan_amount', 0):,.0f}
- Debt-to-Income Ratio: {borrower_data.get('debt_to_income_ratio', 0):.1%}
- Education: {borrower_data.get('degree_type', 'Unknown')} degree, graduated {borrower_data.get('graduation_year', 'Unknown')}
- GPA: {borrower_data.get('gpa', 'Unknown')}
- Location: {borrower_data.get('state', 'Unknown')}
- Housing: {borrower_data.get('housing_status', 'Unknown')}
"""
    
    # Create model summary for the report
    model_summary = f"""
ML Model Results:
- Primary Model: {model_used}
- Risk Category: {risk_category}
- Risk Score: {risk_score:.1f}/100
- Risk Probability: {risk_probability:.1%}"""
    
    # Add ensemble analysis if multiple models were used
    if all_models:
        model_summary += f"\n\nModel Consensus Analysis:"
        for model_name, prediction in all_models.items():
            model_summary += f"\n- {model_name}: {prediction.get('risk_category', 'Unknown')} ({prediction.get('risk_score', 0):.1f}/100)"
        
        # Calculate consensus
        categories = [pred.get('risk_category') for pred in all_models.values()]
        if categories:
            high_count = categories.count('High')
            medium_count = categories.count('Medium')
            low_count = categories.count('Low')
            model_summary += f"\n- Consensus: {high_count} High, {medium_count} Medium, {low_count} Low risk predictions"

    # Create prompt for natural language generation
    report_prompt = f"""
You are a senior loan analyst at StudentCare Solutions writing a professional risk assessment report. 

{borrower_summary}

{model_summary}

Please provide a comprehensive but concise risk assessment report with:

1. **Executive Summary** (2-3 sentences about overall risk level and model consensus)
2. **Key Risk Factors** (specific concerns based on the data)
3. **Positive Indicators** (strengths that reduce risk)
4. **Model Analysis** (brief comment on the ML model predictions and any consensus/disagreement)
5. **Recommendations for StudentCare** (specific actions to take)
6. **Next Steps** (timeline and follow-up actions)

Keep the tone professional but empathetic. Focus on helping the borrower succeed rather than just identifying problems. Be specific about the data points that drive the assessment.
"""
    
    return call_bedrock_claude(report_prompt, max_tokens=800)

def assess_borrower_risk_by_id(borrower_id, master_df):
    """
    Complete pipeline: Borrower ID → Data Lookup → ML Model → Natural Language Report
    """
    print(f"🔄 Processing borrower: {borrower_id}")
    print("=" * 50)
    
    # Step 1: Look up borrower data
    print("1️⃣ Looking up borrower information...")
    borrower_data = get_borrower_by_id(borrower_id, master_df)
    
    if borrower_data is None:
        return {"error": f"Borrower {borrower_id} not found"}
    
    # Display key borrower info
    print(f"✅ Found borrower {borrower_id}")
    print(f"   Age: {borrower_data.get('age')}, Credit Score: {borrower_data.get('credit_score_at_origination')}")
    print(f"   Income: ${borrower_data.get('annual_income', 0):,.0f}, Employment: {borrower_data.get('employment_status')}")
    print(f"   Total Loans: ${borrower_data.get('total_loan_amount', 0):,.0f}")
    print()
    
    # Step 2: Call ML model
    print("2️⃣ Calling Cloudera ML model...")
    model_results = call_cml_model(borrower_data)
    
    if model_results['success']:
        risk_data = model_results.get('risk_assessment', {})
        
        # Main risk assessment from primary model
        print(f"🎯 Risk Category: {risk_data.get('risk_category', 'Unknown')}")
        print(f"📈 Risk Score: {risk_data.get('risk_score', 0):.1f}/100")
        print(f"⚠️  Risk Probability: {risk_data.get('risk_probability', 0):.1%}")
        print(f"🤖 Model Used: {risk_data.get('model_used', 'Unknown')}")
        
        # Show all model predictions if available
        if 'all_models' in risk_data:
            print(f"\n📊 All Model Predictions:")
            for model_name, prediction in risk_data['all_models'].items():
                category = prediction.get('risk_category', 'Unknown')
                score = prediction.get('risk_score', 0)
                prob = prediction.get('risk_probability', 0)
                
                # Add emoji based on risk level
                if category == 'High':
                    emoji = "🔴"
                elif category == 'Medium':
                    emoji = "🟡"
                else:
                    emoji = "🟢"
                
                print(f"   {emoji} {model_name}: {category} ({score:.1f}/100, {prob:.1%})")
        
    else:
        print(f"❌ Model call failed: {model_results.get('error', 'Unknown error')}")
    print()
    
    # Step 3: Generate natural language report
    print("3️⃣ Generating risk assessment report...")
    risk_report = generate_risk_report(borrower_data, model_results)
    
    print("📋 RISK ASSESSMENT REPORT")
    print("=" * 50)
    print(risk_report)
    print("=" * 50)
    
    return {
        'borrower_data': borrower_data,
        'model_results': model_results,
        'risk_report': risk_report
    }

print("🚀 Main assessment functions ready")


🚀 Main assessment functions ready


## 📊 Initialize Data and Examples


In [7]:
# Load borrower data
print("📂 Loading borrower data...")
master_df = load_borrower_data()

if master_df is not None:
    print(f"\n📋 Available borrower IDs (first 10):")
    print(master_df['borrower_id'].head(10).tolist())
    
    print(f"\n📊 Sample risk segments:")
    risk_counts = master_df['_risk_segment'].value_counts() if '_risk_segment' in master_df.columns else "N/A"
    print(risk_counts)
else:
    print("❌ Failed to load data - check file paths")


📂 Loading borrower data...
✅ Loaded 10000 borrowers from master dataset
📊 Available columns: 36
✅ Added risk segments from borrowers file

📋 Available borrower IDs (first 10):
['BOR_000001', 'BOR_000002', 'BOR_000003', 'BOR_000004', 'BOR_000005', 'BOR_000006', 'BOR_000007', 'BOR_000008', 'BOR_000009', 'BOR_000010']

📊 Sample risk segments:
_risk_segment
low       4000
medium    3500
high      2500
Name: count, dtype: int64


## 🎯 Risk Assessment Examples

Try these examples with different risk profiles:


In [8]:
# Example 1: Assess a high-risk borrower
if master_df is not None:
    print("🔴 HIGH-RISK BORROWER EXAMPLE")
    print("=" * 40)
    
    # Find a high-risk borrower if available
    high_risk_borrowers = master_df[master_df['_risk_segment'] == 'high']['borrower_id'].head(1)
    if not high_risk_borrowers.empty:
        borrower_id = high_risk_borrowers.iloc[0]
        result1 = assess_borrower_risk_by_id(borrower_id, master_df)
    else:
        # Use first borrower as example
        borrower_id = master_df['borrower_id'].iloc[0]
        print(f"Using first available borrower: {borrower_id}")
        result1 = assess_borrower_risk_by_id(borrower_id, master_df)
else:
    print("❌ Cannot run example - data not loaded")


🔴 HIGH-RISK BORROWER EXAMPLE
🔄 Processing borrower: BOR_000018
1️⃣ Looking up borrower information...
✅ Found borrower BOR_000018
   Age: 27, Credit Score: 537
   Income: $20,000, Employment: Student
   Total Loans: $98,071

2️⃣ Calling Cloudera ML model...
🔍 Raw model response: {'success': True, 'response': {'model_deployment_crn': 'crn:cdp:ml:us-west-1:8a1e15cd-04c2-48aa-8f35-b4a8c11997d3:workspace:cd911947-f3e1-4596-adb4-8d87bd1060e9/fefd24f0-cc92-4331-8c3a-119374534d4c', 'prediction': {'all_model_predictions': {'gradient_boosting': {'risk_category': 'High', 'risk_probability': 0.9026, 'risk_score': 90.26}, 'logistic_regression': {'risk_category': 'High', 'risk_probability': 0.9626, 'risk_score': 96.26}, 'random_forest': {'risk_category': 'High', 'risk_probability': 0.9208, 'risk_score': 92.08}, 'xgboost': {'risk_category': 'High', 'risk_probability': 0.8962, 'risk_score': 89.62}}, 'borrower_id': 'BOR_000018', 'model_metadata': {'initialization_time_s': 1.7630066871643066, 'models_a

In [9]:
# Example 2: Assess a low-risk borrower
if master_df is not None:
    print("\n\n🟢 LOW-RISK BORROWER EXAMPLE")
    print("=" * 40)
    
    # Find a low-risk borrower if available
    low_risk_borrowers = master_df[master_df['_risk_segment'] == 'low']['borrower_id'].head(1)
    if not low_risk_borrowers.empty:
        borrower_id = low_risk_borrowers.iloc[0]
        result2 = assess_borrower_risk_by_id(borrower_id, master_df)
    else:
        # Use a different borrower as example
        borrower_id = master_df['borrower_id'].iloc[1] if len(master_df) > 1 else master_df['borrower_id'].iloc[0]
        print(f"Using available borrower: {borrower_id}")
        result2 = assess_borrower_risk_by_id(borrower_id, master_df)
else:
    print("❌ Cannot run example - data not loaded")




🟢 LOW-RISK BORROWER EXAMPLE
🔄 Processing borrower: BOR_000003
1️⃣ Looking up borrower information...
✅ Found borrower BOR_000003
   Age: 30, Credit Score: 650
   Income: $54,599, Employment: Employed
   Total Loans: $58,176

2️⃣ Calling Cloudera ML model...
🔍 Raw model response: {'success': True, 'response': {'model_deployment_crn': 'crn:cdp:ml:us-west-1:8a1e15cd-04c2-48aa-8f35-b4a8c11997d3:workspace:cd911947-f3e1-4596-adb4-8d87bd1060e9/fefd24f0-cc92-4331-8c3a-119374534d4c', 'prediction': {'all_model_predictions': {'gradient_boosting': {'risk_category': 'Low', 'risk_probability': 0.0538, 'risk_score': 5.38}, 'logistic_regression': {'risk_category': 'Low', 'risk_probability': 0.1066, 'risk_score': 10.66}, 'random_forest': {'risk_category': 'Low', 'risk_probability': 0.0478, 'risk_score': 4.78}, 'xgboost': {'risk_category': 'Low', 'risk_probability': 0.0471, 'risk_score': 4.71}}, 'borrower_id': 'BOR_000003', 'model_metadata': {'initialization_time_s': 1.7630066871643066, 'models_availa

## 🎯 Interactive Assessment

Use this cell to assess any borrower by ID:


In [10]:
# Interactive assessment - modify the borrower_id below
borrower_id_to_assess = "BOR_000001"  # Change this to any borrower ID

if master_df is not None:
    print(f"🔍 ASSESSING BORROWER: {borrower_id_to_assess}")
    print("=" * 50)
    result = assess_borrower_risk_by_id(borrower_id_to_assess, master_df)
    
    if 'error' not in result:
        print(f"\n✅ Assessment completed for {borrower_id_to_assess}")
    else:
        print(f"\n❌ {result['error']}")
        print(f"\n💡 Available borrower IDs:")
        print(master_df['borrower_id'].head(20).tolist())
else:
    print("❌ Cannot run assessment - data not loaded")


🔍 ASSESSING BORROWER: BOR_000001
🔄 Processing borrower: BOR_000001
1️⃣ Looking up borrower information...
✅ Found borrower BOR_000001
   Age: 40, Credit Score: 746
   Income: $67,038, Employment: Employed
   Total Loans: $78,767

2️⃣ Calling Cloudera ML model...
🔍 Raw model response: {'success': True, 'response': {'model_deployment_crn': 'crn:cdp:ml:us-west-1:8a1e15cd-04c2-48aa-8f35-b4a8c11997d3:workspace:cd911947-f3e1-4596-adb4-8d87bd1060e9/fefd24f0-cc92-4331-8c3a-119374534d4c', 'prediction': {'all_model_predictions': {'gradient_boosting': {'risk_category': 'Low', 'risk_probability': 0.0564, 'risk_score': 5.64}, 'logistic_regression': {'risk_category': 'Low', 'risk_probability': 0.0805, 'risk_score': 8.05}, 'random_forest': {'risk_category': 'Low', 'risk_probability': 0.0467, 'risk_score': 4.67}, 'xgboost': {'risk_category': 'Low', 'risk_probability': 0.0502, 'risk_score': 5.02}}, 'borrower_id': 'BOR_000001', 'model_metadata': {'initialization_time_s': 1.7630066871643066, 'models_avai

## 🎯 Summary

This notebook provides a complete natural language interface for student loan risk assessment:

### 🔄 **Workflow:**
1. **Data Lookup**: Find borrower by ID from synthetic data
2. **ML Prediction**: Call Cloudera ML model for risk scoring  
3. **Natural Language**: Generate professional reports using AWS Bedrock

### 📊 **Key Features:**
- **Real Data Integration**: Uses actual borrower data from CSV files
- **ML Model Integration**: Direct connection to deployed Cloudera ML model
- **Professional Reports**: AI-generated risk assessment narratives
- **Easy to Use**: Just provide a borrower ID (e.g., BOR_000001)

### 🎯 **Use Cases:**
- **Loan Officers**: Quick risk assessments with detailed explanations
- **Customer Service**: Understanding borrower risk profiles
- **Management Reporting**: Professional risk assessment summaries
- **Compliance**: Documented risk evaluation processes

### 🚀 **Next Steps:**
- **Integration**: Connect to CRM systems for automated processing
- **Customization**: Adjust report templates for different audiences  
- **Scaling**: Process multiple borrowers in batch operations
- **Monitoring**: Track risk assessments over time

### 💡 **Tips:**
- Use any borrower ID from the loaded dataset (BOR_000001 to BOR_000xxx)
- Check available IDs in the data loading section above
- Modify the `borrower_id_to_assess` variable to try different borrowers
- Both mock and real Bedrock/CML responses are supported for testing
