# Layer 3: LLM Deep Analysis

### What We're Building:
- Use Claude for nuanced analysis
- Get reasoning, not just labels
- Identify violation TYPE and SEVERITY

In [None]:
from snowflake.snowpark import Session

session = Session.builder.getOrCreate()
session.use_warehouse('COMPLIANCE_DEMO_WH')
session.use_database('COMPLIANCE_DEMO')
session.use_schema('ML')

print("Layer 3: Adding LLM intelligence...")

## Step 1: Review ML Classification Distribution

The ML model outputs three decisions:
- **HIGH_RISK**: Auto-escalate (high confidence violations)
- **NEEDS_REVIEW**: Send to LLM (uncertain cases)
- **LOW_RISK**: Auto-clear (high confidence clean)

In [None]:
stats = session.sql("""
    SELECT 
        ML_DECISION,
        COUNT(*) as cnt,
        SUM(CASE WHEN VIOLATION_LABEL = 1 THEN 1 ELSE 0 END) as violations
    FROM MODEL_PREDICTIONS_V1 
    GROUP BY 1
    ORDER BY 2 DESC
""").collect()

print("ML Decision Distribution:")
for row in stats:
    print(f"  {row['ML_DECISION']}: {row['CNT']:,} emails ({row['VIOLATIONS']:,} actual violations)")

needs_review = [r for r in stats if r['ML_DECISION'] == 'NEEDS_REVIEW'][0]
print(f"\n→ LLM will analyze {needs_review['CNT']} NEEDS_REVIEW emails")
print(f"→ These contain {needs_review['VIOLATIONS']} violations the ML was uncertain about")

## Step 2: Run Claude Analysis on NEEDS_REVIEW Emails

The LLM only runs on the uncertain bucket - emails where the ML model probability was between 0.3 and 0.7. This is where the LLM adds value by catching subtle violations the ML was unsure about.

In [None]:
session.sql("""
CREATE OR REPLACE TABLE COMPLIANCE_DEMO.ML.LLM_ANALYSIS AS
WITH uncertain_emails AS (
    SELECT 
        p.EMAIL_ID,
        p.COMPLIANCE_LABEL as ACTUAL_LABEL,
        p.ML_DECISION,
        p.VIOLATION_PROBABILITY,
        e.SUBJECT,
        e.BODY,
        e.SENDER_DEPT,
        e.RECIPIENT_DEPT
    FROM MODEL_PREDICTIONS_V1 p
    JOIN COMPLIANCE_DEMO.EMAIL_SURVEILLANCE.EMAILS e ON p.EMAIL_ID = e.EMAIL_ID
    WHERE p.ML_DECISION = 'NEEDS_REVIEW'
)
SELECT 
    EMAIL_ID,
    ACTUAL_LABEL,
    ML_DECISION,
    VIOLATION_PROBABILITY,
    SENDER_DEPT,
    RECIPIENT_DEPT,
    SUBJECT,
    BODY,
    
    AI_COMPLETE(
        model => 'claude-3-5-sonnet',
        prompt => CONCAT(
            'You are a hedge fund compliance expert. The ML model is UNCERTAIN about this email (probability: ', 
            ROUND(VIOLATION_PROBABILITY, 2)::VARCHAR, 
            '). Analyze carefully for subtle violations.\\n\\n',
            'VIOLATION TYPES:\\n',
            '- INSIDER_TRADING: Sharing MNPI, trading tips before announcements\\n',
            '- CONFIDENTIALITY_BREACH: Leaking client data, proprietary info\\n',
            '- PERSONAL_TRADING: Undisclosed personal trades, front-running\\n',
            '- INFO_BARRIER_VIOLATION: Research/Trading wall breaches\\n',
            '- CLEAN: Normal business communication\\n\\n',
            'Email from: ', SENDER_DEPT, ' to: ', RECIPIENT_DEPT, '\\n',
            'Subject: ', SUBJECT, '\\n',
            'Body: ', LEFT(BODY, 1500), '\\n\\n',
            'Respond in JSON format.'
        ),
        response_format => {
            'type': 'json',
            'schema': {
                'type': 'object',
                'properties': {
                    'is_violation': {
                        'type': 'boolean',
                        'description': 'Whether this email contains a compliance violation'
                    },
                    'confidence': {
                        'type': 'number',
                        'description': 'Confidence score between 0.0 and 1.0'
                    },
                    'category': {
                        'type': 'string',
                        'description': 'Violation category: INSIDER_TRADING, CONFIDENTIALITY_BREACH, PERSONAL_TRADING, INFO_BARRIER_VIOLATION, or CLEAN'
                    },
                    'reasoning': {
                        'type': 'string',
                        'description': 'One sentence explanation of the analysis'
                    }
                },
                'required': ['is_violation', 'confidence', 'category', 'reasoning']
            }
        }
    ) AS CLAUDE_ANALYSIS
FROM uncertain_emails
""").collect()

print("Claude analyzed all NEEDS_REVIEW emails!")

## Step 3: View Claude's Reasoning

In [None]:
results = session.sql("""
    SELECT 
        EMAIL_ID,
        ACTUAL_LABEL,
        SENDER_DEPT || ' -> ' || RECIPIENT_DEPT as COMMUNICATION,
        SUBJECT,
        BODY,
        CLAUDE_ANALYSIS
    FROM LLM_ANALYSIS
""").to_pandas()

print("\n" + "="*80)
print("CLAUDE'S COMPLIANCE ANALYSIS")
print("="*80)

for _, row in results.iterrows():
    print(f"\n--- Email: {row['EMAIL_ID'][:8]}... ---")
    print(f"Route: {row['COMMUNICATION']}")
    print(f"Subject: {row['SUBJECT']}")
    print(f"Body: {row['BODY'][:300]}...")
    print(f"Actual Label: {row['ACTUAL_LABEL']}")
    print(f"\nClaude says: {row['CLAUDE_ANALYSIS'][:400]}...")

## The Power: Reasoning + Categories

Claude provides:
1. **Classification** - Is it a violation?
2. **Confidence** - How sure is it?
3. **Category** - WHAT TYPE of violation?
4. **Reasoning** - WHY is it a violation?

Keywords can't do this. ML can't do this. Only LLMs can explain their reasoning.

## Layer 3 Complete

**What we built:**
- LLM analysis with Claude
- Structured JSON output
- Human-readable reasoning

**Next:** Combine ML + LLM into a tiered architecture →