# 🎓 Tutorial: Evaluating LLM-Generated Financial Reports

Welcome to this comprehensive tutorial on evaluating LLM-generated investment analysis reports!

## What You'll Learn

1. **Load Financial Documents** - Import and manage financial statements
2. **Configure Investment Personas** - Define expert investor profiles
3. **Generate Investment Reports** - Create AI-powered analysis reports
4. **Generate Expert Questions** - Create persona-specific evaluation questions
5. **Evaluate Report Quality** - Assess coverage and alignment with investment philosophy
6. **Run Complete Pipeline** - Execute end-to-end analysis workflow

---

## 📦 Step 1: Setup and Imports

First, let's import all necessary modules and configure our environment.

In [None]:
# Import the main module
from evaluating_llm_fin_reports import (
    FinancialDocumentLoader,
    InvestmentReportGenerator,
    ReportEvaluator,
    InvestmentAnalysisPipeline,
    QuestionGenerator,
    CoverageEvaluator,
    PersonaAlignmentChecker
)

from pathlib import Path
import json
from pprint import pprint

print("✅ All modules imported successfully!")

## 📂 Step 2: Load Financial Documents

The `FinancialDocumentLoader` class helps you load and manage financial documents in markdown format.

In [None]:
# Initialize the document loader
MARKDOWN_DIR = Path("fin_docs")
doc_loader = FinancialDocumentLoader(MARKDOWN_DIR)

# Load all documents
documents = doc_loader.load_all_documents()

print(f"\n📊 Loaded {len(documents)} financial documents")
print("\n📋 Sample documents:")
for i, (filename, _) in enumerate(list(documents.items())[:5]):
    print(f"  {i+1}. {filename}")

### 🔍 Explore Document Metadata

The loader can parse metadata from filenames to identify company and year.

In [None]:
# Parse metadata from a sample document
sample_filename = list(documents.keys())[0]
metadata = doc_loader.parse_document_metadata(sample_filename)

print(f"\n📄 Sample Document: {sample_filename}")
print(f"\n  Company: {metadata['company']}")
print(f"  Year: {metadata['year']}")
print(f"  Filename: {metadata['filename']}")

### 🏢 Get Documents for a Specific Company

You can filter documents by company name or year.

In [None]:
# Find all unique companies
companies = set()
for filename in documents.keys():
    meta = doc_loader.parse_document_metadata(filename)
    if meta['company'] != 'unknown':
        companies.add(meta['company'])

print(f"\n🏢 Found {len(companies)} unique companies:")
for i, company in enumerate(sorted(list(companies)[:10])):
    print(f"  {i+1}. {company}")

# Get documents for a specific company (using first company as example)
if companies:
    example_company = sorted(list(companies))[0]
    company_docs = doc_loader.get_company_documents(example_company)
    print(f"\n📊 Documents for '{example_company}':")
    for year, content in company_docs.items():
        print(f"  - Year {year}: {len(content)} characters")

## 👤 Step 3: Define Investment Personas

Investment personas represent different investing philosophies (e.g., Warren Buffett, Benjamin Graham).

In [None]:
# Define investment personas with their philosophies
persona_definitions = {
    "Warren Buffett": """
        Focus on high-quality businesses with durable competitive advantages (economic moats), 
        excellent management, consistent earnings growth, and reasonable valuations. 
        Look for companies that can compound wealth over decades through reinvestment and organic growth.
    """,
    
    "Benjamin Graham": """
        Deep value investing with emphasis on margin of safety, asset-based valuation, 
        and contrarian opportunities. Focus on companies trading below intrinsic value 
        with strong balance sheets and conservative debt levels.
    """,
    
    "Peter Lynch": """
        Growth at a reasonable price (GARP) approach. Look for companies with strong 
        earnings growth, expanding markets, competent management, and reasonable P/E 
        ratios relative to growth rates.
    """,
    
    "Cathie Wood": """
        Disruptive innovation focus. Invest in companies leveraging breakthrough technologies 
        with potential for exponential growth. Prioritize market potential over current 
        profitability, with long-term (5+ year) horizon.
    """
}

print("✅ Investment personas defined:")
for persona in persona_definitions.keys():
    print(f"  - {persona}")

## 📝 Step 4: Generate Investment Reports

Now let's generate an AI-powered investment analysis report using one of our personas.

In [None]:
# Initialize the report generator
report_generator = InvestmentReportGenerator(persona_definitions)

print("✅ Report generator initialized!")
print("\nℹ️  The generator will:")
print("  1. Generate report outline based on persona")
print("  2. Analyze financial documents")
print("  3. Generate investment recommendation")
print("  4. Write detailed report sections")
print("  5. Assemble complete markdown report")

In [None]:
# Generate a sample report (this will take a minute)
# Select a company and document
if companies and company_docs:
    sample_company = sorted(list(companies))[0]
    company_docs = doc_loader.get_company_documents(sample_company)
    
    if company_docs:
        sample_year = list(company_docs.keys())[0]
        sample_doc = company_docs[sample_year]
        
        print(f"\n🤖 Generating report for:")
        print(f"  Company: {sample_company}")
        print(f"  Year: {sample_year}")
        print(f"  Persona: Warren Buffett")
        print(f"\n⏳ This may take 30-60 seconds...\n")
        
        # Generate the report
        report = report_generator(
            persona_name="Warren Buffett",
            company_name=sample_company,
            financial_documents=sample_doc,
            news_data=None
        )
        
        print("\n✅ Report generated successfully!")
        print(f"\n📊 Report length: {len(report)} characters")
        print(f"\n📄 Report preview (first 500 chars):\n")
        print("=" * 80)
        print(report[:500])
        print("...")
        print("=" * 80)
        
        # Save the report
        output_dir = Path("test_output")
        output_dir.mkdir(exist_ok=True)
        report_path = output_dir / f"sample_report_{sample_company}_{sample_year}.md"
        
        with open(report_path, 'w', encoding='utf-8') as f:
            f.write(report)
        
        print(f"\n💾 Full report saved to: {report_path}")
    else:
        print("⚠️  No documents found for selected company")
else:
    print("⚠️  No companies found in loaded documents")

## ❓ Step 5: Generate Expert Questions

The evaluation system generates questions that an expert investor would ask about a company.

In [None]:
# Initialize question generator
question_generator = QuestionGenerator()

# Company summary for context
company_summary = f"""
{sample_company} is a Jamaican company listed on the Jamaica Stock Exchange.
Analysis focuses on financial performance, competitive position, and growth prospects.
"""

print("\n🤖 Generating expert questions...\n")

# Generate questions for Warren Buffett persona
questions = question_generator(
    persona_name="Warren Buffett",
    persona_philosophy=persona_definitions["Warren Buffett"],
    company_summary=company_summary,
    n_questions=10
)

print(f"✅ Generated {len(questions.questions)} expert questions\n")
print("📋 Sample Questions:\n")
for i, q in enumerate(questions.questions[:5], 1):
    print(f"{i}. [{q.category.upper()}] {q.question}")
    print(f"   Importance: {q.importance:.2f} | Reasoning: {q.reasoning[:100]}...\n")

## 📊 Step 6: Evaluate Report Coverage

Now let's evaluate how well the generated report answers each question.

In [None]:
# Initialize coverage evaluator
coverage_evaluator = CoverageEvaluator()

print("\n📊 Evaluating report coverage...\n")

# Evaluate first few questions (evaluating all takes time)
evaluations = []
for i, question in enumerate(questions.questions[:3], 1):
    print(f"  Evaluating question {i}/{3}...")
    
    evaluation = coverage_evaluator(
        question=question,
        report=report,
        persona_name="Warren Buffett"
    )
    evaluations.append(evaluation)

print("\n✅ Evaluation complete!\n")
print("📋 Evaluation Results:\n")

for i, (q, eval_result) in enumerate(zip(questions.questions[:3], evaluations), 1):
    print(f"{i}. Question: {q.question}")
    print(f"   Answerable: {eval_result.answerable.upper()}")
    print(f"   Quality Score: {eval_result.quality_rating}/10")
    print(f"   Answer: {eval_result.answer[:150]}...")
    if eval_result.missing_information:
        print(f"   Missing: {', '.join(eval_result.missing_information[:2])}")
    print()

## 🎯 Step 7: Complete Report Evaluation

Use the `ReportEvaluator` to run a complete evaluation pipeline.

In [None]:
# Initialize complete evaluator
report_evaluator = ReportEvaluator(persona_definitions)

print("\n🚀 Running complete evaluation pipeline...\n")
print("This will:")
print("  1. Generate expert questions")
print("  2. Evaluate report coverage")
print("  3. Compute aggregate metrics")
print("  4. Assess persona alignment\n")
print("⏳ This may take 2-3 minutes...\n")

# Run complete evaluation
evaluation_results = report_evaluator(
    persona_name="Warren Buffett",
    company_summary=company_summary,
    report=report,
    n_questions=10  # Use fewer questions for faster execution
)

print("\n✅ Evaluation complete!\n")

### 📈 View Aggregate Metrics

In [None]:
# Extract results
metrics = evaluation_results['metrics']
alignment = evaluation_results['alignment']

print("="*80)
print("📊 EVALUATION METRICS")
print("="*80)
print(f"\n📈 Coverage Metrics:")
print(f"  Overall Coverage Rate: {metrics.coverage_rate:.1%}")
print(f"  Average Quality Score: {metrics.quality_score:.1f}/10")
print(f"\n📋 Question Breakdown:")
print(f"  Fully Answerable: {metrics.answerable_fully}")
print(f"  Partially Answerable: {metrics.answerable_partial}")
print(f"  Not Answerable: {metrics.not_answerable}")

if metrics.critical_gaps:
    print(f"\n⚠️  Critical Information Gaps:")
    for i, gap in enumerate(metrics.critical_gaps, 1):
        print(f"  {i}. {gap}")

print(f"\n🎯 Persona Alignment:")
print(f"  Alignment Score: {alignment.alignment_score:.1f}/10")
print(f"  Reasoning: {alignment.alignment_reasoning[:200]}...")

if alignment.philosophy_evidence:
    print(f"\n✅ Philosophy Evidence:")
    for i, evidence in enumerate(alignment.philosophy_evidence[:3], 1):
        print(f"  {i}. {evidence[:100]}...")

if alignment.philosophy_gaps:
    print(f"\n⚠️  Philosophy Gaps:")
    for i, gap in enumerate(alignment.philosophy_gaps[:3], 1):
        print(f"  {i}. {gap}")

print("\n" + "="*80)

### 📝 Export Detailed Results

In [None]:
# Export results to JSON
output_file = output_dir / "evaluation_results.json"

export_data = {
    "company": sample_company,
    "year": sample_year,
    "persona": "Warren Buffett",
    "metrics": metrics.model_dump(),
    "alignment": {
        "score": alignment.alignment_score,
        "reasoning": alignment.alignment_reasoning,
        "evidence": alignment.philosophy_evidence,
        "gaps": alignment.philosophy_gaps
    },
    "questions_evaluated": len(evaluation_results['questions'])
}

with open(output_file, 'w', encoding='utf-8') as f:
    json.dump(export_data, f, indent=2, ensure_ascii=False)

print(f"💾 Results exported to: {output_file}")

## 🔄 Step 8: Complete Pipeline (Advanced)

For production use, the `InvestmentAnalysisPipeline` automates the entire process across multiple companies and personas.

In [None]:
# Company summaries for the pipeline
company_summaries = {
    sample_company: company_summary
}

# Initialize complete pipeline
pipeline = InvestmentAnalysisPipeline(
    doc_loader=doc_loader,
    report_generator=report_generator,
    report_evaluator=report_evaluator,
    output_dir=Path("output")
)

print("✅ Complete pipeline initialized!")
print("\nℹ️  The pipeline can:")
print("  - Generate reports for multiple companies")
print("  - Test multiple LLM models")
print("  - Evaluate across different personas")
print("  - Generate comprehensive comparison reports")
print("\n⚠️  Note: Full pipeline execution can take significant time!")

### 🚀 Run Small-Scale Pipeline Test

In [None]:
# Run pipeline for a small test
# Uncomment to execute (warning: this takes time!)

# pipeline.run_complete_analysis(
#     models=["gemini"],  # Just one model
#     personas=["Warren Buffett"],  # Just one persona
#     companies=[sample_company],  # Just one company
#     company_summaries=company_summaries,
#     years=[sample_year]  # Just one year
# )

print("ℹ️  Uncomment the code above to run the complete pipeline.")
print("   Expected runtime: 3-5 minutes for this single test case.")

## 📚 Step 9: Summary and Next Steps

### What We Covered:

1. ✅ **Loaded financial documents** from markdown files
2. ✅ **Defined investment personas** with distinct philosophies
3. ✅ **Generated AI investment reports** aligned with persona styles
4. ✅ **Created expert evaluation questions** based on investment philosophy
5. ✅ **Evaluated report coverage** and quality
6. ✅ **Computed aggregate metrics** including persona alignment
7. ✅ **Exported results** for further analysis

### Next Steps:

- 🔧 **Customize personas** - Add your own investment philosophies
- 📊 **Batch processing** - Evaluate multiple reports at once
- 🎯 **Fine-tune evaluation** - Adjust question generation parameters
- 📈 **Comparative analysis** - Compare different LLM models
- 🔬 **Optimize prompts** - Use DSPy optimization features

### Key Files Generated:

- `test_output/sample_report_*.md` - Generated investment report
- `test_output/evaluation_results.json` - Detailed evaluation metrics

---

**Happy evaluating! 🎉**

## 🔧 Appendix: Configuration Tips

### Environment Variables

Create a `.env` file with your API keys:

```bash
GEMINI_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here
ANTHROPIC_API_KEY=your_key_here
```

### Model Configuration

Edit these variables in `evaluating_llm_fin_reports.py`:

```python
PROVIDER = "gemini"  # Options: "openai", "gemini", "anthropic"
TASK_MODEL_TEMP = 0.0  # Low for deterministic extraction
REFLECTION_MODEL_TEMP = 1.0  # High for diverse reflections
```

### Directory Structure

```
evaluating_llm_fin_docs/
├── fin_docs/              # Input: Financial markdown files
├── output/                # Output: Generated reports
├── test_output/           # Output: Test results
└── evaluating_llm_fin_reports.py  # Main module
```