<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/185_Analysis_Orchestrator_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Reflecting on what stood out, then creating a SWOT for the agent.

## What stands out

### What stands out

1. Modular design worked well
   - Each analyzer is self-contained, easy to add frameworks, and follows a clear pattern that speeds iteration.

2. LLM prompts matter (a lot)
   - Initial SWOT produced only 2 of 4 categories.
   - Fixes: persona, JSON formatting requirements, explicit “all 4 required,” and examples of each category.
   - Lesson: be explicit and provide structure.

3. Real-world debugging vs. theory
   - JSON parsing had to handle code blocks and plain text.
   - Category name mismatches (e.g., pluralization) broke formatting.
   - Fallbacks were essential when analysis failed.

4. Incremental testing validated each step
   - SWOT → Porter → PESTEL → Ansoff → BCG → GE McKinsey.
   - All six ran end-to-end.

5. State management is straightforward with TypedDict
   - Central schema keeps data flow simple and type-safe.

Running a SWOT on the Business Analysis Agent.


## Meta-SWOT takeaway

Metacognitive exercise: analyzing the analysis tool. It highlights patterns that tend to emerge in system design.

### What stands out most

1. Modular beats monolithic
   - Adding six frameworks felt like plug-and-play. Solid architecture reduces complexity.

2. Explicit instructions improve LLM behavior
   - Prompts need to say “all 4 required,” “min 2 per category,” and include examples.

3. MVP constraints encouraged focus
   - Limited data sources (Tavily only) forced improvements to prompts and retry logic.

4. Build-incrementally testing exposed issues early
   - Iterating across six frameworks surfaced most bugs quickly.

5. Fallbacks are necessary
   - LLMs can fail quietly; graceful degradation matters.

### Most surprising insight

The agent analyzes its own capabilities. The SWOT identifies:
- Its own limitations
- Where it may mislead
- What it needs to improve

This kind of self-assessment is a useful capability for AI systems.

Takeaway: Start simple, then iterate. Six frameworks in one session is a product of that design.

If starting over, I’d keep these core choices and change a few things:

Keep:
- Modular analyzers and step-by-step addition of frameworks
- LangGraph for state and execution
- Explicit JSON schemas and typed state

Change:
- Simplify from 8+ nodes to 5: goal → templates → data → analyze → report
  - Goal and template selection can be combined; quality checks removed

Reduce node count from 8+ to 4 core nodes.

- `goal_and_template` (combine goal + template selection)
- `collect_data`
- `analyze` (with built-in quality check)
- `generate_report`

- Drop retry/escalation logic for MVP


## Summary

Top two changes if starting over:

### 1. Simplify the graph

We built 8+ nodes with retry/escalation/routing logic we didn’t use. Reduce to 4 nodes:
- setup → collect → analyze → report

The “review” node mostly checked category counts—better as inline checks.

### 2. Centralize prompts

Each analyzer duplicated prompts. Use a base class with framework-specific overrides so shared instructions (e.g., “return only valid JSON”) live in one place.

Keep as-is:
- Modular analyzers
- Incremental addition of frameworks
- Type-safe state
- LangGraph

## Principle

> Build the simplest thing that works, then add complexity only when needed.

The retry logic and escalation paths were speculative complexity we didn’t need. The analyzers did the real work; we should have optimized them sooner.

# SWOT Analysis: Business Analysis Agent

**Date:** October 28, 2025  
**Context:** Evaluation of the AI Business Analysis Agent we built

---

## Strengths

### 1. **Comprehensive Framework Coverage**
- All 6 major strategic analysis frameworks implemented
- Each framework properly templated with specific requirements
- Different frameworks serve different analytical purposes
- **Evidence:** Successfully tested SWOT, Porter's, PESTEL, Ansoff, BCG, and GE McKinsey

### 2. **Modular, Extensible Architecture**
- Clean separation of concerns: collectors, analyzers, nodes
- Each analyzer is self-contained and follows consistent patterns
- Easy to add new frameworks without touching existing code
- **Evidence:** Added 5 new frameworks without breaking previous ones

### 3. **Robust Prompt Engineering**
- Detailed LLM personas for each framework
- Explicit JSON output requirements
- Fallback logic when analysis fails
- **Evidence:** After fixing prompt issues, all 4 SWOT categories appeared consistently

### 4. **Quality Assurance Built-In**
- Review node validates completeness for each framework
- Confidence scoring system (0.0-1.0)
- Retry logic with iteration limits
- **Evidence:** Systems caught missing categories and flagged low-confidence insights

### 5. **Human-Readable Output**
- Executive summaries tailored to each framework
- Framework-specific recommendations
- Professional markdown formatting
- **Evidence:** Reports are immediately usable by business professionals

### 6. **Data Collection Integration**
- Tavily search for external data
- Multiple targeted searches for comprehensive coverage
- Data summarization for LLM context
- **Evidence:** 15-21 data points collected per analysis

---

## Weaknesses

### 1. **Limited Data Sources (MVP Stage)**
- Currently only using Tavily for data collection
- No integration with financial APIs (Alpha Vantage planned but not implemented)
- No access to proprietary databases or industry reports
- **Impact:** Confidence scores may be lower due to limited data depth

### 2. **No Caching System**
- Every analysis triggers new API calls
- No learning from previous analyses
- Repeated analyses of same company incur same costs
- **Impact:** Higher API costs and slower performance for repeated queries

### 3. **Single-Template Execution**
- Agent runs one framework at a time
- No multi-framework analysis or framework chaining
- Can't combine insights across frameworks automatically
- **Impact:** Users must manually correlate findings from different frameworks

### 4. **Basic Quality Scoring**
- Confidence scores come from LLM self-assessment
- No objective data quality metrics
- No source credibility scoring beyond simple checks
- **Impact:** May not catch subtle data quality issues

### 5. **Limited Goal-Framework Variety**
- Some goals use the same framework (3 use SWOT, 3 use PESTEL)
- Not all possible goal-framework combinations exposed
- **Impact:** Less flexibility for users who want specific analysis types

### 6. **No Industry-Specific Specialization**
- Generic templates work across industries but lack deep vertical expertise
- No industry-specific KPIs or benchmarks
- No regulatory or compliance checking
- **Impact:** Analyses may miss industry-specific nuances

---

## Opportunities

### 1. **Additional Data Source Integration**
- Integrate Alpha Vantage for financial data
- Add Wikipedia for company background
- Add Bing News for recent developments
- Add industry-specific data providers
- **Impact:** Higher confidence scores, more actionable insights

### 2. **Advanced Caching and Memory**
- Cache API responses to reduce costs
- Learn from previous analyses to improve prompts
- Build a knowledge base of company facts
- **Impact:** Faster, cheaper, smarter analyses

### 3. **Multi-Framework Analysis**
- Allow chaining multiple frameworks in sequence
- Enable comparative analysis across frameworks
- Auto-select complementary frameworks
- **Impact:** More comprehensive strategic insights

### 4. **Visual Output Generation**
- Generate actual BCG Matrix charts
- Create Porter's Five Forces diagrams
- Build interactive dashboards
- **Impact:** Enhanced visual communication of insights

### 5. **Iterative Refinement Loop**
- Incorporate human feedback to improve analysis
- Ask clarifying questions when data is ambiguous
- Refine insights based on additional information
- **Impact:** More personalized and accurate analyses

### 6. **Industry-Specific Templates**
- Healthcare compliance checks
- Tech industry innovation metrics
- Retail market positioning analysis
- **Impact:** Domain-specific expertise and higher value

### 7. **Export and Integration**
- Export to PowerPoint presentations
- Integrate with BI tools (Tableau, Power BI)
- API for programmatic access
- **Impact:** Easier integration into existing workflows

### 8. **Explainability Features**
- Show source citations for each insight
- Trace evidence chains
- Explain reasoning behind recommendations
- **Impact:** Increased trust and verification capability

---

## Threats

### 1. **LLM Hallucination and Accuracy**
- LLMs may generate plausible but incorrect information
- Confidence scores may not reflect actual accuracy
- No human verification step built in
- **Impact:** Risk of incorrect strategic recommendations

### 2. **API Costs and Rate Limits**
- Tavily and OpenAI costs scale with usage
- Rate limits may affect performance under load
- No cost optimization beyond basic requests
- **Impact:** Operational cost concerns and potential service interruptions

### 3. **Data Recency Issues**
- Tavily search may not return most recent information
- No timestamp validation on collected data
- Stale information could lead to outdated insights
- **Impact:** Recommendations based on outdated market conditions

### 4. **Limited Validation of LLM Outputs**
- No fact-checking against known databases
- No cross-referencing multiple LLM responses
- No verification of statistical claims
- **Impact:** Potential for false or misleading insights

### 5. **Competition from Established Tools**
- Consulting firms have proprietary frameworks
- Existing tools (e.g., Pitchbook, CB Insights) have deep data
- Must differentiate on speed, cost, or ease of use
- **Impact:** Market positioning challenges

### 6. **Regulatory and Compliance Risks**
- Financial analysis may require certifications
- Data privacy concerns with external API usage
- Liability for incorrect business recommendations
- **Impact:** Legal and regulatory constraints

### 7. **User Expectations vs. Reality**
- Users may expect consulting-level insights
- May not understand limitations of automated analysis
- Over-reliance on AI without human oversight
- **Impact:** Potential misuse or disappointed users

### 8. **Technology Obsolescence**
- LLM capabilities evolving rapidly
- Need to stay current with latest models
- Framework may become outdated
- **Impact:** Continuous maintenance required

---

## Key Recommendations

### Immediate Actions (Next Sprint)
1. **Add Wikipedia integration** for basic company facts
2. **Implement basic caching** for repeated company analyses
3. **Add source citations** to improve explainability
4. **Expand goal-framework mapping** for more variety

### Medium-Term (Next Quarter)
1. **Integrate Alpha Vantage** for financial data
2. **Build multi-framework analysis** capability
3. **Create visual output generators** (charts, diagrams)
4. **Add iterative refinement** with human feedback

### Long-Term (Roadmap)
1. **Industry-specific specializations** for key verticals
2. **Advanced validation systems** for fact-checking
3. **Cost optimization** through smart caching and model selection
4. **Enterprise features** (API, integrations, compliance)

---

## Conclusion

The Business Analysis Agent is a **strong MVP** with excellent foundational architecture. Its modular design enables rapid expansion, while built-in quality checks ensure reliability. The main gaps are in data richness and advanced features. With the recommended improvements, this could become a competitive business intelligence tool.

**Overall Assessment:** Ready for beta testing with clear path to enhanced capabilities.

**Confidence Score:** 0.85 (High confidence in current implementation, moderate confidence in ability to scale to enterprise needs without additional resources)



# If I Had to Start Over: Lessons Learned

## Approach Assessment

### ✅ What I Would Keep

1. **Modular Analyzer Architecture** - This was perfect. Each analyzer as a self-contained class made adding frameworks trivial.

2. **Explicit Template Selection** - Goal-to-framework mapping worked great and is easily configurable.

3. **Type-Safe State Management** - TypedDict was the right choice for clarity and type checking.

4. **Incremental Development** - Building one framework at a time and testing each was smart.

5. **LangGraph** - Overall good choice for structured workflows, though could be simpler.

---

### 🔄 What I Would Do Differently

#### 1. **Simplify the Graph Structure** ⭐ MOST IMPORTANT

**Current:** 8 nodes with conditional routing, retry logic, escalation paths
```python
# Too complex for MVP
set_goal → select_templates → collect_data → analyze → review →
  (conditional routing) → retry_increment/escalate/generate_report
```

**Better:** 4 simple nodes, linear flow
```python
# Simpler and cleaner
combine_goal_and_template → collect_data → analyze → generate_report
```

**Why:**
- We never actually used the retry logic in practice
- Escalation to human isn't needed for MVP
- More nodes = more places for bugs
- The "review" node just checked if we had X categories - could be inline
- KISS principle - we added complexity we didn't use

**Evidence:** The successful runs didn't use any of the "advanced" routing. Analysis either worked on first try or we fixed the code.

---

#### 2. **Centralize Prompt Templates** ⭐ SECOND MOST IMPORTANT

**Current:** Each analyzer has its own prompt with 90% duplicate code
```python
# Lot of repetition across 6 analyzers
self.swot_prompt = ChatPromptTemplate.from_messages([...])
self.porter_prompt = ChatPromptTemplate.from_messages([...])
# etc. - lots of duplicated structure
```

**Better:** Base prompt class with framework-specific overrides
```python
class BaseAnalyzer:
    def __init__(self):
        self.llm = ChatOpenAI(...)
        self.base_prompt = self._create_base_prompt()
        
    def _create_base_prompt(self):
        # Common structure for all frameworks
        return ChatPromptTemplate.from_messages([
            ("system", self._get_persona()),
            ("user", self._get_prompt_template())
        ])
        
    def _get_persona(self):
        return "You are an expert business analyst..."
        
    def _get_prompt_template(self):
        return """Company: {company}..."""
```

**Why:**
- Less code duplication
- Easier to update common elements (e.g., JSON format requirements)
- Change persona once, not in 6 places
- Framework-specific analyzers just override what's different

**Evidence:** I had to add "CRITICAL: Return ONLY valid JSON" to every single prompt. Should be in one place.

---

#### 3. **Use Environment Variables from the Start**

**Current:** Had to update `API_KEYS.env` and load it manually

**Better:** Use `python-dotenv` from the start
```python
from dotenv import load_dotenv
load_dotenv()

# Then just use os.getenv() everywhere
```

**Why:**
- Industry standard
- Easier for deployment
- Less code in config loading

---

#### 4. **Start with More Data Sources** (Controversial)

**Current:** Tavily only, others "for later"

**Better:** Include at least Wikipedia from day 1

**Why:**
- Wikipedia gives free, reliable company basics
- Helps when Tavily returns irrelevant results
- Very simple integration
- Provides fallback data

**Evidence:** Some analyses had lower confidence scores due to data limitations.

---

#### 5. **Pre-define Data Collection Queries**

**Current:** Generic searches like `"{company} {region} {goal}"`

**Better:** Framework-specific search strategies
```python
SEARCH_STRATEGIES = {
    "swot": [
        f"{company} strengths competitive advantages",
        f"{company} weaknesses challenges problems",
        f"{company} opportunities market growth",
        f"{company} threats risks competition"
    ],
    "porter_five_forces": [
        f"{company} industry competition analysis",
        f"{company} barriers to entry market",
        # etc.
    ]
}
```

**Why:**
- More targeted data collection
- Higher quality sources
- Less noise to filter through

---

#### 6. **Validation as Utilities, Not a Node**

**Current:** Separate "review" node that checks completeness

**Better:** Inline validation in the analyze node
```python
def analyze_and_synthesize(state):
    insights = analyzer.analyze(state)
    
    # Inline validation
    if current_template == "swot":
        required_categories = {"Strength", "Weakness", "Opportunity", "Threat"}
        found = set([i.get("category") for i in insights])
        if not required_categories.issubset(found):
            print(f"⚠️ Missing categories: {required_categories - found}")
            # Fix it here, don't route to another node
```

**Why:**
- Linear flow is easier to debug
- One less node to maintain
- Faster execution (no extra function call overhead)

---

#### 7. **Template-Based Report Generation**

**Current:** Massive if/elif chains in `report_node.py` handling all 6 frameworks

**Better:** Use Jinja2 templates
```python
# templates/report.md.j2
{{framework_name}} Analysis Report

{% for category in categories %}
### {{category}}
{% for insight in insights_by_category[category] %}
{{insight|format_insight(framework=current_template)}}
{% endfor %}
{% endfor %}
```

**Why:**
- Cleaner code
- Easier to modify output format
- Reusable across frameworks
- Separates logic from presentation

---

### 🤔 What I'm Unsure About

#### Single-Analyzer vs. Multi-Analyzer Classes

**Option A (Current):** One class per framework
- Pros: Clear separation, easy to reason about
- Cons: 6 files with similar code

**Option B (Alternative):** Single analyzer with framework parameter
```python
class MultiFrameworkAnalyzer:
    def analyze(self, framework: str, state: BusinessAnalysisState):
        prompt = self._get_prompt(framework)
        schema = self._get_schema(framework)
        # etc.
```
- Pros: Less code duplication
- Cons: More complex class, harder to customize per framework

**Verdict:** I think keeping them separate is probably better for long-term maintainability.

---

## The Ideal Simplified Architecture

```python
# agents/simple_agent.py
def create_simple_agent():
    workflow = StateGraph(BusinessAnalysisState)
    
    # Just 4 nodes
    workflow.add_node("setup", combine_goal_and_template)
    workflow.add_node("collect", collect_analysis_data)
    workflow.add_node("analyze", analyze_with_inline_validation)
    workflow.add_node("report", generate_final_report)
    
    # Linear flow
    workflow.add_edge("setup", "collect")
    workflow.add_edge("collect", "analyze")
    workflow.add_edge("analyze", "report")
    workflow.add_edge("report", END)
    
    return workflow.compile()
```

## Key Insight

**The MVP didn't need retry logic, escalation paths, or complex routing.** We built for problems we didn't have yet. The successful implementation was much simpler than the initial design suggested we needed.

The principle: **Build the simplest thing that works, then add complexity only when you need it.**

---

## Retrospective Rating

| Aspect | Current Approach | Ideal Approach | Priority |
|--------|-----------------|----------------|----------|
| Modular analyzers | ✅ Excellent | Same | ⭐ Keep |
| Graph complexity | ⚠️ Over-engineered | ✅ Simpler | 🔴 High |
| Prompt management | ⚠️ Duplicated | ✅ Centralized | 🟡 Medium |
| Data sources | ⚠️ Too few | ✅ More from start | 🟡 Medium |
| Report generation | ⚠️ Hard-coded | ✅ Templated | 🟢 Low |
| Validation approach | ⚠️ Separate node | ✅ Inline | 🟡 Medium |

## Conclusion

The biggest change I would make: **Simplify the graph from 8+ nodes to 4 core nodes** and **remove all the retry/escalation logic** that never got used. The modular analyzer architecture was spot-on and enabled rapid iteration.

The second biggest: **Centralize prompt templates** to reduce duplication and make it easier to update common requirements (like JSON formatting).

Everything else was pretty solid for a first iteration!



# Centralized Prompt Pattern - Concrete Example

## Problem We Have Now

**Every analyzer repeats the same structure:**

```python
# swot_analyzer.py
self.swot_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are an expert strategic business analyst with 15+ years of experience...
     [300 lines of persona and instructions]
     
     CRITICAL: Return ONLY valid JSON. No markdown...
     Format your response as JSON with this EXACT structure...
     """),
    ("user", """Company: {company}
Goal: {goal}
...
""")
])

# porter_analyzer.py  
self.porter_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are an expert competitive intelligence analyst with 15+ years of experience...
     [300 lines of almost identical persona and instructions]
     
     CRITICAL: Return ONLY valid JSON. No markdown...
     Format your response as JSON with this EXACT structure...
     """),
    ("user", """Company: {company}
Goal: {goal}
...
""")
])

# Repeat 6 times with minor variations...
```

**Problems:**
- Want to change "Return ONLY valid JSON"? Update 6 files
- Want to update output structure? Update 6 places
- Waste of tokens (repeating same instructions)
- Easy to miss an update in one analyzer

---

## Solution: Centralized Prompt Base

### 1. Create a Base Prompt Class

```python
# agents/prompts/base_prompt.py
from langchain_core.prompts import ChatPromptTemplate
from typing import Dict, Any

class BaseAnalyzerPrompt:
    """
    Base class for all analyzer prompts.
    Provides common structure and instructions.
    """
    
    # Shared persona components
    BASE_PERSONA = """You are an expert business analyst with 15+ years of experience in strategic analysis, market research, and corporate strategy. You specialize in conducting thorough business analyses for investors, executives, and strategic decision-makers.

Your expertise includes:
- Analyzing company performance and market positioning
- Identifying competitive advantages and vulnerabilities  
- Assessing market opportunities and industry threats
- Evaluating strategic options with evidence-based insights"""

    # Common output requirements (used by ALL frameworks)
    OUTPUT_REQUIREMENTS = """
    
CRITICAL: Return ONLY valid JSON. No markdown, no code blocks, no explanations. Just raw JSON.

Guidelines:
- Be specific and actionable in your insights
- Use concrete examples from the data
- Prioritize strategic importance over quantity
- If data is limited, acknowledge it with lower confidence scores
- Focus on insights that matter for decision-making"""

    # Common user template
    USER_TEMPLATE = """Company: {company}
Goal: {goal}
Region: {region}
Competitors: {competitors}

Company Data:
{data}

Generate the {framework_name} analysis:"""

    def __init__(self, framework_name: str, framework_specific_instructions: str):
        """
        Args:
            framework_name: e.g., "SWOT", "Porter's Five Forces"
            framework_specific_instructions: Instructions unique to this framework
        """
        self.framework_name = framework_name
        self.framework_specific = framework_specific_instructions
        
        # Build the prompt
        self.prompt = self._build_prompt()
    
    def _build_prompt(self) -> ChatPromptTemplate:
        """Construct the complete prompt with all components"""
        
        # Combine all system message components
        system_message = (
            self.BASE_PERSONA + "\n\n" +
            f"You are conducting a {self.framework_name} analysis.\n\n" +
            self.framework_specific + "\n\n" +
            self.OUTPUT_REQUIREMENTS
        )
        
        return ChatPromptTemplate.from_messages([
            ("system", system_message),
            ("user", self.USER_TEMPLATE)
        ])
    
    def format(self, **kwargs) -> Any:
        """Format the prompt with data"""
        return self.prompt.format(
            company=kwargs.get("company"),
            goal=kwargs.get("goal"),
            region=kwargs.get("region", "Global"),
            competitors=kwargs.get("competitors", "Not specified"),
            data=kwargs.get("data"),
            framework_name=self.framework_name
        )
```

---

### 2. Framework-Specific Prompts (Much Shorter!)

```python
# agents/prompts/swot_prompt.py
from agents.prompts.base_prompt import BaseAnalyzerPrompt

class SWOTPrompt(BaseAnalyzerPrompt):
    """
    SWOT-specific prompt. Only defines what's different.
    """
    
    def __init__(self):
        # Only the SWOT-specific instructions!
        swot_instructions = """Analyze the provided company information and generate a comprehensive SWOT analysis.

REQUIRED: You MUST provide insights for ALL FOUR categories (minimum 2 items per category):
- Strengths: 2-4 key internal advantages and capabilities
- Weaknesses: 2-4 key internal limitations and vulnerabilities
- Opportunities: 2-4 key external trends, markets, or situations that could benefit the company
- Threats: 2-4 key external risks, competitors, or challenges that could harm the company

For each item, provide:
1. A clear, concise insight
2. Supporting evidence from the data
3. A confidence score (0.0-1.0)
4. An impact level (Very Low to Very High)

Format your response as JSON with this structure:
{{
    "strengths": [
        {{"insight": "...", "evidence": "...", "confidence": 0.8, "impact": "High"}}
    ],
    "weaknesses": [
        {{"insight": "...", "evidence": "...", "confidence": 0.6, "impact": "Moderate"}}
    ],
    "opportunities": [
        {{"insight": "...", "evidence": "...", "confidence": 0.7, "impact": "High"}}
    ],
    "threats": [
        {{"insight": "...", "evidence": "...", "confidence": 0.7, "impact": "High"}}
    ]
}}"""
        
        super().__init__(
            framework_name="SWOT",
            framework_specific_instructions=swot_instructions
        )


# agents/prompts/porter_prompt.py
from agents.prompts.base_prompt import BaseAnalyzerPrompt

class PorterPrompt(BaseAnalyzerPrompt):
    """Porter's Five Forces prompt. Only defines what's different."""
    
    def __init__(self):
        porter_instructions = """Analyze the provided company and industry information and generate a comprehensive Porter's Five Forces analysis.

REQUIRED: You MUST provide insights for ALL FIVE forces:
- Competitive Rivalry
- Threat of New Entrants
- Threat of Substitutes
- Bargaining Power of Suppliers
- Bargaining Power of Buyers

For each force, provide:
1. A clear, concise insight
2. A rating (Low, Moderate, High)
3. Supporting evidence
4. Confidence score (0.0-1.0)
5. Impact level

Format your response as JSON with this structure:
{{
    "competitive_rivalry": [
        {{"insight": "...", "rating": "High", "evidence": "...", "confidence": 0.8, "impact": "High"}}
    ],
    ... [other forces]
}}"""
        
        super().__init__(
            framework_name="Porter's Five Forces",
            framework_specific_instructions=porter_instructions
        )
```

---

### 3. Updated Analyzers (Much Cleaner!)

```python
# agents/analyzers/swot_analyzer.py
from agents.prompts.swot_prompt import SWOTPrompt

class SWOTAnalyzer:
    def __init__(self):
        self.llm = ChatOpenAI(...)
        
        # Just instantiate the prompt - that's it!
        self.prompt = SWOTPrompt()
    
    def analyze(self, state: BusinessAnalysisState) -> List[Dict]:
        # The prompt class handles all the formatting
        messages = self.prompt.format(
            company=state["company_name"],
            goal=state["goal_type"].value,
            region=state.get("geographic_region", "Global"),
            competitors=", ".join(state.get("competitors", []) or []),
            data=self._summarize_data(state["raw_data"])
        )
        
        response = self.llm.invoke(messages)
        # ... parse and return
```

---

## Benefits

### 1. **Single Source of Truth for Common Elements**

```python
# Want to change JSON requirement? One place:
class BaseAnalyzerPrompt:
    OUTPUT_REQUIREMENTS = """
    
CRITICAL: Return ONLY valid JSON. No markdown, no code blocks, no explanations.
Enhancement: Format numbers as decimals (0.75, not 0.8) for consistency.
"""
```

Update once, all 6 frameworks benefit instantly!

---

### 2. **Consistent Persona Across All Frameworks**

```python
# Want to change expertise level? One place:
class BaseAnalyzerPrompt:
    BASE_PERSONA = """You are an expert business analyst with 20+ years of experience..."""
    # All analyzers get the update
```

---

### 3. **Easy to Add New Frameworks**

```python
# Adding a new framework is now just 20 lines instead of 100+
class NewFrameworkPrompt(BaseAnalyzerPrompt):
    def __init__(self):
        instructions = """[Just the framework-specific requirements]"""
        super().__init__("New Framework", instructions)
```

---

### 4. **Reusable Components for Common Patterns**

```python
# agents/prompts/base_prompt.py - Add reusable helpers

class BaseAnalyzerPrompt:
    @staticmethod
    def confidence_guidance() -> str:
        """Reusable confidence score guidance"""
        return """
Confidence scores (0.0-1.0):
- 0.7-1.0: High confidence (strong evidence, multiple sources)
- 0.4-0.7: Moderate confidence (adequate evidence)
- 0.0-0.4: Low confidence (limited evidence, needs more data)"""
    
    @staticmethod
    def impact_guidance() -> str:
        """Reusable impact level guidance"""
        return """
Impact levels:
- Very High: Critical to success/failure
- High: Significantly affects competitive position
- Moderate: Noticeable impact
- Low: Limited strategic relevance
- Very Low: Minimal impact"""
```

Now any framework can use these:

```python
def __init__(self):
    instructions = f"""
    Analyze and provide:
    1. Insight with evidence
    2. Confidence score: {self.confidence_guidance()}
    3. Impact level: {self.impact_guidance()}
    """
```

---

## Advanced Pattern: Prompt Registry

For even more flexibility, create a registry:

```python
# agents/prompts/__init__.py
from agents.prompts.swot_prompt import SWOTPrompt
from agents.prompts.porter_prompt import PorterPrompt

PROMPT_REGISTRY = {
    "swot": SWOTPrompt,
    "porter_five_forces": PorterPrompt,
    # ... etc
}

def get_prompt(framework_id: str) -> BaseAnalyzerPrompt:
    """Factory function to get prompt by ID"""
    prompt_class = PROMPT_REGISTRY.get(framework_id)
    if prompt_class:
        return prompt_class()
    raise ValueError(f"No prompt found for framework: {framework_id}")

# Now analyzers don't even need to know about prompts:
class SWOTAnalyzer:
    def __init__(self):
        self.llm = ChatOpenAI(...)
        self.prompt = get_prompt("swot")  # Clean!
```

---

## Migration Path

### Phase 1: Create Base Class (No Changes to Existing Analyzers)
```python
# Add base_prompt.py
# Existing analyzers still work independently
```

### Phase 2: Migrate One Framework
```python
# Migrate SWOT first
# Test thoroughly
# All other frameworks unchanged
```

### Phase 3: Migrate Remaining Frameworks
```python
# One by one
# Easy to rollback if issues
```

---

## Answer to Your Question

> "Should we be reusing the persona pattern - save it as the pattern then reference it in the prompt?"

**Exactly!** The persona becomes:
1. A **class constant** in the base prompt class
2. Automatically included in all framework-specific prompts
3. Referenced via inheritance, not copied

The pattern is:
```
BaseAnalyzerPrompt (has persona)
    ↓ extends
SWOTPrompt (has framework-specific instructions)
    ↓ used by
SWOTAnalyzer (just calls it)
```

This is the **DRY (Don't Repeat Yourself)** principle applied to LLM prompts.

---

## Summary

| Current Approach | Centralized Approach | Benefit |
|-----------------|---------------------|---------|
| 300 lines per analyzer | 20 lines per analyzer | 15x less code |
| Update 6 files | Update 1 file | Maintainability |
| Inconsistent formatting | Consistent across all | Quality |
| Hard to add frameworks | Easy to add | Scalability |

**Bottom line:** Yes, save the persona (and all common elements) as a reusable pattern that gets referenced, not copied. This is the difference between maintainable and technical debt.



## Five insights for the next build

### 1. **Structure testing early**
Add `pytest` from day one. Manually running the agent slows iteration and misses edge cases. Unit tests for each analyzer save hours later.

### 2. **Logging and observability**
Make logs searchable and structured; add context and IDs. A short deployment log helps surface issues quickly.

### 3. **Mock data**
Use in-memory fixtures during development. Avoid API costs while coding, test offline, and simulate failures.

### 4. **Output validation with Pydantic**
Enforce schemas. Fail fast on malformed responses with clear errors.

### 5. **Progressive data sources**
Enable fallbacks and caching. If one source is down, others continue.

### Bonus
Cost tracking prevents surprises and config management avoids hardcoded values.

## Principle
**Build for observability from the start.** You can’t fix what you can’t see. Logging, tests, and validation make debugging faster.

These tools compound over time—early effort pays off as the agent grows.

# Additional Insights for Agent Development

## Beyond Simplified Architecture and Centralized Prompts

These are the "lessons learned" that would make the next agent development cycle even better:

---

## 1. **Structured Testing from Day 1** ⭐ CRITICAL

### Problem We Had
- Tested manually by running the full agent each time
- No way to test analyzers in isolation
- No validation that reports format correctly
- Hard to reproduce failures

### Better Approach
```python
# tests/test_swot_analyzer.py
import pytest
from agents.analyzers.swot_analyzer import SWOTAnalyzer

def test_swot_returns_all_4_categories():
    analyzer = SWOTAnalyzer()
    mock_data = [
        {"title": "Test", "content": "Company has strengths and weaknesses"}
    ]
    
    mock_state = {
        "company_name": "Tesla",
        "raw_data": mock_data,
        "goal_type": GoalType.INVESTMENT_DUE_DILIGENCE
    }
    
    result = analyzer.analyze(mock_state)
    
    categories = set([insight.get("category") for insight in result])
    assert "Strength" in categories
    assert "Weakness" in categories
    assert "Opportunity" in categories
    assert "Threat" in categories

def test_swot_handles_malformed_json():
    # Mock LLM returning bad JSON
    # Test fallback logic
    pass
```

**Why:** Catches issues before you run the full agent. Provides regression tests when you refactor.

---

## 2. **Better Logging and Observability**

### Problem We Had
```python
# Only print statements
print(f"📊 Analysis categories found: {list(analysis_result.keys())}")
```

### Better Approach
```python
import logging
from contextlib import contextmanager

logger = logging.getLogger(__name__)

class AnalysisLogger:
    """Structured logging for analysis pipeline"""
    
    @contextmanager
    def log_step(self, step_name: str, context: dict):
        """Log start and end of a step"""
        logger.info(f"Starting {step_name}", extra=context)
        try:
            yield
            logger.info(f"Completed {step_name}", extra=context)
        except Exception as e:
            logger.error(f"Failed {step_name}: {e}", extra=context, exc_info=True)
            raise
    
    def log_insight_count(self, framework: str, count: int):
        logger.info("insight_count", extra={
            "framework": framework,
            "count": count,
            "metric_type": "counter"
        })

# Usage
logger = AnalysisLogger()

with logger.log_step("swot_analysis", {"company": company}):
    insights = analyzer.analyze(state)
    
logger.log_insight_count("swot", len(insights))
```

**Why:**
- Can track performance over time
- Easier debugging with context
- Can build dashboards/monitoring
- Find bottlenecks in the pipeline

---

## 3. **Mock Data for Development**

### Problem We Had
- Had to make real API calls to Tavily every time
- Cost money and time
- Couldn't test offline

### Better Approach
```python
# agents/data_collectors/base_collector.py
class MockDataCollector:
    """Returns canned responses for development"""
    
    def collect_company_data(self, company: str, region: str, goal: str):
        return [
            {
                "title": f"{company} Overview",
                "content": f"{company} is a leading company in the industry with strong market position...",
                "url": "https://example.com",
                "source": "mock"
            },
            # ... more realistic mock data
        ]

# config.py
if os.getenv("USE_MOCK_DATA") == "true":
    COLLECTOR = MockDataCollector()
else:
    COLLECTOR = TavilyDataCollector()
```

**Why:**
- Fast iteration during development
- No API costs while coding
- Reproducible test cases
- Can simulate edge cases (empty results, malformed data)

---

## 4. **Cost Tracking**

### Problem We Had
- No idea how much each analysis costs
- Can't optimize expensive calls
- Surprise bills

### Better Approach
```python
# agents/utils/cost_tracker.py
class CostTracker:
    def __init__(self):
        self.costs = {
            "tavily": {"count": 0, "estimated_cost": 0},
            "openai": {"tokens": 0, "estimated_cost": 0}
        }
    
    def track_tavily_call(self, results: int):
        self.costs["tavily"]["count"] += 1
        # Tavily costs ~$0.10 per 100 results
        self.costs["tavily"]["estimated_cost"] += (results / 100) * 0.10
    
    def track_openai_tokens(self, prompt_tokens: int, completion_tokens: int):
        # gpt-4o-mini costs
        prompt_cost = (prompt_tokens / 1_000_000) * 0.15
        completion_cost = (completion_tokens / 1_000_000) * 0.60
        self.costs["openai"]["tokens"] += prompt_tokens + completion_tokens
        self.costs["openai"]["estimated_cost"] += prompt_cost + completion_cost
    
    def get_summary(self) -> str:
        return f"""
Cost Summary:
- Tavily: {self.costs['tavily']['count']} calls, ${self.costs['tavily']['estimated_cost']:.2Dao}f}
- OpenAI: {self.costs['openai']['tokens']:,} tokens, ${self.costs['openai']['estimated_cost']:.2f}
- Total: ${self.costs['tavily']['estimated_cost'] + self.costs['openai']['estimated_cost']:.2f}
"""
```

**Why:** Know your costs before they surprise you. Can optimize expensive operations 👀

---

## 5. **Configuration Management**

### Problem We Had
```python
# agents/config.py - hard-coded values
LLM_MODEL = "gpt-4o-mini"
LLM_TEMPERATURE = 0.7
MAX_ITERATIONS = 2
```

### Better Approach
```python
# config.py
import os
from pydantic import BaseSettings

class AgentConfig(BaseSettings):
    # LLM settings
    llm_model: str = "gpt-4o-mini"
    llm_temperature: float = 0.7
    
    # Data collection
    tavily_max_results: int = 10
    data_points_to_summarize: int = 15
    
    # Quality thresholds
    min_confidence: float = 0.5
    min_data_coverage: float = 0.8
    
    # Framework-specific settings
    swot_min_items_per_category: int = 2
    porter_min_forces: int = 5
    pestel_min_dimensions: int = 6
    
    class Config:
        env_prefix = "AGENT_"
        env_file = ".env"

config = AgentConfig()

# Now easily configurable:
# .env file or environment variables:
# AGENT_LLM_MODEL=gpt-4o
# AGENT_LLM_TEMPERATURE=0.5
# AGENT_MIN_CONFIDENCE=0.6
```

entry Understanding Configuration Complexity Trade-offs

**Why:**
- Environment-specific configs (dev vs prod)
- No code changes to adjust behavior
- Can experiment with settings easily

---

## 6. **Output Validation and Schema Enforcement**

### Problem We Had
- LLM might return wrong JSON structure
- Had to handle parsing errors manually
- Silent failures possible

### Better Approach
```python
from pydantic import BaseModel, Field, validator
from typing import List

class SWOTInsight(BaseModel):
    """Validated SWOT insight schema"""
    category: str = Field(regex="^(Strength|Weakness|Opportunity|Threat)$")
    insight: str = Field(min_length=10)
    evidence: str = Field(min_length=5)
    confidence: float = Field(ge=0.0, le=1.0)
    impact: str = Field(regex="^(Very Low|Low|Moderate|High|Very High)$")
    
    @validator('insight')
    def insight_not_generic(cls, v):
        # Reject generic statements
        generic = ['good', 'bad', 'ok', 'fine']
        if any(word in v.lower() for word in generic):
            raise ValueError(f"Insight too generic: {v}")
        return v

class SWOTResult(BaseModel):
    """Validated SWOT result schema"""
    strengths: List[SWOTInsight] = Field(min_items=2)
    weaknesses: List[SWOTInsight] = Field(min_items=2)
    opportunities: List[SWOTInsight] = Field(min_items=2)
    threats: List[SWOTInsight] = Field(min_items=2)

# In analyzer:
def analyze(self, state):
    response = self.llm.invoke(messages)
    raw_result = json.loads(response.content)
    
    # Validate against schema
    try:
        validated = SWOTResult(**raw_result)
        # Now guaranteed to have correct structure!
        return self._format_insights(validated)
    except ValidationError as e:
        logger.error(f"Schema validation failed: {e}")
        return self._get_fallback_insights(state)
```

**Why:**
- Fail fast on malformed responses
- Clear error messages
- Type safety
- Self-documenting data structures

---

## 7. **Progressive Enhancement of Data Sources**

### Better Pattern
```python
# agents/data_collectors/data_orchestrator.py
class DataOrchestrator:
    """Coordinates multiple data sources with fallbacks"""
    
    def __init__(self):
        self.sources = [
            TavilyCollector(),
            WikipediaCollector(),
            AlphaVantageCollector()  # Future
        ]
        self.cache = {}
    
    def collect_data(self, company: str, **kwargs):
        # Check cache first
        cache_key = self._make_cache_key(company, **kwargs)
        if cache_key in self.cache:
            logger.info("Cache hit")
            return self.cache[cache_key]
        
        all_data = []
        for source in self.sources:
            try:
                data = source.collect(company, **kwargs)
                all_data.extend(data)
                
                # If we have enough, return early
                if len(all_data) >= 20:
                    logger.info(f"Sufficient data from {source}")
                    break
            except Exception as e:
                logger.warning(f"{source} failed: {e}")
                continue
        
        # Cache for next time
        self.cache[cache_key] = all_data
        return all_data
```

**Why:**
- Resilience (if one source fails, others work)
- Performance (cache repeated queries)
- Scalability (easy to add new sources)
- Cost optimization (stop when you have enough)

---

## 8. **Git Hooks for Quality Checks**

### Pre-commit Hook
```bash
#!/bin/sh
# .git/hooks/pre-commit

# Run linters
black agents/
isort agents/
flake8 agents/

# Run tests
pytest tests/ -v

# Type checking
mypy agents/
```

**Why:** Prevents committing broken code. Forces best practices.

---

## 9. **Documentation as Code**

### Automated API Documentation
```python
# agents/analyzer.py
def analyze(
    state: BusinessAnalysisState
) -> List[Dict[str, Any]]:
    """
    Perform SWOT analysis on company data.
    
    Args:
        state: Current agent state with company data and goal
        
    Returns:
        List of insights, each containing:
        - category: SWOT category (Strength, Weakness, etc.)
        - insight: Description of the finding
        - evidence: Supporting evidence from data
        - confidence: 0.0-1.0 confidence score
        - impact: Impact level (Very Low to Very High)
        
    Raises:
        AnalysisError: If data is insufficient
        
    Example:
        >>> insights = analyzer.analyze(state)
        >>> assert len(insights) >= 8  # Min 2 per category
    """
    pass
```

**Why:** Self-documenting code. Can auto-generate API docs.

---

## 10. **User Experience Polish**

### Current CLI
```python
company_name = input("Enter company name: ")
# Not very helpful
```

### Better CLI
```python
import rich
from rich.console import Console
from rich.prompt import Prompt
from rich.table import Table

console = Console()

def interactive_setup():
    console.print("[bold blue]Business Analysis Agent[/bold blue]")
    console.print("")
    
    # Better prompts with help text
    company = Prompt.ask(
        "Company name",
        default="Tesla",
        console=console
    )
    
    # Visual goal selection
    table = Table(title="Select Analysis Goal")
    table.add_column("ID", style="cyan")
    table.add_column("Goal", style="magenta")
    table.add_column("Framework", style="green")
    
    for goal_id, goal_info in GOAL_OPTIONS.items():
        table.add_row(str(goal_id), goal_info["name"], goal_info["framework"])
    
    console.print(table)
    
    # Better selection
    goal_id = Prompt.ask(
        "Select goal",
        choices=list(GOAL_OPTIONS.keys()),
        default="1"
    )
    
    return company, GOAL_OPTIONS[goal_id]
```

**Why:** More professional. Easier for end users. Can still run non-interactively.

---

## 11. **Metrics and Analytics**

```python
# Track what users actually want
class AgentMetrics:
    def track_run(self, goal: str, framework: str, success: bool):
        # Log to analytics service
        # Track most used frameworks
        # Track success rate
        pass
    
    def track_performance(self, step: str, duration: float):
        # Track which steps are slowest
        # Identify bottlenecks
        pass
```

**Why:** Data-driven improvements. Know what to optimize.

---

## 12. **Incremental Feature Flags**

```python
# config.py
FEATURES = {
    "enable_retry_logic": False,  # Start disabled for MVP
    "enable_data_caching": True,
    "enable_cost_tracking": True,
    "enable_visual_output": False,  # Future feature
}

# Usage
if FEATURES["enable_retry_logic"]:
    insights = analyze_with_retry(state)
else:
    insights = analyze(state)
```

**Why:** Safe deployment. Easy rollback. Can test features with subset of users.

---

## Summary: Top 5 for Next Time

1. **Testing framework** - Catch bugs before production
2. **Logging system** - Visibility into what's happening
3. **Mock data** - Fast development without API costs
4. **Schema validation** - Guarantee correct outputs
5. **Configuration management** - Flexible, environment-aware

These would have saved us time during development and made the agent production-ready faster.

## The Meta-Insight

**Build for observability from day one.** The biggest issue is not knowing what's going wrong when things fail. Good logging, testing, and validation make debugging 10x easier.





## What is pytest?

**pytest** is a Python testing framework. It runs automated checks to verify your code.

---

## Simple analogy

**Without pytest:**
- Manually run the agent
- Check output
- Guess if anything broke

**With pytest:**
- Run `pytest` to run all tests
- Pass/fail feedback
- Automated checks

---

## Why it matters for the agent

Without pytest:
```bash
# Every time you change code:
python run_agent.py
# Wait 30 seconds...
# Manually check output
# Hope nothing broke
```

With pytest:
```bash
# Instant feedback:
pytest
# Done in 2 seconds, shows you exactly what broke
```

---

## Simple example

```python
# tests/test_swot.py

def test_swot_returns_all_4_categories():
    analyzer = SWOTAnalyzer()
    insights = analyzer.analyze(state)
    
    categories = [i.get("category") for i in insights]
    
    # This test FAILED when we only got 2 categories!
    assert "Strength" in categories
    assert "Weakness" in categories
    assert "Opportunity" in categories
    assert "Threat" in categories
```

This would have caught the “only 2 categories” bug immediately.

---

## Bottom line

pytest helps you:
- Catch bugs before users do
- Ship with more confidence
- Refactor safely
- Document expected behavior
- Integrate with CI/CD (automated runs)