<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/188__B2B_Sales_Orchestrator_Agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



## What is a B2B Sales Orchestrator Agent?

A B2B sales orchestrator agent automates and coordinates sales activities across the sales cycle. It:
- Researches prospects (company background, decision-makers, pain points)
- Plans personalized outreach (message drafts, channel selection, timing)
- Handles responses (qualifies leads, routes based on interest)
- Tracks follow-ups (nurture sequences, engagement monitoring)
- Generates reports (research summaries, proposals, qualification assessments)

## Typical workflow stages

1. Lead research & qualification ‚Üí Understand the company and fit
2. Outreach planning ‚Üí Create personalized outreach strategy
3. Initial outreach ‚Üí Send personalized messages
4. Response handling ‚Üí Analyze responses and route accordingly
5. Discovery & needs analysis ‚Üí Understand specific needs
6. Proposal & demo prep ‚Üí Create customized proposals
7. Follow-up & nurture ‚Üí Keep prospects engaged

## Expected outputs

1. Lead research report ‚Äî Company overview, decision-makers, pain points, buying signals, fit assessment
2. Outreach plan ‚Äî Personalized message drafts, channel selection, timing, follow-up sequences
3. Qualification report ‚Äî BANT assessment (Budget, Authority, Need, Timeline), qualification score, next steps
4. Needs analysis document ‚Äî Detailed pain points, goals, requirements
5. Proposal document ‚Äî Customized solution, pricing, ROI
6. Sales activity dashboard ‚Äî Pipeline status, follow-up tasks, metrics

## Learning opportunities

- Multi-stage workflows (sequential stages with state management)
- Conditional routing (route based on response type, qualification score, engagement)
- Multi-source data integration (web research, LinkedIn, email tracking)
- LLM-powered personalization (generate personalized messages)
- Error handling (no response, objections, unqualified leads)

## MVP recommendation

Start with a simple 3-stage linear flow:
1. Lead research ‚Üí Research company, identify decision-makers
2. Outreach planning ‚Üí Generate personalized outreach plan
3. Report generation ‚Üí Create lead research report and outreach plan

No actual email/LinkedIn sending for MVP ‚Äî just generate drafts and reports. Add complexity later.

The full introduction document is saved at: `docs/project/B2B_SALES_ORCHESTRATOR_INTRO.md`

Questions:
1. Should the MVP focus on research ‚Üí planning ‚Üí report (3 stages)?
2. What input format? Single company name, or CSV with multiple leads?
3. Should we use mock data for LinkedIn/company data, or integrate real APIs?
4. What level of personalization? Template-based or LLM-generated?





## üîç What‚Äôs Realistic for an MVP?

You can absolutely mix **real data + synthetic data**, and that's actually the best approach for development. Here‚Äôs the realistic breakdown:

| Data Type                                        | Realistic Source                                    | MVP Strategy                                      |
| ------------------------------------------------ | --------------------------------------------------- | ------------------------------------------------- |
| **Company basic info** (industry, size, revenue) | Web search (Tavily, DuckDuckGo API, Google SerpAPI) | ‚úÖ Use real data via web search API                |
| **Decision-makers / titles / LinkedIn URLs**     | LinkedIn API (requires approval + auth)             | ‚ùå Use synthetic/mock data for MVP                 |
| **Pain points, buying signals**                  | News scraping, PR, job postings                     | ‚úÖ Use real web search + LLM summarization         |
| **Funding, growth info**                         | Crunchbase, Pitchbook APIs                          | ‚ùå Use fake values unless you already have access  |
| **Contact emails / enrichment**                  | Apollo, Clearbit, Clay                              | ‚ùå Mock only. Do not generate real emails.         |
| **Outreach messages**                            | LLM-generated                                       | ‚úÖ Fully real (LLM personalized based on research) |
| **Qualification (BANT)**                         | Based on AI inference using research                | ‚úÖ Use fake values but based on real research      |
| **Follow-up tracking, CRM sync**                 | HubSpot, Salesforce APIs                            | ‚ùå Skip in MVP, mock state only                    |
| **Proposals, reports**                           | Generated by LLM                                    | ‚úÖ Fully real + templated markdown                 |

So the **MVP can be 100% real for:**

* company-level research
* personalized outreach message generation
* auto-generated reports
* basic lead scoring logic

And **synthetic/mock for:**

* decision-maker contact data
* CRM sync
* funding / revenue / hiring signals if no API access

---

## üß† Why This Still Feels "Real"

Because we can:

* Use **real company names** (you pick any real company)
* Pull **real web data about them**
* Generate **real personalized outreach messages**
* Output **real, usable research reports + outreach plans**
* Treat fake contact data as *placeholder fields* until you wire in Apollo/Clearbit/etc.

Example:

```json
{
  "contact": {
    "name": "FAKE - John Smith",
    "role": "VP of Engineering",
    "email": "placeholder@acme.com"
  }
}
```

---

## üì¶ Suggested MVP Input / Output

### ‚úÖ MVP Input

```json
{
  "company": "Acme Corp",
  "website": "https://acme.com",
  "product": "AI-driven sales analytics platform"
}
```

### ‚úÖ MVP Output Files

| Output                     | Format              | Real?            |
| -------------------------- | ------------------- | ---------------- |
| Lead Research Report       | Markdown            | ‚úÖ Real           |
| Outreach Plan              | JSON + Markdown     | ‚úÖ Real           |
| Personalized Message Draft | Generated by LLM    | ‚úÖ Real           |
| Qualification Assessment   | Mixed (real + mock) | ‚úÖ Partially fake |

---

## üß± Suggested MVP Architecture (3-Stage Linear)

```
research ‚Üí outreach_plan ‚Üí report_generation
```

Then later:

```
research ‚Üí outreach_plan ‚Üí outreach_send ‚Üí response_handling ‚Üí qualification ‚Üí proposal
```

---

## üß™ Testing Strategy

| Area          | Test Type                                        |
| ------------- | ------------------------------------------------ |
| Research Node | Use real public companies (Zoom, Notion, Stripe) |
| Outreach Node | Validate message personalization logic           |
| Report Node   | Snapshot test markdown outputs                   |
| Later         | Add mock "response handler" unit tests           |





## ‚úÖ MVP Strategy

| Area            | MVP Strategy                           | Status in Scaffold                                      |
| --------------- | -------------------------------------- | ------------------------------------------------------- |
| Research Data   | ‚úÖ Real via Tavily                      | ‚úÖ Already supported                                     |
| Decision Makers | ‚ùå Fake / templated                     | ‚úÖ Scaffold uses mock placeholders                       |
| ICP Fit Score   | ‚ùå Hard-coded / defaults                | ‚úÖ Scaffold uses deterministic scoring, can start simple |
| Personalization | ‚ùå Placeholder until later              | ‚úÖ Outreach node supports real + fake inputs             |
| Debugging       | ‚úÖ Build nodes one at a time, test each | ‚úÖ Scaffold uses smoke test flow before wiring           |
| API Keys        | ‚úÖ You have Tavily key in `.env`        | ‚úÖ Research node structured for Tavily                   |
| Company Example | ‚úÖ Target (real company)                | ‚úÖ Works for MVP                                         |

The scaffold already matches the phased approach you want:

> *‚ÄúTest, debug, isolate issues, improve inputs once the pipeline works‚Äù* ‚úÖ

# B2B Sales Orchestrator Agent - Scaffold Plan

**Status:** üìã Planning Phase - Review Before Implementation  
**Purpose:** Build MVP sales orchestrator that researches companies, plans personalized outreach, and generates reports  
**Learning Focus:** Multi-source data orchestration, LLM-powered personalization, conditional routing (future), state management

---

## Overview

Build an orchestrator agent that takes a company name/URL as input, performs research, generates personalized outreach plans, and creates comprehensive lead research reports. This MVP focuses on the **research ‚Üí planning ‚Üí reporting** workflow without actual email/LinkedIn sending.

**Key Learning Goals:**
- Practice multi-source data collection (web search, company data)
- Learn LLM-powered personalization patterns
- Build orchestrator patterns for multi-stage workflows
- Create actionable, formatted reports

**MVP Scope (Incremental Approach):**
- ‚úÖ Single company input (Target for testing)
- ‚úÖ Real web research (Tavily API - user has key)
- ‚úÖ **Fixed template for decision-makers** (not LLM-generated, focus on orchestration)
- ‚úÖ **Fixed defaults for ICP scoring** (not configurable, get logic working first)
- ‚úÖ **Dummy data for personalization** (get MVP working, then improve incrementally)
- ‚úÖ Markdown reports as output
- ‚ùå No CRM integration (mock state only)
- ‚ùå No email/LinkedIn sending (just generate drafts)
- ‚ùå No real contact enrichment APIs (use placeholders)

**Development Philosophy:**
Start with templates/defaults/dummy data ‚Üí Get orchestration working ‚Üí Replace dummy data incrementally (one section at a time) ‚Üí Test ‚Üí Debug ‚Üí Isolate issues ‚Üí Move to next section

---

## What This Agent Does (MVP)

1. **Research** company background, industry, pain points, buying signals
2. **Plan** personalized outreach strategy (message drafts, channel selection, timing)
3. **Generate** comprehensive lead research report and outreach plan

**Input:** Company name and website URL (optional product/service description)  
**Output:** Lead research report (markdown) + Outreach plan (JSON + markdown)

---

## State Schema (Plain English)

The agent will track:

**Input Fields:**
- `company_name`: Company name to research (e.g., "Acme Corp")
- `company_website`: Optional website URL
- `product_service`: Optional description of what we're selling (for personalization)

**Research Data:**
- `company_research`: Raw research data from web search
  - Company overview (industry, size, revenue estimates, growth stage)
  - Recent news/articles (pain points, buying signals)
  - Technology stack (if available)
  - Job postings (hiring signals, needs)
  - Industry trends (market context)
- `decision_makers`: Mock/synthetic decision-maker data (for MVP)
  - Name, title, LinkedIn placeholder
  - Role in decision-making process
  - Contact preference (email vs LinkedIn)

**Analysis Results:**
- `company_profile`: Structured company profile
  - Industry, size, revenue estimate, growth stage
  - Key pain points (from research)
  - Buying signals (funding, hiring, expansion)
  - ICP fit score (0-100, based on criteria)
  - Technology alignment
- `pain_points`: Extracted pain points and challenges
- `buying_signals`: Identified buying signals (funding, hiring, news, etc.)
- `fit_assessment`: ICP fit analysis
  - Fit score (0-100)
  - Fit reasons (why it's a good/bad fit)
  - Priority level (high/medium/low)

**Outreach Planning:**
- `outreach_plan`: Personalized outreach strategy
  - Target contact (decision-maker)
  - Channel recommendation (email vs LinkedIn)
  - Timing recommendation (best day/time)
  - Value proposition (personalized based on research)
  - Message drafts (initial + follow-up sequence)
  - Personalization elements (company-specific hooks)

**Output:**
- `research_report`: Generated lead research report (markdown)
- `outreach_plan_markdown`: Formatted outreach plan (markdown)
- `report_file_paths`: Paths to saved report files

**Metadata:**
- `errors`: Any errors encountered
- `processing_time`: Time taken
- `research_sources`: List of sources used for research

---

## Node Design (Minimal Linear Flow for MVP)

Following your guide: **Start with minimal linear flow, add conditional routing later when needed.**

### Node 1: `goal_node` - Define Research Goal
**Purpose:** Set up the research and outreach planning framework

**What it does:**
- Define research goal (understand company, identify pain points, assess fit)
- Set up outreach planning objective (create personalized outreach strategy)
- Structure goal as dictionary with:
  - Research objectives (what to find out)
  - Outreach objectives (what to create)
  - Product/service context (for personalization)
  - ICP criteria (ideal customer profile match criteria)

**Input (State):**
- `company_name`
- `company_website` (optional)
- `product_service` (optional)

**Output (State):**
- `goal`: Research and outreach planning goal definition

**Logic:**
- Fixed structure (template-based, no LLM needed)
- Use standard research objectives
- Include product/service context if provided

---

### Node 2: `planning_node` - Create Research & Outreach Plan
**Purpose:** Define the research strategy and outreach planning approach

**What it does:**
- Create execution plan based on goal
- Define research steps (company overview, pain points, buying signals, decision-makers)
- Define outreach planning steps (personalization strategy, message crafting, channel selection)
- Structure plan as list of steps

**Input (State):**
- `goal`

**Output (State):**
- `plan`: Execution plan with research and outreach planning steps

**Logic:**
- Template-based plan (no LLM needed)
- Plan structure: step number, action, node responsible

---

### Node 3: `research_node` - Collect Company Data
**Purpose:** Research company using web search and extract structured insights

**What it does:**
- Perform web searches for:
  - Company overview (industry, size, revenue, growth stage)
  - Recent news/articles (pain points, challenges, initiatives)
  - Technology stack (if available)
  - Job postings (hiring signals, needs)
  - Industry trends (market context)
- Use multiple search queries (per framework-specific search strategy pattern)
- Collect data from multiple sources (Tavily, web search, etc.)
- Extract structured insights:
  - Company profile (industry, size, revenue estimate, growth stage)
  - Pain points (from news, articles, job postings)
  - Buying signals (funding, hiring, expansion, technology adoption)
  - Technology alignment (if available)

**Input (State):**
- `company_name`
- `company_website` (optional - helps with search)
- `goal` (research objectives)
- `plan` (research steps)

**Output (State):**
- `company_research`: Raw research data (search results, articles, etc.)
- `company_profile`: Structured company profile
- `pain_points`: Extracted pain points
- `buying_signals`: Identified buying signals
- `research_sources`: List of sources used

**Logic:**
- Web search API calls (Tavily, SerpAPI, or Bing)
- Multiple targeted queries (e.g., "{company} industry", "{company} challenges", "{company} funding")
- LLM-powered extraction (summarize search results, extract structured data)
- **Error handling:**
  - API failures ‚Üí fail gracefully, log error, continue with available data
  - No results found ‚Üí log warning, use partial data
  - Invalid responses ‚Üí retry once, then fail gracefully

**Challenges:**
- Extracting structured data from unstructured web search results
- Identifying relevant pain points from news/articles
- Determining company size/revenue from public sources (may need estimates)
- Identifying buying signals from various sources

---

### Node 4: `analyze_node` - Analyze Fit & Generate Decision-Makers
**Purpose:** Assess ICP fit and generate mock decision-maker data

**What it does:**
- Analyze company profile against **fixed default ICP criteria**
- Calculate fit score (0-100) using **deterministic algorithm**
- Generate fit assessment (fit reasons, priority level)
- **Generate decision-makers using fixed template** (not LLM-generated for MVP):
  - Use template structure: VP/Director level roles
  - Create synthetic contact data (name, title, LinkedIn placeholder)
  - Mark clearly as "PLACEHOLDER" or "FAKE"
  - Recommend contact preference (email vs LinkedIn)

**Input (State):**
- `company_profile`
- `pain_points`
- `buying_signals`
- `goal` (ICP criteria)

**Output (State):**
- `fit_assessment`: ICP fit analysis (score, reasons, priority)
- `decision_makers`: Mock decision-maker data (synthetic for MVP)

**Logic:**
- Fit scoring algorithm (deterministic, based on **fixed default ICP criteria**)
- **Fixed template for decision-makers** (not LLM-generated for MVP):
  - Template: ["VP of Sales", "Director of Operations", "VP of Engineering"]
  - Generate 1-3 decision-makers using template
  - Mark clearly as "PLACEHOLDER" in output
- **Error handling:**
  - Missing company data ‚Üí lower fit score, log warning
  - Use default template structure (no LLM calls needed for MVP)

**Notes:**
- **MVP:** Fixed template (focus on orchestration, not data quality)
- **Phase 2:** Replace with LLM-generated or real APIs
- Clearly mark all placeholder data in output

---

### Node 5: `outreach_plan_node` - Generate Personalized Outreach Plan
**Purpose:** Create personalized outreach strategy and message drafts

**What it does:**
- **MVP: Use dummy data for personalization** (get orchestration working first):
  - Use template message with company name insertion
  - Use dummy pain points and buying signals
  - Generate basic outreach plan structure
- **Phase 2: Replace with LLM-powered personalization**:
  - Analyze company research and pain points (real data)
  - Craft personalized value proposition (based on research findings)
  - Select best channel (email vs LinkedIn) based on decision-maker profile
  - Determine optimal timing (industry patterns, timezone)
  - Create initial message draft (personalized, value-focused, concise)
  - Create follow-up sequence (if no response)
- Include personalization elements (dummy for MVP, real for Phase 2):
  - Company-specific hooks (reference recent news, funding, initiatives)
  - Pain point references (address specific challenges)
  - Value proposition alignment (connect product/service to company needs)

**Input (State):**
- `company_profile`
- `pain_points`
- `buying_signals`
- `fit_assessment`
- `decision_makers`
- `product_service` (for personalization)
- `goal` (outreach objectives)

**Output (State):**
- `outreach_plan`: Personalized outreach strategy
  - Target contact (decision-maker)
  - Channel recommendation
  - Timing recommendation
  - Value proposition
  - Message drafts (initial + follow-ups)
  - Personalization elements

**Logic:**
- **MVP: Template-based with dummy data** (company name insertion only)
- **Phase 2: LLM-powered message generation** (personalized based on real research)
- Template-based structure (ensure consistent format)
- **Error handling:**
  - LLM API failures ‚Üí retry once, then fail gracefully (Phase 2)
  - Invalid responses ‚Üí log error, use fallback template

**Development Strategy:**
- **MVP:** Get basic outreach plan structure working (dummy data)
- **Phase 2:** Replace dummy data with real research ‚Üí test ‚Üí debug ‚Üí isolate issues
- **Phase 3:** Improve message quality (LLM personalization) ‚Üí test ‚Üí debug

---

### Node 6: `report_node` - Generate Lead Research Report
**Purpose:** Generate comprehensive markdown reports (research report + outreach plan)

**What it does:**
- Generate lead research report using Jinja2 template:
  - Company overview
  - Key decision-makers (marked as placeholder for MVP)
  - Pain points and challenges
  - Buying signals
  - Fit assessment
  - Recommended outreach approach
- Generate outreach plan document:
  - Outreach strategy summary
  - Personalized message drafts
  - Follow-up sequence
  - Channel and timing recommendations
- Save reports to files (e.g., `lead_research_<company>_<timestamp>.md`)

**Input (State):**
- `company_profile`
- `pain_points`
- `buying_signals`
- `fit_assessment`
- `decision_makers`
- `outreach_plan`
- `research_sources`

**Output (State):**
- `research_report`: Generated lead research report (markdown)
- `outreach_plan_markdown`: Formatted outreach plan (markdown)
- `report_file_paths`: Paths to saved report files

**Logic:**
- Jinja2 template rendering
- Format markdown with proper headers, sections, tables
- Include clear indicators for placeholder/synthetic data (decision-makers)
- **Error handling:** Template render fail ‚Üí fail immediately (can't produce output without template)

---

## Graph Wiring (MVP: Linear Flow)

**Start with linear flow only:**

```
goal ‚Üí planning ‚Üí research ‚Üí analyze ‚Üí outreach_plan ‚Üí report ‚Üí END
```

**Rationale:** MVP doesn't need conditional routing initially. All companies go through the same research ‚Üí planning ‚Üí reporting pipeline. We can add conditional routing later if we want to:
- Skip certain research steps for known companies
- Route to different outreach strategies based on fit score
- Handle batch processing (multiple companies)

**Later Enhancement (Not MVP):**
- After `analyze_node`: if fit score < 50, route to "low priority" report format
- After `research_node`: if insufficient data, route to enhanced research or flag for manual review
- Batch processing: if directory/CSV input, route to batch processing subgraph

But for MVP: **Linear flow is sufficient.**

---

## Error Handling Strategy Matrix

| Error Type | Strategy | Example |
|------------|----------|---------|
| **Company name not found** | Fail gracefully | Log warning, continue with partial data, note in report |
| **Web search API failure** | Retry once, then fail gracefully | Log error, use available data, continue with partial results |
| **No research results** | Log warning, continue | Use partial data, note data gaps in report |
| **LLM API failure** | Retry once, then fail gracefully | Log error, use fallback template, continue |
| **Invalid LLM response** | Retry once, then fail gracefully | Log error, use fallback, continue |
| **Template render fail** | Fail immediately | Can't produce output without template |
| **File write fail** | Fail immediately | Can't save reports |

**Principle:** Fail fast for output issues; fail gracefully for research/analysis issues (continue with partial data, log warnings).

---

## Data Sources & Research Strategy

### Web Search Queries (Framework-Specific Pattern)

**Research Strategy:**
```python
RESEARCH_QUERIES = {
    "company_overview": [
        f"{company_name} company overview",
        f"{company_name} industry size revenue",
        f"{company_name} growth stage funding"
    ],
    "pain_points": [
        f"{company_name} challenges problems",
        f"{company_name} pain points",
        f"{company_name} recent news struggles"
    ],
    "buying_signals": [
        f"{company_name} funding raised",
        f"{company_name} hiring jobs",
        f"{company_name} expansion growth",
        f"{company_name} technology adoption"
    ],
    "technology": [
        f"{company_name} technology stack",
        f"{company_name} software tools",
        f"{company_name} infrastructure"
    ]
}
```

**Data Sources:**
- **Primary:** Tavily API (web search) or SerpAPI/Google Search
- **Secondary:** Company website (if provided)
- **Future:** Crunchbase, Pitchbook (if API access available)

**Research Collection:**
- Execute multiple queries per category
- Collect top N results per query
- Aggregate and deduplicate results
- Extract structured data using LLM

---

## ICP Fit Scoring (Deterministic - Fixed Defaults for MVP)

**ICP Criteria (Fixed Defaults - MVP):**
- Company size (employee count): 100-1000 employees = high fit (20 points)
- Industry: Retail/Technology = high fit (20 points)
- Growth stage: Established = medium fit (15 points)
- Technology alignment: Using similar tools = high fit (20 points)
- Pain points: Match our solution = high fit (25 points)

**Scoring Formula:**
- Each criterion contributes points (fixed for MVP)
- Total score: 0-100
- Priority levels:
  - High: 70-100
  - Medium: 40-69
  - Low: 0-39

**MVP Strategy:**
- Use fixed defaults to get scoring logic working
- **Phase 2:** Make criteria configurable (via goal/config)
- **Phase 3:** Add more sophisticated scoring (ML models, etc.)

---

## Folder Structure

```
project_root/
‚îú‚îÄ‚îÄ agents/
‚îÇ   ‚îî‚îÄ‚îÄ b2b_sales_orchestrator_agent.py    # LangGraph workflow (after smoke test)
‚îú‚îÄ‚îÄ nodes/
‚îÇ   ‚îú‚îÄ‚îÄ __init__.py
‚îÇ   ‚îú‚îÄ‚îÄ goal_node.py                       # Node 1: Define research goal
‚îÇ   ‚îú‚îÄ‚îÄ planning_node.py                   # Node 2: Create research plan
‚îÇ   ‚îú‚îÄ‚îÄ research_node.py                   # Node 3: Collect company data
‚îÇ   ‚îú‚îÄ‚îÄ analyze_node.py                    # Node 4: Analyze fit & generate decision-makers
‚îÇ   ‚îú‚îÄ‚îÄ outreach_plan_node.py              # Node 5: Generate personalized outreach plan
‚îÇ   ‚îî‚îÄ‚îÄ report_node.py                     # Node 6: Generate reports
‚îú‚îÄ‚îÄ prompts/
‚îÇ   ‚îú‚îÄ‚îÄ __init__.py
‚îÇ   ‚îú‚îÄ‚îÄ base_analyzer.py                   # Base prompt class (reuse existing)
‚îÇ   ‚îú‚îÄ‚îÄ research_prompt.py                 # Research extraction prompt
‚îÇ   ‚îî‚îÄ‚îÄ outreach_prompt.py                 # Outreach planning prompt
‚îú‚îÄ‚îÄ templates/
‚îÇ   ‚îú‚îÄ‚îÄ lead_research_report.md.j2         # Jinja2 template for research report
‚îÇ   ‚îî‚îÄ‚îÄ outreach_plan.md.j2                # Jinja2 template for outreach plan
‚îú‚îÄ‚îÄ utils/
‚îÇ   ‚îú‚îÄ‚îÄ __init__.py
‚îÇ   ‚îú‚îÄ‚îÄ web_search.py                      # Web search utilities (Tavily/SerpAPI)
‚îÇ   ‚îú‚îÄ‚îÄ research_parser.py                 # Research data extraction utilities
‚îÇ   ‚îî‚îÄ‚îÄ validators.py                      # Data validation utilities (reuse existing)
‚îú‚îÄ‚îÄ tests/
‚îÇ   ‚îú‚îÄ‚îÄ test_mvp_runner.py                 # ‚≠ê Smoke test (create first)
‚îÇ   ‚îî‚îÄ‚îÄ test_sales_orchestrator.py         # Integration test (after wiring)
‚îú‚îÄ‚îÄ config.py                              # State schema (SalesOrchestratorState TypedDict) + AgentConfig
‚îú‚îÄ‚îÄ requirements.txt
‚îî‚îÄ‚îÄ sales_reports/                         # Output directory for reports
```

---

## Implementation Order (Following Your Guide)

1. **Goal node** (simplest, defines structure) - Fixed logic, no dependencies
2. **Planning node** (uses goal) - Template-based, depends on goal structure
3. **Research node** (most complex, depends on setup) - Web search, LLM extraction
4. **Analyze node** (scoring logic, depends on research) - Fit assessment, mock data generation
5. **Outreach plan node** (LLM-powered, depends on analyze) - Personalized message generation
6. **Report node** (formats output) - Template rendering, file saving

**Why this order:** Build from simplest ‚Üí most complex, test each before dependencies.

---

## Testing Strategy

### ‚≠ê Smoke Test First (Before LangGraph)
Create `test_mvp_runner.py` that:
- Manually calls nodes in sequence
- Tests with sample company name (e.g., "Zoom", "Notion", "Stripe")
- Verifies state contracts (what each node reads/writes)
- Catches 90% of contract issues before graph complexity

**Example:**
```python
def test_linear_flow():
    state = {
        "company_name": "Zoom",
        "company_website": "https://zoom.us",
        "product_service": "AI-driven sales analytics platform",
        "errors": []
    }
    
    state = goal_node(state)
    assert "goal" in state
    
    state = planning_node(state)
    assert "plan" in state
    
    state = research_node(state)
    assert "company_profile" in state
    
    state = analyze_node(state)
    assert "fit_assessment" in state
    
    state = outreach_plan_node(state)
    assert "outreach_plan" in state
    
    state = report_node(state)
    assert "research_report" in state
    
    print("‚úÖ All nodes passed smoke test!")
```

### Unit Tests
- Test web search utilities with real API calls (mock responses)
- Test research parser with sample search results
- Test fit scoring algorithm with known inputs
- Test outreach message generation (personalization logic)

### Integration Tests
- Test full workflow with real company names (Zoom, Notion, Stripe)
- Test error handling (API failures, no results, etc.)
- Validate output reports (markdown format, completeness)

---

## Key Challenges & Solutions

### Challenge 1: Extracting Structured Data from Web Search
**Problem:** Web search returns unstructured text, need to extract structured company profile.

**Solution:**
- Use LLM to extract structured data from search results
- Define clear extraction schema (company profile, pain points, buying signals)
- Use Pydantic schemas for validation
- Retry logic if extraction fails

### Challenge 2: Generating Personalized Messages
**Problem:** Need truly personalized messages, not generic templates.

**Solution:**
- Use LLM with company research context
- Include specific pain points and buying signals in prompt
- Generate company-specific hooks (reference recent news, funding)
- Validate personalization (check for company name, specific references)

### Challenge 3: Mock Decision-Maker Data
**Problem:** Need realistic but clearly synthetic decision-makers.

**Solution (MVP):**
- Use **fixed template** (not LLM-generated)
- Template: ["VP of Sales", "Director of Operations", "VP of Engineering"]
- Generate 1-3 decision-makers using template
- Mark clearly as "PLACEHOLDER" in output
- **Phase 2:** Replace with LLM-generated or real APIs

### Challenge 4: ICP Fit Scoring
**Problem:** Need objective fit scoring criteria.

**Solution:**
- Define clear ICP criteria upfront
- Use deterministic scoring algorithm
- Make criteria configurable (via goal or config)
- Document scoring rationale in report

---

## Success Criteria

### MVP Success Criteria
- ‚úÖ Successfully researches real companies using web search
- ‚úÖ Extracts structured company profile, pain points, buying signals
- ‚úÖ Calculates ICP fit score
- ‚úÖ Generates personalized outreach messages
- ‚úÖ Creates comprehensive lead research reports
- ‚úÖ Handles API failures gracefully (continues with partial data)

### Quality Gates
- Smoke test passes (all nodes execute in sequence)
- Unit tests pass (web search, research parser, fit scorer)
- Integration test passes (full workflow with real company)
- Reports render correctly from templates
- Personalization validated (messages contain company-specific elements)
- Error handling works for common failure modes

---

## Future Enhancements (Not MVP)

1. **Conditional Routing:** Route based on fit score (high fit ‚Üí proposal, low fit ‚Üí nurture)
2. **Batch Processing:** Support CSV input with multiple companies
3. **Real Contact Enrichment:** Integrate Apollo, Clearbit, or Clay APIs
4. **Response Handling:** Analyze email/LinkedIn responses and route accordingly
5. **CRM Integration:** Sync with HubSpot, Salesforce
6. **Follow-up Sequences:** Automated follow-up tracking and sending
7. **Lead Scoring:** More sophisticated scoring with ML models
8. **Multi-Channel Outreach:** Actual email/LinkedIn sending

---

## Decisions Made ‚úÖ

1. **Input Format:** ‚úÖ Single company name (Target for MVP testing)
2. **Web Search API:** ‚úÖ Tavily (user has API key)
3. **Decision-Maker Generation:** ‚úÖ Fixed template (not LLM-generated for MVP)
4. **ICP Criteria:** ‚úÖ Fixed defaults (not configurable for MVP)
5. **Message Personalization:** ‚úÖ Dummy data initially, replace incrementally

**Development Strategy:**
- Phase 1: MVP with templates/defaults/dummy data ‚Üí Get orchestration working
- Phase 2: Replace dummy data incrementally ‚Üí Test ‚Üí Debug ‚Üí Isolate ‚Üí Next section

---

## Next Steps ‚úÖ

1. ‚úÖ Review and refine scaffold plan
2. ‚úÖ Create `PROJECT_REQUIREMENTS.md` entry
3. ‚è≠Ô∏è Begin implementation following the implementation order:
   - Create state schema in `config.py`
   - Implement nodes (goal ‚Üí planning ‚Üí research ‚Üí analyze ‚Üí outreach ‚Üí report)
   - Create smoke test runner
   - Wire nodes into LangGraph
4. ‚è≠Ô∏è Test with Target company
5. ‚è≠Ô∏è Iterate on quality improvements (incremental approach)

---

*End of Scaffold Plan*





## State schema created

### `SalesOrchestratorState` TypedDict
Includes fields from the scaffold plan:

**Input fields:**
- `company_name`, `company_website`, `product_service`

**Goal & Planning:**
- `goal`, `plan`

**Research Data:**
- `company_research` (raw web search results)
- `research_sources` (list of sources)

**Analysis Results:**
- `company_profile` (structured profile)
- `pain_points` (list of challenges)
- `buying_signals` (list of signals)
- `fit_assessment` (ICP fit score, reasons, priority)
- `decision_makers` (template-based for MVP)

**Outreach Planning:**
- `outreach_plan` (strategy, messages, personalization)

**Output:**
- `research_report`, `outreach_plan_markdown`, `report_file_paths`

**Metadata:**
- `errors`, `processing_time`

### `SalesOrchestratorConfig` dataclass
- LLM settings (model, temperature)
- Tavily API key (from env)
- Output directory (`sales_reports/`)
- Fixed ICP criteria defaults (MVP)

### Features
- Type-safe TypedDict with `total=False` (optional fields)
- Inline documentation for each field
- Structure examples in comments
- Proper dataclass pattern with `field(default_factory)` for mutable defaults
- Preserves existing `ArticleSummaryState` for backward compatibility



In [None]:
"""Configuration and state schema for AI Agents"""

from typing import TypedDict, Optional, List, Dict, Any
from dataclasses import dataclass, field
from dotenv import load_dotenv
from pathlib import Path
import os

# Load environment variables from API_KEYS.env file
env_path = Path(__file__).parent / "API_KEYS.env"
load_dotenv(dotenv_path=env_path)


# ============================================================================
# Article Summarization Agent (Existing)
# ============================================================================

class ArticleSummaryState(TypedDict, total=False):
    """State for article summarization agent"""

    # Input fields
    article_path: str                    # Path to article file

    # Goal & Planning fields (MVP: Fixed goal, template-based plan)
    goal: Dict[str, Any]                 # Fixed goal definition
    plan: List[Dict[str, Any]]           # Execution plan

    # Article fields
    article_content: str                 # Full article text
    article_title: Optional[str]         # Extracted title
    article_author: Optional[str]        # Extracted author
    article_date: Optional[str]          # Extracted date

    # Processing fields
    extracted_sections: Dict[str, Any]  # Structured insights from LLM

    # Output fields
    summary_markdown: str                # Final formatted output
    summary_file_path: Optional[str]     # Path to saved summary file

    # Metadata
    errors: List[str]                    # Any errors encountered
    processing_time: Optional[float]     # Time taken to process


# ============================================================================
# B2B Sales Orchestrator Agent
# ============================================================================

class SalesOrchestratorState(TypedDict, total=False):
    """State for B2B sales orchestrator agent"""

    # Input fields
    company_name: str                   # Company name to research (e.g., "Target")
    company_website: Optional[str]       # Optional website URL
    product_service: Optional[str]       # Optional description of what we're selling

    # Goal & Planning fields (MVP: Fixed goal, template-based plan)
    goal: Dict[str, Any]                 # Research and outreach planning goal definition
    plan: List[Dict[str, Any]]           # Execution plan

    # Research Data
    company_research: Dict[str, Any]     # Raw research data from web search
    # Structure:
    # {
    #   "company_overview": {...},
    #   "recent_news": [...],
    #   "technology_stack": [...],
    #   "job_postings": [...],
    #   "industry_trends": [...]
    # }

    research_sources: List[str]          # List of sources used for research

    # Analysis Results
    company_profile: Dict[str, Any]       # Structured company profile
    # Structure:
    # {
    #   "industry": str,
    #   "size": str,  # e.g., "500-1000 employees"
    #   "revenue_estimate": str,  # e.g., "$1B-$10B"
    #   "growth_stage": str,  # e.g., "Established", "Growth", "Startup"
    #   "technology_alignment": Optional[str]
    # }

    pain_points: List[str]               # Extracted pain points and challenges

    buying_signals: List[Dict[str, Any]] # Identified buying signals
    # Structure:
    # [
    #   {"type": "funding", "description": "...", "date": "..."},
    #   {"type": "hiring", "description": "...", "date": "..."},
    #   ...
    # ]

    fit_assessment: Dict[str, Any]       # ICP fit analysis
    # Structure:
    # {
    #   "fit_score": int,  # 0-100
    #   "fit_reasons": List[str],
    #   "priority_level": str,  # "high", "medium", "low"
    #   "icp_criteria_met": Dict[str, bool]
    # }

    decision_makers: List[Dict[str, Any]] # Mock/synthetic decision-maker data (MVP: template-based)
    # Structure:
    # [
    #   {
    #     "name": str,  # "PLACEHOLDER - John Smith"
    #     "title": str,  # "VP of Sales"
    #     "linkedin_placeholder": str,  # "linkedin.com/in/placeholder"
    #     "role": str,  # "Decision Maker" or "Influencer"
    #     "contact_preference": str  # "email" or "linkedin"
    #   },
    #   ...
    # ]

    # Outreach Planning
    outreach_plan: Dict[str, Any]        # Personalized outreach strategy
    # Structure:
    # {
    #   "target_contact": Dict[str, Any],  # Selected decision-maker
    #   "channel": str,  # "email" or "linkedin"
    #   "timing": str,  # "Tuesday 10am EST"
    #   "value_proposition": str,  # Personalized value prop
    #   "message_drafts": {
    #     "initial": str,
    #     "follow_up_1": str,
    #     "follow_up_2": str
    #   },
    #   "personalization_elements": List[str]  # Company-specific hooks
    # }

    # Output fields
    research_report: str                 # Generated lead research report (markdown)
    outreach_plan_markdown: str          # Formatted outreach plan (markdown)
    report_file_paths: Dict[str, str]   # Paths to saved report files
    # Structure:
    # {
    #   "research_report": "path/to/research_report.md",
    #   "outreach_plan": "path/to/outreach_plan.md"
    # }

    # Metadata
    errors: List[str]                    # Any errors encountered
    processing_time: Optional[float]    # Time taken to process


# ============================================================================
# Agent Configuration Classes
# ============================================================================

@dataclass
class AgentConfig:
    """Configuration for Article Summarization Agent"""
    llm_model: str = os.getenv("LLM_MODEL", "gpt-4o-mini")
    temperature: float = 0.3
    articles_dir: str = "articles"
    summaries_dir: str = "article_summaries"  # Where to save summaries
    template_path: str = "articles/_Article_Summarization_Template copy.txt"


@dataclass
class SalesOrchestratorConfig:
    """Configuration for B2B Sales Orchestrator Agent"""
    llm_model: str = os.getenv("LLM_MODEL", "gpt-4o-mini")
    temperature: float = 0.3
    tavily_api_key: str = os.getenv("TAVILY_API_KEY", "")
    sales_reports_dir: str = "sales_reports"  # Where to save reports

    # ICP Scoring Defaults (MVP: Fixed)
    icp_criteria: Dict[str, Any] = field(default_factory=lambda: {
        "company_size_min": 100,
        "company_size_max": 1000,
        "preferred_industries": ["Retail", "Technology", "SaaS"],
        "growth_stages": ["Established", "Growth"],
        "scoring_weights": {
            "company_size": 20,
            "industry": 20,
            "growth_stage": 15,
            "technology_alignment": 20,
            "pain_points": 25
        }
    })



# Goal Node

In [None]:
"""Goal node - Define research and outreach planning goal"""

import logging
from typing import Dict, Any
from config import SalesOrchestratorState

logger = logging.getLogger(__name__)


def goal_node(state: SalesOrchestratorState) -> SalesOrchestratorState:
    """Define research and outreach planning goal

    Creates a fixed goal structure with:
    - Research objectives (what to find out about the company)
    - Outreach objectives (what to create for outreach)
    - Product/service context (for personalization)
    - ICP criteria (ideal customer profile match criteria)
    """
    logger.info("üéØ Defining research and outreach planning goal...")

    try:
        company_name = state.get("company_name", "")
        company_website = state.get("company_website")
        product_service = state.get("product_service")

        # Define research objectives
        research_objectives = [
            "Understand company background (industry, size, revenue, growth stage)",
            "Identify pain points and challenges from recent news/articles",
            "Find buying signals (funding, hiring, expansion, technology adoption)",
            "Assess technology stack and alignment (if available)",
            "Research industry trends and market context"
        ]

        # Define outreach objectives
        outreach_objectives = [
            "Create personalized outreach strategy",
            "Generate message drafts (initial + follow-up sequence)",
            "Select optimal channel (email vs LinkedIn)",
            "Determine best timing for outreach",
            "Develop company-specific value proposition"
        ]

        # Build goal structure
        goal: Dict[str, Any] = {
            "objective": f"Research {company_name} and create personalized outreach plan",
            "company_name": company_name,
            "company_website": company_website,
            "product_service": product_service or "Not specified",
            "research_objectives": research_objectives,
            "outreach_objectives": outreach_objectives,
            "icp_criteria": {
                "company_size_min": 100,
                "company_size_max": 1000,
                "preferred_industries": ["Retail", "Technology", "SaaS"],
                "growth_stages": ["Established", "Growth"],
                "scoring_weights": {
                    "company_size": 20,
                    "industry": 20,
                    "growth_stage": 15,
                    "technology_alignment": 20,
                    "pain_points": 25
                }
            }
        }

        state["goal"] = goal
        logger.info(f"‚úÖ Goal defined for {company_name}")

    except Exception as e:
        logger.error(f"Error in goal_node: {e}")
        if "errors" not in state:
            state["errors"] = []
        state["errors"].append(f"Goal definition failed: {str(e)}")

    return state



# Planning Node

In [None]:
"""Planning node - Create research and outreach planning execution plan"""

import logging
from typing import List, Dict, Any
from config import SalesOrchestratorState

logger = logging.getLogger(__name__)


def planning_node(state: SalesOrchestratorState) -> SalesOrchestratorState:
    """Create execution plan for research and outreach

    Creates a template-based plan with:
    - Research steps (company overview, pain points, buying signals)
    - Outreach planning steps (personalization, message crafting, channel selection)
    """
    logger.info("üìã Creating execution plan...")

    try:
        goal = state.get("goal", {})

        # Create execution plan based on goal objectives
        plan: List[Dict[str, Any]] = [
            {
                "step": 1,
                "action": "Define research and outreach planning goal",
                "node": "goal",
                "status": "completed"
            },
            {
                "step": 2,
                "action": "Create execution plan",
                "node": "planning",
                "status": "in_progress"
            },
            {
                "step": 3,
                "action": "Research company using web search (Tavily)",
                "node": "research",
                "research_focus": [
                    "Company overview (industry, size, revenue, growth stage)",
                    "Recent news/articles (pain points, challenges)",
                    "Buying signals (funding, hiring, expansion)",
                    "Technology stack (if available)",
                    "Industry trends"
                ]
            },
            {
                "step": 4,
                "action": "Analyze company fit and generate decision-makers",
                "node": "analyze",
                "analysis_focus": [
                    "Calculate ICP fit score using fixed criteria",
                    "Generate decision-makers using template (MVP)",
                    "Assess fit reasons and priority level"
                ]
            },
            {
                "step": 5,
                "action": "Generate personalized outreach plan",
                "node": "outreach_plan",
                "outreach_focus": [
                    "Create outreach strategy (channel, timing)",
                    "Generate message drafts (initial + follow-ups)",
                    "Develop value proposition (dummy data for MVP)"
                ]
            },
            {
                "step": 6,
                "action": "Generate lead research report and outreach plan",
                "node": "report",
                "report_focus": [
                    "Format research report using Jinja2 template",
                    "Format outreach plan using Jinja2 template",
                    "Save reports to sales_reports/ directory"
                ]
            }
        ]

        state["plan"] = plan
        logger.info("‚úÖ Execution plan created")

    except Exception as e:
        logger.error(f"Error in planning_node: {e}")
        if "errors" not in state:
            state["errors"] = []
        state["errors"].append(f"Planning failed: {str(e)}")

    return state



# Smoke Test

In [None]:
"""Smoke test runner - Test nodes manually in sequence before LangGraph wiring"""

import logging
from config import SalesOrchestratorState
from nodes import goal_node, planning_node
# from nodes import research_node, analyze_node, outreach_plan_node, report_node

# Set up logging for visibility
logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')


def test_linear_flow():
    """Test nodes manually in sequence before LangGraph wiring"""
    print("=" * 60)
    print("üß™ Smoke Test: B2B Sales Orchestrator Agent")
    print("=" * 60)

    # Start with minimal state
    state: SalesOrchestratorState = {
        "company_name": "Target",
        "company_website": "https://target.com",
        "product_service": "AI-driven sales analytics platform",
        "errors": []
    }

    print(f"\nüì• Initial State:")
    print(f"  Company: {state['company_name']}")
    print(f"  Website: {state.get('company_website', 'N/A')}")
    print(f"  Product: {state.get('product_service', 'N/A')}")

    # Execute nodes in sequence
    print("\n" + "-" * 60)
    print("Testing goal node...")
    print("-" * 60)
    state = goal_node(state)
    assert "goal" in state, "Goal node should add 'goal' to state"
    assert state["goal"]["company_name"] == "Target", "Goal should contain company name"
    print(f"‚úÖ Goal node passed")
    print(f"   Goal objective: {state['goal']['objective']}")

    print("\n" + "-" * 60)
    print("Testing planning node...")
    print("-" * 60)
    state = planning_node(state)
    assert "plan" in state, "Planning node should add 'plan' to state"
    assert len(state["plan"]) == 6, "Plan should have 6 steps"
    print(f"‚úÖ Planning node passed")
    print(f"   Plan has {len(state['plan'])} steps")

    # TODO: Uncomment when remaining nodes are implemented
    # print("\n" + "-" * 60)
    # print("Testing research node...")
    # print("-" * 60)
    # state = research_node(state)
    # assert "company_profile" in state, "Research node should add 'company_profile' to state"
    # print(f"‚úÖ Research node passed")
    #
    # print("\n" + "-" * 60)
    # print("Testing analyze node...")
    # print("-" * 60)
    # state = analyze_node(state)
    # assert "fit_assessment" in state, "Analyze node should add 'fit_assessment' to state"
    # print(f"‚úÖ Analyze node passed")
    #
    # print("\n" + "-" * 60)
    # print("Testing outreach plan node...")
    # print("-" * 60)
    # state = outreach_plan_node(state)
    # assert "outreach_plan" in state, "Outreach plan node should add 'outreach_plan' to state"
    # print(f"‚úÖ Outreach plan node passed")
    #
    # print("\n" + "-" * 60)
    # print("Testing report node...")
    # print("-" * 60)
    # state = report_node(state)
    # assert "research_report" in state, "Report node should add 'research_report' to state"
    # print(f"‚úÖ Report node passed")

    print("\n" + "=" * 60)
    print("‚úÖ Smoke test completed successfully!")
    print(f"   Nodes tested: 2/6 (goal, planning)")
    print(f"   Errors: {len(state.get('errors', []))}")
    if state.get("errors"):
        print(f"   Error details: {state['errors']}")
    print("=" * 60)

    return state

if __name__ == "__main__":
    test_linear_flow()



# Test Results

In [None]:
(.venv) micahshull@Micahs-iMac LG_Cursor_012 % python tests/test_mvp_runner.py
============================================================
üß™ Smoke Test: B2B Sales Orchestrator Agent
============================================================

üì• Initial State:
  Company: Target
  Website: https://target.com
  Product: AI-driven sales analytics platform

------------------------------------------------------------
Testing goal node...
------------------------------------------------------------
INFO: üéØ Defining research and outreach planning goal...
INFO: ‚úÖ Goal defined for Target
‚úÖ Goal node passed
   Goal objective: Research Target and create personalized outreach plan

------------------------------------------------------------
Testing planning node...
------------------------------------------------------------
INFO: üìã Creating execution plan...
INFO: ‚úÖ Execution plan created
‚úÖ Planning node passed
   Plan has 6 steps

============================================================
‚úÖ Smoke test completed successfully!
   Nodes tested: 2/6 (goal, planning)
   Errors: 0
============================================================

# Analyze Node


In [None]:
"""Analyze node - Assess ICP fit and generate decision-makers"""

import logging
from typing import List, Dict, Any
from config import SalesOrchestratorState

logger = logging.getLogger(__name__)


def _calculate_fit_score(company_profile: Dict[str, Any], goal: Dict[str, Any], pain_points: List[str]) -> tuple[int, List[str], Dict[str, bool]]:
    """Calculate ICP fit score using fixed default criteria

    Returns:
        tuple: (fit_score (0-100), fit_reasons, icp_criteria_met)
    """
    icp_criteria = goal.get("icp_criteria", {})
    weights = icp_criteria.get("scoring_weights", {})

    fit_score = 0
    fit_reasons = []
    criteria_met = {}

    # Check company size (dummy check for MVP - will use real data later)
    company_size_score = weights.get("company_size", 20)
    # For MVP: assume medium fit if we have company data
    if company_profile:
        fit_score += company_size_score * 0.7  # 70% of points
        criteria_met["company_size"] = True
        fit_reasons.append("Company size within acceptable range")
    else:
        criteria_met["company_size"] = False

    # Check industry (dummy check for MVP)
    industry_score = weights.get("industry", 20)
    preferred_industries = icp_criteria.get("preferred_industries", [])
    company_industry = company_profile.get("industry", "").lower() if company_profile else ""

    if any(ind.lower() in company_industry for ind in preferred_industries if company_industry):
        fit_score += industry_score
        criteria_met["industry"] = True
        fit_reasons.append(f"Industry ({company_profile.get('industry', 'Unknown')}) aligns with ICP")
    elif company_industry:
        fit_score += industry_score * 0.5  # Partial match
        criteria_met["industry"] = False
        fit_reasons.append(f"Industry ({company_profile.get('industry', 'Unknown')}) partially aligns")
    else:
        criteria_met["industry"] = False

    # Check growth stage
    growth_stage_score = weights.get("growth_stage", 15)
    preferred_stages = icp_criteria.get("growth_stages", [])
    company_stage = company_profile.get("growth_stage", "") if company_profile else ""

    if company_stage in preferred_stages:
        fit_score += growth_stage_score
        criteria_met["growth_stage"] = True
        fit_reasons.append(f"Growth stage ({company_stage}) aligns with ICP")
    elif company_stage:
        fit_score += growth_stage_score * 0.5
        criteria_met["growth_stage"] = False
    else:
        criteria_met["growth_stage"] = False

    # Check technology alignment (dummy check for MVP)
    tech_score = weights.get("technology_alignment", 20)
    if company_profile and company_profile.get("technology_alignment"):
        fit_score += tech_score * 0.8  # Assume good alignment
        criteria_met["technology_alignment"] = True
        fit_reasons.append("Technology stack shows alignment")
    else:
        criteria_met["technology_alignment"] = False

    # Check pain points (dummy check for MVP)
    pain_points_score = weights.get("pain_points", 25)
    if pain_points:
        fit_score += pain_points_score
        criteria_met["pain_points"] = True
        fit_reasons.append(f"Identified {len(pain_points)} relevant pain points")
    else:
        criteria_met["pain_points"] = False

    # Ensure score is between 0-100
    fit_score = max(0, min(100, int(fit_score)))

    return fit_score, fit_reasons, criteria_met


def _generate_decision_makers(company_name: str) -> List[Dict[str, Any]]:
    """Generate decision-makers using fixed template (MVP)

    Returns:
        List of decision-maker dictionaries with placeholder data
    """
    # Fixed template for MVP
    template_titles = [
        "VP of Sales",
        "Director of Operations",
        "VP of Engineering"
    ]

    decision_makers = []
    for i, title in enumerate(template_titles[:3], 1):  # Generate up to 3
        decision_maker = {
            "name": f"PLACEHOLDER - {title.split()[-1]} {i}",  # e.g., "PLACEHOLDER - Sales 1"
            "title": title,
            "linkedin_placeholder": f"linkedin.com/in/placeholder-{company_name.lower().replace(' ', '-')}-{i}",
            "role": "Decision Maker" if "VP" in title else "Influencer",
            "contact_preference": "linkedin" if "VP" in title else "email"
        }
        decision_makers.append(decision_maker)

    return decision_makers


def _determine_priority_level(fit_score: int) -> str:
    """Determine priority level based on fit score"""
    if fit_score >= 70:
        return "high"
    elif fit_score >= 40:
        return "medium"
    else:
        return "low"


def analyze_node(state: SalesOrchestratorState) -> SalesOrchestratorState:
    """Analyze company fit and generate decision-makers

    Calculates ICP fit score using fixed default criteria and generates
    decision-makers using a fixed template (MVP approach).
    """
    logger.info("üîç Analyzing company fit and generating decision-makers...")

    try:
        company_profile = state.get("company_profile", {})
        pain_points = state.get("pain_points", [])
        buying_signals = state.get("buying_signals", [])
        goal = state.get("goal", {})
        company_name = state.get("company_name", "Company")

        # Calculate fit score
        fit_score, fit_reasons, criteria_met = _calculate_fit_score(company_profile, goal, pain_points)

        # Determine priority level
        priority_level = _determine_priority_level(fit_score)

        # Build fit assessment
        fit_assessment: Dict[str, Any] = {
            "fit_score": fit_score,
            "fit_reasons": fit_reasons if fit_reasons else ["Insufficient data for full assessment"],
            "priority_level": priority_level,
            "icp_criteria_met": criteria_met
        }

        # Generate decision-makers using template
        decision_makers = _generate_decision_makers(company_name)

        # Update state
        state["fit_assessment"] = fit_assessment
        state["decision_makers"] = decision_makers

        logger.info(f"‚úÖ Fit analysis complete: Score {fit_score}/100, Priority: {priority_level}")
        logger.info(f"‚úÖ Generated {len(decision_makers)} decision-makers (template-based)")

    except Exception as e:
        logger.error(f"Error in analyze_node: {e}")
        if "errors" not in state:
            state["errors"] = []
        state["errors"].append(f"Fit analysis failed: {str(e)}")
        # Set defaults on error
        state["fit_assessment"] = {
            "fit_score": 0,
            "fit_reasons": ["Analysis failed"],
            "priority_level": "low",
            "icp_criteria_met": {}
        }
        state["decision_makers"] = []

    return state



# Test Results

In [None]:
(.venv) micahshull@Micahs-iMac LG_Cursor_012 % python tests/test_mvp_runner.py
============================================================
üß™ Smoke Test: B2B Sales Orchestrator Agent
============================================================

üì• Initial State:
  Company: Target
  Website: https://target.com
  Product: AI-driven sales analytics platform

------------------------------------------------------------
Testing goal node...
------------------------------------------------------------
INFO: üéØ Defining research and outreach planning goal...
INFO: ‚úÖ Goal defined for Target
‚úÖ Goal node passed
   Goal objective: Research Target and create personalized outreach plan

------------------------------------------------------------
Testing planning node...
------------------------------------------------------------
INFO: üìã Creating execution plan...
INFO: ‚úÖ Execution plan created
‚úÖ Planning node passed
   Plan has 6 steps

------------------------------------------------------------
Testing analyze node...
------------------------------------------------------------
INFO: üîç Analyzing company fit and generating decision-makers...
INFO: ‚úÖ Fit analysis complete: Score 74/100, Priority: high
INFO: ‚úÖ Generated 3 decision-makers (template-based)
‚úÖ Analyze node passed
   Fit score: 74/100
   Priority: high
   Decision-makers: 3

============================================================
‚úÖ Smoke test completed successfully!
   Nodes tested: 3/6 (goal, planning, analyze)
   Errors: 0
============================================================
(.venv) micahshull@Micahs-iMac LG_Cursor_012 %

# Outreach Plan Node

In [None]:
"""Outreach plan node - Generate personalized outreach plan"""

import logging
from typing import Dict, Any, List
from config import SalesOrchestratorState

logger = logging.getLogger(__name__)


def _select_target_contact(decision_makers: List[Dict[str, Any]]) -> Dict[str, Any]:
    """Select primary target contact from decision-makers

    MVP: Select first decision-maker (prefer VP-level)
    """
    if not decision_makers:
        # Return default if no decision-makers
        return {
            "name": "PLACEHOLDER - Contact",
            "title": "VP of Sales",
            "role": "Decision Maker",
            "contact_preference": "linkedin"
        }

    # Prefer VP-level contacts
    for dm in decision_makers:
        if "VP" in dm.get("title", ""):
            return dm

    # Otherwise return first
    return decision_makers[0]


def _determine_channel(contact: Dict[str, Any]) -> str:
    """Determine best channel based on contact preference"""
    preference = contact.get("contact_preference", "linkedin")
    return preference if preference in ["email", "linkedin"] else "linkedin"


def _generate_timing_recommendation() -> str:
    """Generate timing recommendation (dummy for MVP)"""
    # MVP: Fixed recommendation
    return "Tuesday 10am EST"


def _generate_value_proposition(company_name: str, product_service: str) -> str:
    """Generate value proposition (template-based for MVP)"""
    product = product_service or "our solution"
    return f"Help {company_name} improve operational efficiency and customer experience with {product}"


def _generate_message_drafts(company_name: str, contact: Dict[str, Any],
                            value_proposition: str, product_service: str) -> Dict[str, str]:
    """Generate message drafts using templates (MVP: dummy data)"""

    contact_name = contact.get("name", "there").replace("PLACEHOLDER - ", "")
    product = product_service or "our AI-driven sales analytics platform"

    # Initial message template
    initial_message = f"""Hi {contact_name},

I noticed {company_name} is in the {contact.get('title', 'leadership')} space and thought you might be interested in how {product} can help companies like yours improve operational efficiency.

{value_proposition}

Would you be open to a brief 15-minute conversation to explore how this could benefit {company_name}?

Best regards"""

    # Follow-up 1 (Day 3)
    follow_up_1 = f"""Hi {contact_name},

I wanted to follow up on my previous message about {product} for {company_name}.

I believe this could be particularly valuable for addressing operational challenges and improving customer experience.

Would you be available for a quick call this week?

Best regards"""

    # Follow-up 2 (Day 7)
    follow_up_2 = f"""Hi {contact_name},

Last attempt to connect - I wanted to share that {product} has helped similar companies in your industry achieve significant improvements.

If this isn't the right time, no worries. I'll touch base again in a few months.

Best regards"""

    return {
        "initial": initial_message,
        "follow_up_1": follow_up_1,
        "follow_up_2": follow_up_2
    }


def _generate_personalization_elements(company_name: str, company_profile: Dict[str, Any],
                                       pain_points: List[str], buying_signals: List[Dict[str, Any]]) -> List[str]:
    """Generate personalization elements (dummy for MVP)"""
    elements = []

    # Company name
    elements.append(f"Company: {company_name}")

    # Industry (if available)
    if company_profile.get("industry"):
        elements.append(f"Industry: {company_profile['industry']}")

    # Pain points (if available)
    if pain_points:
        elements.append(f"Pain points: {len(pain_points)} identified")

    # Buying signals (if available)
    if buying_signals:
        elements.append(f"Buying signals: {len(buying_signals)} identified")

    # Note: MVP uses dummy/template data
    if not elements:
        elements.append("Template-based personalization (MVP)")

    return elements


def outreach_plan_node(state: SalesOrchestratorState) -> SalesOrchestratorState:
    """Generate personalized outreach plan with message drafts

    Creates outreach strategy using template-based approach with dummy data for MVP.
    In Phase 2, this will be replaced with LLM-powered personalization.
    """
    logger.info("üìß Generating personalized outreach plan...")

    try:
        company_name = state.get("company_name", "Company")
        company_profile = state.get("company_profile", {})
        pain_points = state.get("pain_points", [])
        buying_signals = state.get("buying_signals", [])
        decision_makers = state.get("decision_makers", [])
        product_service = state.get("product_service", "AI-driven sales analytics platform")
        goal = state.get("goal", {})

        # Select target contact
        target_contact = _select_target_contact(decision_makers)

        # Determine channel
        channel = _determine_channel(target_contact)

        # Generate timing recommendation
        timing = _generate_timing_recommendation()

        # Generate value proposition
        value_proposition = _generate_value_proposition(company_name, product_service)

        # Generate message drafts
        message_drafts = _generate_message_drafts(
            company_name, target_contact, value_proposition, product_service
        )

        # Generate personalization elements
        personalization_elements = _generate_personalization_elements(
            company_name, company_profile, pain_points, buying_signals
        )

        # Build outreach plan
        outreach_plan: Dict[str, Any] = {
            "target_contact": target_contact,
            "channel": channel,
            "timing": timing,
            "value_proposition": value_proposition,
            "message_drafts": message_drafts,
            "personalization_elements": personalization_elements
        }

        state["outreach_plan"] = outreach_plan

        logger.info(f"‚úÖ Outreach plan generated")
        logger.info(f"   Target: {target_contact.get('name', 'Unknown')}")
        logger.info(f"   Channel: {channel}")
        logger.info(f"   Messages: {len(message_drafts)} drafts")

    except Exception as e:
        logger.error(f"Error in outreach_plan_node: {e}")
        if "errors" not in state:
            state["errors"] = []
        state["errors"].append(f"Outreach planning failed: {str(e)}")
        # Set defaults on error
        state["outreach_plan"] = {
            "target_contact": {},
            "channel": "email",
            "timing": "TBD",
            "value_proposition": "Not generated",
            "message_drafts": {},
            "personalization_elements": []
        }

    return state



# Test Results

In [None]:
(.venv) micahshull@Micahs-iMac LG_Cursor_012 % python tests/test_mvp_runner.py
============================================================
üß™ Smoke Test: B2B Sales Orchestrator Agent
============================================================

üì• Initial State:
  Company: Target
  Website: https://target.com
  Product: AI-driven sales analytics platform

------------------------------------------------------------
Testing goal node...
------------------------------------------------------------
INFO: üéØ Defining research and outreach planning goal...
INFO: ‚úÖ Goal defined for Target
‚úÖ Goal node passed
   Goal objective: Research Target and create personalized outreach plan

------------------------------------------------------------
Testing planning node...
------------------------------------------------------------
INFO: üìã Creating execution plan...
INFO: ‚úÖ Execution plan created
‚úÖ Planning node passed
   Plan has 6 steps

------------------------------------------------------------
Testing analyze node...
------------------------------------------------------------
INFO: üîç Analyzing company fit and generating decision-makers...
INFO: ‚úÖ Fit analysis complete: Score 74/100, Priority: high
INFO: ‚úÖ Generated 3 decision-makers (template-based)
‚úÖ Analyze node passed
   Fit score: 74/100
   Priority: high
   Decision-makers: 3

------------------------------------------------------------
Testing outreach plan node...
------------------------------------------------------------
INFO: üìß Generating personalized outreach plan...
INFO: ‚úÖ Outreach plan generated
INFO:    Target: PLACEHOLDER - Sales 1
INFO:    Channel: linkedin
INFO:    Messages: 3 drafts
‚úÖ Outreach plan node passed
   Target: PLACEHOLDER - Sales 1
   Channel: linkedin
   Messages: 3 drafts

============================================================
‚úÖ Smoke test completed successfully!
   Nodes tested: 4/6 (goal, planning, analyze, outreach_plan)
   Errors: 0
============================================================
(.venv) micahshull@Micahs-iMac LG_Cursor_012 %

# Report Node

In [None]:
"""Report node - Generate lead research report and outreach plan"""

import logging
from datetime import datetime
from pathlib import Path
from typing import Dict, Any
from jinja2 import Environment, FileSystemLoader
from config import SalesOrchestratorState, SalesOrchestratorConfig

logger = logging.getLogger(__name__)


def _get_template_environment() -> Environment:
    """Get Jinja2 template environment with FileSystemLoader

    Uses absolute path to templates directory as per development guide.
    """
    # Get absolute path to templates directory
    project_root = Path(__file__).parent.parent
    templates_dir = project_root / "templates"

    return Environment(
        loader=FileSystemLoader(str(templates_dir)),
        trim_blocks=True,
        lstrip_blocks=True
    )


def _generate_timestamp() -> str:
    """Generate formatted timestamp for reports"""
    return datetime.now().strftime("%Y-%m-%d %H:%M:%S")


def _generate_filename(company_name: str, report_type: str) -> str:
    """Generate filename for report

    Args:
        company_name: Company name
        report_type: 'research' or 'outreach'

    Returns:
        Filename string (e.g., 'lead_research_Target_20250101_120000.md')
    """
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    safe_name = company_name.replace(" ", "_").replace("/", "_")

    if report_type == "research":
        return f"lead_research_{safe_name}_{timestamp}.md"
    else:
        return f"outreach_plan_{safe_name}_{timestamp}.md"


def _save_report(content: str, filename: str, config: SalesOrchestratorConfig) -> Path:
    """Save report to file

    Returns:
        Path to saved file
    """
    project_root = Path(__file__).parent.parent
    reports_dir = project_root / config.sales_reports_dir

    # Create directory if it doesn't exist
    reports_dir.mkdir(exist_ok=True)

    file_path = reports_dir / filename

    with open(file_path, "w", encoding="utf-8") as f:
        f.write(content)

    return file_path


def report_node(state: SalesOrchestratorState) -> SalesOrchestratorState:
    """Generate markdown reports (research report + outreach plan)

    Renders Jinja2 templates and saves reports to sales_reports/ directory.
    """
    logger.info("üìÑ Generating lead research report and outreach plan...")

    try:
        # Get config
        config = SalesOrchestratorConfig()

        # Get data from state
        company_name = state.get("company_name", "Company")
        company_website = state.get("company_website", "")
        company_profile = state.get("company_profile", {})
        pain_points = state.get("pain_points", [])
        buying_signals = state.get("buying_signals", [])
        fit_assessment = state.get("fit_assessment", {})
        decision_makers = state.get("decision_makers", [])
        outreach_plan = state.get("outreach_plan", {})
        research_sources = state.get("research_sources", [])

        # Generate timestamp
        timestamp = _generate_timestamp()

        # Get template environment
        env = _get_template_environment()

        # Render research report
        logger.info("Rendering research report template...")
        research_template = env.get_template("lead_research_report.md.j2")
        research_report = research_template.render(
            company_name=company_name,
            company_website=company_website,
            timestamp=timestamp,
            company_profile=company_profile,
            decision_makers=decision_makers,
            pain_points=pain_points,
            buying_signals=buying_signals,
            fit_assessment=fit_assessment,
            research_sources=research_sources
        )

        # Render outreach plan
        logger.info("Rendering outreach plan template...")
        outreach_template = env.get_template("outreach_plan.md.j2")
        outreach_plan_markdown = outreach_template.render(
            company_name=company_name,
            timestamp=timestamp,
            outreach_plan=outreach_plan
        )

        # Save reports to files
        research_filename = _generate_filename(company_name, "research")
        outreach_filename = _generate_filename(company_name, "outreach")

        logger.info(f"Saving research report to {research_filename}...")
        research_path = _save_report(research_report, research_filename, config)

        logger.info(f"Saving outreach plan to {outreach_filename}...")
        outreach_path = _save_report(outreach_plan_markdown, outreach_filename, config)

        # Update state
        state["research_report"] = research_report
        state["outreach_plan_markdown"] = outreach_plan_markdown
        state["report_file_paths"] = {
            "research_report": str(research_path),
            "outreach_plan": str(outreach_path)
        }

        logger.info(f"‚úÖ Reports generated successfully")
        logger.info(f"   Research report: {research_path}")
        logger.info(f"   Outreach plan: {outreach_path}")

    except Exception as e:
        logger.error(f"Error in report_node: {e}")
        if "errors" not in state:
            state["errors"] = []
        state["errors"].append(f"Report generation failed: {str(e)}")
        # Set defaults on error
        state["research_report"] = ""
        state["outreach_plan_markdown"] = ""
        state["report_file_paths"] = {}
        raise  # Re-raise since template rendering is critical

    return state



# Test Results

In [None]:
(.venv) micahshull@Micahs-iMac LG_Cursor_012 % python tests/test_mvp_runner.py
============================================================
üß™ Smoke Test: B2B Sales Orchestrator Agent
============================================================

üì• Initial State:
  Company: Target
  Website: https://target.com
  Product: AI-driven sales analytics platform

------------------------------------------------------------
Testing goal node...
------------------------------------------------------------
INFO: üéØ Defining research and outreach planning goal...
INFO: ‚úÖ Goal defined for Target
‚úÖ Goal node passed
   Goal objective: Research Target and create personalized outreach plan

------------------------------------------------------------
Testing planning node...
------------------------------------------------------------
INFO: üìã Creating execution plan...
INFO: ‚úÖ Execution plan created
‚úÖ Planning node passed
   Plan has 6 steps

------------------------------------------------------------
Testing analyze node...
------------------------------------------------------------
INFO: üîç Analyzing company fit and generating decision-makers...
INFO: ‚úÖ Fit analysis complete: Score 74/100, Priority: high
INFO: ‚úÖ Generated 3 decision-makers (template-based)
‚úÖ Analyze node passed
   Fit score: 74/100
   Priority: high
   Decision-makers: 3

------------------------------------------------------------
Testing outreach plan node...
------------------------------------------------------------
INFO: üìß Generating personalized outreach plan...
INFO: ‚úÖ Outreach plan generated
INFO:    Target: PLACEHOLDER - Sales 1
INFO:    Channel: linkedin
INFO:    Messages: 3 drafts
‚úÖ Outreach plan node passed
   Target: PLACEHOLDER - Sales 1
   Channel: linkedin
   Messages: 3 drafts

------------------------------------------------------------
Testing report node...
------------------------------------------------------------
INFO: üìÑ Generating lead research report and outreach plan...
INFO: Rendering research report template...
INFO: Rendering outreach plan template...
INFO: Saving research report to lead_research_Target_20251103_165204.md...
INFO: Saving outreach plan to outreach_plan_Target_20251103_165204.md...
INFO: ‚úÖ Reports generated successfully
INFO:    Research report: /Users/micahshull/Documents/AI_LangGraph/LG_Cursor_012/sales_reports/lead_research_Target_20251103_165204.md
INFO:    Outreach plan: /Users/micahshull/Documents/AI_LangGraph/LG_Cursor_012/sales_reports/outreach_plan_Target_20251103_165204.md
‚úÖ Report node passed
   Research report: 1546 chars
   Outreach plan: 1780 chars
   Files saved: 2

============================================================
‚úÖ Smoke test completed successfully!
   Nodes tested: 5/6 (goal, planning, analyze, outreach_plan, report)
   Errors: 0
============================================================
(.venv) micahshull@Micahs-iMac LG_Cursor_012 %

# Web Search

In [None]:
"""Web search utilities using Tavily API"""

import logging
from typing import List, Dict, Any
from tavily import TavilyClient
from config import SalesOrchestratorConfig

logger = logging.getLogger(__name__)


def get_tavily_client(config: SalesOrchestratorConfig = None) -> TavilyClient:
    """Get Tavily client instance"""
    if config is None:
        config = SalesOrchestratorConfig()

    if not config.tavily_api_key:
        raise ValueError("TAVILY_API_KEY not found in environment. Please add it to API_KEYS.env")

    return TavilyClient(api_key=config.tavily_api_key)


def search_company(company_name: str, company_website: str = None,
                  max_results: int = 5) -> List[Dict[str, Any]]:
    """Search for company information using Tavily

    Args:
        company_name: Company name to search for
        company_website: Optional website URL to help with search
        max_results: Maximum number of results per query

    Returns:
        List of search results (dictionaries with title, url, content, etc.)
    """
    try:
        config = SalesOrchestratorConfig()
        client = get_tavily_client(config)

        # Build search query
        query = company_name
        if company_website:
            query += f" {company_website}"

        logger.info(f"Searching Tavily for: {query}")

        # Perform search
        response = client.search(
            query=query,
            max_results=max_results,
            search_depth="advanced"
        )

        results = response.get("results", [])
        logger.info(f"Found {len(results)} results from Tavily")

        return results

    except Exception as e:
        logger.error(f"Tavily search failed: {e}")
        return []


def search_multiple_queries(company_name: str, queries: List[str],
                           max_results_per_query: int = 3) -> Dict[str, List[Dict[str, Any]]]:
    """Execute multiple search queries and return categorized results

    Args:
        company_name: Company name
        queries: List of search query strings
        max_results_per_query: Maximum results per query

    Returns:
        Dictionary mapping query category to list of results
    """
    config = SalesOrchestratorConfig()
    client = get_tavily_client(config)

    all_results = {}

    for query in queries:
        try:
            # Add company name to each query
            full_query = f"{company_name} {query}"

            logger.info(f"Searching: {full_query}")

            response = client.search(
                query=full_query,
                max_results=max_results_per_query,
                search_depth="advanced"
            )

            results = response.get("results", [])
            all_results[query] = results
            logger.info(f"  Found {len(results)} results for '{query}'")

        except Exception as e:
            logger.warning(f"Query '{query}' failed: {e}")
            all_results[query] = []

    return all_results



# Research Node

In [None]:
"""Research node - Collect company data using web search"""

import logging
import json
from typing import Dict, Any, List
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from config import SalesOrchestratorState, SalesOrchestratorConfig
from utils.web_search import search_multiple_queries

logger = logging.getLogger(__name__)


def _get_research_queries() -> List[str]:
    """Get list of research queries to execute"""
    return [
        "company overview industry size revenue",
        "recent news challenges problems",
        "funding hiring expansion growth",
        "technology stack software tools"
    ]


def _extract_structured_data_with_llm(search_results: Dict[str, List[Dict[str, Any]]],
                                     company_name: str, config: SalesOrchestratorConfig) -> tuple[Dict[str, Any], List[str]]:
    """Use LLM to extract structured data from search results

    Returns:
        Dictionary with company_profile, pain_points, buying_signals
    """
    try:
        # Combine search results into text
        combined_text = ""
        sources = []

        for query_category, results in search_results.items():
            combined_text += f"\n\n=== {query_category} ===\n"
            for result in results:
                title = result.get("title", "")
                url = result.get("url", "")
                content = result.get("content", "")
                combined_text += f"Title: {title}\nURL: {url}\nContent: {content}\n\n"
                sources.append(url)

        # Create LLM prompt for extraction
        llm = ChatOpenAI(
            model=config.llm_model,
            temperature=config.temperature
        )

        prompt = ChatPromptTemplate.from_messages([
            ("system", """You are an expert business analyst. Extract structured information about a company from search results.
Return ONLY valid JSON, no prose, no markdown, no explanations. Just the JSON object."""),
            ("user", """Extract structured information about {company_name} from the following search results:

{search_results_text}

Return a JSON object with this structure:
{{
    "company_profile": {{
        "industry": "string or null",
        "size": "string (e.g., '500-1000 employees') or null",
        "revenue_estimate": "string (e.g., '$1B-$10B') or null",
        "growth_stage": "string (e.g., 'Established', 'Growth', 'Startup') or null",
        "technology_alignment": "string or null"
    }},
    "pain_points": ["list of pain points or challenges mentioned"],
    "buying_signals": [
        {{
            "type": "funding|hiring|expansion|technology",
            "description": "string",
            "date": "string or null"
        }}
    ]
}}

Return ONLY valid JSON now:""")
        ])

        messages = prompt.format_messages(
            company_name=company_name,
            search_results_text=combined_text[:8000]  # Limit to avoid token limits
        )

        logger.info("Calling LLM to extract structured data...")
        response = llm.invoke(messages)
        response_text = response.content.strip()

        # Parse JSON response
        # Try to extract JSON if wrapped in markdown code blocks
        if "```json" in response_text:
            response_text = response_text.split("```json")[1].split("```")[0].strip()
        elif "```" in response_text:
            response_text = response_text.split("```")[1].split("```")[0].strip()

        extracted_data = json.loads(response_text)

        logger.info("‚úÖ Successfully extracted structured data from search results")
        return extracted_data, sources

    except json.JSONDecodeError as e:
        logger.error(f"Failed to parse LLM JSON response: {e}")
        logger.error(f"Response was: {response_text[:500]}")
        # Return defaults
        return {
            "company_profile": {},
            "pain_points": [],
            "buying_signals": []
        }, sources
    except Exception as e:
        logger.error(f"LLM extraction failed: {e}")
        # Return defaults
        return {
            "company_profile": {},
            "pain_points": [],
            "buying_signals": []
        }, sources


def research_node(state: SalesOrchestratorState) -> SalesOrchestratorState:
    """Research company using Tavily web search and extract structured insights

    Performs multiple targeted searches and uses LLM to extract structured data.
    """
    logger.info("üîç Researching company using web search...")

    try:
        company_name = state.get("company_name", "")
        company_website = state.get("company_website")
        goal = state.get("goal", {})
        config = SalesOrchestratorConfig()

        if not company_name:
            raise ValueError("company_name is required for research")

        # Get research queries
        queries = _get_research_queries()

        # Perform multiple searches
        logger.info(f"Executing {len(queries)} search queries for {company_name}...")
        search_results = search_multiple_queries(
            company_name=company_name,
            queries=queries,
            max_results_per_query=3
        )

        # Store raw research data
        state["company_research"] = search_results

        # Extract structured data using LLM
        extracted_data, sources = _extract_structured_data_with_llm(
            search_results, company_name, config
        )

        # Update state with extracted data
        state["company_profile"] = extracted_data.get("company_profile", {})
        state["pain_points"] = extracted_data.get("pain_points", [])
        state["buying_signals"] = extracted_data.get("buying_signals", [])
        state["research_sources"] = list(set(sources))  # Deduplicate sources

        logger.info(f"‚úÖ Research complete")
        logger.info(f"   Company profile: {len(state['company_profile'])} fields")
        logger.info(f"   Pain points: {len(state['pain_points'])} found")
        logger.info(f"   Buying signals: {len(state['buying_signals'])} found")
        logger.info(f"   Sources: {len(state['research_sources'])} unique")

    except ValueError as e:
        logger.error(f"Validation error in research_node: {e}")
        if "errors" not in state:
            state["errors"] = []
        state["errors"].append(f"Research validation failed: {str(e)}")
        # Set defaults
        state["company_research"] = {}
        state["company_profile"] = {}
        state["pain_points"] = []
        state["buying_signals"] = []
        state["research_sources"] = []
    except Exception as e:
        logger.error(f"Error in research_node: {e}")
        if "errors" not in state:
            state["errors"] = []
        state["errors"].append(f"Research failed: {str(e)}")
        # Set defaults on error
        state["company_research"] = {}
        state["company_profile"] = {}
        state["pain_points"] = []
        state["buying_signals"] = []
        state["research_sources"] = []

    return state



# Test Results

In [None]:
(.venv) micahshull@Micahs-iMac LG_Cursor_012 % python tests/test_mvp_runner.py
============================================================
üß™ Smoke Test: B2B Sales Orchestrator Agent
============================================================

üì• Initial State:
  Company: Target
  Website: https://target.com
  Product: AI-driven sales analytics platform

------------------------------------------------------------
Testing goal node...
------------------------------------------------------------
INFO: üéØ Defining research and outreach planning goal...
INFO: ‚úÖ Goal defined for Target
‚úÖ Goal node passed
   Goal objective: Research Target and create personalized outreach plan

------------------------------------------------------------
Testing planning node...
------------------------------------------------------------
INFO: üìã Creating execution plan...
INFO: ‚úÖ Execution plan created
‚úÖ Planning node passed
   Plan has 6 steps

------------------------------------------------------------
Testing research node...
------------------------------------------------------------
‚ö†Ô∏è  Note: This requires TAVILY_API_KEY in API_KEYS.env
‚ö†Ô∏è  If API key is missing, node will fail gracefully
INFO: üîç Researching company using web search...
INFO: Executing 4 search queries for Target...
INFO: Searching: Target company overview industry size revenue
INFO:   Found 3 results for 'company overview industry size revenue'
INFO: Searching: Target recent news challenges problems
INFO:   Found 3 results for 'recent news challenges problems'
INFO: Searching: Target funding hiring expansion growth
INFO:   Found 3 results for 'funding hiring expansion growth'
INFO: Searching: Target technology stack software tools
INFO:   Found 3 results for 'technology stack software tools'
INFO: Calling LLM to extract structured data...
INFO: HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO: ‚úÖ Successfully extracted structured data from search results
INFO: ‚úÖ Research complete
INFO:    Company profile: 5 fields
INFO:    Pain points: 9 found
INFO:    Buying signals: 0 found
INFO:    Sources: 12 unique
‚úÖ Research node passed
   Company profile: 5 fields
   Pain points: 9 found
   Buying signals: 0 found

------------------------------------------------------------
Testing analyze node...
------------------------------------------------------------
INFO: üîç Analyzing company fit and generating decision-makers...
INFO: ‚úÖ Fit analysis complete: Score 74/100, Priority: high
INFO: ‚úÖ Generated 3 decision-makers (template-based)
‚úÖ Analyze node passed
   Fit score: 74/100
   Priority: high
   Decision-makers: 3

------------------------------------------------------------
Testing outreach plan node...
------------------------------------------------------------
INFO: üìß Generating personalized outreach plan...
INFO: ‚úÖ Outreach plan generated
INFO:    Target: PLACEHOLDER - Sales 1
INFO:    Channel: linkedin
INFO:    Messages: 3 drafts
‚úÖ Outreach plan node passed
   Target: PLACEHOLDER - Sales 1
   Channel: linkedin
   Messages: 3 drafts

------------------------------------------------------------
Testing report node...
------------------------------------------------------------
INFO: üìÑ Generating lead research report and outreach plan...
INFO: Rendering research report template...
INFO: Rendering outreach plan template...
INFO: Saving research report to lead_research_Target_20251103_170431.md...
INFO: Saving outreach plan to outreach_plan_Target_20251103_170431.md...
INFO: ‚úÖ Reports generated successfully
INFO:    Research report: /Users/micahshull/Documents/AI_LangGraph/LG_Cursor_012/sales_reports/lead_research_Target_20251103_170431.md
INFO:    Outreach plan: /Users/micahshull/Documents/AI_LangGraph/LG_Cursor_012/sales_reports/outreach_plan_Target_20251103_170431.md
‚úÖ Report node passed
   Research report: 2959 chars
   Outreach plan: 1801 chars
   Files saved: 2

============================================================
‚úÖ Smoke test completed successfully!
   Nodes tested: 6/6 (goal, planning, research, analyze, outreach_plan, report)
   Errors: 0
============================================================
(.venv) micahshull@Micahs-iMac LG_Cursor_012 %

All 6 nodes passed. The research node used real Tavily data and found:
- 9 pain points
- 5 company profile fields
- 12 unique sources
- Reports generated with real data (2959 and 1801 chars)

## MVP complete

All nodes are working end-to-end with real data.

