<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/142_B2B_Sales_Agent_Claude_Langchain_00_Research_Agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



### 🏗️ **Complete LangChain Implementation**

**1. Data Models (`langchain_models.py`)**
- Pydantic models for type safety and validation
- Enums for structured data
- Built-in serialization/deserialization

**2. Research Agent (`langchain_research_agent.py`)**
- LangChain tools for company research
- Structured output with Pydantic models
- Mock data integration

**3. Analysis Agent (`langchain_analysis_agent.py`)**
- LangChain LLM chains for analysis
- Prompt templates for structured analysis
- Mock and real LLM support

**4. Personalization Agent (`langchain_personalization_agent.py`)**
- LangChain templates for message generation
- Chain composition for complex workflows
- Multiple personalization strategies

**5. Orchestrator (`langchain_orchestrator.py`)**
- LangChain workflow management
- Agent coordination and data flow
- Error handling and retry logic

### 📊 **Key Differences Discovered**

**LangChain Advantages:**
- **Built-in LLM integration** - Ready for real AI models
- **Template management** - Structured prompt handling
- **Tool integration** - Easy to add external APIs
- **Chain composition** - Complex workflows made simple
- **Data validation** - Pydantic models with automatic validation
- **Error handling** - Built-in retry and error management

**Pure Python Advantages:**
- **Full control** - Every line of code is yours
- **Simple deployment** - No external dependencies
- **Fast execution** - No framework overhead
- **Easy debugging** - Clear, linear code flow
- **Learning value** - Understand every component

### 🎯 **Real-World Insights**

**When to Use Pure Python:**
- Learning and understanding fundamentals
- Simple, single-purpose agents
- Performance-critical applications
- Full control requirements
- Minimal dependency projects

**When to Use LangChain:**
- Production systems with LLM integration
- Complex multi-agent workflows
- Team collaboration projects
- Rapid prototyping with AI
- Enterprise features needed

### 🚀 **What This Means for Your Career**

**You Now Have:**
1. **Deep understanding** of both orchestration approaches
2. **Production-ready patterns** for AI agent systems
3. **Framework expertise** in LangChain
4. **Software engineering skills** in Python
5. **Real-world implementation** experience

**This Directly Applies to Your HBR Research:**
- **Sales Automation** - Both approaches can scale personalized outreach
- **Consulting Obelisk** - Orchestration replaces junior analyst work
- **B2B Omnichannel** - Multi-agent workflows for customer journeys

### 💡 **Next Steps for Your Learning**

**1. Production Enhancements:**
- Integrate real APIs (web scraping, CRM systems)
- Add LLM integration for natural language generation
- Implement A/B testing for message optimization

**2. Advanced Features:**
- Human-in-the-loop approval workflows
- Multi-tenant scaling
- Real-time monitoring and analytics

**3. Career Positioning:**
- You can now position yourself as someone who understands **both** approaches
- You have the **orchestration expertise** that HBR identified as high-value
- You can help organizations choose the right approach for their needs

### 🏆 **The Big Picture**

This comparison demonstrates exactly what the HBR articles identified as the **core superpower** of AI agents: **orchestration**. You now understand:

- **How to design** multi-agent workflows
- **How to manage** state and data flow
- **How to handle** errors and retries
- **How to scale** from simple to complex systems
- **How to choose** the right tools for the job



In [None]:
"""
LangChain Research Agent - Finds company information using LangChain tools

This agent demonstrates:
- LangChain tools and tool calling
- Structured output with Pydantic models
- Error handling and validation
- Mock data integration
"""

import logging
from typing import Optional, Dict, Any
from langchain.tools import BaseTool
from langchain_core.pydantic_v1 import BaseModel as PydanticV1BaseModel
from langchain_models import CompanyInfo, CompanySize

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class CompanyResearchInput(PydanticV1BaseModel):
    """Input schema for company research tool"""
    company_name: str

class CompanyResearchTool(BaseTool):
    """LangChain tool for researching company information"""

    name: str = "company_research"
    description: str = "Research company information including industry, size, location, and key contacts"
    args_schema: type[PydanticV1BaseModel] = CompanyResearchInput

    def __init__(self):
        super().__init__()
        # Mock data for demonstration
        self._mock_companies = {
            "acme_corporation": CompanyInfo(
                name="Acme Corporation",
                industry="Manufacturing",
                size=CompanySize.MID_MARKET,
                location="Chicago, IL",
                website="https://acmecorp.com",
                description="Leading manufacturer of industrial equipment with 500+ employees",
                recent_news=[
                    "Acme Corp announces expansion into European markets",
                    "New sustainability initiative launched",
                    "Partnership with major automotive supplier"
                ],
                key_contacts=[
                    {"name": "Sarah Johnson", "title": "CEO", "email": "sarah@acmecorp.com"},
                    {"name": "Mike Chen", "title": "VP Operations", "email": "mike@acmecorp.com"}
                ]
            ),
            "techstartup_inc": CompanyInfo(
                name="TechStartup Inc",
                industry="SaaS",
                size=CompanySize.STARTUP,
                location="San Francisco, CA",
                website="https://techstartup.com",
                description="Fast-growing SaaS platform for project management",
                recent_news=[
                    "Series A funding round of $10M completed",
                    "New AI features launched",
                    "Team expansion to 50 employees"
                ],
                key_contacts=[
                    {"name": "Alex Rodriguez", "title": "Founder & CEO", "email": "alex@techstartup.com"},
                    {"name": "Lisa Wang", "title": "CTO", "email": "lisa@techstartup.com"}
                ]
            )
        }

    def _run(self, company_name: str) -> str:
        """Execute the company research tool"""
        try:
            logger.info(f"Researching company: {company_name}")

            # Input validation
            if not company_name or not isinstance(company_name, str):
                raise ValueError("Company name must be a non-empty string")

            # Look up company in mock data
            company_key = company_name.lower().replace(" ", "_").replace(".", "")

            if company_key in self._mock_companies:
                company_info = self._mock_companies[company_key]
                logger.info(f"Successfully found information for {company_name}")

                # Return structured data as JSON string
                return company_info.model_dump_json()
            else:
                logger.warning(f"No information found for company: {company_name}")
                return "null"

        except Exception as e:
            logger.error(f"Error researching company {company_name}: {str(e)}")
            raise

    async def _arun(self, company_name: str) -> str:
        """Async version of the tool"""
        return self._run(company_name)

class LangChainResearchAgent:
    """
    LangChain Research Agent that uses tools to find company information

    This demonstrates:
    - LangChain tool integration
    - Structured output with Pydantic models
    - Error handling and logging
    - Mock data for demonstration
    """

    def __init__(self, agent_id: str = "langchain_research_agent"):
        self.agent_id = agent_id
        self.logger = logging.getLogger(f"{__name__}.{agent_id}")

        # Initialize the research tool
        self.research_tool = CompanyResearchTool()

        self.logger.info(f"LangChain Research Agent initialized: {agent_id}")

    def research_company(self, company_name: str) -> Optional[CompanyInfo]:
        """
        Research a company using LangChain tools

        Args:
            company_name: Name of the company to research

        Returns:
            CompanyInfo object with company details, or None if not found
        """
        try:
            self.logger.info(f"Starting LangChain research for: {company_name}")

            # Use the LangChain tool
            result = self.research_tool._run(company_name)

            if result == "null":
                self.logger.warning(f"No information found for {company_name}")
                return None

            # Parse the JSON result back to CompanyInfo
            company_info = CompanyInfo.model_validate_json(result)

            self.logger.info(f"LangChain research completed for {company_name}")
            return company_info

        except Exception as e:
            self.logger.error(f"LangChain research failed for {company_name}: {str(e)}")
            raise

    def get_status(self) -> Dict[str, str]:
        """Return agent status for monitoring"""
        return {
            "agent_id": self.agent_id,
            "status": "ready",
            "framework": "langchain",
            "available_companies": list(self.research_tool._mock_companies.keys()),
            "tool_name": self.research_tool.name
        }

# Example usage and testing
if __name__ == "__main__":
    print("=== LangChain Research Agent Demo ===\n")

    # Create agent
    research_agent = LangChainResearchAgent()

    # Test the agent
    companies_to_test = ["Acme Corporation", "TechStartup Inc", "Unknown Company"]

    for company in companies_to_test:
        try:
            company_info = research_agent.research_company(company)
            if company_info:
                print(f"✅ Found: {company_info.name}")
                print(f"   Industry: {company_info.industry}")
                print(f"   Size: {company_info.size}")
                print(f"   Contacts: {len(company_info.key_contacts)}")
                print(f"   Recent News: {company_info.recent_news[0] if company_info.recent_news else 'None'}")
            else:
                print(f"❌ Not found: {company}")
        except Exception as e:
            print(f"❌ Error researching {company}: {str(e)}")

        print()

    # Show agent status
    status = research_agent.get_status()
    print("Agent Status:")
    for key, value in status.items():
        print(f"  {key}: {value}")


Let’s walk through your **LangChain Research Agent** step by step and highlight the **key concepts + what to learn from it**.

---

# 🟦 Purpose of the Research Agent

This agent’s job is to **fetch structured company information** (industry, size, location, website, contacts, recent news).
It’s designed as a **LangChain-style tool agent**, meaning it:

* Defines **tools** (callable functions with schemas).
* Returns **structured output** (Pydantic models).
* Has **error handling + logging**.
* Uses **mock data** for safe testing.

---

# 🟦 Core Components

### 1. **Schemas with Pydantic**

```python
class CompanyResearchInput(PydanticV1BaseModel):
    company_name: str
```

* Defines what inputs the tool accepts (`company_name`).
* Ensures **validation** (can’t pass junk types).
* This schema makes your agent predictable → contracts between orchestrator and agent.

---

### 2. **The Tool**

```python
class CompanyResearchTool(BaseTool):
    name = "company_research"
    description = "Research company info..."
    args_schema = CompanyResearchInput
```

* A **LangChain Tool** = a unit of action the agent can call.
* Has a name + description (so LLMs can reason about when to call it).
* Tied to an input schema (no ambiguous arguments).

Inside, it uses:

```python
_run(company_name: str) -> str
```

* Validates input.
* Looks up in **mock dataset** (`acme_corporation`, `techstartup_inc`).
* Returns **structured JSON string** (`company_info.model_dump_json()`).

⚡ Lesson: Tools abstract messy APIs → return clean, validated data.

---

### 3. **The Agent Wrapper**

```python
class LangChainResearchAgent:
    def research_company(self, company_name: str) -> Optional[CompanyInfo]:
        result = self.research_tool._run(company_name)
        if result == "null":
            return None
        return CompanyInfo.model_validate_json(result)
```

* Calls the tool.
* Handles errors + empty results.
* Converts JSON back to **CompanyInfo object**.

So the agent doesn’t just give raw text — it **guarantees structured, typed output**.

---

### 4. **Status / Observability**

```python
def get_status(self):
    return {
        "agent_id": self.agent_id,
        "status": "ready",
        "framework": "langchain",
        "available_companies": list(...),
        "tool_name": self.research_tool.name
    }
```

* Lets orchestrators check: *“Is this agent alive? What companies can it handle?”*
* **Health checks** → critical for orchestration.

---

# 🟦 What to Learn From This

1. **Schemas are contracts** → Pydantic ensures inputs/outputs are structured.
2. **Tools encapsulate logic** → clean wrappers around messy data sources.
3. **Mock-first development** → build/test agent flow *before* wiring real APIs.
4. **Logging + error handling** → makes the agent production-friendly.
5. **Structured returns, not free text** → downstream agents (analysis, personalization) can consume outputs reliably.
6. **Agent != LLM** → this one doesn’t call an LLM at all! It’s a *tool agent* that wraps structured logic.

---

# 🟦 Big Picture

This agent is your **pipeline entry point**:

* ResearchAgent → collects company info.
* AnalysisAgent → interprets info into pain points/opportunities.
* PersonalizationAgent → generates outreach messages.

It sets the stage by ensuring that **all later steps receive clean, validated company data**.



All those fields you see filled out inside `self._mock_companies` are just **mock data examples** for demonstration.

Here’s how to think about it:

---

## 🟦 Why Mock Data?

* **Purpose:** To let you **test the ResearchAgent end-to-end** without calling LinkedIn, Crunchbase, or a news API.
* **Benefit:** You can run the full Research → Analysis → Personalization pipeline and see structured outputs immediately.
* **Limitation:** You only get results for the hardcoded demo companies (`acme_corporation`, `techstartup_inc`).

---

## 🟦 Where Actual Data Would Come From

In production, you’d **swap out the mock dictionary** with real data sources, for example:

* **LinkedIn API / Scraper** → industry, size, location, employees, contacts.
* **Crunchbase API** → funding rounds, company metadata.
* **Clearbit / Apollo.io** → enrichment (website, contact emails).
* **News APIs (Google News, Bing, GDELT, etc.)** → latest articles about the company.
* **Internal CRM** → if you already have some records.

---

## 🟦 How You’d Replace the Mock

Instead of:

```python
self._mock_companies = { "acme_corporation": CompanyInfo(...), ... }
```

You’d do something like:

```python
def _run(self, company_name: str) -> str:
    # 1. Call LinkedIn API → get company profile
    linkedin_data = get_linkedin_profile(company_name)

    # 2. Call Crunchbase → get funding & size
    cb_data = get_crunchbase_info(company_name)

    # 3. Call News API → get recent articles
    news_data = get_recent_news(company_name)

    # 4. Combine into CompanyInfo object
    company_info = CompanyInfo(
        name=linkedin_data["name"],
        industry=linkedin_data["industry"],
        size=map_size(cb_data["employee_count"]),
        location=linkedin_data["hq_location"],
        website=linkedin_data["website"],
        description=linkedin_data["description"],
        recent_news=news_data,
        key_contacts=linkedin_data["contacts"]
    )

    return company_info.model_dump_json()
```

That way, the agent **still returns a clean `CompanyInfo` object**, but it’s powered by real APIs instead of mock values.

---

✅ **Key learning:**
The mock data is just a **training wheel**. The *structure* (schemas, tools, outputs) is what matters. Later, you “swap the engine” (replace mock lookups with API calls) but keep the exact same interface.




Great question — and you’re thinking like an **orchestrator designer** now 👌.

Yes, you have **two main design options**:

---

## 🟦 Option 1: ResearchAgent Does It All

* ResearchAgent **directly calls APIs** (LinkedIn, Crunchbase, News, CRM, etc.).
* It enriches everything and outputs a full `CompanyInfo`.
  ✅ Simpler, fewer moving parts.
  ❌ Gets big & harder to maintain — one agent knows too much.

---

## 🟦 Option 2: Data Gathering Agents + ResearchAgent (Modular)

* Create **specialist agents/tools**:

  * `LinkedInAgent` → profiles, contacts.
  * `CrunchbaseAgent` → funding, company size.
  * `NewsAgent` → latest company mentions.
  * `CRMEnrichmentAgent` → check your internal records.
* ResearchAgent acts more like a **coordinator**:

  * Calls the data gatherers.
  * Merges results into a `CompanyInfo`.

✅ Modular (easier to test and swap).
✅ You can run/enrich in parallel.
✅ ResearchAgent stays slim — it just orchestrates.
❌ Slightly more overhead in orchestration logic.

---

## 🟦 What This Looks Like

```mermaid
flowchart TD
    L[LinkedInAgent] --> R[ResearchAgent]
    C[CrunchbaseAgent] --> R
    N[NewsAgent] --> R
    CRM[CRMEnrichmentAgent] --> R
    R --> CI[CompanyInfo (structured data)]
```

---

## 🔑 Key Learning

* If you’re prototyping → **Option 1** (monolithic ResearchAgent with mock/real data).
* If you’re building for scale → **Option 2** (specialized data agents, ResearchAgent as integrator).

That’s the classic **“monolith vs. micro-agents”** trade-off.


