# CRM Lead Qualifier Agent

## Goal

To develop an AI agent that automatically enriches a new sales lead (identified by an email address) by gathering publicly available company information, checking for prior engagement in the internal CRM, and assigning a preliminary qualification score.

## Context

Sales representatives often spend valuable time manually researching leads and cross-referencing internal systems before a discovery call. This process is slow, inconsistent, and often leads to a poorly prepared first interaction.

## Agent Functionality

The agent must be able to:

1. Extract Domain: Take the email address and extract the company domain name (e.g., jane@acmecorp.com → acmecorp.com).

2. Enrich Company Data: Use the domain to look up (simulated) company details like industry, size, and annual revenue.

3. Check CRM History: Search the internal (simulated) CRM for any past contact or notes associated with the lead's email.

4. Calculate Lead Score: Synthesize all gathered data to assign a qualitative priority score (e.g., High, Medium, Low).

5. Final Summary: Present a concise, actionable summary of all findings to the sales representative.

# Technical Implementation

We will use the OpenAI client's Function Calling capability to define and execute the necessary business logic tools in a structured loop.

## What is function calling?

Function calling is the mechanism that bridges the gap between an AI model (which is just a text generator) and your actual code (which can perform actions).

Think of the Large Language Model (LLM) as a very smart receptionist. It understands what people want, but it doesn't have the keys to the file cabinet or the ability to make phone calls itself.

Function calling gives the receptionist a "menu" of services it can request from the back office (your code).

![](https://cdn.openai.com/API/docs/images/function-calling-diagram-steps.png)

### The Core Concept

* You provide the tools: You tell the model, "I have a function called get_weather(city) that takes a city name as an argument."

* The model "thinks": If a user asks, "What's the weather in Tokyo?", the model recognizes that your tool can solve this.

* The model outputs JSON (not text): Instead of replying to the user, the model pauses and gives you a structured request: {"function": "get_weather", "arguments": {"city": "Tokyo"}}.

* You execute: Your code sees this request, runs the actual Python function, and gets the result (e.g., "Sunny, 25°C").

* The model finishes: You feed that result back to the model, and it writes the final natural language answer: "It is currently sunny and 25 degrees in Tokyo."

### Why is this powerful?

Without function calling, LLMs are isolated text predictors—they can't "do" anything. With function calling, they become agents.

* **Structured Data Extraction**: Instead of hoping the model formats data correctly, you force it to output clean JSON that fits your database schema.

* **Connecting to the World**: It allows the AI to browse the web, query databases, send emails, or control software, merely by defining those actions as "functions."

In [None]:
import os
import json
from openai import OpenAI
from google.colab import userdata

In [None]:
# --- 1. Initialize OpenAI Client ---
try:
    client = OpenAI(api_key=userdata.get('OPENAI_APIKEY'))
except Exception as e:
    print(f"Error initializing OpenAI client: {e}")
    print("Please ensure your OPENAI_API_KEY is set in your environment variables.")

# Defining the "Tools" (Business Logic)

Here we define the actual Python functions that perform the work. In AI Agent terminology, these are the **Tools**.

For this demonstration, we are mocking the external systems (simulating database lookups with dictionaries), but in a production environment, these functions would:
1.  **`lookup_domain_info`**: Query an API like Clearbit or Crunchbase.
2.  **`check_crm_history`**: Query a Salesforce or HubSpot SQL database.
3.  **`calculate_lead_score`**: Run a custom algorithm or ML model to grade the lead.

The agent "calls" these tools by asking us to run these specific Python functions.

In [None]:

def lookup_domain_info(domain: str) -> str:
    """
    Looks up and returns mock company information based on its domain.
    In a real system, this would call an external data enrichment API (e.g., Clearbit).
    """
    print(f"-> TOOL ACTIVATED: Looking up domain info for {domain}...")

    # Mock database for demonstration
    mock_data = {
        "acmecorp.com": {"industry": "Software/SaaS", "size": "501-1000 employees", "revenue": "$50M - $100M"},
        "widgetco.net": {"industry": "Manufacturing", "size": "100-250 employees", "revenue": "$10M - $25M"},
        "globalfin.org": {"industry": "Financial Services", "size": "5000+ employees", "revenue": "$1B+"},
    }

    info = mock_data.get(domain, {"industry": "Unknown", "size": "N/A", "revenue": "N/A"})

    # Return the data as a JSON string for the AI model to parse easily
    return json.dumps(info)

In [None]:
def check_crm_history(email: str) -> str:
    """
    Checks the internal CRM system for past engagement history with the lead.
    In a real system, this would query a PostgreSQL database or a CRM API (e.g., Salesforce).
    """
    print(f"-> TOOL ACTIVATED: Checking CRM history for {email}...")

    # Mock database for demonstration
    mock_data = {
        "jane@acmecorp.com": {"last_contact": "2025-11-15", "status": "Cold Lead", "notes": "Attended webinar, no follow-up yet."},
        "bob@widgetco.net": {"last_contact": "2025-12-01", "status": "Active Opportunity", "notes": "Discussed Q1 budget and product integration."},
        "default": {"last_contact": "N/A", "status": "No Record", "notes": "New lead, first contact opportunity."},
    }

    history = mock_data.get(email, mock_data["default"])
    return json.dumps(history)


In [None]:
def calculate_lead_score(data_summary: str) -> str:
    """
    Analyzes the collected data (domain and CRM history) to assign a lead score (High/Medium/Low).
    This function simulates a complex scoring algorithm.
    """
    print("-> TOOL ACTIVATED: Calculating lead score...")

    data = json.loads(data_summary)
    score = "Low" # Default score

    # Simple scoring logic for demonstration
    if data["domain_info"].get("revenue", "").startswith("$1B+"):
        score = "High"
    elif data["crm_history"].get("status") == "Active Opportunity":
        score = "High"
    elif data["domain_info"].get("revenue", "").startswith("$50M"):
        score = "Medium"

    return json.dumps({"lead_score": score})

In [None]:
# Map of available function names to the actual Python functions
AVAILABLE_FUNCTIONS = {
    "lookup_domain_info": lookup_domain_info,
    "check_crm_history": check_crm_history,
    "calculate_lead_score": calculate_lead_score,
}

# Configuring the Agent's "Menu" (Tool Schema)

The Large Language Model (LLM) cannot see our Python code directly. We must describe our tools to it using a specific JSON format known as a **Schema**.

This schema tells the model:
* **What** the tool does (Description).
* **When** to use it (Context).
* **How** to use it (Parameters/Arguments).

We pass this list to the `tools` parameter in the API call later. It effectively gives the AI a "menu" of actions it can take.

In [None]:
# This is how the AI model learns about the tools it can use.
tools_schema = [
    {
        "type": "function",
        "function": {
            "name": "lookup_domain_info",
            "description": "Retrieves general business information (industry, size, revenue) about a company based on its domain name.",
            "parameters": {
                "type": "object",
                "properties": {
                    "domain": {"type": "string", "description": "The company's domain name, e.g., 'acmecorp.com'"},
                },
                "required": ["domain"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "check_crm_history",
            "description": "Checks the internal CRM system for past contact, status, and notes associated with a specific lead email.",
            "parameters": {
                "type": "object",
                "properties": {
                    "email": {"type": "string", "description": "The full email address of the lead."},
                },
                "required": ["email"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "calculate_lead_score",
            "description": "Calculates the priority score (High/Medium/Low) for a lead based on a summary of all collected domain and CRM history data.",
            "parameters": {
                "type": "object",
                "properties": {
                    "data_summary": {"type": "string", "description": "A JSON string containing the combined domain_info and crm_history."},
                },
                "required": ["data_summary"],
            },
        },
    },
]

# The Agent Loop (Think → Act → Observe)

This is the brain of the application. The `run_agent` function implements the core feedback loop required for an autonomous agent.

**How it works:**
1.  **State Management (`collected_data`)**: We initialize a dictionary *outside* the loop to act as the agent's "memory" across multiple steps.
2.  **Think**: We send the conversation history to the model (`client.chat.completions.create`).
3.  **Act**: If the model decides to call a tool, we execute the corresponding Python function.
4.  **Observe**: We append the tool's output to the conversation history as a new message.
5.  **Repeat**: The loop continues until the model decides it has enough information to answer the user directly.



In [None]:
def run_agent(user_prompt: str):
    """
    The main execution loop for the CRM Lead Qualifier Agent.
    """
    print(f"\n--- Running Lead Qualifier Agent ---")

    system_prompt = (
        "You are an expert CRM Lead Qualifier Agent. Your sole task is to analyze a sales lead "
        "provided via email address. You must follow these steps precisely: "
        "1. Identify the domain from the email. "
        "2. Call `lookup_domain_info` and `check_crm_history` sequentially to gather all data. "
        "3. Combine all collected data into a single JSON object. "
        "4. Call `calculate_lead_score` with the combined JSON object. "
        "5. Finally, synthesize all information (domain info, CRM history, and score) "
        "into a single, easy-to-read summary for a busy sales rep."
    )

    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt},
    ]

    collected_data = {}

    while True:
        print("\n[AI Thinking...]")
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools_schema,
            tool_choice="auto",
        )

        response_message = response.choices[0].message
        messages.append(response_message)

        if response_message.tool_calls:
            tool_calls = response_message.tool_calls

            for tool_call in tool_calls:
                function_name = tool_call.function.name
                function_to_call = AVAILABLE_FUNCTIONS.get(function_name)
                function_args = json.loads(tool_call.function.arguments)

                if not function_to_call:
                    print(f"Error: Unknown function {function_name}")
                    continue

                # Execute the function
                function_result = function_to_call(**function_args)

                # Update the persistent memory (collected_data)
                if function_name == "lookup_domain_info":
                    collected_data["domain_info"] = json.loads(function_result)
                elif function_name == "check_crm_history":
                    collected_data["crm_history"] = json.loads(function_result)
                elif function_name == "calculate_lead_score":
                    # Inject the accumulated data from previous turns
                    function_args = {"data_summary": json.dumps(collected_data)}
                    function_result = function_to_call(**function_args)

                # Append the tool result as a NEW message
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "name": function_name,
                    "content": function_result
                })

        else:
            print("\n--- FINAL AGENT SUMMARY ---")
            print(response_message.content)
            break

# Execution and Testing

Finally, we test our agent with two distinct scenarios to see how it dynamically adapts its behavior.

* **Scenario 1 (Jane @ AcmeCorp)**: Represents a high-value enterprise lead. The agent should detect the high revenue and assign a "High" score.
* **Scenario 2 (Bob @ WidgetCo)**: Represents a smaller company but with an active deal. The agent should detect the "Active Opportunity" status in the CRM and score it accordingly.

In [None]:
# Scenario 1: High-Value Lead (Large company, needs scoring)
lead_email_1 = "jane@acmecorp.com"
run_agent(f"Please qualify this lead for my call tomorrow: {lead_email_1}")

print("\n" + "="*80 + "\n")


--- Running Lead Qualifier Agent ---

[AI Thinking...]
-> TOOL ACTIVATED: Looking up domain info for acmecorp.com...
-> TOOL ACTIVATED: Checking CRM history for jane@acmecorp.com...

[AI Thinking...]
-> TOOL ACTIVATED: Calculating lead score...
-> TOOL ACTIVATED: Calculating lead score...

[AI Thinking...]

--- FINAL AGENT SUMMARY ---
Here's a concise summary of the lead based on the information gathered:

### Domain Information:
- **Industry**: Software/SaaS
- **Company Size**: 501-1000 employees
- **Revenue**: $50M - $100M

### CRM History:
- **Last Contact Date**: November 15, 2025
- **Status**: Cold Lead
- **Notes**: The lead, Jane, attended a webinar but no follow-up has been conducted yet.

### Lead Score:
- **Priority**: Medium

This information suggests that Jane works for a reasonably large and financially healthy software company. Although currently listed as a cold lead, her recent participation in a webinar indicates potential interest, making her a promising candidate for

In [None]:
# Scenario 2: Medium-Value Lead (Active opportunity, needs scoring)
lead_email_2 = "bob@widgetco.net"
run_agent(f"Can you run an analysis on this lead: {lead_email_2}")


--- Running Lead Qualifier Agent ---

[AI Thinking...]
-> TOOL ACTIVATED: Looking up domain info for widgetco.net...
-> TOOL ACTIVATED: Checking CRM history for bob@widgetco.net...

[AI Thinking...]
-> TOOL ACTIVATED: Calculating lead score...
-> TOOL ACTIVATED: Calculating lead score...

[AI Thinking...]

--- FINAL AGENT SUMMARY ---
### Lead Summary for Sales Rep:

- **Email**: bob@widgetco.net
- **Domain Information**:
  - **Industry**: Manufacturing
  - **Company Size**: 100-250 employees
  - **Annual Revenue**: $10M - $25M

- **CRM History**:
  - **Last Contacted**: December 1, 2025
  - **Status**: Active Opportunity
  - **Notes**: Discussion around Q1 budget and product integration has already taken place.

- **Lead Score**: High

This lead represents a high-priority opportunity in the manufacturing sector, with engagement in budgetary discussions, indicating potential purchasing plans.
