<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/257_Product_CustomerFitDiscoveryOrchestrator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Synthesis utilities for Product-Customer Fit Discovery Orchestrator

This final set of utilities, the **Synthesis Agent (Step 6)**, is the ultimate goal of your entire orchestrator. Its role is to take the disparate analytical outputs from all previous agentsâ€”the clusters, the rules, the motifs, and the scoresâ€”and **synthesize** them into a unified list of **actionable, ranked business opportunities.**

This is where the raw data is finally transformed into the final, consumable **strategy**.

***

## ðŸ§  Core Agent Architecture: Cross-Agent Validation and Strategic Scoring

The module's power comes from its ability to validate findings across multiple analytical methods and score them based on business value.

### 1. The Synthesis Engine (`combine_insights`)

This function is the strategic core, using pre-defined **business heuristics** to generate opportunities from the combined evidence.

| Opportunity Type | Evidence Source (Agent) | Strategic Goal |
| :--- | :--- | :--- |
| **Product Gap** | Customer Clusters (Segmentation) | Identify **Cross-Sell** opportunities to fill gaps in a specific customer segment's purchasing profile. |
| **Bundle Opportunity** | Product Clusters + Association Rules | **Cross-Validation:** Confirms that products that *cluster* together are also *statistically associated* in purchase history. |
| **Cross-Sell** | Association Rules (Pattern Mining) | Direct translation of high-confidence purchase dependencies into **Targeted Marketing** actions. |
| **Market Gap** | Graph Motifs/Centrality (Isolated Products) | **Flagging Structural Failure:** Explicitly identifies products that are failing to connect with the market (Ghost Demand), leading to investigation actions. |

**Output Structure:** Crucially, every generated insight includes an `evidence` dictionary that explicitly tracks which agent (clustering, patterns, graph) contributed to the finding.

***

### 2. Validation and Ranking for Robustness

The latter functions ensure that the final list of strategies is robust, ranked, and ready for execution.

| Function | Purpose | Strategic Implication |
| :--- | :--- | :--- |
| **`validate_insights`** | Requires **Cross-Agent Evidence**. Sets `validated=True` only if an insight is supported by two or more independent analytical agents. | **Mitigates Risk:** Prevents the agent from acting on a noisy signal from a single source. Only robust, multi-perspective findings are confirmed. |
| **`score_opportunities`** | Calculates the **Composite Score**. This score is a weighted blend of: **Confidence** ($\text{40\%}$), **Business Value** ($\text{30\%}$), **Feasibility** ($\text{20\%}$), and **Evidence Strength** ($\text{10\%}$). | **Prioritizes Action:** Embeds the business's priorities (confidence is key) into the final metric, ensuring the agent recommends high-quality, executable ideas. |
| **`rank_opportunities`** | Ranks the final list by prioritizing **Validated** insights first, then by the **Composite Score**. | **Final Playbook:** Creates a definitive, ordered list of the top business strategies, maximizing the impact of the analysis. |

***

## âœ¨ Differentiation: The Final Strategic Playbook

This module is the definitive component that transforms your system into a **Strategic Discovery Orchestrator**

[Image of a Data Pipeline Flow]
.

* **Autonomous Strategy:** The agent produces a complete, ranked list of **recommended actions** and flags them with a confidence level and feasibility score. No human intervention is needed to translate the data science output into a sales strategy.
* **The "Ghost Demand" Resolution:** By generating and prioritizing **Market Gap** and **Product Gap** insights, the agent fulfills its core mission: to identify and quantify the specific, actionable opportunities represented by the unseen market potential.
* **Self-Correction:** The mandatory **cross-validation step** gives the agent an internal self-correction mechanism, ensuring the strategic recommendations are the most reliable outcome of the entire 6-step analytical process.

In [None]:
"""Synthesis utilities for Product-Customer Fit Discovery Orchestrator"""

from typing import List, Dict, Any, Set
from collections import defaultdict, Counter
import uuid


def combine_insights(
    customer_clusters: List[Dict[str, Any]],
    product_clusters: List[Dict[str, Any]],
    association_rules: List[Dict[str, Any]],
    sequential_patterns: List[Dict[str, Any]],
    graph_motifs: List[Dict[str, Any]],
    centrality_metrics: Dict[str, Any],
    preprocessed_data: Dict[str, Any]
) -> List[Dict[str, Any]]:
    """
    Combine insights from all analysis agents into unified opportunities.

    Args:
        customer_clusters: Customer segmentation results
        product_clusters: Product bundling results
        association_rules: Product association rules
        sequential_patterns: Purchase sequence patterns
        graph_motifs: Network motif patterns
        centrality_metrics: Centrality analysis results
        preprocessed_data: Preprocessed data for context

    Returns:
        List of synthesized insight dictionaries
    """
    insights = []

    # Get derived features for context
    derived_features = preprocessed_data.get("derived_features", {})
    customer_engagement = derived_features.get("customer_engagement", {})
    product_popularity = derived_features.get("product_popularity", {})

    # 1. Product Gap Opportunities (from clustering)
    for cluster in customer_clusters:
        underserved_products = cluster.get("underserved_products", [])
        if underserved_products:
            for product_id in underserved_products[:3]:  # Top 3 per cluster
                insight = {
                    "insight_id": str(uuid.uuid4())[:8],
                    "insight_type": "product_gap",
                    "title": f"Untapped Product: {product_id} for {cluster['cluster_label']}",
                    "description": f"Customer segment {cluster['cluster_label']} ({cluster['size']} customers) doesn't use {product_id}, representing a cross-sell opportunity",
                    "confidence": 0.7,  # Medium confidence
                    "business_value": cluster.get("business_value", 0.0) * 0.1,  # Estimate
                    "evidence": {
                        "from_clustering": [f"Segment {cluster['cluster_label']} missing {product_id}"],
                        "from_patterns": [],
                        "from_graph": []
                    },
                    "recommended_actions": [
                        f"Target {cluster['cluster_label']} with {product_id} marketing",
                        f"Create bundle including {product_id} for this segment"
                    ],
                    "implementation_feasibility": "medium"
                }
                insights.append(insight)

    # 2. Bundle Opportunities (from product clustering + association rules)
    for cluster in product_clusters:
        if cluster.get("bundle_potential", 0) > 0.5:
            product_ids = cluster.get("entity_ids", [])
            if len(product_ids) >= 2:
                # Check if association rules support this bundle
                supporting_rules = [
                    r for r in association_rules
                    if set(r.get("antecedent", [])).issubset(set(product_ids)) or
                       set(r.get("consequent", [])).issubset(set(product_ids))
                ]

                insight = {
                    "insight_id": str(uuid.uuid4())[:8],
                    "insight_type": "bundle_opportunity",
                    "title": f"Natural Product Bundle: {', '.join(product_ids[:3])}",
                    "description": f"Products {', '.join(product_ids)} naturally cluster together with bundle potential {cluster.get('bundle_potential', 0):.2f}",
                    "confidence": 0.8 if supporting_rules else 0.6,
                    "business_value": cluster.get("bundle_potential", 0.0) * 1000,  # Estimate
                    "evidence": {
                        "from_clustering": [f"Product cluster {cluster['cluster_label']}"],
                        "from_patterns": [f"{len(supporting_rules)} supporting association rules"] if supporting_rules else [],
                        "from_graph": []
                    },
                    "recommended_actions": [
                        f"Create bundle package: {', '.join(product_ids)}",
                        "Test bundle pricing strategy"
                    ],
                    "implementation_feasibility": "high"
                }
                insights.append(insight)

    # 3. Cross-Sell Opportunities (from association rules)
    cross_sell_rules = [r for r in association_rules if r.get("rule_type") == "cross_sell"]
    for rule in cross_sell_rules[:10]:  # Top 10 cross-sell rules
        antecedent = rule.get("antecedent", [])
        consequent = rule.get("consequent", [])
        confidence = rule.get("confidence", 0.0)

        if confidence >= 0.5:  # High confidence only
            insight = {
                "insight_id": str(uuid.uuid4())[:8],
                "insight_type": "cross_sell",
                "title": f"Cross-Sell: {', '.join(antecedent)} â†’ {', '.join(consequent)}",
                "description": f"Customers with {', '.join(antecedent)} have {confidence:.0%} probability of also using {', '.join(consequent)}",
                "confidence": confidence,
                "business_value": rule.get("business_value", 0.0),
                "evidence": {
                    "from_clustering": [],
                    "from_patterns": [f"Association rule: {confidence:.0%} confidence, {rule.get('support', 0):.0%} support"],
                    "from_graph": []
                },
                "recommended_actions": [
                    f"Recommend {', '.join(consequent)} to customers with {', '.join(antecedent)}",
                    "Create automated cross-sell campaign"
                ],
                "implementation_feasibility": "high"
            }
            insights.append(insight)

    # 4. Market Gap Opportunities (from graph analysis - isolated products)
    isolated_products = centrality_metrics.get("isolated_products", [])
    for product_id in isolated_products[:5]:  # Top 5 isolated products
        popularity = product_popularity.get(product_id, {})
        popularity_score = popularity.get("popularity_score", 0.0)

        insight = {
            "insight_id": str(uuid.uuid4())[:8],
            "insight_type": "market_gap",
            "title": f"Underutilized Product: {product_id}",
            "description": f"Product {product_id} has low network connectivity (isolated) but may represent untapped market potential",
            "confidence": 0.6,
            "business_value": (1.0 - popularity_score) * 500,  # Inverse of popularity
            "evidence": {
                "from_clustering": [],
                "from_patterns": [],
                "from_graph": [f"Low centrality: isolated product in network"]
            },
            "recommended_actions": [
                f"Investigate why {product_id} has low adoption",
                "Consider targeted marketing campaign for {product_id}",
                "Review product positioning and messaging"
            ],
            "implementation_feasibility": "medium"
        }
        insights.append(insight)

    # 5. Customer Segment Opportunities (from clustering + patterns)
    for cluster in customer_clusters:
        characteristics = cluster.get("characteristics", {})
        top_products = characteristics.get("top_products", [])

        # Find association rules relevant to this segment's products
        segment_rules = [
            r for r in association_rules
            if any(p in r.get("antecedent", []) + r.get("consequent", []) for p in top_products)
        ]

        if segment_rules and cluster.get("size", 0) > 10:  # Significant segment
            insight = {
                "insight_id": str(uuid.uuid4())[:8],
                "insight_type": "customer_segment",
                "title": f"High-Value Segment: {cluster['cluster_label']}",
                "description": f"Segment {cluster['cluster_label']} ({cluster['size']} customers) shows strong product patterns with {len(segment_rules)} relevant association rules",
                "confidence": 0.75,
                "business_value": cluster.get("business_value", 0.0),
                "evidence": {
                    "from_clustering": [f"Segment size: {cluster['size']} customers"],
                    "from_patterns": [f"{len(segment_rules)} relevant association rules"],
                    "from_graph": []
                },
                "recommended_actions": [
                    f"Develop segment-specific marketing for {cluster['cluster_label']}",
                    f"Create personalized product recommendations"
                ],
                "implementation_feasibility": "high"
            }
            insights.append(insight)

    return insights


def score_opportunities(insights: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
    """
    Score opportunities based on business value, confidence, and feasibility.

    Args:
        insights: List of insight dictionaries

    Returns:
        List of insights with added scores, sorted by score
    """
    scored = []

    for insight in insights:
        confidence = insight.get("confidence", 0.0)
        business_value = insight.get("business_value", 0.0)
        feasibility = insight.get("implementation_feasibility", "low")

        # Feasibility score
        feasibility_scores = {"high": 1.0, "medium": 0.7, "low": 0.4}
        feasibility_score = feasibility_scores.get(feasibility, 0.5)

        # Evidence strength (number of supporting sources)
        evidence = insight.get("evidence", {})
        evidence_count = sum(len(v) for v in evidence.values())
        evidence_strength = min(1.0, evidence_count / 3.0)  # Normalize to 0-1

        # Composite score
        composite_score = (
            confidence * 0.4 +
            min(1.0, business_value / 1000.0) * 0.3 +  # Normalize business value
            feasibility_score * 0.2 +
            evidence_strength * 0.1
        )

        insight_copy = insight.copy()
        insight_copy["composite_score"] = composite_score
        scored.append(insight_copy)

    # Sort by composite score (descending)
    scored.sort(key=lambda x: x["composite_score"], reverse=True)

    return scored


def validate_insights(
    insights: List[Dict[str, Any]],
    require_cross_validation: bool = True
) -> List[Dict[str, Any]]:
    """
    Validate insights by checking for cross-agent evidence.

    Args:
        insights: List of insight dictionaries
        require_cross_validation: Whether to require evidence from multiple agents

    Returns:
        List of validated insights with validation flags
    """
    validated = []

    for insight in insights:
        evidence = insight.get("evidence", {})

        # Count evidence sources
        sources = sum(1 for v in evidence.values() if v)

        if require_cross_validation:
            is_validated = sources >= 2  # Need evidence from at least 2 agents
        else:
            is_validated = sources >= 1

        insight_copy = insight.copy()
        insight_copy["validated"] = is_validated
        insight_copy["evidence_sources"] = sources

        validated.append(insight_copy)

    return validated


def rank_opportunities(
    insights: List[Dict[str, Any]],
    top_n: int = 10
) -> List[Dict[str, Any]]:
    """
    Rank opportunities by composite score and validation status.

    Args:
        insights: List of validated insight dictionaries
        top_n: Number of top opportunities to return

    Returns:
        Top N ranked opportunities
    """
    # Sort by: validated first, then composite score
    ranked = sorted(
        insights,
        key=lambda x: (x.get("validated", False), x.get("composite_score", 0.0)),
        reverse=True
    )

    return ranked[:top_n]


def create_synthesis_summary(
    insights: List[Dict[str, Any]],
    validated_insights: List[Dict[str, Any]],
    ranked_opportunities: List[Dict[str, Any]]
) -> Dict[str, Any]:
    """
    Create summary of synthesis results.

    Args:
        insights: All insights
        validated_insights: Validated insights
        ranked_opportunities: Top ranked opportunities

    Returns:
        Summary dictionary
    """
    # Count by type
    insights_by_type = Counter(i.get("insight_type", "unknown") for i in insights)

    # Calculate total potential value
    total_value = sum(i.get("business_value", 0.0) for i in ranked_opportunities)

    # Count validated
    validated_count = sum(1 for i in validated_insights if i.get("validated", False))

    # Top opportunity types
    top_types = [insight_type for insight_type, _ in insights_by_type.most_common(3)]

    return {
        "total_insights": len(insights),
        "high_confidence_insights": sum(1 for i in insights if i.get("confidence", 0) >= 0.7),
        "total_potential_value": float(total_value),
        "insights_by_type": dict(insights_by_type),
        "cross_validated_insights": validated_count,
        "top_opportunity_types": top_types
    }



# Tests for synthesis utilities

In [None]:
"""Tests for synthesis utilities"""

import pytest
from tools.synthesis import (
    combine_insights,
    score_opportunities,
    validate_insights,
    rank_opportunities,
    create_synthesis_summary
)


def test_combine_insights():
    """Test combining insights from all agents"""
    customer_clusters = [
        {
            "cluster_id": 0,
            "cluster_label": "Segment 1",
            "entity_ids": ["C001", "C002"],
            "size": 2,
            "underserved_products": ["P10"],
            "business_value": 100.0,
            "characteristics": {"top_products": ["P01", "P02"]}
        }
    ]

    product_clusters = [
        {
            "cluster_id": 0,
            "cluster_label": "Bundle 1",
            "entity_ids": ["P01", "P02"],
            "size": 2,
            "bundle_potential": 0.8
        }
    ]

    association_rules = [
        {
            "antecedent": ["P01"],
            "consequent": ["P02"],
            "confidence": 0.75,
            "support": 0.5,
            "rule_type": "cross_sell",
            "business_value": 50.0
        }
    ]

    sequential_patterns = []
    graph_motifs = []
    centrality_metrics = {"isolated_products": ["P20"]}
    preprocessed_data = {
        "derived_features": {
            "customer_engagement": {},
            "product_popularity": {"P20": {"popularity_score": 0.2}}
        }
    }

    insights = combine_insights(
        customer_clusters,
        product_clusters,
        association_rules,
        sequential_patterns,
        graph_motifs,
        centrality_metrics,
        preprocessed_data
    )

    assert len(insights) > 0
    assert all("insight_id" in i for i in insights)
    assert all("insight_type" in i for i in insights)
    assert all("confidence" in i for i in insights)


def test_score_opportunities():
    """Test scoring opportunities"""
    insights = [
        {
            "insight_id": "test1",
            "confidence": 0.8,
            "business_value": 500.0,
            "implementation_feasibility": "high",
            "evidence": {"from_clustering": ["test"], "from_patterns": ["test"]}
        },
        {
            "insight_id": "test2",
            "confidence": 0.5,
            "business_value": 200.0,
            "implementation_feasibility": "low",
            "evidence": {"from_clustering": []}
        }
    ]

    scored = score_opportunities(insights)

    assert len(scored) == 2
    assert all("composite_score" in s for s in scored)
    assert scored[0]["composite_score"] >= scored[1]["composite_score"]  # Should be sorted


def test_validate_insights():
    """Test validating insights"""
    insights = [
        {
            "insight_id": "test1",
            "evidence": {
                "from_clustering": ["evidence1"],
                "from_patterns": ["evidence2"],
                "from_graph": []
            }
        },
        {
            "insight_id": "test2",
            "evidence": {
                "from_clustering": ["evidence1"],
                "from_patterns": [],
                "from_graph": []
            }
        }
    ]

    validated = validate_insights(insights, require_cross_validation=True)

    assert len(validated) == 2
    assert all("validated" in v for v in validated)
    assert all("evidence_sources" in v for v in validated)
    assert validated[0]["validated"] is True  # Has 2 sources
    assert validated[1]["validated"] is False  # Has only 1 source


def test_rank_opportunities():
    """Test ranking opportunities"""
    insights = [
        {
            "insight_id": "test1",
            "validated": True,
            "composite_score": 0.8
        },
        {
            "insight_id": "test2",
            "validated": False,
            "composite_score": 0.9
        },
        {
            "insight_id": "test3",
            "validated": True,
            "composite_score": 0.7
        }
    ]

    ranked = rank_opportunities(insights, top_n=2)

    assert len(ranked) == 2
    # Validated should come first
    assert ranked[0]["validated"] is True
    assert ranked[1]["validated"] is True


def test_create_synthesis_summary():
    """Test creating synthesis summary"""
    insights = [
        {"insight_type": "product_gap", "confidence": 0.8, "business_value": 100.0},
        {"insight_type": "bundle_opportunity", "confidence": 0.7, "business_value": 200.0},
        {"insight_type": "product_gap", "confidence": 0.6, "business_value": 50.0}
    ]

    validated = [
        {"validated": True},
        {"validated": True},
        {"validated": False}
    ]

    ranked = [
        {"business_value": 200.0},
        {"business_value": 100.0}
    ]

    summary = create_synthesis_summary(insights, validated, ranked)

    assert "total_insights" in summary
    assert "high_confidence_insights" in summary
    assert "total_potential_value" in summary
    assert "insights_by_type" in summary
    assert summary["total_insights"] == 3
    assert summary["cross_validated_insights"] == 2


def test_combine_insights_product_gap():
    """Test product gap insights are created"""
    customer_clusters = [
        {
            "cluster_id": 0,
            "cluster_label": "Test Segment",
            "entity_ids": ["C001"],
            "size": 1,
            "underserved_products": ["P10", "P11"],
            "business_value": 100.0,
            "characteristics": {}
        }
    ]

    insights = combine_insights(
        customer_clusters,
        [],
        [],
        [],
        [],
        {},
        {"derived_features": {}}
    )

    # Should have product gap insights
    product_gaps = [i for i in insights if i["insight_type"] == "product_gap"]
    assert len(product_gaps) > 0


def test_combine_insights_bundle_opportunity():
    """Test bundle opportunity insights are created"""
    product_clusters = [
        {
            "cluster_id": 0,
            "cluster_label": "Test Bundle",
            "entity_ids": ["P01", "P02", "P03"],
            "size": 3,
            "bundle_potential": 0.9
        }
    ]

    insights = combine_insights(
        [],
        product_clusters,
        [],
        [],
        [],
        {},
        {"derived_features": {}}
    )

    # Should have bundle opportunity insights
    bundles = [i for i in insights if i["insight_type"] == "bundle_opportunity"]
    assert len(bundles) > 0


def test_combine_insights_cross_sell():
    """Test cross-sell insights are created"""
    association_rules = [
        {
            "antecedent": ["P01"],
            "consequent": ["P02"],
            "confidence": 0.8,
            "support": 0.4,
            "rule_type": "cross_sell",
            "business_value": 100.0
        }
    ]

    insights = combine_insights(
        [],
        [],
        association_rules,
        [],
        [],
        {},
        {"derived_features": {}}
    )

    # Should have cross-sell insights
    cross_sells = [i for i in insights if i["insight_type"] == "cross_sell"]
    assert len(cross_sells) > 0



# Test Results

In [None]:
(.venv) micahshull@Micahs-iMac LG_Cursor_035_Product-CustomerFitDiscoveryOrchestrator % python3 -m pytest tests/test_synthesis.py -v
============================================================ test session starts ============================================================
platform darwin -- Python 3.13.7, pytest-9.0.1, pluggy-1.6.0 -- /Users/micahshull/Documents/AI_LangGraph/LG_Cursor_035_Product-CustomerFitDiscoveryOrchestrator/.venv/bin/python3
cachedir: .pytest_cache
rootdir: /Users/micahshull/Documents/AI_LangGraph/LG_Cursor_035_Product-CustomerFitDiscoveryOrchestrator
plugins: langsmith-0.4.53, anyio-4.12.0, asyncio-1.3.0, cov-7.0.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 8 items

tests/test_synthesis.py::test_combine_insights PASSED                                                                                 [ 12%]
tests/test_synthesis.py::test_score_opportunities PASSED                                                                              [ 25%]
tests/test_synthesis.py::test_validate_insights PASSED                                                                                [ 37%]
tests/test_synthesis.py::test_rank_opportunities PASSED                                                                               [ 50%]
tests/test_synthesis.py::test_create_synthesis_summary PASSED                                                                         [ 62%]
tests/test_synthesis.py::test_combine_insights_product_gap PASSED                                                                     [ 75%]
tests/test_synthesis.py::test_combine_insights_bundle_opportunity PASSED                                                              [ 87%]
tests/test_synthesis.py::test_combine_insights_cross_sell PASSED                                                                      [100%]

============================================================= 8 passed in 0.02s =============================================================
