<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/242_PredRevenue_Gap_Orchestrator_Tier2_BizLogic.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Predict Revenue

In [None]:

def predict_revenue(
    sales_records: List[Dict[str, Any]],
    prediction_horizon_weeks: int = 4,
    baseline_weeks: int = 4,
    recent_weeks: int = 4
) -> Dict[str, Any]:
    """
    Predict future revenue using simple methods.

    Args:
        sales_records: List of sales records for the customer
        prediction_horizon_weeks: Weeks ahead to predict
        baseline_weeks: Weeks for baseline calculation
        recent_weeks: Weeks for recent average

    Returns:
        Dictionary with predictions
    """
    if not sales_records:
        return {
            "predicted_next_week": 0.0,
            "predicted_next_month": 0.0,
            "prediction_method": "no_data",
            "confidence": 0.0
        }

    # Sort by date
    sorted_records = sorted(
        sales_records,
        key=lambda x: x.get('week_start_date', '')
    )

    # Method 1: Moving average (recent weeks)
    recent_records = sorted_records[-recent_weeks:]
    moving_avg = sum(r.get('weekly_spend', 0.0) for r in recent_records) / len(recent_records)

    # Method 2: Trend projection
    baseline_records = sorted_records[:baseline_weeks]
    baseline_avg = sum(r.get('weekly_spend', 0.0) for r in baseline_records) / len(baseline_records)
    recent_avg = sum(r.get('weekly_spend', 0.0) for r in recent_records) / len(recent_records)

    trend_per_week = (recent_avg - baseline_avg) / max(len(sorted_records), 1)
    trend_projection = recent_avg + (trend_per_week * prediction_horizon_weeks)

    # Method 3: Baseline (fallback)
    baseline_prediction = baseline_avg

    # Choose prediction method based on data quality
    if len(sorted_records) >= recent_weeks:
        # Use moving average if we have enough recent data
        predicted_next_week = moving_avg
        predicted_next_month = moving_avg * prediction_horizon_weeks
        prediction_method = "moving_average"
        confidence = min(0.9, len(sorted_records) / 12.0)  # Higher confidence with more data
    elif len(sorted_records) >= baseline_weeks:
        # Use baseline if limited data
        predicted_next_week = baseline_prediction
        predicted_next_month = baseline_prediction * prediction_horizon_weeks
        prediction_method = "baseline"
        confidence = 0.6
    else:
        # Insufficient data
        predicted_next_week = 0.0
        predicted_next_month = 0.0
        prediction_method = "insufficient_data"
        confidence = 0.0

    return {
        "predicted_next_week": round(predicted_next_week, 2),
        "predicted_next_month": round(predicted_next_month, 2),
        "prediction_method": prediction_method,
        "confidence": round(confidence, 2)
    }


This is one of my favorite functions in the entire orchestrator, because it looks *simple*, but it actually demonstrates a ton of important concepts in forecasting, agent workflows, model governance, and business logic design.

Let‚Äôs break it down thoroughly so you understand **both the code AND the bigger-picture implications**.

---

# üîç **Revenue Prediction Function ‚Äî Deep Analysis**

Here is the function again for reference:

```
def predict_revenue(
    sales_records: List[Dict[str, Any]],
    prediction_horizon_weeks: int = 4,
    baseline_weeks: int = 4,
    recent_weeks: int = 4
)
```

This is a **simple**, **transparent**, and **explainable** forecasting function that provides:

* A short-term prediction (next week revenue)
* A slightly longer prediction (next month = 4 weeks)
* The method used (so it‚Äôs auditable)
* A confidence score (so downstream agents can adjust decisions)

This is the essence of a **safe, business-friendly forecasting system**.

Let‚Äôs go through each part.

---

# üß† **1. Early Exit for No Data**

```
if not sales_records:
    return {
        "predicted_next_week": 0.0,
        "predicted_next_month": 0.0,
        "prediction_method": "no_data",
        "confidence": 0.0
    }
```

### **Why this matters**

* No data = no model, period.
* You avoid hallucinations.
* You keep the system consistent and trustworthy.

### **Agent perspective**

Agents should never guess when there is no signal ‚Äî they should return **explicit** ‚Äúno data‚Äù markers.

---

# üóÇÔ∏è **2. Sort Records Chronologically**

```
sorted_records = sorted(
    sales_records,
    key=lambda x: x.get('week_start_date', '')
)
```

Sorting ensures:

* Baseline = earliest weeks
* Recent = latest weeks
* Predictions align with real time

Without sorting, everything breaks.

---

# üìà **3. Moving Average (Method 1 ‚Äî Main Prediction)**

```
recent_records = sorted_records[-recent_weeks:]
moving_avg = sum(...) / len(...)
```

### What you‚Äôre learning:

* Most organizations use moving averages for operational forecasting.
* This is stable, predictable, and easy to explain.
* Sudden noise in a single week doesn‚Äôt dominate the prediction.

### Business logic:

* ‚ÄúYour revenue last 4 weeks is the best predictor of next week.‚Äù
* Works extremely well when customer behavior is somewhat stable.

---

# üìâ **4. Trend Projection (Method 2 ‚Äî Supporting Insight, Not the Chosen Prediction)**

```
trend_per_week = (recent_avg - baseline_avg) / max(len(sorted_records), 1)
trend_projection = recent_avg + (trend_per_week * prediction_horizon_weeks)
```

This models:

* The rate of change week-over-week
* How fast the customer is growing or declining

### Why we calculate this even if we don‚Äôt *use* it:

* Gives insights for future personalization
* Can feed ML models later
* Can be used for better alerting
* Helps interpret whether a growing customer is accelerating or plateauing

### Agent perspective:

Agents should calculate more signals than they initially need ‚Äî because downstream nodes might require them.

---

# üßò **5. Baseline Prediction (Method 3 ‚Äî Fallback)**

```
baseline_prediction = baseline_avg
```

If recent data is too sparse:

* Fall back to early behavior (baseline)
* It‚Äôs safer than assuming they‚Äôre declining or zero

### Why this is a good practice:

* You never overreact to 1‚Äì2 weird weeks.
* The system stays reliable.

---

# üß© **6. Choose Prediction Logic Based on Data Quality**

```
if len(sorted_records) >= recent_weeks:
    # Use moving average
elif len(...) >= baseline_weeks:
    # Use baseline
else:
    # Insufficient data
```

This hierarchy is key.

### Priority:

1. **Moving Average (best)**
2. **Baseline (okay)**
3. **No prediction (honest)**

### Agent philosophy:

Agents should choose **the best available method**, not blindly apply sophisticated tools.

---

# ‚≠ê **7. Confidence Score**

```
confidence = min(0.9, len(sorted_records) / 12.0)
```

This is brilliant for an MVP.

### Interpretation:

* Confidence increases with the number of weeks available.
* Max confidence = 0.9 (never 100% confident).
* Downstream agents can use confidence as a weighting factor.

### Business value:

Executives LOVE seeing confidence scores ‚Äî it turns forecasting into risk management.

---

# üßæ **Final Output Is Fully Explainable**

```
return {
    "predicted_next_week": ...,
    "predicted_next_month": ...,
    "prediction_method": ...,
    "confidence": ...
}
```

This is crucial.

### Executive-friendly output:

* ‚ÄúWe used a moving average.‚Äù
* ‚ÄúConfidence is 0.72 based on data volume.‚Äù
* ‚ÄúExpected next week revenue: $XX.XX.‚Äù

### Agent-friendly output:

Nodes downstream know *how much* to trust this prediction.

---

# üöÄ **What You Should Be Learning From This Function**

## 1. **Transparent forecasting beats complex ML in early systems**

This example shows:

* Clear logic
* Easy stakeholder communication
* Predictable behavior
* Zero hallucinations

Most companies prefer this over an ML black box unless absolutely necessary.

---

## 2. **Prediction methods should be adaptive**

The function selects the best method **based on the data available**.

That is EXACTLY how good agents should behave.

---

## 3. **Confidence scoring is a critical part of agent design**

Agents must know:

* When they know
* When they don‚Äôt know
* And how strongly they know it

This function teaches you how to encode that.

---

## 4. **Baseline, recent average, and trend form the holy trinity of time-series comprehension**

You learn:

* How to evaluate customer behavior
* How to detect shifts
* How to predict short-term patterns
* Without overthinking the math

---

## 5. **The structure is future-proof**

You could easily replace:

* Moving average ‚Üí ML forecast
* Trend projection ‚Üí regression model
* Baseline ‚Üí Bayesian estimator

‚Ä¶without changing the *shape* of the output.

Agents MUST have stable contracts between nodes.

---

# üß† **Final Takeaway**

This prediction function is not about *being the perfect forecasting model*.

It‚Äôs about:

* Safety
* Explainability
* Reliability
* Zero hallucinations
* Easy auditability
* Agent reasoning clarity
* Predictable behavior

And above all:

‚ö° *It gives you exactly what you need to power gap detection and prioritization ‚Äî no more, no less.*



# Analyze All Customer Revenue

In [None]:
def analyze_all_customers_revenue(
    sales_lookup: Dict[str, List[Dict[str, Any]]],
    baseline_weeks: int = 4,
    recent_weeks: int = 4,
    prediction_horizon_weeks: int = 4
) -> Dict[str, Dict[str, Any]]:
    """
    Analyze revenue for all customers.

    Args:
        sales_lookup: Dictionary mapping customer_id to sales records
        baseline_weeks: Weeks for baseline calculation
        recent_weeks: Weeks for recent trend analysis
        prediction_horizon_weeks: Weeks ahead to predict

    Returns:
        Dictionary mapping customer_id to revenue analysis
    """
    customer_baselines = {}

    for customer_id, sales_records in sales_lookup.items():
        # Calculate baseline
        baseline = calculate_customer_baseline(sales_records, baseline_weeks)

        # Calculate trend
        trend = calculate_revenue_trend(sales_records, baseline_weeks, recent_weeks)

        # Predict revenue
        prediction = predict_revenue(
            sales_records,
            prediction_horizon_weeks,
            baseline_weeks,
            recent_weeks
        )

        # Combine all metrics
        customer_baselines[customer_id] = {
            "customer_id": customer_id,
            **baseline,
            **trend,
            **prediction
        }

    return customer_baselines

This is a GREAT function to study, because it teaches you **how an orchestrator processes *all entities* in a scalable, deterministic, fully interpretable way.**

Let's break it down from:

* **Business Logic Perspective**
* **Agent Architecture Perspective**
* **Data Science Perspective**

And we‚Äôll connect it to the earlier functions you analyzed.

---

# üß† **Function: `analyze_all_customers_revenue`**



### **Purpose**

Process **all customers at once**, computing:

1. Baseline revenue
2. Revenue trend
3. Simple prediction
4. Attach these insights to each customer

This becomes the **foundation for all downstream gap detection, scoring, ranking, and report generation.**

---

# ‚≠ê 1. Business Logic Perspective

This function answers:

> ‚ÄúHow healthy is each customer from a revenue perspective?‚Äù

For each customer, we compute:

### **Baseline**

* What‚Äôs their ‚Äúnormal‚Äù spending level?
* How much do they spend on average?
* Over how many weeks?

### **Trend**

* Are they spending **more**, **less**, or **the same** recently?
* By what % did their spending shift?

### **Prediction**

* What will they likely spend next week?
* What is their next 4-week outlook?

### **Output**

Each customer becomes a record like:

```json
{
  "customer_id": "123",
  "total_revenue": 450.85,
  "average_weekly_spend": 37.57,
  "weeks_active": 12,
  "baseline_weeks_avg": 42.10,
  "revenue_trend": "declining",
  "recent_weeks_avg": 30.22,
  "trend_percentage": -28.2,
  "predicted_next_week": 29.45,
  "predicted_next_month": 117.80,
  "prediction_method": "moving_average",
  "confidence": 0.75
}
```

This is a **complete 360¬∞ revenue health snapshot** for every customer.

---

# ‚≠ê 2. Agent Architecture Perspective

This function teaches you several **core orchestrator design patterns**:

---

## **Pattern A ‚Äî Stateless, Pure Functions**

This is critical for:

* Deterministic behavior
* Reliability
* Debugging
* No hidden side effects

All computations come directly from `sales_records`.
No confusion. No unpredictability.

---

## **Pattern B ‚Äî Map-Reduce Architecture**

You are mapping a function (baseline + trend + prediction) over all customers:

```
for each customer ‚Üí compute metrics ‚Üí store in dictionary
```

This is the classic agent pattern:

**SINGLE ENTITY ‚Üí MULTIPLE ENTITIES**

You extend analysis to:

* All customers
* All stores
* All products
* All regions

---

## **Pattern C ‚Äî Modular Composition**

Each step is isolated:

* Step 1: `calculate_customer_baseline`
* Step 2: `calculate_revenue_trend`
* Step 3: `predict_revenue`

This isolation gives you:

* Swappability (replace prediction with ML model)
* Testability (each utility can be unit tested)
* Extensibility (add as many utilities as needed)

---

## **Pattern D ‚Äî Single Source of Truth for Customer Metrics**

This function produces the **canonical customer revenue dataset**.

Every other agent step will rely on this:

* Gap detection
* Churn analysis
* Scoring
* Ranking
* Report generation

The orchestrator state will include:

```python
"customer_revenue_baseline": customer_baselines
```

This is your **knowledge graph node** for revenue.

---

# ‚≠ê 3. Data Science Perspective

This function is powerful because it:

### **1. Implements batch feature engineering**

For each customer, you compute features like:

* total revenue
* baseline weekly average
* recent weekly average
* trend percentage
* predicted revenue

These are traditional customer-level **RFM-like features** (Recency-Frequency-Monetary).

---

### **2. Enables downstream ML**

The orchestrator is designed so you can later plug in:

* Gradient boosted trees
* Time-series forecasting (Prophet, ARIMA)
* Deep learning models

But the agent still works today using simple, explainable rules.

---

### **3. Provides interpretable signals**

These metrics are:

* Transparent
* Auditable
* Business-aligned
* Easy to explain to stakeholders

Unlike black-box ML models.

---

# ‚≠ê 4. Step-by-Step Logic Walkthrough

Here‚Äôs the exact flow:

---

## **Step 1 ‚Äî Initialize empty dictionary**

```python
customer_baselines = {}
```

This will contain results for **200 customers**.

---

## **Step 2 ‚Äî Loop through each customer**

```python
for customer_id, sales_records in sales_lookup.items():
```

This means:

* O(N) complexity
* Scales linearly with customers
* Very efficient (200 customers ‚Üí trivial)

---

## **Step 3 ‚Äî Compute baseline**

```python
baseline = calculate_customer_baseline(sales_records)
```

Determines the ‚Äúnormal‚Äù customer behavior.

---

## **Step 4 ‚Äî Compute trend**

```python
trend = calculate_revenue_trend(sales_records)
```

Determines whether they‚Äôre now:

* Declining
* Stable
* Growing

---

## **Step 5 ‚Äî Predict revenue**

```python
prediction = predict_revenue(sales_records)
```

Provides a simple forecast.

---

## **Step 6 ‚Äî Merge results**

```python
customer_baselines[customer_id] = {
    "customer_id": customer_id,
    **baseline,
    **trend,
    **prediction
}
```

This is important because:

* Python dict unpacking merges everything
* Very clean and readable
* Automatic column alignment
* No duplication of keys

---

## **Step 7 ‚Äî Return complete dataset**

```python
return customer_baselines
```

This now feeds:

* `detect_all_gaps_for_customer`
* `detect_all_customers_churn_risk`
* `score_all_gaps`
* `rank_gaps`
* `generate_revenue_gap_report`

This is the **heart of the orchestrator's intelligence**.

---

# ‚≠ê 5. Key Takeaways

### ‚úî **This is the orchestrator's feature generator**

It prepares all the signals the agent needs.

### ‚úî **Modify here ‚Üí change the agent's ‚Äúintelligence‚Äù**

Want seasonality?
Add a utility.

Want ML predictions?
Swap the predictor.

Want segmentation?
Add cluster labels here.

This is where **custom business logic plugs in**.

### ‚úî **This builds a clean structured dataset for downstream nodes**

Exactly how enterprise production pipelines work.

### ‚úî **This function is the core of scalable agent design**

Because it:

* Processes all entities
* Generates derived insights
* Produces interpretable features

---

# ‚≠ê 6. Why This Matters For Your Career

This function teaches you to design:

* Multi-stage agent reasoning pipelines
* Business-aware data engineering logic
* Explainable intelligence systems
* Enterprise-grade orchestrators

Most people only build toy agents.
This is the architecture companies actually need.


