<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/361_EFIA_Aggregation_Utils.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



## Aggregation & Theme Detection

### Turning Individual Feedback Into Organizational Signal

This code block is where the Employee Feedback Intelligence Agent makes its first major leap:
it transforms **individual employee comments** into **patterns leadership can see and act on**.

Rather than jumping straight to AI-driven interpretation, this layer uses **clear, auditable rules** to organize feedback by department, category, and recurring themes. This ensures the system produces insight that is **understandable, explainable, and defensible**.

---

## Why Aggregation Comes Before AI

Leaders don’t struggle because they lack feedback.
They struggle because feedback arrives as:

* Isolated comments
* Anecdotes
* One-off complaints
* Scattered suggestions

This aggregation layer answers the real question leaders care about:

> “Is this an isolated issue — or is this something many people are experiencing?”

Only after that question is answered does summarization or AI enhancement make sense.

---

## Grouping Feedback by Department

### `aggregate_by_department`

This function groups feedback by **job area** (e.g., Fulfillment, Guest Advocate, General Merchandise).

In practical terms, this allows the agent to:

* Show where issues are concentrated
* Compare departments side by side
* Generate department-specific summaries
* Avoid mixing unrelated operational realities

For leaders, this immediately answers:

* “Which teams are under the most pressure?”
* “Where should attention be focused first?”

---

## Separating Issues From Ideas

### `aggregate_by_category`

Not all feedback has the same intent.

This function cleanly separates:

* **Issues** (problems to fix)
* **Ideas** (opportunities to improve)

This distinction is critical:

* Issues highlight risk and friction
* Ideas highlight innovation and engagement

By separating them early, the agent avoids muddy analysis and enables clearer prioritization later.

---

## Creating a High-Level Feedback Snapshot

### `calculate_feedback_summary`

This function produces an **executive-friendly snapshot** of the entire feedback set.

It calculates:

* Total feedback volume
* Issues vs ideas
* Feedback by department
* Issues by department
* Ideas by department

This summary powers:

* Dashboards
* Top-level reports
* “At-a-glance” leadership views

Importantly, it does this **without interpretation** — it simply reports reality.

That transparency builds trust.

---

## Detecting Themes: From Noise to Patterns

### `detect_themes`

This is where individual comments begin to form **organizational themes**.

For the MVP, theme detection is intentionally:

* Rule-based
* Keyword-driven
* Explicit and explainable

Each theme is defined by a clear set of keywords tied to real operational concepts, such as:

* Staffing and scheduling
* Device or technology issues
* Training and onboarding
* Inventory accuracy
* Process bottlenecks
* Goal pressure

If feedback mentions these concepts repeatedly, the system records it.

---

### Why This Approach Is Intentional

This design avoids a common trap:
letting an AI model decide what themes “feel important” without accountability.

Instead:

* Themes require **minimum frequency**
* Keywords are visible and adjustable
* Example feedback is preserved
* Departments and categories are tracked explicitly

If a leader asks:

> “Why did the system flag this as a top issue?”

The answer is clear and inspectable.

---

## Filtering for What Actually Matters

Themes only become visible if they occur often enough to matter.

By applying a minimum frequency threshold:

* One-off complaints are filtered out
* Recurring issues rise naturally
* Attention stays focused on systemic problems

This aligns perfectly with the **Pareto principle** that will be applied later.

---

## Output: Clean, Actionable Themes

The final output of this layer is a list of themes that include:

* A stable theme ID
* A clear name
* How often it appears
* Which departments are affected
* Example employee comments
* The feedback entries that contributed

These themes become the backbone for:

* Prioritization
* Executive summaries
* Visualizations
* Follow-up actions

---

## Architectural Takeaway

This aggregation layer is where the agent earns credibility.

Before sentiment scoring, before prioritization, before LLM summaries, the system proves that it can:

* Separate signal from noise
* Detect patterns without bias
* Explain *why* something matters

By grounding insight in frequency, structure, and transparency, the agent creates a foundation leaders can trust — and that AI can safely build on later.




## Configurable Themes: A System That Adapts to Leadership Priorities


>The theme system is intentionally configurable, allowing HR and leadership to adapt how feedback is interpreted over time — starting with transparent keyword rules today and evolving toward smarter, data-driven discovery as the organization matures.

---


One of the most important design decisions in this agent is that **themes are not hard-coded truths** — they are **configurable lenses**.

The `theme_keywords` dictionary defines how the system recognizes recurring issues and ideas. Rather than hiding this logic inside a black-box model, the agent makes it **explicit, adjustable, and open to evolution**.

This is intentional.

---

## Why This Matters to Leaders and HR

Organizational priorities change.

What leadership cares about today may not be what matters most six months from now:

* A new CEO may focus on execution speed
* HR may prioritize training quality
* Operations may focus on staffing stability
* Technology teams may want early warning on system issues

By keeping themes configurable, the system stays aligned with **current business concerns**, not yesterday’s assumptions.

---

## How Theme Configuration Works

Each theme is defined by:

* A clear, human-readable name
* A set of keywords that signal that theme in employee feedback

For example:

* Device issues are detected through words like *device*, *zebra*, *system*
* Staffing concerns through *coverage*, *shift*, *hours*
* Training gaps through *new*, *onboarding*, *confident*

If feedback contains those signals repeatedly, the system flags a pattern.

There’s no mystery behind it — and that’s a feature.

---

## Designed for HR and Management Control

Because themes are rule-based and visible:

* HR teams can adjust keywords as language evolves
* Managers can add new themes tied to current initiatives
* Leadership can temporarily elevate focus areas (e.g., safety, burnout, retention)

No code rewrite is required to shift attention — just configuration changes.

This turns the agent into a **living system**, not a static dashboard.

---

## A Clear Path to Smarter Detection Over Time

This approach also creates a natural upgrade path.

As more data is collected, the system can evolve in stages:

### Phase 1 — Keyword-Driven (Current MVP)

* Fast to deploy
* Fully explainable
* Easy to adjust
* Low risk

### Phase 2 — Expanded Dictionaries

* Keyword lists grow based on real employee language
* Themes become more precise over time

### Phase 3 — ML or LLM-Assisted Clustering

* Semantic similarity replaces exact keyword matching
* New themes can be discovered automatically
* Human-defined themes and AI-discovered themes coexist

Even then, leadership still retains control over:

* Which themes matter
* What gets escalated
* What appears in reports

AI enhances detection — it does not override judgment.

---

## Why Managers Will Like This

From a management perspective, this design:

* Makes priorities visible
* Prevents surprise insights with no explanation
* Allows leaders to say, “Let’s watch *this* more closely right now”
* Builds confidence that the system reflects real organizational concerns

It’s analytics with a steering wheel.






In [None]:
"""Aggregation Utilities for Employee Feedback Intelligence Agent

Groups and aggregates feedback by department, category, and themes.
"""

from typing import List, Dict, Any
from collections import defaultdict, Counter


def aggregate_by_department(feedback: List[Dict[str, Any]]) -> Dict[str, List[Dict[str, Any]]]:
    """
    Group feedback by department (job_area).

    Args:
        feedback: List of feedback entries

    Returns:
        Dictionary mapping department name to list of feedback entries
    """
    by_department = defaultdict(list)

    for entry in feedback:
        department = entry.get("job_area", "Unknown")
        by_department[department].append(entry)

    return dict(by_department)


def aggregate_by_category(feedback: List[Dict[str, Any]]) -> Dict[str, List[Dict[str, Any]]]:
    """
    Group feedback by category (Issue vs Idea).

    Args:
        feedback: List of feedback entries

    Returns:
        Dictionary mapping category to list of feedback entries
    """
    by_category = defaultdict(list)

    for entry in feedback:
        category = entry.get("category", "Unknown")
        by_category[category].append(entry)

    return dict(by_category)


def calculate_feedback_summary(feedback: List[Dict[str, Any]]) -> Dict[str, Any]:
    """
    Calculate overall summary statistics for feedback.

    Args:
        feedback: List of feedback entries

    Returns:
        Summary dictionary with counts and statistics
    """
    total = len(feedback)

    # Count by category
    issues = [e for e in feedback if e.get("category") == "Issue"]
    ideas = [e for e in feedback if e.get("category") == "Idea"]

    # Count by department
    departments = Counter(e.get("job_area", "Unknown") for e in feedback)

    return {
        "total_feedback": total,
        "total_issues": len(issues),
        "total_ideas": len(ideas),
        "departments": list(departments.keys()),
        "feedback_by_department": dict(departments),
        "issues_by_department": dict(Counter(e.get("job_area", "Unknown") for e in issues)),
        "ideas_by_department": dict(Counter(e.get("job_area", "Unknown") for e in ideas))
    }


def detect_themes(feedback: List[Dict[str, Any]], min_frequency: int = 3) -> List[Dict[str, Any]]:
    """
    Detect recurring themes in feedback using keyword matching.

    This is a rule-based MVP approach. Future enhancement: Use LLM for semantic grouping.

    Args:
        feedback: List of feedback entries
        min_frequency: Minimum occurrences to form a theme

    Returns:
        List of theme dictionaries
    """
    # Simple keyword-based theme detection (MVP)
    # Future: Use LLM for semantic similarity grouping

    theme_keywords = {
        "Device/Technology Issues": ["device", "zebra", "freeze", "connection", "system", "machine", "technical"],
        "Staffing/Scheduling": ["staffing", "scheduled", "coverage", "people", "team", "shift", "hours"],
        "Training/Onboarding": ["training", "new", "hires", "trained", "learn", "onboarding", "confident"],
        "Process Efficiency": ["time", "slow", "efficient", "speed", "fast", "quick", "bottleneck", "congested"],
        "Communication": ["communication", "clear", "feedback", "information", "update", "message"],
        "Inventory/Location": ["location", "inventory", "find", "stock", "backroom", "aisle", "item"],
        "Goal/Pressure": ["goal", "pressure", "expect", "target", "hit", "miss", "impossible"],
        "Task Switching": ["switch", "multiple", "different", "task", "focus", "split"],
        "Priority Management": ["priority", "important", "urgent", "first", "focus"],
        "Resource Availability": ["supplies", "materials", "equipment", "tools", "resources"]
    }

    themes = defaultdict(lambda: {
        "feedback_ids": [],
        "departments": set(),
        "categories": set(),
        "examples": []
    })

    # Match feedback to themes based on keywords
    for entry in feedback:
        text = entry.get("free_text_feedback", "").lower()
        submission_id = entry.get("submission_id")
        department = entry.get("job_area", "Unknown")
        category = entry.get("category", "Unknown")

        for theme_name, keywords in theme_keywords.items():
            if any(keyword in text for keyword in keywords):
                themes[theme_name]["feedback_ids"].append(submission_id)
                themes[theme_name]["departments"].add(department)
                themes[theme_name]["categories"].add(category)
                if len(themes[theme_name]["examples"]) < 3:  # Keep up to 3 examples
                    themes[theme_name]["examples"].append(entry.get("free_text_feedback", ""))

    # Convert to list format and filter by frequency
    theme_list = []
    for theme_id, (theme_name, theme_data) in enumerate(themes.items(), 1):
        frequency = len(theme_data["feedback_ids"])

        if frequency >= min_frequency:
            theme_list.append({
                "theme_id": f"theme_{theme_id:03d}",
                "theme_name": theme_name,
                "category": list(theme_data["categories"])[0] if theme_data["categories"] else "Unknown",
                "frequency": frequency,
                "departments": list(theme_data["departments"]),
                "example_feedback": theme_data["examples"],
                "feedback_ids": theme_data["feedback_ids"]
            })

    # Sort by frequency (descending)
    theme_list.sort(key=lambda x: x["frequency"], reverse=True)

    return theme_list

