<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/360_EFIA_DataLoader_Utils.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



## Data Ingestion: Clean Inputs, Trusted Outputs

Before an intelligence system can generate insight, it needs something far more basic — **reliable data**.

This code block defines the **data loading and validation layer** for the Employee Feedback Intelligence Agent. Its job is simple but critical: ensure that only **well-formed, trustworthy feedback** ever enters the agent’s reasoning pipeline.

No assumptions.
No silent failures.
No “best guess” parsing.

---

## Why This Layer Exists

Employee feedback drives real decisions — staffing, process changes, leadership attention.

If the data is incomplete, inconsistent, or malformed:

* Trends become misleading
* Prioritization breaks down
* Executive trust erodes quickly

This layer prevents that by enforcing **clear rules at the boundary** of the system.

---

## Loading Feedback One File at a Time (With Guardrails)

### `load_feedback_file`

This function handles a single responsibility:
**load one feedback file and confirm it meets expectations.**

Before any feedback is accepted:

* The file must exist
* The file must be valid JSON
* The data must be a list of feedback records
* Each record must contain required fields:

  * `submission_id`
  * `job_area`
  * `category`
  * `free_text_feedback`

If anything is wrong, the function fails loudly and clearly.

That’s intentional.

This approach ensures that:

* Errors are caught early
* Bad data never contaminates analysis
* Problems are easy to trace back to their source

In other words, the agent refuses to reason on questionable input.

---

## Combining Multiple Departments Into One View

### `load_all_feedback_files`

In a real organization, feedback doesn’t live in one place.

This function:

* Loads feedback from multiple job-area files
* Applies the same validation rules to each
* Combines everything into a single, clean dataset

If *any* file is missing or invalid, the process stops immediately.

This design choice reinforces an important principle:

> **Partial insight is worse than no insight.**

Leadership decisions should be based on a complete and reliable picture — not silently degraded data.

---

## Entry-Level Validation for Safety and Consistency

### `validate_feedback_entry`

This utility validates individual feedback records.

It ensures:

* Required fields are present
* Required fields aren’t empty
* Categories are valid (`Issue` or `Idea`)
* Optional fields (like employee name) are handled intentionally

This allows the agent to:

* Enforce consistency across departments
* Avoid downstream edge cases
* Support future extensions (e.g., stricter rules for ideas vs issues)

It’s a safety net that protects every downstream step.

---

## Business Impact of This Design

This data loading layer provides several important guarantees:

* **Trust**
  Leaders can be confident that insights are based on valid input.

* **Transparency**
  If something goes wrong, the error is explicit and traceable.

* **Scalability**
  New departments or stores can be added by dropping in new files — no code changes required.

* **Auditability**
  Every decision can be traced back to a known, validated dataset.

This is how you build an agent that people are willing to rely on.

---

## Architectural Takeaway

This code doesn’t analyze feedback.
It protects the system from bad assumptions.

By enforcing structure, validation, and clear failure modes at the data boundary, the agent establishes a foundation of **credibility** before any intelligence is generated.

That’s not just good engineering —
it’s essential for executive confidence.




# Data Loading Utilities for Employee Feedback Intelligence Agent

In [None]:
"""Data Loading Utilities for Employee Feedback Intelligence Agent

Loads feedback data from JSON files.
"""

import json
from pathlib import Path
from typing import List, Dict, Any, Optional
from toolshed.validation import validate_json_file


def load_feedback_file(file_path: str) -> List[Dict[str, Any]]:
    """
    Load feedback data from a single JSON file.

    Args:
        file_path: Path to JSON file

    Returns:
        List of feedback entries

    Raises:
        ValueError: If file is invalid or missing required fields
    """
    data, errors = validate_json_file(
        file_path,
        expected_type=list,
        item_type=dict,
        required_fields=["submission_id", "job_area", "category", "free_text_feedback"]
    )

    if errors:
        raise ValueError(f"Validation errors in {file_path}: {', '.join(errors)}")

    return data


def load_all_feedback_files(data_dir: str, feedback_files: List[str]) -> List[Dict[str, Any]]:
    """
    Load all feedback files and combine into a single list.

    Args:
        data_dir: Directory containing feedback files
        feedback_files: List of JSON filenames to load

    Returns:
        Combined list of all feedback entries
    """
    all_feedback = []
    data_path = Path(data_dir)

    for filename in feedback_files:
        file_path = data_path / filename

        if not file_path.exists():
            raise FileNotFoundError(f"Feedback file not found: {file_path}")

        try:
            feedback_data = load_feedback_file(str(file_path))
            all_feedback.extend(feedback_data)
        except Exception as e:
            raise ValueError(f"Error loading {filename}: {str(e)}")

    return all_feedback


def validate_feedback_entry(entry: Dict[str, Any]) -> List[str]:
    """
    Validate a single feedback entry.

    Args:
        entry: Feedback entry dictionary

    Returns:
        List of error messages (empty if valid)
    """
    errors = []

    required_fields = ["submission_id", "job_area", "category", "free_text_feedback"]
    for field in required_fields:
        if field not in entry:
            errors.append(f"Missing required field: {field}")
        elif not entry[field] and field != "employee_name":  # employee_name can be empty
            errors.append(f"Empty required field: {field}")

    # Validate category
    if "category" in entry and entry["category"] not in ["Issue", "Idea"]:
        errors.append(f"Invalid category: {entry['category']}. Must be 'Issue' or 'Idea'")

    return errors

