# Compare Case Groups

Demonstrates use of the Intelligence Toolkit library to compare groups in a dataset.

See [readme](https://github.com/microsoft/intelligence-toolkit/blob/main/app/workflows/compare_case_groups/README.md) for more details.

In [5]:
import sys

sys.path.append("..")
import polars as pl
from toolkit.compare_case_groups.api import CompareCaseGroups

In [6]:
# Create the workflow object
import os
from toolkit.AI.openai_configuration import OpenAIConfiguration


ccg = CompareCaseGroups()

ai_configuration = OpenAIConfiguration(
    {
        "api_type": "OpenAI",
        "api_key": os.environ["OPENAI_API_KEY"],
        "model": "gpt-4o",
    }
)
ccg.set_ai_configuration(ai_configuration)

data_path = "../example_outputs/compare_case_groups/customer_complaints/customer_complaints_prepared.csv"
customer_cases = pl.read_csv(data_path)
print("Loaded data")

Loaded data


In [7]:
filters = []
### If filtering, options here:
# ccg.get_filter_options(customer_cases)

groups = ["city"]
aggregates = [
    "product_code",
    "delivery_issue",
    "description_issue",
    "price_issue",
    "quality_issue",
    "service_issue",
]
temporal = "period"
print("Selected params for workflow")

Selected params for workflow


In [8]:
ccg.create_data_summary(
    customer_cases,
    ["product_code:H"],
    groups,
    aggregates,
    temporal,
)
print("Created data summary")

Created data summary


In [9]:
ccg.model_df.head()

city,group_count,group_rank,attribute_value,attribute_count,attribute_rank,period_window,period_window_count,period_window_rank,period_window_delta
str,u32,i32,str,u32,i32,str,u32,i32,i32
"""Baytown""",1,49,"""delivery_issue…",1,43,"""2025-H1""",1,1,0
"""Baytown""",1,49,"""description_is…",1,42,"""2025-H1""",1,1,0
"""Baytown""",1,49,"""price_issue:tr…",1,28,"""2025-H1""",1,1,0
"""Baytown""",1,49,"""product_code:H…",1,49,"""2025-H1""",1,1,0
"""Baytown""",1,49,"""quality_issue:…",1,37,"""2025-H1""",1,1,0


In [10]:
ccg.get_summary_description()

'This table shows:\n- A summary of **296** data records matching [**product_code\\:H**], representing **40.0%** of the overall dataset with values for all grouping attributes\n- The **group_count** of records for all [**city**] groups, and corresponding **group_rank**\n- The **attribute_count** of each **attribute_value** for all [**city**] groups, and corresponding **attribute_rank**\n- The **period_window_count** of each **attribute_value** for each **period_window** for all [**city**] groups, and corresponding **period_window_rank**\n- The **period_window_delta**, or change in the **attribute_value_count** for successive **period_window** values, within each [**city**] group'

In [11]:
# Select groups to generate reports
# By group name
groups = ["Baytown", "Brookside"]
# OR
# By top n groups
top_groups = 4

report_data, filter_description = ccg.get_report_data(top_group_ranks=top_groups)

In [12]:
# Generates AI report on selected data
explanation = ccg.generate_group_report(report_data, filter_description)
print(explanation)

# Group Comparison Report

## Introduction

This report provides a detailed comparison of data records filtered by the product code "H" across different city groups. The dataset consists of 296 records, representing 40% of the overall dataset. The focus is on the top four city groups based on record count: Hilltop, Lakeside, Riverside, and Seaside. Each city group is analyzed in terms of various attributes and their changes over different time periods.

## City Groups Overview

### Hilltop

- **Total Records**: 42 (Rank 1)
- **Key Observations**:
  - The most common issue is "description_issue:false" with 33 occurrences (Rank 1).
  - "Delivery_issue:true" and "quality_issue:true" both have 16 occurrences, indicating some level of concern in these areas.
  - Over the period from 2020-H1 to 2023-H2, there is a notable increase in "product_code:H" from 2 to 34 (Delta +19).
  - "Delivery_issue:false" saw a significant increase in 2023-H2 with 21 occurrences (Delta +18).

### Lakeside

- **