# Compare Case Groups

Demonstrates use of the Intelligence Toolkit library to compare groups in a dataset.

See [readme](https://github.com/microsoft/intelligence-toolkit/blob/main/app/workflows/compare_case_groups/README.md) for more details.

In [1]:
import sys

sys.path.append("..")
import polars as pl
from toolkit.compare_case_groups.api import CompareCaseGroups

  from tqdm.autonotebook import tqdm, trange


In [2]:
# Create the workflow object
import os
from toolkit.AI.openai_configuration import OpenAIConfiguration


ccg = CompareCaseGroups()

ai_configuration = OpenAIConfiguration(
    {
        "api_type": "OpenAI",
        "api_key": os.environ["OPENAI_API_KEY"],
        "model": "gpt-4o",
    }
)
ccg.set_ai_configuration(ai_configuration)

data_path = "../example_outputs/compare_case_groups/customer_complaints/customer_complaints_prepared.csv"
customer_cases = pl.read_csv(data_path)
print("Loaded data")

Loaded data


In [3]:
filters = []
### If filtering, options here:
# ccg.get_filter_options(customer_cases)

groups = ["city"]
aggregates = [
    "product_code",
    "delivery_issue",
    "description_issue",
    "price_issue",
    "quality_issue",
    "service_issue",
]
temporal = "period"
print("Selected params for workflow")

Selected params for workflow


In [4]:
ccg.create_data_summary(
    customer_cases,
    [],
    groups,
    aggregates,
    temporal,
)
print("Created data summary")

Created data summary


In [5]:
ccg.model_df.head()

city,group_count,group_rank,attribute_value,attribute_count,attribute_rank,period_window,period_window_count,period_window_rank,period_window_delta
str,u32,i32,str,u32,i32,str,u32,i32,i32
"""Baytown""",8,50,"""delivery_issue…",6,49,"""2020-H1""",1,4,0
"""Baytown""",8,50,"""delivery_issue…",2,56,"""2020-H1""",0,3,0
"""Baytown""",8,50,"""description_is…",5,56,"""2020-H1""",1,4,0
"""Baytown""",8,50,"""description_is…",3,41,"""2020-H1""",0,3,0
"""Baytown""",8,50,"""price_issue:fa…",5,56,"""2020-H1""",1,4,0


In [6]:
ccg.get_summary_description()

'This table shows:\n- A summary of all **2769** data records with values for all grouping attributes\n- The **group_count** of records for all [**city**] groups, and corresponding **group_rank**\n- The **attribute_count** of each **attribute_value** for all [**city**] groups, and corresponding **attribute_rank**\n- The **period_window_count** of each **attribute_value** for each **period_window** for all [**city**] groups, and corresponding **period_window_rank**\n- The **period_window_delta**, or change in the **attribute_value_count** for successive **period_window** values, within each [**city**] group'

In [9]:
# Select groups to generate reports
# By group name
selected_groups = ["Lakeside"]
# OR
# By top n groups
top_group_ranks = 10

report_data, filter_description = ccg.get_report_data(selected_groups=selected_groups)

In [10]:
# Generates AI report on selected data
explanation = ccg.generate_group_report(report_data, filter_description)
print(explanation)

# Group Comparison Report: Lakeside

## Introduction

This report focuses on the dataset filtered to include only the city group "Lakeside." The dataset provides a comprehensive overview of various issues and product codes over different time periods, from the first half of 2020 to the second half of 2026. The analysis includes counts, ranks, and changes in these attributes over time.

## Data Summary

The dataset consists of 349 records for the city group "Lakeside." The analysis covers several attributes, including delivery issues, description issues, price issues, product codes, quality issues, and service issues. Each attribute is evaluated for its occurrence and rank within the group, as well as its changes over successive time periods.

## Key Findings

### Delivery Issues

- **False Delivery Issues**: The count of records without delivery issues is consistently high, with a peak in the second half of 2023 (124 records, rank 1). There is a significant increase in the first half o