# Compare Case Groups

Demonstrates use of the Intelligence Toolkit library to compare groups in a dataset.

See [readme](https://github.com/microsoft/intelligence-toolkit/blob/main/app/workflows/compare_case_groups/README.md) for more details.

In [1]:
import sys

sys.path.append("..")
import polars as pl
from toolkit.compare_case_groups.api import CompareCaseGroups

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Create the workflow object
import os
from toolkit.AI.openai_configuration import OpenAIConfiguration


ccg = CompareCaseGroups()

ai_configuration = OpenAIConfiguration(
    {
        "api_type": "OpenAI",
        "api_key": os.environ["OPENAI_API_KEY"],
        "model": "gpt-4o",
    }
)
ccg.set_ai_configuration(ai_configuration)

data_path = "../example_outputs/compare_case_groups/customer_complaints/customer_complaints_prepared.csv"
customer_cases = pl.read_csv(data_path)
print("Loaded data")

Loaded data


In [3]:
filters = []
groups = ["city"]
aggregates = [
    "product_code",
    "delivery_issue",
    "description_issue",
    "price_issue",
    "quality_issue",
    "service_issue",
]
temporal = "period"
print("Selected params for workflow")

Selected params for workflow


In [4]:
ccg.create_data_summary(
    customer_cases,
    filters,
    groups,
    aggregates,
    temporal,
)
print("Created data summary")

Created data summary


In [5]:
ccg.model_df.head()

city,group_count,group_rank,attribute_value,attribute_count,attribute_rank,period_window,period_window_count,period_window_rank,period_window_delta
str,u32,i32,str,u32,i32,str,u32,i32,i32
"""Baytown""",8,50,"""delivery_issue…",6,49,"""2020-H1""",1,4,0
"""Baytown""",8,50,"""description_is…",5,56,"""2020-H1""",1,4,0
"""Baytown""",8,50,"""price_issue:fa…",5,56,"""2020-H1""",1,4,0
"""Baytown""",8,50,"""product_code:G…",1,44,"""2020-H1""",1,3,0
"""Baytown""",8,50,"""quality_issue:…",6,51,"""2020-H1""",1,4,0


In [6]:
ccg.get_summary_description()

'This table shows:\n- A summary of all **2769** data records with values for all grouping attributes\n- The **group_count** of records for all [**city**] groups, and corresponding **group_rank**\n- The **attribute_count** of each **attribute_value** for all [**city**] groups, and corresponding **attribute_rank**\n- The **period_window_count** of each **attribute_value** for each **period_window** for all [**city**] groups, and corresponding **period_window_rank**\n- The **period_window_delta**, or change in the **attribute_value_count** for successive **period_window** values, within each [**city**] group'

In [7]:
# Select groups to generate reports
# By group name
groups = ["Baytown", "Brookside"]
# OR
# By top n groups
top_groups = 4

report_data, filter_description = ccg.get_report_data(top_group_ranks=top_groups)

In [8]:
# Generates AI report on selected data
explanation = ccg.generate_group_report(report_data, filter_description)
print(explanation)

# Group Comparison Report

## Introduction

This report provides a detailed comparison of the top four city groups based on record count from a dataset containing 2769 records. The cities included in this analysis are Lakeside, Springfield, Hilltop, and Rivertown. The report examines various attributes and their changes over different time periods.

## City Groups Overview

The dataset is filtered to include the top four cities by record count:

1. **Lakeside**: 349 records
2. **Springfield**: 265 records
3. **Hilltop**: 259 records
4. **Rivertown**: 204 records

## Attribute Analysis

### Lakeside

- **Delivery Issues**: The majority of records indicate no delivery issues (274), ranking first in this category. There is a significant increase in records without delivery issues in 2023-H2 (124), a rise of 100 from the previous period.
- **Description Issues**: Most records show no description issues (285), also ranking first. The count increased significantly in 2023-H2 (131), up by 91.