# Compare Case Groups

Demonstrates use of the Intelligence Toolkit library to compare groups in a dataset.

See [readme](https://github.com/microsoft/intelligence-toolkit/blob/main/app/workflows/compare_case_groups/README.md) for more details.

In [1]:
import sys

sys.path.append("..")
import polars as pl
from intelligence_toolkit.compare_case_groups.api import CompareCaseGroups

  from tqdm.autonotebook import tqdm, trange


In [2]:
# Create the workflow object
import os
from intelligence_toolkit.helpers import df_functions
from intelligence_toolkit.AI.openai_configuration import OpenAIConfiguration
import pandas as pd

ccg = CompareCaseGroups()

ai_configuration = OpenAIConfiguration(
    {
        "api_type": "OpenAI",
        "api_key": os.environ["OPENAI_API_KEY"],
        "model": "gpt-4o",
    }
)
ccg.set_ai_configuration(ai_configuration)

data_path = "../example_outputs/compare_case_groups/customer_complaints/customer_complaints_prepared.csv"
customer_cases = pd.read_csv(data_path)
customer_cases = pl.from_pandas(df_functions.supress_boolean_binary(customer_cases))
print("Loaded data")

Loaded data


In [3]:
filters = []
### If filtering, options here:
# ccg.get_filter_options(customer_cases)

groups = ["city"]
aggregates = [
    "product_code",
    "delivery_issue",
    "description_issue",
    "price_issue",
    "quality_issue",
    "service_issue",
]
temporal = "period"
print("Selected params for workflow")

Selected params for workflow


In [4]:
ccg.create_data_summary(
    customer_cases,
    [],
    groups,
    aggregates,
    temporal,
)
print("Created data summary")

Created data summary


In [5]:
len(ccg.model_df)

9646

In [6]:
ccg.model_df.head()

city,group_count,group_rank,attribute_value,attribute_count,attribute_rank,period_window,period_window_count,period_window_rank,period_window_delta
str,u32,i32,str,u32,i32,str,u32,i32,i32
"""Baytown""",8,50,"""delivery_issue…",2,56,"""2020-H1""",0,3,0
"""Baytown""",8,50,"""description_is…",3,41,"""2020-H1""",0,3,0
"""Baytown""",8,50,"""price_issue:Tr…",3,50,"""2020-H1""",0,2,0
"""Baytown""",8,50,"""product_code:A…",1,38,"""2020-H1""",0,2,0
"""Baytown""",8,50,"""product_code:D…",2,27,"""2020-H1""",0,2,0


In [7]:
ccg.get_summary_description()

'This table shows:\n- A summary of all **2769** data records with values for all grouping attributes\n- The **group_count** of records for all [**city**] groups, and corresponding **group_rank**\n- The **attribute_count** of each **attribute_value** for all [**city**] groups, and corresponding **attribute_rank**\n- The **period_window_count** of each **attribute_value** for each **period_window** for all [**city**] groups, and corresponding **period_window_rank**\n- The **period_window_delta**, or change in the **attribute_value_count** for successive **period_window** values, within each [**city**] group'

In [8]:
# Select groups to generate reports
# By group name
selected_groups = [{"city": "Lakeside"}]
# OR
# By top n groups
top_group_ranks = 10

report_data, filter_description = ccg.get_report_data(top_group_ranks=top_group_ranks)

In [9]:
# Generates AI report on selected data
explanation = ccg.generate_group_report(report_data, filter_description)
print(explanation)

# Group Comparison Report

## Introduction

This report provides a detailed analysis of the top 10 city groups based on record count from a dataset of 2769 records. The dataset includes information on various issues and product codes across different time periods. The focus is on comparing these city groups in terms of the frequency and ranking of specific attributes and their changes over time.

## Data Summary

The dataset is filtered to include only the top 10 city groups by record count. These groups are:

1. Lakeside (349 records)
2. Springfield (265 records)
3. Hilltop (259 records)
4. Rivertown (204 records)
5. Riverside (184 records)
6. Seaside (127 records)
7. Mountainview (119 records)
8. Brookside (111 records)
9. Greenfield (104 records)
10. Meadowville (94 records)

## Key Findings

### Attribute Analysis

- **Delivery Issues**: 
  - Hilltop has the highest count of delivery issues with 104 records, ranking 1st among all groups.
  - Springfield follows with 88 records, ran