# Detect Case Patterns

Demonstrates use of the Intelligence Toolkit library to detect attribute patterns in a dataset of timestamped case records.

See [readme](https://github.com/microsoft/intelligence-toolkit/blob/main/app/workflows/detect_attribute_patterns/README.md) for more details.

In [1]:
import sys

sys.path.append("..")
import os
from toolkit.detect_case_patterns import DetectCasePatterns
import toolkit.detect_case_patterns.prompts as prompts
from toolkit.AI.openai_configuration import OpenAIConfiguration
import pandas as pd

  from tqdm.autonotebook import tqdm, trange


In [2]:
# Create the workflow object
from toolkit.helpers import df_functions


dcp = DetectCasePatterns()
# Set the AI configuration
ai_configuration = OpenAIConfiguration(
    {
        "api_type": "OpenAI",
        "api_key": os.environ["OPENAI_API_KEY"],
        "model": "gpt-4o",
    }
)
dcp.set_ai_configuration(ai_configuration)
# Load the prepared case data
data_path = "../example_outputs/detect_case_patterns/customer_complaints/customer_complaints_prepared.csv"
case_data = pd.read_csv(data_path)
# Map missing values and binary False to empty strings, since we only care about the presence of attributes
case_data = df_functions.supress_boolean_binary(case_data)
print("Loaded data")

Loaded data


In [3]:
# Generate the graph model
dcp.generate_graph_model(df=case_data, period_col="period")
print("Generated graph model")

Generated graph model


In [4]:
# Generate the graph model
dcp.generate_embedding_model()
print("Generated embedding model")

Generated embedding model


In [5]:
# Detect the case patterns
dcp.detect_patterns(min_pattern_count=10, max_pattern_length=5)
print("Detected case patterns")

Detected case patterns


In [6]:
# Inspect the top patterns of the maximum length
pdf = dcp.patterns_df
max_length = pdf["length"].max()
top_patterns = pdf[pdf["length"] == max_length].head(10)
print(top_patterns)

      period                                            pattern  length  \
511  2023-H1  age_range:(50-60] & city:Springfield & deliver...       5   
510  2023-H1  age_range:(40-50] & city:Mountainview & delive...       5   
509  2023-H1  age_range:(30-40] & city:Forestville & price_i...       5   
99   2022-H2  age_range:(30-40] & city:Quartz City & deliver...       5   
508  2023-H1  age_range:(20-30] & city:Riverside & descripti...       5   

     count  mean  z_score  detections  overall_score  
511     18   2.0     3.32           1           0.45  
510     18   2.0     3.30           1           0.44  
509     16   1.0     3.32           1           0.43  
99      15   1.0     3.32           1           0.42  
508     13   1.0     3.32           1           0.40  


In [7]:
# Create the time series
dcp.create_time_series_df()
print("Created time series")

Created time series


In [8]:
# Set the example pattern to the top pattern of the longest length
example_pattern = top_patterns.iloc[0]
print(example_pattern)

period                                                     2023-H1
pattern          age_range:(50-60] & city:Springfield & deliver...
length                                                           5
count                                                           18
mean                                                           2.0
z_score                                                       3.32
detections                                                       1
overall_score                                                 0.45
Name: 511, dtype: object


In [9]:
# Compute related attribute counts for the example pattern
att_counts = dcp.compute_attribute_counts(
    selected_pattern=example_pattern["pattern"],
    selected_pattern_period=example_pattern["period"],
)
print(att_counts)

Computing attribute counts for pattern: age_range:(50-60] & city:Springfield & delivery_issue:True & product_code:G & service_issue:True with period: 2023-H1 for period column: period
           AttributeValue  Count
0       age_range:(50-60]     18
1        city:Springfield     18
2     delivery_issue:True     18
5          product_code:G     18
7      service_issue:True     18
6      quality_issue:True      7
3  description_issue:True      4
4        price_issue:True      4


In [10]:
# Create the time series chart
chart = dcp.create_time_series_chart(
    selected_pattern=example_pattern["pattern"],
    selected_pattern_period=example_pattern["period"],
)
chart

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


In [11]:
# Explain the top-ranked pattern of longest length
explanation = dcp.explain_pattern(
    selected_pattern=example_pattern["pattern"],
    selected_pattern_period=example_pattern["period"],
    ai_instructions=prompts.user_prompt,
)
print(explanation)

Computing attribute counts for pattern: age_range:(50-60] & city:Springfield & delivery_issue:True & product_code:G & service_issue:True with period: 2023-H1 for period column: period
# Pattern Report

**Pattern: age_range:(50-60] & city:Springfield & delivery_issue:True & product_code:G & service_issue:True**

This pattern identifies a group of individuals aged between 50 and 60 years old, residing in Springfield, who have experienced both delivery and service issues with product code G.

## Pattern observation

The pattern was observed only in the first half of 2023, with 18 cases matching the pattern. In all other periods from 2020 to 2025, no cases were recorded. This sudden appearance in 2023-H1 suggests a specific issue or change during this time that affected this demographic and product.

## Pattern context

In addition to the attributes defining the pattern, some cases also reported quality issues (7 cases), description issues (4 cases), and price issues (4 cases). This inform