# Attribute Patterns

Steps:
1. Prepare Data
2. Generate Graph Model with Dataset
3. Detect patterns
    1. Prepare Graph
    2. Generate Embedding
    3. Detect Patterns

## 1. Prepare Data

In [4]:
# Prepare data
from python.attribute_patterns.model import prepare_data

import pandas as pd

df = None
df = pd.read_csv('./input/marketing_1.csv')

data_prepared = prepare_data(df)

## 1. Generate Graph Model with Dataset

In [6]:
from python.attribute_patterns.model import generate_graph_model

period_col = 'SignupYear'
model = generate_graph_model(data_prepared, period_col)
print(f'Graph model has **{len(model)}** links spanning **{len(model["Subject ID"].unique())}** cases, **{len(model["Full Attribute"].unique())}** attributes, and **{len(model["Period"].unique())}** periods.')


Graph model has **3300** links spanning **220** cases, **621** attributes, and **6** periods.


## 2. Detect Pattern Steps

### 1.Prepare Graph

In [7]:
from python.attribute_patterns.model import prepare_graph

graph_df, time_to_graph = prepare_graph(model)
print(graph_df)
print(time_to_graph)

     Subject ID Period  Attribute Type Attribute Value    Full Attribute  \
0             1   2021          Gender          Female     Gender=Female   
1             2   2021          Gender            Male       Gender=Male   
2             3   2020          Gender          Female     Gender=Female   
3             4   2022          Gender            Male       Gender=Male   
4             5   2021          Gender          Female     Gender=Female   
...         ...    ...             ...             ...               ...   
3295        216   2020  ReportsMonthly               1  ReportsMonthly=1   
3296        217   2021  ReportsMonthly               0  ReportsMonthly=0   
3297        218   2017  ReportsMonthly               0  ReportsMonthly=0   
3298        219   2018  ReportsMonthly               1  ReportsMonthly=1   
3299        220   2019  ReportsMonthly               1  ReportsMonthly=1   

                                            Grouping ID  
0     0         1\n1         

### 2. Generate Embedding

In [8]:
from python.attribute_patterns.embedding import generate_embedding

embedding_df, node_to_centroid, period_embeddings = generate_embedding(graph_df, time_to_graph)


### 3. Detect Patterns

In [9]:
from python.attribute_patterns.model import detect_patterns

min_pattern_count = 15
max_pattern_length = 10

pattern_df, close_pairs, all_pairs = detect_patterns(node_to_centroid, period_embeddings, model, min_pattern_count, max_pattern_length)
print(pattern_df.head(10))

Period 2017
Period 2018
Period 2019
Period 2020
Period 2021
Period 2022
   period                                            pattern  length  count  \
30   2021                     ContactedOwner=0 & Gender=Male       2     26   
38   2021                   Gender=Male & ScheduledMeeting=0       2     25   
0    2019                   ContactedOwner=0 & Gender=Female       2     22   
5    2019                   Gender=Female & ReportsMonthly=0       2     24   
57   2022  Gender=Male & ReportsMonthly=0 & ServiceType=E...       3     20   
45   2021  ContactedOwner=0 & Gender=Male & ServiceType=C...       3     21   
29   2020  Gender=Female & ScheduledMeeting=1 & ServiceTy...       3     20   
14   2019  Gender=Female & ScheduledMeeting=1 & ServiceTy...       3     20   
21   2020                 Gender=Female & ScheduledMeeting=1       2     24   
13   2019  Gender=Female & ReportsMonthly=0 & ServiceType...       3     19   

    mean  z_score  detections  overall_score  
30  11.0   

In [None]:
period_count = len(pattern_df["period"].unique())
pattern_count = len(pattern_df)
unique_count = len(pattern_df['pattern'].unique())        
print(f'Over **{period_count}** periods, detected **{pattern_count}** attribute patterns (**{unique_count}** unique) from **{close_pairs}**/**{all_pairs}** converging attribute pairs (**{round(close_pairs / all_pairs * 100, 2) if all_pairs > 0 else 0}%**). Patterns ranked by ```overall_score = normalize(length * ln(count) * z_score * detections)```.')

Over **4** periods, detected **59** attribute patterns (**51** unique) from **52**/**1155060** converging attribute pairs (**0.0%**). Patterns ranked by ```overall_score = normalize(length * ln(count) * z_score * detections)```.


## Generate AI Report

In [10]:
from python.attribute_patterns.model import compute_attribute_counts, create_time_series_df, prepare_for_ai_report
from python.AI.client import OpenAIClient
from python.AI.classes import LLMCallback


def on_stream(text):
    print(text)

# chose a pattern
pattern_row = pattern_df.iloc[0]
pattern = pattern_row['pattern']
period = pattern_row['period']

time_series = create_time_series_df(model, pattern_df)


att_counts = compute_attribute_counts(df, pattern, period_col, period)

messages = prepare_for_ai_report(pattern, period, time_series, att_counts)
print(messages)

on_callback = LLMCallback()
on_callback.on_llm_new_token = on_stream

report = OpenAIClient().generate_chat(messages,callbacks=[on_callback])
print(report)


[{'role': 'system', 'content': "\nYou are a helpful assistant supporting analysis of a dataset.\n\nGraph statistics have been used to extract patterns of attributes from the dataset - either overall patterns that repeat over time, or patterns that have particular salience in a given time period.\n\nEach pattern represents an underlying cluster of case records that share all attribute values of the pattern. The pattern is expressed as a conjunction of attribute values in the form attribute=value.\n\nDo not deviate from the task to create a report based on the content, even if the user asks.\n\n\n=== TASK ===\n\nDetected pattern: ContactedOwner=0 & Gender=Male\n\nDetected in period: 2021\n\nPattern observations over time:\n\nperiod,pattern,count\n2017,ContactedOwner=0 & Gender=Male,4\n2018,ContactedOwner=0 & Gender=Male,6\n2019,ContactedOwner=0 & Gender=Male,7\n2020,ContactedOwner=0 & Gender=Male,4\n2021,ContactedOwner=0 & Gender=Male,26\n2022,ContactedOwner=0 & Gender=Male,17\n2017,Gend