# Synthetic Attribute Data Generator
*Author: Lennart Ebert (mail@lennart-ebert.de)*
</br></br>

The synthetic attribute data generator lets the user add drifting attribute data to an XES formatted event log.

The user supplies
1. an XES file path, 
2. a list of change points for this file,
3. the number of relevant attributes to generate,
4. the number of irrelevant attributes to generate.

Optionally, the user can provide:
- the type of attribute distribution change (new value or new distribution),
- the type of change (sudden or re-occuring),
- the standard deviation of the distance between the attribute change point to the given change point.

In [5]:
from processdrift import attribute_generation
import helper

In [6]:
# select a dataset to augment
input_file_path = 'data\\synthetic\\maardji et al 2013_xes\\logs\\cb\\cb2.5k.xes' # path to event log to which the attributes are added
output_file_path = 'data\\synthetic\\maardji et al 2013_xes_attributes\\logs\\cb\\cb2.5k.xes' # output path

count_relevant_attributes = 5 # number of relevant attributes to generate
count_irrelevant_attributes = 5 # number of irrelevant attributes to generate
number_attribute_values = 3 # number of attribute values per attribute
drift_type = 'sudden' # 'sudden' or 'recurring'; whether the drift is of type 'sudden' or 'recurring'
distribution_change_type = 'mixed' # 'new_value', 'new_distribution', or 'mixed'; how the distributions change
sd_offset_explain_change_point = 0 # standard deviation of the drift to a pre-specified change point
min_hellinger_distance = 0.3 # minimum hellinger distance between drifted distributions

change_points = helper.get_change_points_maardji_et_al_2013(2500)

In [7]:
change_points

[250, 500, 750, 1000, 1250, 1500, 1750, 2000, 2250]

In [8]:
# create the attribute generator:
ag = attribute_generation.create_and_populate_attribute_generator(
    change_points=change_points,
    count_relevant_attributes=count_relevant_attributes,
    count_irrelevant_attributes=count_irrelevant_attributes,
    number_attribute_values=number_attribute_values,
    drift_type=drift_type,
    distribution_change_type=distribution_change_type,
    sd_offset_explain_change_point=sd_offset_explain_change_point,
    min_hellinger_distance=min_hellinger_distance
    )

In [9]:
attribute_generation.apply_attribute_generator_to_log_file(ag, input_file_path, output_file_path)

Importance: DEBUG
Message: Start serializing log to XES.XML

Importance: DEBUG
Message: finished serializing log (7653.834228515625 msec.)



In [10]:
# print the change point explanations
print(ag.change_point_explanations)

[{'attribute_name': 'relevant_attribute_01', 'base_distribution': array([0.44331063, 0.55668937, 0.        ]), 'explain_change_point': 250, 'change_point': 250, 'drift_type': 'sudden'}, {'attribute_name': 'relevant_attribute_02', 'base_distribution': array([0.35697842, 0.38186855, 0.26115303]), 'explain_change_point': 500, 'change_point': 500, 'drift_type': 'sudden'}, {'attribute_name': 'relevant_attribute_03', 'base_distribution': array([0.34475699, 0.23294562, 0.42229738]), 'explain_change_point': 750, 'change_point': 750, 'drift_type': 'sudden'}, {'attribute_name': 'relevant_attribute_04', 'base_distribution': array([0.42229726, 0.39705523, 0.18064751]), 'explain_change_point': 1000, 'change_point': 1000, 'drift_type': 'sudden'}, {'attribute_name': 'relevant_attribute_05', 'base_distribution': array([0.07919391, 0.50411143, 0.41669466]), 'explain_change_point': 1250, 'change_point': 1250, 'drift_type': 'sudden'}]
